Skip to main content

Creating Your First Index

This guide will walk you through creating your first document index with Oboyu. In just a few minutes, you'll have a searchable collection of your documents.

What is an Index?

An index is Oboyu's searchable database of your documents. When you create an index, Oboyu:

  • Discovers documents in specified directories
  • Extracts and processes text content
  • Creates semantic embeddings for intelligent search
  • Builds a fast search structure

Quick Start: Index a Directory

The simplest way to create an index is to point Oboyu at a directory:

oboyu index ~/Documents/my-notes

This command will:

  1. Scan the my-notes directory for supported files
  2. Process all documents found
  3. Create a searchable index

Supported File Types

Oboyu currently supports plain text files and automatically recognizes these file types:

  • Text Documents: .txt, .md, .markdown
  • Code Files: .py, .js, .java, .cpp, etc.
  • Web Documents: .html, .htm
  • Configuration: .json, .yaml, .xml

Note: Binary files like .pdf and .docx are not currently supported.

Monitoring Progress

During indexing, you'll see a progress display showing:

  • Directory scanning progress
  • Documents being processed
  • Final summary with total files and chunks indexed

To reduce screen output for large collections, use:

oboyu index ~/Documents --quiet-progress

Basic Indexing Examples

Index Multiple Directories

oboyu index ~/Documents/projects ~/Documents/notes ~/Documents/research

Index with a Custom Database Path

Specify a custom database location:

oboyu index ~/Documents/work-docs --db-path ~/my-indexes/work.db

Later, search using this specific database:

oboyu search "meeting notes" --db-path ~/my-indexes/work.db

Index Specific File Types

Focus on particular file types using include patterns:

oboyu index ~/Documents --include-patterns "*.md" --include-patterns "*.txt"

Exclude Directories

Skip certain folders using exclude patterns:

oboyu index ~/Documents --exclude-patterns "*/archive/*" --exclude-patterns "*/temp/*"

Understanding Index Output

After indexing completes, you'll see a summary like:

Indexed 156 files (234 chunks) in 45.2s

This tells you:

  • Files: Number of documents processed
  • Chunks: Number of text segments created for search
  • Time: Total processing time

You can now search your documents:

oboyu search "your search terms"

Best Practices for Indexing

1. Start Small

Begin with a focused directory to understand how Oboyu works:

oboyu index ~/Documents/current-project

2. Organize Your Documents

Structure your files logically before indexing:

~/Documents/
├── projects/
│ ├── project-a/
│ └── project-b/
├── github-issues/
└── research/

3. Use Separate Database Files

Create separate indices for different purposes using custom database paths:

oboyu index ~/work-docs --db-path ~/indexes/work.db
oboyu index ~/personal-notes --db-path ~/indexes/personal.db
oboyu index ~/research-papers --db-path ~/indexes/research.db

4. Regular Updates

Keep your index current by re-indexing periodically (Oboyu performs incremental updates by default):

oboyu index ~/Documents

Checking Index Status

Check the status of what would be indexed:

# Check what files would be processed
oboyu manage status

# Check differences (what would be updated)
oboyu manage diff

The index database is stored at ~/.oboyu/oboyu.db by default, or at the path specified with --db-path.

Incremental Indexing

Oboyu supports incremental updates to save time and performs them by default:

# Incremental indexing (default behavior)
oboyu index ~/Documents

# Force full reindex
oboyu index ~/Documents --force

Handling Large Document Collections

For large collections (10,000+ files):

1. Index in Batches

oboyu index ~/Documents/2023 --db-path ~/indexes/docs-2023.db
oboyu index ~/Documents/2024 --db-path ~/indexes/docs-2024.db

2. Adjust Chunk Settings for Performance

# Adjust chunk size for better performance
oboyu index ~/Documents --chunk-size 1024

# Set chunk overlap for better search results
oboyu index ~/Documents --chunk-overlap 100

3. Use Minimal Progress Output

# Reduce screen output for faster processing
oboyu index ~/large-collection --quiet-progress

Common Indexing Scenarios

Text Documents and Notes

oboyu index ~/Papers --include-patterns "*.txt" --include-patterns "*.md" --db-path ~/indexes/research.db

Software Documentation

oboyu index ~/dev/docs --include-patterns "*.md" --include-patterns "*.rst" --db-path ~/indexes/dev-docs.db

Meeting Notes

oboyu index ~/OneDrive/MeetingNotes --db-path ~/indexes/meetings.db

Mixed Language Documents

# Oboyu automatically detects Japanese content
oboyu index ~/Documents/日本語資料 --db-path ~/indexes/japanese-docs.db

Troubleshooting

Index Creation Fails

If indexing fails, check:

  1. Permissions: Ensure you have read access to all files
  2. Disk Space: Indices typically need 10-20% of source document size
  3. File Corruption: Corrupted files are skipped automatically

Slow Indexing

To speed up indexing:

  • Close other applications to free up resources
  • Use SSD storage for better performance
  • Index smaller directories separately

Missing Documents

If some documents aren't indexed:

# Check which files were skipped
oboyu index ~/Documents --verbose

Next Steps

Now that you've created your first index, you're ready to: