Creating Your First Index

This guide will walk you through creating your first document index with Oboyu. In just a few minutes, you'll have a searchable collection of your documents.

What is an Index?

An index is Oboyu's searchable database of your documents. When you create an index, Oboyu:

Discovers documents in specified directories
Extracts and processes text content
Creates semantic embeddings for intelligent search
Builds a fast search structure

Quick Start: Index a Directory

The simplest way to create an index is to point Oboyu at a directory:

oboyu index ~/Documents/my-notes

This command will:

Scan the my-notes directory for supported files
Process all documents found
Create a searchable index

Supported File Types

Oboyu currently supports plain text files and automatically recognizes these file types:

Text Documents: .txt, .md, .markdown
Code Files: .py, .js, .java, .cpp, etc.
Web Documents: .html, .htm
Configuration: .json, .yaml, .xml

Note: Binary files like .pdf and .docx are not currently supported.

Monitoring Progress

During indexing, you'll see a progress display showing:

Directory scanning progress
Documents being processed
Final summary with total files and chunks indexed

To reduce screen output for large collections, use:

oboyu index ~/Documents --quiet-progress

Basic Indexing Examples

Index Multiple Directories

oboyu index ~/Documents/projects ~/Documents/notes ~/Documents/research

Index with a Custom Database Path

Specify a custom database location:

oboyu index ~/Documents/work-docs --db-path ~/my-indexes/work.db

Later, search using this specific database:

oboyu search "meeting notes" --db-path ~/my-indexes/work.db

Index Specific File Types

Focus on particular file types using include patterns:

oboyu index ~/Documents --include-patterns "*.md" --include-patterns "*.txt"

Exclude Directories

Skip certain folders using exclude patterns:

oboyu index ~/Documents --exclude-patterns "*/archive/*" --exclude-patterns "*/temp/*"

Understanding Index Output

After indexing completes, you'll see a summary like:

Indexed 156 files (234 chunks) in 45.2s

This tells you:

Files: Number of documents processed
Chunks: Number of text segments created for search
Time: Total processing time

You can now search your documents:

oboyu search "your search terms"

Best Practices for Indexing

1. Start Small

Begin with a focused directory to understand how Oboyu works:

oboyu index ~/Documents/current-project

2. Organize Your Documents

Structure your files logically before indexing:

~/Documents/
├── projects/
│   ├── project-a/
│   └── project-b/
├── github-issues/
└── research/

3. Use Separate Database Files

Create separate indices for different purposes using custom database paths:

oboyu index ~/work-docs --db-path ~/indexes/work.db
oboyu index ~/personal-notes --db-path ~/indexes/personal.db
oboyu index ~/research-papers --db-path ~/indexes/research.db

4. Regular Updates

Keep your index current by re-indexing periodically (Oboyu performs incremental updates by default):

oboyu index ~/Documents

Checking Index Status

Check the status of what would be indexed:

# Check what files would be processed
oboyu manage status

# Check differences (what would be updated)
oboyu manage diff

The index database is stored at ~/.oboyu/oboyu.db by default, or at the path specified with --db-path.

Incremental Indexing

Oboyu supports incremental updates to save time and performs them by default:

# Incremental indexing (default behavior)
oboyu index ~/Documents

# Force full reindex
oboyu index ~/Documents --force

Handling Large Document Collections

For large collections (10,000+ files):

1. Index in Batches

oboyu index ~/Documents/2023 --db-path ~/indexes/docs-2023.db
oboyu index ~/Documents/2024 --db-path ~/indexes/docs-2024.db

2. Adjust Chunk Settings for Performance

# Adjust chunk size for better performance
oboyu index ~/Documents --chunk-size 1024

# Set chunk overlap for better search results
oboyu index ~/Documents --chunk-overlap 100

3. Use Minimal Progress Output

# Reduce screen output for faster processing
oboyu index ~/large-collection --quiet-progress

Common Indexing Scenarios

Text Documents and Notes

oboyu index ~/Papers --include-patterns "*.txt" --include-patterns "*.md" --db-path ~/indexes/research.db

Software Documentation

oboyu index ~/dev/docs --include-patterns "*.md" --include-patterns "*.rst" --db-path ~/indexes/dev-docs.db

Meeting Notes

oboyu index ~/OneDrive/MeetingNotes --db-path ~/indexes/meetings.db

Mixed Language Documents

# Oboyu automatically detects Japanese content
oboyu index ~/Documents/日本語資料 --db-path ~/indexes/japanese-docs.db

Troubleshooting

Index Creation Fails

If indexing fails, check:

Permissions: Ensure you have read access to all files
Disk Space: Indices typically need 10-20% of source document size
File Corruption: Corrupted files are skipped automatically

Slow Indexing

To speed up indexing:

Close other applications to free up resources
Use SSD storage for better performance
Index smaller directories separately

Missing Documents

If some documents aren't indexed:

# Check which files were skipped
oboyu index ~/Documents --verbose

Next Steps

Now that you've created your first index, you're ready to:

Execute Your First Search - Learn how to search your indexed documents
Basic Workflows - Discover daily usage patterns

What is an Index?​

Quick Start: Index a Directory​

Supported File Types​

Monitoring Progress​

Basic Indexing Examples​

Index Multiple Directories​

Index with a Custom Database Path​

Index Specific File Types​

Exclude Directories​

Understanding Index Output​

Best Practices for Indexing​

1. Start Small​

2. Organize Your Documents​

3. Use Separate Database Files​

4. Regular Updates​

Checking Index Status​

Incremental Indexing​

Handling Large Document Collections​

1. Index in Batches​

2. Adjust Chunk Settings for Performance​

3. Use Minimal Progress Output​

Common Indexing Scenarios​

Text Documents and Notes​

Software Documentation​

Meeting Notes​

Mixed Language Documents​

Troubleshooting​

Index Creation Fails​

Slow Indexing​

Missing Documents​

Next Steps​