Oboyu Architecture Overview
System Design Philosophyβ
Oboyu (θ¦γ) is designed with a clear architectural vision: to provide powerful semantic search for local documents with exceptional Japanese language support. The system embraces simplicity, privacy, and efficiency while offering advanced search capabilities.
Core Architectural Principlesβ
- Local-First: All processing occurs on the user's machine with no data sent externally
- Modular Design: Clean separation of concerns with distinct components
- Japanese Excellence: First-class support for Japanese throughout the system
- Flexibility: Support for multiple search methodologies (vector, BM25, hybrid)
- Minimal Dependencies: Self-contained system with few external requirements
Component Architectureβ
Oboyu is built around three primary components, each with distinct responsibilities:
- Crawler: Discovers and extracts documents from the file system
- Indexer: Processes documents and builds search indexes
- Query Engine: Handles search requests and returns relevant results
Component Overviewβ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Crawler ββββββ Indexer ββββββ Query Engine β
β β β β β β
β β’ Discovery β β β’ Processing β β β’ Vector Search β
β β’ Extraction β β β’ Embedding β β β’ BM25 Search β
β β’ Japanese β β β’ Storage β β β’ Hybrid Search β
β Processing β β β’ Change β β β’ Reranking β
β β β Detection β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DuckDB Database β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β
β β file_metadataβ β chunks β β embeddings β β
β β β β β β β β
β β β’ path β β β’ content β β β’ vector β β
β β β’ metadata β β β’ language β β β’ similarity search β β
β β β’ checksums β β β’ metadata β β (VSS extension) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data Flowβ
The system follows a straightforward data flow:
Document Sources β Crawler β Indexer β Database
β
β
User Query β Query Engine β Results
Technology Stackβ
- Core Language: Python 3.8+ for cross-platform compatibility
- Database: DuckDB with VSS extension for vector similarity search and full-text indexing
- Embedding Models: Ruri v3 (cl-nagoya/ruri-v3-30m) with Japanese optimization
- Reranker Models: Ruri Cross-Encoder (cl-nagoya/ruri-reranker-small) for result refinement
- Japanese Processing: MeCab morphological analyzer via fugashi library
- Search Algorithms: Vector search (HNSW), BM25, and hybrid approaches
- ONNX Optimization: Automatic model conversion for 2-4x inference speedup
- CLI Framework: Typer with Rich for interactive command-line interface
- MCP Integration: Model Context Protocol server for AI assistant integration
Database Schema Overviewβ
Oboyu uses a carefully designed DuckDB schema optimized for semantic search:
Core Tablesβ
file_metadata
: File information, checksums, processing metadatachunks
: Document segments with content, language detection, and metadataembeddings
: Vector representations with VSS extension for similarity search
BM25 Search Tablesβ
vocabulary
: Term vocabulary with IDF scoresinverted_index
: Term-to-document mappings with TF scoresdocument_stats
: Document length and term count statisticscollection_stats
: Collection-wide statistics for BM25 scoring
Meta Tablesβ
schema_version
: Database schema versioning for safe migrations
Key Featuresβ
- VSS Extension: Vector similarity search with HNSW indexing
- Full-Text Search: Native DuckDB FTS for exact term matching
- Incremental Updates: Change detection prevents redundant processing
- Schema Migrations: Version-controlled database schema evolution
- Transaction Safety: ACID compliance for reliable updates
Interface Architectureβ
Command-Line Interfaceβ
Oboyu provides a rich CLI with multiple interaction modes:
- Single Commands: Direct file indexing and one-shot queries
- Interactive Mode: Persistent REPL for continuous searching with session state
- Management Commands: Index status checking, differential updates, clearing
MCP Server Modeβ
The Model Context Protocol (MCP) server enables AI assistant integration:
- Transport Options: stdio, Server-Sent Events (SSE), streamable-http
- Tool Exposure: Search, indexing, index management via standardized protocol
- Session Management: Persistent database connections for multiple queries
- Error Handling: Robust error reporting and recovery
API Layersβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Interfaces β
βββββββββββββββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ€
β CLI Commands β Interactive Mode β MCP Server β
βββββββββββββββββββββββΌββββββββββββββββββββββΌββββββββββββββββββ€
β β’ index β β’ /search β β’ search_tool β
β β’ query β β’ /mode β β’ index_tool β
β β’ clear β β’ /settings β β’ clear_tool β
β β’ mcp β β’ /stats β β’ status_tool β
βββββββββββββββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Core Engine β
βββββββββββββββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββ€
β Crawler β Indexer β Query Engine β
βββββββββββββββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
Configuration Systemβ
The system is configured through a YAML file located at ~/.oboyu/config.yaml
, providing extensive customization options while maintaining sensible defaults.
Integration Pointsβ
- Command Line Interface: Direct document indexing and querying
- MCP Server: Standard stdio interface for integration with other tools
For detailed information on each component, see: