Skip to main content

Oboyu Architecture Overview

System Design Philosophy​

Oboyu (θ¦šγ‚†) is designed with a clear architectural vision: to provide powerful semantic search for local documents with exceptional Japanese language support. The system embraces simplicity, privacy, and efficiency while offering advanced search capabilities.

Core Architectural Principles​

  • Local-First: All processing occurs on the user's machine with no data sent externally
  • Modular Design: Clean separation of concerns with distinct components
  • Japanese Excellence: First-class support for Japanese throughout the system
  • Flexibility: Support for multiple search methodologies (vector, BM25, hybrid)
  • Minimal Dependencies: Self-contained system with few external requirements

Component Architecture​

Oboyu is built around three primary components, each with distinct responsibilities:

  1. Crawler: Discovers and extracts documents from the file system
  2. Indexer: Processes documents and builds search indexes
  3. Query Engine: Handles search requests and returns relevant results

Component Overview​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Crawler │────│ Indexer │────│ Query Engine β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β€’ Discovery β”‚ β”‚ β€’ Processing β”‚ β”‚ β€’ Vector Search β”‚
β”‚ β€’ Extraction β”‚ β”‚ β€’ Embedding β”‚ β”‚ β€’ BM25 Search β”‚
β”‚ β€’ Japanese β”‚ β”‚ β€’ Storage β”‚ β”‚ β€’ Hybrid Search β”‚
β”‚ Processing β”‚ β”‚ β€’ Change β”‚ β”‚ β€’ Reranking β”‚
β”‚ β”‚ β”‚ Detection β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DuckDB Database β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ file_metadataβ”‚ β”‚ chunks β”‚ β”‚ embeddings β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β€’ path β”‚ β”‚ β€’ content β”‚ β”‚ β€’ vector β”‚ β”‚
β”‚ β”‚ β€’ metadata β”‚ β”‚ β€’ language β”‚ β”‚ β€’ similarity search β”‚ β”‚
β”‚ β”‚ β€’ checksums β”‚ β”‚ β€’ metadata β”‚ β”‚ (VSS extension) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow​

The system follows a straightforward data flow:

Document Sources β†’ Crawler β†’ Indexer β†’ Database
↑
↓
User Query β†’ Query Engine β†’ Results

Technology Stack​

  • Core Language: Python 3.8+ for cross-platform compatibility
  • Database: DuckDB with VSS extension for vector similarity search and full-text indexing
  • Embedding Models: Ruri v3 (cl-nagoya/ruri-v3-30m) with Japanese optimization
  • Reranker Models: Ruri Cross-Encoder (cl-nagoya/ruri-reranker-small) for result refinement
  • Japanese Processing: MeCab morphological analyzer via fugashi library
  • Search Algorithms: Vector search (HNSW), BM25, and hybrid approaches
  • ONNX Optimization: Automatic model conversion for 2-4x inference speedup
  • CLI Framework: Typer with Rich for interactive command-line interface
  • MCP Integration: Model Context Protocol server for AI assistant integration

Database Schema Overview​

Oboyu uses a carefully designed DuckDB schema optimized for semantic search:

Core Tables​

  • file_metadata: File information, checksums, processing metadata
  • chunks: Document segments with content, language detection, and metadata
  • embeddings: Vector representations with VSS extension for similarity search

BM25 Search Tables​

  • vocabulary: Term vocabulary with IDF scores
  • inverted_index: Term-to-document mappings with TF scores
  • document_stats: Document length and term count statistics
  • collection_stats: Collection-wide statistics for BM25 scoring

Meta Tables​

  • schema_version: Database schema versioning for safe migrations

Key Features​

  • VSS Extension: Vector similarity search with HNSW indexing
  • Full-Text Search: Native DuckDB FTS for exact term matching
  • Incremental Updates: Change detection prevents redundant processing
  • Schema Migrations: Version-controlled database schema evolution
  • Transaction Safety: ACID compliance for reliable updates

Interface Architecture​

Command-Line Interface​

Oboyu provides a rich CLI with multiple interaction modes:

  • Single Commands: Direct file indexing and one-shot queries
  • Interactive Mode: Persistent REPL for continuous searching with session state
  • Management Commands: Index status checking, differential updates, clearing

MCP Server Mode​

The Model Context Protocol (MCP) server enables AI assistant integration:

  • Transport Options: stdio, Server-Sent Events (SSE), streamable-http
  • Tool Exposure: Search, indexing, index management via standardized protocol
  • Session Management: Persistent database connections for multiple queries
  • Error Handling: Robust error reporting and recovery

API Layers​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User Interfaces β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ CLI Commands β”‚ Interactive Mode β”‚ MCP Server β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ index β”‚ β€’ /search β”‚ β€’ search_tool β”‚
β”‚ β€’ query β”‚ β€’ /mode β”‚ β€’ index_tool β”‚
β”‚ β€’ clear β”‚ β€’ /settings β”‚ β€’ clear_tool β”‚
β”‚ β€’ mcp β”‚ β€’ /stats β”‚ β€’ status_tool β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Core Engine β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Crawler β”‚ Indexer β”‚ Query Engine β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Configuration System​

The system is configured through a YAML file located at ~/.oboyu/config.yaml, providing extensive customization options while maintaining sensible defaults.

Integration Points​

  • Command Line Interface: Direct document indexing and querying
  • MCP Server: Standard stdio interface for integration with other tools

For detailed information on each component, see: