Skip to main content

Oboyu Reranker Guide

Overviewโ€‹

The Oboyu reranker feature significantly improves search result quality by applying Cross-Encoder models to re-score and reorder initial retrieval results. This is particularly beneficial for Retrieval-Augmented Generation (RAG) applications where the quality of retrieved context directly impacts the generated output.

What is Reranking?โ€‹

Reranking is a two-stage retrieval process:

  1. Initial Retrieval: Fast vector/BM25/hybrid search retrieves a larger set of candidates (e.g., 30-60 documents)
  2. Reranking: A more sophisticated Cross-Encoder model re-scores these candidates for better relevance

This approach combines the efficiency of vector search with the accuracy of Cross-Encoder models.

Supported Modelsโ€‹

Oboyu supports the following Ruri reranker models:

  • cl-nagoya/ruri-reranker-small (default): Lightweight model offering excellent balance of performance and resource usage
  • cl-nagoya/ruri-v3-reranker-310m: Heavy model with superior accuracy for quality-focused applications

Both models are specifically optimized for Japanese text while maintaining good multilingual capabilities.

Model Comparisonโ€‹

ModelSizeMemory UsageSpeedAccuracyBest For
cl-nagoya/ruri-reranker-small~100M params~400MBFast (20-40ms)ExcellentGeneral use, real-time applications
cl-nagoya/ruri-v3-reranker-310m310M params~1.2GBModerate (50-100ms)SuperiorQuality-focused, batch processing

Recommendation: The default ruri-reranker-small model provides the best balance for most use cases. It offers excellent accuracy with significantly lower resource requirements (~67% memory reduction) compared to the 310m model.

Key Featuresโ€‹

1. ONNX Optimizationโ€‹

  • Automatic conversion to ONNX format for 2-4x faster CPU inference
  • Lazy model loading to minimize startup time
  • Persistent model caching in XDG-compliant directories

2. Flexible Integrationโ€‹

  • Works with all search modes (vector, BM25, hybrid)
  • Optional feature that can be enabled/disabled per query
  • Configurable top-k multiplier for initial retrieval

3. Performance Benefitsโ€‹

  • Hit Rate: 4-10% improvement in retrieval accuracy
  • MRR (Mean Reciprocal Rank): 20%+ improvement in result ranking
  • Japanese Queries: Significant enhancement over embedding-only search

Configurationโ€‹

Global Configurationโ€‹

Add reranker settings to your ~/.config/oboyu/config.yaml:

indexer:
# Reranker settings
reranker_model: "cl-nagoya/ruri-reranker-small" # Model to use
use_reranker: true # Enable by default
reranker_use_onnx: true # Use ONNX optimization
reranker_device: "cpu" # Device (cpu/cuda)
reranker_top_k_multiplier: 3 # Retrieve 3x candidates
reranker_batch_size: 8 # Batch size
reranker_max_length: 512 # Max sequence length
reranker_threshold: null # Score threshold (optional)

Configuration Optionsโ€‹

OptionDescriptionDefault
reranker_modelModel name or pathcl-nagoya/ruri-reranker-small
use_rerankerEnable reranking by defaulttrue
reranker_use_onnxUse ONNX optimizationtrue
reranker_deviceDevice for inferencecpu
reranker_top_k_multiplierMultiplier for initial retrieval3
reranker_batch_sizeBatch size for reranking8
reranker_max_lengthMaximum input length512
reranker_thresholdMinimum score thresholdnull

Usageโ€‹

Command Line Interfaceโ€‹

# Search with reranking (uses config default)
oboyu query --query "ใ‚ทใ‚นใƒ†ใƒ ใฎ่จญ่จˆๅŽŸๅ‰‡ใซใคใ„ใฆ"

# Explicitly enable reranking
oboyu query --query "design principles" --rerank

# Disable reranking for this query
oboyu query --query "quick lookup" --no-rerank

# Rerank with custom top-k
oboyu query --query "้‡่ฆใชๆฆ‚ๅฟต" --rerank --top-k 5

Python APIโ€‹

from oboyu.indexer.indexer import Indexer
from oboyu.indexer.config import IndexerConfig

# Initialize with reranker enabled
config = IndexerConfig(config_dict={
"indexer": {
"db_path": "oboyu.db",
"use_reranker": True,
"reranker_model": "cl-nagoya/ruri-reranker-small",
}
})
indexer = Indexer(config=config)

# Search with reranking
results = indexer.search(
query="ๆ—ฅๆœฌ่ชžใฎๆ–‡ๆ›ธๆคœ็ดข",
limit=5,
use_reranker=True # Or None to use config default
)

# Search without reranking
results = indexer.search(
query="quick search",
limit=10,
use_reranker=False
)

Custom Rerankerโ€‹

from oboyu.indexer.reranker import BaseReranker, create_reranker

# Create a custom reranker
reranker = create_reranker(
model_name="cl-nagoya/ruri-reranker-small",
use_onnx=True,
device="cpu",
batch_size=16,
)

# Use with indexer
indexer = Indexer(config=config, reranker=reranker)

How Reranking Worksโ€‹

1. Initial Retrieval Phaseโ€‹

Query โ†’ Embedding โ†’ Vector Search โ†’ Top 15 candidates (if top_k=5, multiplier=3)

2. Reranking Phaseโ€‹

Query + Each Candidate โ†’ Cross-Encoder โ†’ Relevance Score โ†’ Re-order โ†’ Top 5 results

3. Architecture Comparisonโ€‹

Bi-Encoder (Embedding Model):

  • Encodes query and documents separately
  • Fast but less accurate for nuanced matching
  • Used in initial retrieval

Cross-Encoder (Reranker):

  • Processes query-document pairs together
  • Slower but more accurate
  • Captures fine-grained semantic relationships

Performance Considerationsโ€‹

Speed vs. Accuracy Trade-offsโ€‹

ConfigurationSpeedAccuracyUse Case
No rerankingFastestGoodHigh-volume, latency-sensitive
Small model + ONNXFastBetterBalanced performance
310m model + ONNXModerateBestQuality-focused RAG
310m model + PyTorchSlowerBestGPU environments

Optimization Tipsโ€‹

  1. Batch Size: Larger batches improve throughput but increase latency
  2. Top-k Multiplier: Higher values improve recall but increase processing time
  3. ONNX: Always use for CPU deployments (2-4x speedup)
  4. Model Selection: Use small model for real-time applications

Resource Requirementsโ€‹

Memory Usage (Approximate):

  • 310m model: ~1.2GB (ONNX) / ~1.5GB (PyTorch)
  • Small model: ~400MB (ONNX) / ~500MB (PyTorch)

Processing Time (per query, 15 candidates):

  • 310m + ONNX: ~50-100ms
  • 310m + PyTorch: ~150-300ms
  • Small + ONNX: ~20-40ms

Benchmarkingโ€‹

Run the reranking benchmark to evaluate performance:

# Run benchmark with test queries
python -m bench.benchmark_reranking \
--db-path oboyu.db \
--test-queries test_queries.json \
--output results.json

# Custom configuration
python -m bench.benchmark_reranking \
--config reranker_config.yaml \
--db-path oboyu.db \
--test-queries queries.json \
--top-k 5 10 20 \
--initial-k 60

Model Cache Managementโ€‹

ONNX models are cached for faster subsequent loads:

~/.cache/oboyu/embedding/cache/models/onnx/
โ”œโ”€โ”€ cl-nagoya_ruri-v3-reranker-310m/
โ”‚ โ”œโ”€โ”€ model.onnx
โ”‚ โ”œโ”€โ”€ model_optimized.onnx
โ”‚ โ”œโ”€โ”€ tokenizer_config.json
โ”‚ โ””โ”€โ”€ onnx_config.json
โ””โ”€โ”€ cl-nagoya_ruri-reranker-small/
โ””โ”€โ”€ ...

To clear the cache:

rm -rf ~/.cache/oboyu/embedding/cache/models/onnx/

Troubleshootingโ€‹

Common Issuesโ€‹

  1. Slow First Query: Models are loaded lazily on first use. Subsequent queries will be faster.

  2. Out of Memory: Reduce batch size or disable reranking temporarily:

    reranker_batch_size: 4
    # Or disable for this query: oboyu query --query "text" --no-rerank
  3. ONNX Conversion Fails: Disable ONNX optimization:

    reranker_use_onnx: false

Debug Modeโ€‹

Enable debug logging to see reranking details:

import logging
logging.getLogger("oboyu.indexer.reranker").setLevel(logging.DEBUG)

Best Practicesโ€‹

  1. For RAG Applications: Always enable reranking for context retrieval
  2. For Japanese Content: Use the default 310m model for best results
  3. For Mixed Language: The models work well for multilingual content
  4. Initial Retrieval: Set multiplier based on result diversity needs (3-5x is typical)
  5. Threshold Setting: Use threshold to filter low-confidence results in critical applications

Future Enhancementsโ€‹

  • Support for custom Cross-Encoder models
  • GPU acceleration for batch processing
  • Async reranking for better concurrency
  • Fine-tuning support for domain-specific ranking