Oboyu Reranker Guide
Overviewโ
The Oboyu reranker feature significantly improves search result quality by applying Cross-Encoder models to re-score and reorder initial retrieval results. This is particularly beneficial for Retrieval-Augmented Generation (RAG) applications where the quality of retrieved context directly impacts the generated output.
What is Reranking?โ
Reranking is a two-stage retrieval process:
- Initial Retrieval: Fast vector/BM25/hybrid search retrieves a larger set of candidates (e.g., 30-60 documents)
- Reranking: A more sophisticated Cross-Encoder model re-scores these candidates for better relevance
This approach combines the efficiency of vector search with the accuracy of Cross-Encoder models.
Supported Modelsโ
Oboyu supports the following Ruri reranker models:
cl-nagoya/ruri-reranker-small
(default): Lightweight model offering excellent balance of performance and resource usagecl-nagoya/ruri-v3-reranker-310m
: Heavy model with superior accuracy for quality-focused applications
Both models are specifically optimized for Japanese text while maintaining good multilingual capabilities.
Model Comparisonโ
Model | Size | Memory Usage | Speed | Accuracy | Best For |
---|---|---|---|---|---|
cl-nagoya/ruri-reranker-small | ~100M params | ~400MB | Fast (20-40ms) | Excellent | General use, real-time applications |
cl-nagoya/ruri-v3-reranker-310m | 310M params | ~1.2GB | Moderate (50-100ms) | Superior | Quality-focused, batch processing |
Recommendation: The default ruri-reranker-small
model provides the best balance for most use cases. It offers excellent accuracy with significantly lower resource requirements (~67% memory reduction) compared to the 310m model.
Key Featuresโ
1. ONNX Optimizationโ
- Automatic conversion to ONNX format for 2-4x faster CPU inference
- Lazy model loading to minimize startup time
- Persistent model caching in XDG-compliant directories
2. Flexible Integrationโ
- Works with all search modes (vector, BM25, hybrid)
- Optional feature that can be enabled/disabled per query
- Configurable top-k multiplier for initial retrieval
3. Performance Benefitsโ
- Hit Rate: 4-10% improvement in retrieval accuracy
- MRR (Mean Reciprocal Rank): 20%+ improvement in result ranking
- Japanese Queries: Significant enhancement over embedding-only search
Configurationโ
Global Configurationโ
Add reranker settings to your ~/.config/oboyu/config.yaml
:
indexer:
# Reranker settings
reranker_model: "cl-nagoya/ruri-reranker-small" # Model to use
use_reranker: true # Enable by default
reranker_use_onnx: true # Use ONNX optimization
reranker_device: "cpu" # Device (cpu/cuda)
reranker_top_k_multiplier: 3 # Retrieve 3x candidates
reranker_batch_size: 8 # Batch size
reranker_max_length: 512 # Max sequence length
reranker_threshold: null # Score threshold (optional)
Configuration Optionsโ
Option | Description | Default |
---|---|---|
reranker_model | Model name or path | cl-nagoya/ruri-reranker-small |
use_reranker | Enable reranking by default | true |
reranker_use_onnx | Use ONNX optimization | true |
reranker_device | Device for inference | cpu |
reranker_top_k_multiplier | Multiplier for initial retrieval | 3 |
reranker_batch_size | Batch size for reranking | 8 |
reranker_max_length | Maximum input length | 512 |
reranker_threshold | Minimum score threshold | null |
Usageโ
Command Line Interfaceโ
# Search with reranking (uses config default)
oboyu query --query "ใทในใใ ใฎ่จญ่จๅๅใซใคใใฆ"
# Explicitly enable reranking
oboyu query --query "design principles" --rerank
# Disable reranking for this query
oboyu query --query "quick lookup" --no-rerank
# Rerank with custom top-k
oboyu query --query "้่ฆใชๆฆๅฟต" --rerank --top-k 5
Python APIโ
from oboyu.indexer.indexer import Indexer
from oboyu.indexer.config import IndexerConfig
# Initialize with reranker enabled
config = IndexerConfig(config_dict={
"indexer": {
"db_path": "oboyu.db",
"use_reranker": True,
"reranker_model": "cl-nagoya/ruri-reranker-small",
}
})
indexer = Indexer(config=config)
# Search with reranking
results = indexer.search(
query="ๆฅๆฌ่ชใฎๆๆธๆค็ดข",
limit=5,
use_reranker=True # Or None to use config default
)
# Search without reranking
results = indexer.search(
query="quick search",
limit=10,
use_reranker=False
)
Custom Rerankerโ
from oboyu.indexer.reranker import BaseReranker, create_reranker
# Create a custom reranker
reranker = create_reranker(
model_name="cl-nagoya/ruri-reranker-small",
use_onnx=True,
device="cpu",
batch_size=16,
)
# Use with indexer
indexer = Indexer(config=config, reranker=reranker)
How Reranking Worksโ
1. Initial Retrieval Phaseโ
Query โ Embedding โ Vector Search โ Top 15 candidates (if top_k=5, multiplier=3)
2. Reranking Phaseโ
Query + Each Candidate โ Cross-Encoder โ Relevance Score โ Re-order โ Top 5 results
3. Architecture Comparisonโ
Bi-Encoder (Embedding Model):
- Encodes query and documents separately
- Fast but less accurate for nuanced matching
- Used in initial retrieval
Cross-Encoder (Reranker):
- Processes query-document pairs together
- Slower but more accurate
- Captures fine-grained semantic relationships
Performance Considerationsโ
Speed vs. Accuracy Trade-offsโ
Configuration | Speed | Accuracy | Use Case |
---|---|---|---|
No reranking | Fastest | Good | High-volume, latency-sensitive |
Small model + ONNX | Fast | Better | Balanced performance |
310m model + ONNX | Moderate | Best | Quality-focused RAG |
310m model + PyTorch | Slower | Best | GPU environments |
Optimization Tipsโ
- Batch Size: Larger batches improve throughput but increase latency
- Top-k Multiplier: Higher values improve recall but increase processing time
- ONNX: Always use for CPU deployments (2-4x speedup)
- Model Selection: Use small model for real-time applications
Resource Requirementsโ
Memory Usage (Approximate):
- 310m model: ~1.2GB (ONNX) / ~1.5GB (PyTorch)
- Small model: ~400MB (ONNX) / ~500MB (PyTorch)
Processing Time (per query, 15 candidates):
- 310m + ONNX: ~50-100ms
- 310m + PyTorch: ~150-300ms
- Small + ONNX: ~20-40ms
Benchmarkingโ
Run the reranking benchmark to evaluate performance:
# Run benchmark with test queries
python -m bench.benchmark_reranking \
--db-path oboyu.db \
--test-queries test_queries.json \
--output results.json
# Custom configuration
python -m bench.benchmark_reranking \
--config reranker_config.yaml \
--db-path oboyu.db \
--test-queries queries.json \
--top-k 5 10 20 \
--initial-k 60
Model Cache Managementโ
ONNX models are cached for faster subsequent loads:
~/.cache/oboyu/embedding/cache/models/onnx/
โโโ cl-nagoya_ruri-v3-reranker-310m/
โ โโโ model.onnx
โ โโโ model_optimized.onnx
โ โโโ tokenizer_config.json
โ โโโ onnx_config.json
โโโ cl-nagoya_ruri-reranker-small/
โโโ ...
To clear the cache:
rm -rf ~/.cache/oboyu/embedding/cache/models/onnx/
Troubleshootingโ
Common Issuesโ
-
Slow First Query: Models are loaded lazily on first use. Subsequent queries will be faster.
-
Out of Memory: Reduce batch size or disable reranking temporarily:
reranker_batch_size: 4
# Or disable for this query: oboyu query --query "text" --no-rerank -
ONNX Conversion Fails: Disable ONNX optimization:
reranker_use_onnx: false
Debug Modeโ
Enable debug logging to see reranking details:
import logging
logging.getLogger("oboyu.indexer.reranker").setLevel(logging.DEBUG)
Best Practicesโ
- For RAG Applications: Always enable reranking for context retrieval
- For Japanese Content: Use the default 310m model for best results
- For Mixed Language: The models work well for multilingual content
- Initial Retrieval: Set multiplier based on result diversity needs (3-5x is typical)
- Threshold Setting: Use threshold to filter low-confidence results in critical applications
Future Enhancementsโ
- Support for custom Cross-Encoder models
- GPU acceleration for batch processing
- Async reranking for better concurrency
- Fine-tuning support for domain-specific ranking