/ projects / sieve

Sieve

stable

Agent-orchestrated hybrid retrieval. Not a black-box RAG pipeline — the agent plans ingestion strategy, chooses search mode, iterates with HyDE retries and query refinement. 31 MCP tools, zero incremental cost (all local inference), multi-collection, provenance-rich.

MCP Qdrant Ollama SQLite FTS5 HuggingFace Python

The Differentiator

Most RAG systems treat retrieval as a black box: embed → search → return top-k. Sieve inverts this. The agent thinks about how to search, not just what to search for.

01 Agent analyzes the query and plans retrieval strategy (semantic, keyword, hybrid, or multi-pass)

02 Chooses embedding model, collection scope, and filter predicates based on query intent

03 Executes search, evaluates result quality, and decides whether to refine with HyDE or alternate queries

04 Cross-encoder reranking for final precision. Full provenance: plan_hash, model_version, section paths

Architecture

31 MCP tools across ingestion, search, collection management, and provenance. All inference runs locally via Ollama (Snowflake Arctic Embed for embeddings) — zero incremental cost after hardware. SQLite FTS5 handles keyword search; Qdrant handles vector search; HuggingFace cross-encoder handles reranking.

# Tool categories (31 total)
ingestion/     ingest_document, chunk_document, embed_chunks, ...
search/        semantic_search, keyword_search, hybrid_search, hyde_search, ...
collections/   create_collection, list_collections, collection_stats, ...
provenance/    get_plan_hash, get_source_metadata, trace_chunk_origin, ...

Key Properties

Zero incremental cost

All embedding and reranking runs locally. Only the orchestrating LLM has API cost.

Multi-collection

Separate collections per domain, corpus, or temporal scope. Agent selects based on query.

Provenance-rich

Every result traces back to plan_hash, model_version, bounding boxes, and section paths.

Client-orchestrated

The agent decides retrieval strategy — not a fixed pipeline with knobs to turn.