Sieve
stableAgent-orchestrated hybrid retrieval. Not a black-box RAG pipeline — the agent plans ingestion strategy, chooses search mode, iterates with HyDE retries and query refinement. 31 MCP tools, zero incremental cost (all local inference), multi-collection, provenance-rich.
The Differentiator
Most RAG systems treat retrieval as a black box: embed → search → return top-k. Sieve inverts this. The agent thinks about how to search, not just what to search for.
Architecture
31 MCP tools across ingestion, search, collection management, and provenance. All inference runs locally via Ollama (Snowflake Arctic Embed for embeddings) — zero incremental cost after hardware. SQLite FTS5 handles keyword search; Qdrant handles vector search; HuggingFace cross-encoder handles reranking.
# Tool categories (31 total)
ingestion/ ingest_document, chunk_document, embed_chunks, ...
search/ semantic_search, keyword_search, hybrid_search, hyde_search, ...
collections/ create_collection, list_collections, collection_stats, ...
provenance/ get_plan_hash, get_source_metadata, trace_chunk_origin, ... Key Properties
Zero incremental cost
All embedding and reranking runs locally. Only the orchestrating LLM has API cost.
Multi-collection
Separate collections per domain, corpus, or temporal scope. Agent selects based on query.
Provenance-rich
Every result traces back to plan_hash, model_version, bounding boxes, and section paths.
Client-orchestrated
The agent decides retrieval strategy — not a fixed pipeline with knobs to turn.