visitor@hershkshetry:~$
/ projects / specparse

SpecParse

experimental

Stateful REPL with doer-checker structured extraction. Claude drafts, Codex verifies. Session API with pre-injected document access, sub-LLM routing, and provenance tracking. Built for any domain where you need structured data extracted from unstructured documents with high confidence.

Claude Codex MCP Python FastAPI

Session Model

SpecParse exposes a session API — open, exec, read, close — that maintains state across extraction passes. When a session opens, the target document is pre-injected into context. Subsequent exec calls operate against that context without re-uploading.

# Session lifecycle
session = specparse.open(doc="contract_2024.pdf")
result  = session.exec(schema=ContractSchema, mode="extract")
checked = session.exec(schema=ContractSchema, mode="verify")
final   = session.read()  # merged, reconciled output
session.close()           # cleanup, persist provenance

Doer-Checker Pattern

Two-pass verification with different models. Claude (doer) performs the initial extraction against a typed schema. Codex (checker) independently verifies the extraction against the source document, flagging discrepancies with confidence scores and source citations.

DOER Claude

Extracts structured data from document against typed schema. Optimized for recall — capture everything.

CHECKER Codex

Verifies each extracted field against source. Flags hallucinations, adds confidence scores, cites provenance.

Applications

  • Contract extraction — parties, terms, obligations, deadlines from legal documents
  • Regulatory compliance — requirements, citations, cross-references from regulatory filings
  • RFP parsing — scope, requirements, evaluation criteria from procurement documents
  • Financial doc analysis — line items, footnotes, adjustments from financial statements