SpecParse
experimentalStateful REPL with doer-checker structured extraction. Claude drafts, Codex verifies. Session API with pre-injected document access, sub-LLM routing, and provenance tracking. Built for any domain where you need structured data extracted from unstructured documents with high confidence.
Session Model
SpecParse exposes a session API — open,
exec,
read,
close — that maintains state
across extraction passes. When a session opens, the target document is pre-injected into
context. Subsequent exec calls
operate against that context without re-uploading.
# Session lifecycle
session = specparse.open(doc="contract_2024.pdf")
result = session.exec(schema=ContractSchema, mode="extract")
checked = session.exec(schema=ContractSchema, mode="verify")
final = session.read() # merged, reconciled output
session.close() # cleanup, persist provenance Doer-Checker Pattern
Two-pass verification with different models. Claude (doer) performs the initial extraction against a typed schema. Codex (checker) independently verifies the extraction against the source document, flagging discrepancies with confidence scores and source citations.
Extracts structured data from document against typed schema. Optimized for recall — capture everything.
Verifies each extracted field against source. Flags hallucinations, adds confidence scores, cites provenance.
Applications
- ▸ Contract extraction — parties, terms, obligations, deadlines from legal documents
- ▸ Regulatory compliance — requirements, citations, cross-references from regulatory filings
- ▸ RFP parsing — scope, requirements, evaluation criteria from procurement documents
- ▸ Financial doc analysis — line items, footnotes, adjustments from financial statements