Production LLM evaluation harness — 11 deterministic judges, 5 LLM graders, 5 paired auditors. Demonstrates a scalable-oversight pattern: every deterministic finding is re-inspected by an LLM auditor before it counts toward the composite score.
Frozen snapshot of a real run against the State/Local Government IT Professional Services RFP. Walk the seven stages; every visitor sees the same canonical output.
Static snapshot of a real proposal-ops run. Agency, location, and proprietary identifiers redacted. Download the full system from GitHub to run against your own RFP.
Stage 01 · RFP loaded
[Agency] — State/Local Government · 38 pp · 39 parsed requirements · 100 pts total
Want to run this against your own RFP?
Download the system on GitHub →