Ground Truth
Manually-created reference datasets with known-correct values. Create from Validation → Golden Samples. Benchmark runs compare extraction results against golden samples for per-field accuracy scoring with AI judge verdicts.
Manually-created reference datasets with known-correct values. Create from Validation → Golden Samples. Benchmark runs compare extraction results against golden samples for per-field accuracy scoring with AI judge verdicts.