Benchmark Results
Get per-field accuracy results for a benchmark run, or compare two benchmark runs side by side to track extraction quality improvements over time.
GET
/v1/quality/benchmarks/:id/resultsResponse (Results)
Response fields
dataarrayArray of per-document result objects.
data[].idstringResult UUID.
data[].document_idstringDocument evaluated.
data[].ground_truth_entry_idstringGround truth entry compared against.
data[].accuracynumberAccuracy score for this document (0–1).
data[].field_resultsobjectPer-field accuracy breakdown.
data[].created_atstringISO 8601 timestamp.
Response
{
"data": [
{
"id": "d4e5f6a7-b8c9-0123-defa-234567890123",
"document_id": "doc_abc123",
"ground_truth_entry_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
"accuracy": 0.93,
"field_results": {
"vendor_name": { "correct": true, "extracted": "Acme Corp", "expected": "Acme Corp" },
"total_amount": { "correct": false, "extracted": "14000.00", "expected": "14250.00" },
"invoice_number": { "correct": true, "extracted": "INV-2024-0847", "expected": "INV-2024-0847" }
},
"created_at": "2024-09-25T12:00:04.200Z"
}
]
}GET
/v1/quality/benchmarks/compareResponse (Compare)
Response fields
run_aobjectFull benchmark object for the first run.
run_bobjectFull benchmark object for the second run.
accuracy_deltanumber | nullDifference in overall accuracy (run_a minus run_b). Null if either run has no accuracy score yet.
Response (Compare)
{
"run_a": {
"id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
"name": "Benchmark 2024-09-25",
"status": "completed",
"accuracy_overall": 0.93,
"documents_total": 50,
"created_at": "2024-09-25T12:00:00.000Z"
},
"run_b": {
"id": "d4e5f6a7-b8c9-0123-defa-234567890123",
"name": "Benchmark 2024-10-01",
"status": "completed",
"accuracy_overall": 0.96,
"documents_total": 50,
"created_at": "2024-10-01T09:00:00.000Z"
},
"accuracy_delta": -0.03
}Errors
Error responses
400bad_requestBoth run_a and run_b query parameters are required for the compare endpoint.
401unauthorizedMissing or invalid API key.
404not_foundOne or both benchmark run IDs not found for your workspace.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.