Create Benchmark
Start a benchmark run that compares a job run output against a ground truth dataset. Produces per-field accuracy scores and overall metrics.
POST
/v1/quality/benchmarksResponse
Response fields (201 Created)
idstringBenchmark run UUID.
namestringBenchmark run name.
dataset_idstringGround truth dataset ID.
user_schema_idstring | nullUser schema ID, if scoped.
statusstringInitial status: queued.
accuracy_overallnullNull until the run completes.
documents_processedintegerAlways 0 at creation.
documents_totalintegerTotal entries in the dataset to evaluate.
created_atstringISO 8601 creation timestamp.
completed_atnullNull until the run completes.
links.selfstringURL to this benchmark run.
links.resultsstringURL to the per-document results.
Response (201 Created)
{
"id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
"name": "Benchmark 2024-09-25",
"dataset_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"user_schema_id": null,
"status": "queued",
"accuracy_overall": null,
"accuracy_by_field": null,
"documents_processed": 0,
"documents_total": 50,
"duration_ms": null,
"accuracy_delta": null,
"compared_to_run_id": null,
"created_at": "2024-09-25T12:00:00.000Z",
"completed_at": null,
"links": {
"self": "/v1/quality/benchmarks/c3d4e5f6-a7b8-9012-cdef-123456789012",
"results": "/v1/quality/benchmarks/c3d4e5f6-a7b8-9012-cdef-123456789012/results"
}
}Errors
Error responses
400validation_errorMissing required field: dataset_id.
401unauthorizedMissing or invalid API key.
404not_foundThe specified dataset_id does not exist for your workspace.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.