Skip to main content

Create Benchmark

Start a benchmark run that compares a job run output against a ground truth dataset. Produces per-field accuracy scores and overall metrics.

POST/v1/quality/benchmarks

Response

Response fields (201 Created)

idstringBenchmark run UUID.
namestringBenchmark run name.
dataset_idstringGround truth dataset ID.
user_schema_idstring | nullUser schema ID, if scoped.
statusstringInitial status: queued.
accuracy_overallnullNull until the run completes.
documents_processedintegerAlways 0 at creation.
documents_totalintegerTotal entries in the dataset to evaluate.
created_atstringISO 8601 creation timestamp.
completed_atnullNull until the run completes.
links.selfstringURL to this benchmark run.
links.resultsstringURL to the per-document results.

Response (201 Created)

{
  "id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
  "name": "Benchmark 2024-09-25",
  "dataset_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "user_schema_id": null,
  "status": "queued",
  "accuracy_overall": null,
  "accuracy_by_field": null,
  "documents_processed": 0,
  "documents_total": 50,
  "duration_ms": null,
  "accuracy_delta": null,
  "compared_to_run_id": null,
  "created_at": "2024-09-25T12:00:00.000Z",
  "completed_at": null,
  "links": {
    "self": "/v1/quality/benchmarks/c3d4e5f6-a7b8-9012-cdef-123456789012",
    "results": "/v1/quality/benchmarks/c3d4e5f6-a7b8-9012-cdef-123456789012/results"
  }
}

Errors

Error responses

400validation_errorMissing required field: dataset_id.
401unauthorizedMissing or invalid API key.
404not_foundThe specified dataset_id does not exist for your workspace.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.