Skip to main content

Benchmark Results

Get per-field accuracy results for a benchmark run, or compare two benchmark runs side by side to track extraction quality improvements over time.

GET/v1/quality/benchmarks/:id/results

Response (Results)

Response fields

dataarrayArray of per-document result objects.
data[].idstringResult UUID.
data[].document_idstringDocument evaluated.
data[].ground_truth_entry_idstringGround truth entry compared against.
data[].accuracynumberAccuracy score for this document (0–1).
data[].field_resultsobjectPer-field accuracy breakdown.
data[].created_atstringISO 8601 timestamp.

Response

{
  "data": [
    {
      "id": "d4e5f6a7-b8c9-0123-defa-234567890123",
      "document_id": "doc_abc123",
      "ground_truth_entry_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
      "accuracy": 0.93,
      "field_results": {
        "vendor_name": { "correct": true, "extracted": "Acme Corp", "expected": "Acme Corp" },
        "total_amount": { "correct": false, "extracted": "14000.00", "expected": "14250.00" },
        "invoice_number": { "correct": true, "extracted": "INV-2024-0847", "expected": "INV-2024-0847" }
      },
      "created_at": "2024-09-25T12:00:04.200Z"
    }
  ]
}
GET/v1/quality/benchmarks/compare

Response (Compare)

Response fields

run_aobjectFull benchmark object for the first run.
run_bobjectFull benchmark object for the second run.
accuracy_deltanumber | nullDifference in overall accuracy (run_a minus run_b). Null if either run has no accuracy score yet.

Response (Compare)

{
  "run_a": {
    "id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
    "name": "Benchmark 2024-09-25",
    "status": "completed",
    "accuracy_overall": 0.93,
    "documents_total": 50,
    "created_at": "2024-09-25T12:00:00.000Z"
  },
  "run_b": {
    "id": "d4e5f6a7-b8c9-0123-defa-234567890123",
    "name": "Benchmark 2024-10-01",
    "status": "completed",
    "accuracy_overall": 0.96,
    "documents_total": 50,
    "created_at": "2024-10-01T09:00:00.000Z"
  },
  "accuracy_delta": -0.03
}

Errors

Error responses

400bad_requestBoth run_a and run_b query parameters are required for the compare endpoint.
401unauthorizedMissing or invalid API key.
404not_foundOne or both benchmark run IDs not found for your workspace.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.