Benchmark Results

Get per-field accuracy results for a benchmark run, or compare two benchmark runs side by side to track extraction quality improvements over time.

Retrieve per-document accuracy results for a completed benchmark run, showing which fields matched and which diverged from the ground truth. Each result includes the extracted value, expected value, and whether they matched. Use the compare endpoint to track accuracy improvements across runs.

GET/v1/quality/benchmarks/:id/results

Response (Results)

Response fields

dataarrayArray of per-document result objects.

data[].idstringResult UUID.

data[].document_idstringDocument evaluated.

data[].ground_truth_entry_idstringGround truth entry compared against.

data[].accuracynumberAccuracy score for this document (0–1).

data[].field_resultsobjectPer-field accuracy breakdown.

data[].created_atstringISO 8601 timestamp.

Response

{
  "data": [
    {
      "id": "d4e5f6a7-b8c9-0123-defa-234567890123",
      "document_id": "doc_abc123",
      "ground_truth_entry_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
      "accuracy": 0.93,
      "field_results": {
        "vendor_name": { "correct": true, "extracted": "Acme Corp", "expected": "Acme Corp" },
        "total_amount": { "correct": false, "extracted": "14000.00", "expected": "14250.00" },
        "invoice_number": { "correct": true, "extracted": "INV-2024-0847", "expected": "INV-2024-0847" }
      },
      "created_at": "2024-09-25T12:00:04.200Z"
    }
  ]
}

To track accuracy trends over time, compare two benchmark runs side by side. The accuracy_delta shows the difference in overall accuracy between the two runs.

GET/v1/quality/benchmarks/compare

Response (Compare)

Response fields

run_aobjectFull benchmark object for the first run.

run_bobjectFull benchmark object for the second run.

accuracy_deltanumber | nullDifference in overall accuracy (run_a minus run_b). Null if either run has no accuracy score yet.

Response (Compare)

{
  "run_a": {
    "id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
    "name": "Benchmark 2024-09-25",
    "status": "completed",
    "accuracy_overall": 0.93,
    "documents_total": 50,
    "created_at": "2024-09-25T12:00:00.000Z"
  },
  "run_b": {
    "id": "d4e5f6a7-b8c9-0123-defa-234567890123",
    "name": "Benchmark 2024-10-01",
    "status": "completed",
    "accuracy_overall": 0.96,
    "documents_total": 50,
    "created_at": "2024-10-01T09:00:00.000Z"
  },
  "accuracy_delta": -0.03
}

Errors

Error responses

400bad_requestBoth run_a and run_b query parameters are required for the compare endpoint.

401unauthorizedMissing or invalid API key.

404not_foundOne or both benchmark run IDs not found for your workspace.

429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.

Create Benchmark

List Datasets

Benchmark Results

Response (Results)

Response (Compare)

Errors

Related