Skip to main content

List Ground Truth Datasets

List all ground truth datasets used for benchmarking extraction accuracy. Each dataset contains manually verified entries that serve as the gold standard.

Ground truth datasets contain manually verified data entries that serve as the gold standard for measuring extraction accuracy. Create datasets, add entries, then run benchmarks against extraction results.

Use this endpoint to see all available datasets before creating a benchmark run. A typical workflow is to list datasets, select the one covering the document type you want to evaluate, then pass its id to POST /v1/quality/benchmarks to start a run.

Each dataset includes a name, optional description, user_schema_id (if scoped to a schema), document_count (number of verified entries), and a links.self URL for the detail endpoint. Datasets are returned in descending creation order with cursor-based pagination.

Create separate datasets for different document types or schema versions to track accuracy independently. Pair with the benchmark endpoints to measure extraction quality over time — run benchmarks after schema changes or pipeline updates to detect regressions.

GET/v1/quality/ground-truth

Response

Response fields

dataarrayArray of ground truth dataset objects.
data[].idstringDataset UUID.
data[].namestringDataset name.
data[].descriptionstring | nullOptional description.
data[].user_schema_idstring | nullAssociated user schema ID, if any.
data[].document_countintegerNumber of documents (entries) in the dataset.
data[].created_atstringISO 8601 creation timestamp.
data[].links.selfstringURL to this dataset.
pagination.totalintegerTotal number of datasets.
pagination.limitintegerMaximum results per page.
pagination.has_morebooleanWhether more results exist beyond this page.
pagination.next_cursorstring | nullCursor to fetch the next page.

Response

{
  "data": [
    {
      "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "name": "Invoice Accuracy Set",
      "description": "Manually verified invoices for Q3 2024",
      "user_schema_id": null,
      "document_count": 50,
      "created_at": "2024-09-01T10:00:00.000Z",
      "links": {
        "self": "/v1/quality/ground-truth/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
      }
    }
  ],
  "pagination": {
    "total": 3,
    "limit": 20,
    "has_more": false,
    "next_cursor": null
  }
}

Errors

Error responses

401unauthorizedMissing or invalid API key.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.