List Ground Truth Datasets
List all ground truth datasets used for benchmarking extraction accuracy. Each dataset contains manually verified entries that serve as the gold standard.
Ground truth datasets contain manually verified data entries that serve as the gold standard for measuring extraction accuracy. Create datasets, add entries, then run benchmarks against extraction results.
Use this endpoint to see all available datasets before creating a benchmark run. A typical workflow is to list datasets, select the one covering the document type you want to evaluate, then pass its id to POST /v1/quality/benchmarks to start a run.
Each dataset includes a name, optional description, user_schema_id (if scoped to a schema), document_count (number of verified entries), and a links.self URL for the detail endpoint. Datasets are returned in descending creation order with cursor-based pagination.
Create separate datasets for different document types or schema versions to track accuracy independently. Pair with the benchmark endpoints to measure extraction quality over time — run benchmarks after schema changes or pipeline updates to detect regressions.
/v1/quality/ground-truthResponse
Response fields
Response
{
"data": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Invoice Accuracy Set",
"description": "Manually verified invoices for Q3 2024",
"user_schema_id": null,
"document_count": 50,
"created_at": "2024-09-01T10:00:00.000Z",
"links": {
"self": "/v1/quality/ground-truth/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
}
],
"pagination": {
"total": 3,
"limit": 20,
"has_more": false,
"next_cursor": null
}
}Errors
Error responses