Get / Delete Dataset
Retrieve a ground truth dataset by ID with metadata and entry count, or delete it permanently. Deleting a dataset does not remove associated benchmark results.
Retrieve a dataset with its metadata and sample entries, or delete it permanently. The GET response includes a samples array with the actual ground truth entries, allowing you to inspect the expected values for each document.
Use GET to inspect the dataset contents before running a benchmark. The samples array contains all ground truth entries with their document_id, expected_data (key-value map of verified field values), and optional notes. This lets you verify the dataset is correctly populated.
The document_count field shows how many entries exist. For large datasets, the samples array may produce a sizable response. The user_schema_id indicates whether the dataset is scoped to a specific extraction schema, which improves benchmark accuracy by ensuring field name alignment.
Use DELETE when a dataset is outdated or no longer needed. Benchmark results that referenced this dataset are preserved for historical tracking — the benchmark retains the dataset_id even after the dataset itself is removed. Create a new dataset with updated entries rather than modifying existing ones.
/v1/quality/ground-truth/:idResponse (GET)
Response fields
Response
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Invoice Accuracy Set",
"description": "Manually verified invoices for Q3 2024",
"user_schema_id": null,
"document_count": 50,
"created_at": "2024-09-01T10:00:00.000Z",
"links": {
"self": "/v1/quality/ground-truth/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
},
"samples": [
{
"id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
"document_id": "doc_abc123",
"expected_data": {
"vendor_name": "Acme Corp",
"total_amount": 14250.00,
"invoice_number": "INV-2024-0847"
},
"notes": null,
"created_at": "2024-09-05T12:00:00.000Z"
}
]
}/v1/quality/ground-truth/:idResponse (DELETE)
Response fields
Response (DELETE)
{ "deleted": true }Errors
Error responses