Create Ground Truth Dataset
Create a new ground truth dataset linked to a schema. The dataset defines the expected extraction output used for accuracy benchmarking.
Create an empty ground truth dataset that you can populate with verified entries. Datasets serve as the baseline for benchmark runs that measure extraction accuracy. After creating a dataset, add entries individually or import them in bulk via CSV.
The typical workflow is: create the dataset, then populate it using POST /v1/quality/ground-truth/:id/entries for individual entries or POST /v1/quality/ground-truth/:id/entries/import-csv for bulk import. Once populated, create a benchmark run with POST /v1/quality/benchmarks.
The response returns the dataset with document_count: 0 since it is initially empty. The user_schema_id is null unless you associate it with a schema. The links.self URL points to the detail endpoint where you can retrieve entries or delete the dataset.
For best results, aim for at least 30-50 entries per dataset. Linking a dataset to a user_schema_id ensures ground truth field names align with your extraction schema, producing more meaningful benchmark comparisons.
expected_data entries should match the field names used in your extraction schema. Unmatched fields are stored but ignored during benchmark comparison./v1/quality/ground-truthResponse
Response fields (201 Created)
Response (201 Created)
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Invoice Accuracy Set",
"description": null,
"user_schema_id": null,
"document_count": 0,
"created_at": "2024-09-01T10:00:00.000Z",
"links": {
"self": "/v1/quality/ground-truth/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
}Errors
Error responses