Skip to main content

Get / Delete Dataset

Retrieve a ground truth dataset by ID with metadata and entry count, or delete it permanently. Deleting a dataset does not remove associated benchmark results.

Retrieve a dataset with its metadata and sample entries, or delete it permanently. The GET response includes a samples array with the actual ground truth entries, allowing you to inspect the expected values for each document.

Use GET to inspect the dataset contents before running a benchmark. The samples array contains all ground truth entries with their document_id, expected_data (key-value map of verified field values), and optional notes. This lets you verify the dataset is correctly populated.

The document_count field shows how many entries exist. For large datasets, the samples array may produce a sizable response. The user_schema_id indicates whether the dataset is scoped to a specific extraction schema, which improves benchmark accuracy by ensuring field name alignment.

Use DELETE when a dataset is outdated or no longer needed. Benchmark results that referenced this dataset are preserved for historical tracking — the benchmark retains the dataset_id even after the dataset itself is removed. Create a new dataset with updated entries rather than modifying existing ones.

Deleting a dataset is permanent. However, benchmark results that used this dataset are retained for historical reference. The benchmark will show the dataset_id but the dataset itself will no longer be retrievable.
GET/v1/quality/ground-truth/:id

Response (GET)

Response fields

idstringDataset UUID.
namestringDataset name.
descriptionstring | nullOptional description.
user_schema_idstring | nullAssociated user schema ID, if any.
document_countintegerNumber of entries in the dataset.
created_atstringISO 8601 creation timestamp.
links.selfstringURL to this dataset.
samplesarrayArray of ground truth entry objects (id, document_id, expected_data, notes, created_at).

Response

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "name": "Invoice Accuracy Set",
  "description": "Manually verified invoices for Q3 2024",
  "user_schema_id": null,
  "document_count": 50,
  "created_at": "2024-09-01T10:00:00.000Z",
  "links": {
    "self": "/v1/quality/ground-truth/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  },
  "samples": [
    {
      "id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
      "document_id": "doc_abc123",
      "expected_data": {
        "vendor_name": "Acme Corp",
        "total_amount": 14250.00,
        "invoice_number": "INV-2024-0847"
      },
      "notes": null,
      "created_at": "2024-09-05T12:00:00.000Z"
    }
  ]
}
DELETE/v1/quality/ground-truth/:id

Response (DELETE)

Response fields

deletedbooleanAlways true on success.

Response (DELETE)

{ "deleted": true }

Errors

Error responses

401unauthorizedMissing or invalid API key.
404not_foundNo dataset with this ID exists for your workspace.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.