Skip to main content

Get Document Fields

Return a document's captured fields bound into the field registry — one row per occurrence with its canonical concept, tier, value, and confidence.

Returns the fields captured from a document and bound into the field registry. Where the extraction endpoints give you the raw field-name/value pairs a document produced, this endpoint gives you the registry-resolved view: each captured value is reported alongside the canonical concept it bound to, that concept's tier, and the confidence of the binding. It is the structured result most integrations actually want per document.

Each row carries the occurrence value and confidence, the resolved registry field (its id, canonical name, display name, and data type), the registry tier (1 = canonical, 2 = provisional, 3 = candidate), and the semantic cluster the field belongs to when one is assigned. Rows are ordered by tier, so the most established concepts come first.

A linked duplicate transparently returns the canonical document's fields. When a re-ingested file is byte-identical to one already in your workspace, it becomes a thin link to the canonical, and reads dereference to it — so you always see the captured data without re-extraction.
GET/v1/documents/{id}/fields

Response

Response fields

document_idstringThe document UUID the fields belong to.
data[].field_idstringRegistry field UUID the value bound to.
data[].canonical_namestringCanonical name of the registry concept.
data[].display_namestring | nullHuman-readable display name, when set.
data[].cluster_namestring | nullSemantic cluster the field belongs to, when assigned.
data[].data_typestring | nullDeclared data type of the concept.
data[].tierintegerRegistry tier: 1 canonical, 2 provisional, 3 candidate.
data[].valueanyThe captured value for this occurrence.
data[].confidencenumber | nullConfidence of the binding (0–1).

curl

Response

{
  "document_id": "7f3a1b2c-0000-0000-0000-000000000000",
  "data": [
    {
      "field_id": "b2c3d4e5-0000-0000-0000-000000000000",
      "canonical_name": "invoice_total",
      "display_name": "Invoice Total",
      "cluster_name": "amounts",
      "data_type": "number",
      "tier": 1,
      "value": "490.00",
      "confidence": 0.97
    }
  ]
}