Skip to main content

Get Field & Similar Fields

Retrieve a canonical field by ID with its 20 most recent occurrences, or find semantically similar fields by embedding cosine similarity with top-10 ranked matches.

Retrieve the full details of a canonical field, including its metadata and the 20 most recent extraction occurrences. Each occurrence shows which document the field was extracted from, the raw field name as it appeared in that document, the extracted value, and when the extraction happened.

Occurrences give you a window into how a field is actually used across your document corpus. They reveal naming variations (e.g. a field canonicalized as invoice_number might appear as Inv. No., Invoice #, or Rechnungsnummer in different documents), value distributions, and extraction recency.

The similar fields endpoint uses embedding cosine similarity to find the 10 most semantically related fields in your registry. Each match includes a similarity score between 0 and 1, where 1.0 is an exact semantic match. This is useful for discovering near-duplicate fields that could be merged, or for finding related fields when building new schemas.

Pair these endpoints together for field audit workflows: fetch a field to understand its current usage, then check similar fields to identify potential duplicates or related concepts that should be grouped in the same schema.

GET/v1/fields/:id

Get Field Response

Response fields

idstringField UUID.
canonical_namestringNormalized canonical field name.
display_namestringHuman-readable display name.
data_typestringInferred data type.
tierintegerTier level: 1 (core), 2 (established), 3 (emerging).
cluster_namestringSemantic cluster name.
occurrence_countintegerTotal occurrences across all documents.
master_instructionstring | nullSynthesized extraction instruction.
occurrencesarrayThe 20 most recent extraction occurrences.
occurrences[].document_idstringUUID of the document this occurrence came from.
occurrences[].document_filenamestringOriginal filename of the source document.
occurrences[].raw_field_namestringThe field name as it appeared in the raw document before canonicalization.
occurrences[].valuestringThe extracted value for this field in this document.
occurrences[].created_atstringISO 8601 timestamp of when this occurrence was recorded.
created_atstringISO 8601 field creation timestamp.
updated_atstringISO 8601 last update timestamp.
linksobjectRelated resource URLs (self, similar).

GET /v1/fields/:id Response

{
  "id": "f1a2b3c4-d5e6-7890-abcd-ef1234567890",
  "canonical_name": "invoice_number",
  "display_name": "Invoice Number",
  "data_type": "string",
  "tier": 1,
  "cluster_name": "identifiers",
  "occurrence_count": 1847,
  "master_instruction": "Extract the unique invoice identifier, typically alphanumeric, found in the header area of the document.",
  "occurrences": [
    {
      "document_id": "aa11bb22-cc33-dd44-ee55-ff6677889900",
      "document_filename": "invoice_2024_march.pdf",
      "raw_field_name": "Invoice No.",
      "value": "INV-2024-00847",
      "created_at": "2024-12-01T08:30:00.000Z"
    },
    {
      "document_id": "bb22cc33-dd44-ee55-ff66-778899001122",
      "document_filename": "rechnung_q4.pdf",
      "raw_field_name": "Rechnungsnummer",
      "value": "RE-2024-12345",
      "created_at": "2024-11-28T14:15:00.000Z"
    },
    {
      "document_id": "cc33dd44-ee55-ff66-7788-990011223344",
      "document_filename": "facture_nov.pdf",
      "raw_field_name": "N° de facture",
      "value": "FAC-2024-0091",
      "created_at": "2024-11-25T11:00:00.000Z"
    }
  ],
  "created_at": "2024-06-15T10:00:00.000Z",
  "updated_at": "2024-12-01T08:30:00.000Z",
  "links": {
    "self": "/v1/fields/f1a2b3c4-d5e6-7890-abcd-ef1234567890",
    "similar": "/v1/fields/f1a2b3c4-d5e6-7890-abcd-ef1234567890/similar"
  }
}

Similar Fields

Find the 10 most semantically similar fields to a given field using embedding cosine similarity. Each result includes a similarity score from 0 to 1, where higher values indicate closer semantic meaning. Fields with scores above 0.85 are strong candidates for merging.

Similarity scores above 0.85 often indicate near-duplicate fields that should be merged in the platform UI. Review the occurrences of both fields before merging to ensure they truly represent the same concept.
GET/v1/fields/:id/similar

Similar Fields Response

Response fields

dataarrayArray of similar field objects, sorted by similarity score descending.
data[].idstringField UUID.
data[].canonical_namestringCanonical field name.
data[].display_namestringHuman-readable display name.
data[].data_typestringInferred data type.
data[].tierintegerTier level.
data[].similaritynumberCosine similarity score between 0 and 1.

GET /v1/fields/:id/similar Response

{
  "data": [
    {
      "id": "d4e5f6a7-b8c9-0123-def0-678901234567",
      "canonical_name": "invoice_no",
      "display_name": "Invoice No",
      "data_type": "string",
      "tier": 2,
      "similarity": 0.94
    },
    {
      "id": "e5f6a7b8-c9d0-1234-ef01-789012345678",
      "canonical_name": "reference_number",
      "display_name": "Reference Number",
      "data_type": "string",
      "tier": 1,
      "similarity": 0.78
    },
    {
      "id": "f6a7b8c9-d0e1-2345-f012-890123456789",
      "canonical_name": "po_number",
      "display_name": "PO Number",
      "data_type": "string",
      "tier": 1,
      "similarity": 0.72
    }
  ]
}

Errors

Error responses

400validation_errorInvalid field ID format. Must be a valid UUID.
401unauthorizedMissing or invalid API key.
404not_foundNo field with this ID exists for your organization.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.