Get Field & Similar Fields
Retrieve a canonical field by ID with its 20 most recent occurrences, or find semantically similar fields by embedding cosine similarity with top-10 ranked matches.
Retrieve the full details of a canonical field, including its metadata and the 20 most recent extraction occurrences. Each occurrence shows which document the field was extracted from, the raw field name as it appeared in that document, the extracted value, and when the extraction happened.
Occurrences give you a window into how a field is actually used across your document corpus. They reveal naming variations (e.g. a field canonicalized as invoice_number might appear as Inv. No., Invoice #, or Rechnungsnummer in different documents), value distributions, and extraction recency.
The similar fields endpoint uses embedding cosine similarity to find the 10 most semantically related fields in your registry. Each match includes a similarity score between 0 and 1, where 1.0 is an exact semantic match. This is useful for discovering near-duplicate fields that could be merged, or for finding related fields when building new schemas.
Pair these endpoints together for field audit workflows: fetch a field to understand its current usage, then check similar fields to identify potential duplicates or related concepts that should be grouped in the same schema.
/v1/fields/:idGet Field Response
Response fields
GET /v1/fields/:id Response
{
"id": "f1a2b3c4-d5e6-7890-abcd-ef1234567890",
"canonical_name": "invoice_number",
"display_name": "Invoice Number",
"data_type": "string",
"tier": 1,
"cluster_name": "identifiers",
"occurrence_count": 1847,
"master_instruction": "Extract the unique invoice identifier, typically alphanumeric, found in the header area of the document.",
"occurrences": [
{
"document_id": "aa11bb22-cc33-dd44-ee55-ff6677889900",
"document_filename": "invoice_2024_march.pdf",
"raw_field_name": "Invoice No.",
"value": "INV-2024-00847",
"created_at": "2024-12-01T08:30:00.000Z"
},
{
"document_id": "bb22cc33-dd44-ee55-ff66-778899001122",
"document_filename": "rechnung_q4.pdf",
"raw_field_name": "Rechnungsnummer",
"value": "RE-2024-12345",
"created_at": "2024-11-28T14:15:00.000Z"
},
{
"document_id": "cc33dd44-ee55-ff66-7788-990011223344",
"document_filename": "facture_nov.pdf",
"raw_field_name": "N° de facture",
"value": "FAC-2024-0091",
"created_at": "2024-11-25T11:00:00.000Z"
}
],
"created_at": "2024-06-15T10:00:00.000Z",
"updated_at": "2024-12-01T08:30:00.000Z",
"links": {
"self": "/v1/fields/f1a2b3c4-d5e6-7890-abcd-ef1234567890",
"similar": "/v1/fields/f1a2b3c4-d5e6-7890-abcd-ef1234567890/similar"
}
}Similar Fields
Find the 10 most semantically similar fields to a given field using embedding cosine similarity. Each result includes a similarity score from 0 to 1, where higher values indicate closer semantic meaning. Fields with scores above 0.85 are strong candidates for merging.
/v1/fields/:id/similarSimilar Fields Response
Response fields
GET /v1/fields/:id/similar Response
{
"data": [
{
"id": "d4e5f6a7-b8c9-0123-def0-678901234567",
"canonical_name": "invoice_no",
"display_name": "Invoice No",
"data_type": "string",
"tier": 2,
"similarity": 0.94
},
{
"id": "e5f6a7b8-c9d0-1234-ef01-789012345678",
"canonical_name": "reference_number",
"display_name": "Reference Number",
"data_type": "string",
"tier": 1,
"similarity": 0.78
},
{
"id": "f6a7b8c9-d0e1-2345-f012-890123456789",
"canonical_name": "po_number",
"display_name": "PO Number",
"data_type": "string",
"tier": 1,
"similarity": 0.72
}
]
}Errors
Error responses