Skip to main content

Analyze Reference Data

Auto-classify the columns of an uploaded reference dataset into key, descriptor, and numeric roles to inform a reconciliation config.

Reconciliation matches extracted document values against an uploaded reference dataset. Before you write a config, this endpoint inspects the dataset and classifies every column so you can see which columns are natural lookup keys, which are human-readable descriptors, and which are numeric values worth checking with a tolerance.

The classification is keyed lookup, not probabilistic record linkage. A key column is a stable identifier (an invoice number, a SKU, a booking reference) that can anchor a document to one reference row. A descriptor column is free text such as a name or address. A numeric column holds amounts or quantities that a numeric tolerance check can compare against.

Use this read-only endpoint as the first step of a reconciliation flow. Inspect the suggested roles, then either let the server derive a full config with POST /v1/reconciliation/auto-configure or write the config yourself with PUT /v1/reconciliation/config/{referenceDataId}. The dataset must belong to your tenant and must be fully imported.

GET/v1/reconciliation/analyze/{referenceDataId}

Response

Response fields

reference_data.idstringReference dataset UUID.
reference_data.namestringHuman-readable dataset name.
reference_data.row_countintegerNumber of rows in the dataset.
classification.key_columnsstring[]Columns suitable as anchor lookup keys.
classification.descriptor_columnsstring[]Free-text columns such as names and addresses.
classification.numeric_columnsstring[]Columns holding amounts or quantities.

Response

{
  "reference_data": {
    "id": "rd_a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "name": "Carrier Master 2024",
    "row_count": 1842
  },
  "classification": {
    "key_columns": ["booking_ref", "container_no"],
    "descriptor_columns": ["carrier_name", "consignee"],
    "numeric_columns": ["freight_amount", "weight_kg"]
  }
}
Classification is a suggestion, not a contract. You can override any role when you save a config: a numeric column can still be used as a narrowing column, and a descriptor can drive a fuzzy_name check.

Errors

Error responses

400validation_errorInvalid reference data ID format. Must be a UUID.
401unauthorizedMissing or invalid API key.
404not_foundNo reference dataset with this ID exists for your organization.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.