Reference Data
Upload CSV or Excel files as lookup tables for the matching engine and schema reference strategies. These reference datasets represent your "ground truth" — the known records you want to match extracted document data against. Each reference dataset is versioned independently and can be shared across multiple schemas and matching configurations without duplication.
Reference data is the foundation of the matching system. Common examples include customer lists, product catalogs, vendor registries, contract databases, and supplier directories. When you upload a reference dataset, the platform indexes all columns and rows for fast lookup during matching runs. You can also import reference data directly from a SQL database connection using POST /matching/reference-data/from-sql, which streams rows asynchronously in batches of 500 from your connected MSSQL or PostgreSQL database.
When you upload a reference dataset, the platform indexes all columns and rows for fast lookup during matching runs. Each dataset is versioned independently, so you can update your reference data without affecting in-progress matching configurations. A single dataset can be shared across multiple schemas and matching configurations.
For best results, ensure your reference data is clean and deduplicated before uploading. Include all columns that you plan to match against — such as names, identifiers, dates, and amounts. Most teams refresh their reference data periodically by re-uploading from their source system or by using the SQL import option to pull directly from a connected database.
curl -X POST https://api.talonic.com/v1/matching/reference-data \
-H "Authorization: Bearer $TALONIC_API_KEY" \
-F "file=@vendor_registry.csv"
# Response:
# {
# "id": "ref_vendor_001",
# "name": "vendor_registry",
# "status": "ready",
# "row_count": 2450,
# "columns": ["vendor_id", "vendor_name", "country", "tax_id"],
# "created_at": "2025-04-22T10:00:00Z"
# }curl -X POST https://api.talonic.com/v1/matching/reference-data/from-sql \
-H "Authorization: Bearer $TALONIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"connection_id": "src_sql_001",
"kind": "table",
"table_name": "vendors",
"schema_name": "public"
}'
# Response (async — poll status until ready):
# {
# "id": "ref_vendor_002",
# "status": "importing",
# "source_meta": {
# "connection_id": "src_sql_001",
# "table_name": "vendors"
# }
# }- CSV and Excel (XLSX) file uploads for quick one-time imports.
- SQL database imports for live reference data from connected sources.
- Versioning — each dataset tracks versions independently.
- Cross-schema sharing — one dataset can be referenced by multiple schemas and matching configurations.