Matching Configurations
Define field-to-field comparisons between extracted data and reference datasets. Each comparison uses a weighted strategy to score matches. A matching configuration specifies which fields to compare, which strategy to use for each comparison, and the relative weight that determines how much each field contributes to the overall confidence score:
Matching strategies
| Parameter | Type | Description |
|---|---|---|
| exact | strategy | Case-insensitive exact string match. Weight determines contribution to overall score. |
| fuzzy | strategy | Token-based fuzzy matching with configurable similarity threshold. |
| date_range | strategy | Matches dates within a configurable tolerance window (e.g. +/- 7 days). |
| numeric_range | strategy | Matches numbers within a configurable percentage or absolute tolerance. |
You can also use AI strategy generation to let the platform suggest field mappings and strategies automatically based on the schema and reference data structure.
Each field comparison carries a weight that determines how much it contributes to the overall confidence score. Set high weights on fields that are strong identifiers (like reference numbers or unique IDs) and lower weights on fields that are common or prone to variation (like names or descriptions). The weighted aggregate produces a final score between 0% and 100%.
Most teams start with AI strategy generation and then fine-tune weights based on initial results. A common pattern is to set a high weight on a unique identifier field (like a PO number) with exact strategy, combined with lower-weighted fuzzy matches on name and description fields as supporting evidence. Review the first batch of results to calibrate thresholds before running at scale.
curl -X POST https://api.talonic.com/v1/matching/configs \
-H "Authorization: Bearer $TALONIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Invoice to PO Matching",
"reference_data_id": "ref_vendor_001",
"field_mappings": [
{ "source": "vendor_name", "target": "vendor_name", "strategy": "fuzzy", "weight": 0.4 },
{ "source": "po_number", "target": "po_number", "strategy": "exact", "weight": 0.35 },
{ "source": "total_amount", "target": "amount", "strategy": "numeric_range", "weight": 0.15, "tolerance": 0.02 },
{ "source": "invoice_date", "target": "po_date", "strategy": "date_range", "weight": 0.1, "tolerance_days": 7 }
]
}'curl -X POST https://api.talonic.com/v1/matching/strategies/generate \
-H "Authorization: Bearer $TALONIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"schema_id": "us_def456",
"reference_data_id": "ref_vendor_001"
}'
# Response:
# {
# "id": "strat_001",
# "suggested_mappings": [
# { "source": "vendor_name", "target": "vendor_name", "strategy": "fuzzy", "weight": 0.4, "confidence": 0.95 },
# { "source": "invoice_number", "target": "ref_number", "strategy": "exact", "weight": 0.35, "confidence": 0.88 }
# ]
# }- exact — case-insensitive string comparison. Best for unique identifiers like PO numbers, invoice IDs, and reference codes where values should match verbatim.
- fuzzy — token-based similarity with a configurable threshold (0-100%). Handles misspellings, abbreviations, and word reordering. Ideal for company names, addresses, and descriptions.
- date_range — matches dates within a configurable tolerance window (e.g., +/- 7 days). Useful when documents report dates with slight offsets, such as invoice date vs. received date.
- numeric_range — matches numbers within a percentage or absolute tolerance. Handles rounding differences in amounts, quantities, and prices across systems.