Skip to main content

Field Resolution

When a document is processed, each extracted field is resolved against the registry using a three-band matching model. The bands determine whether a match is accepted automatically, flagged for confirmation, or treated as a new field. Resolution is the core mechanism that turns raw, document-specific field names into canonical registry entries — building a unified knowledge graph across all your documents.

ParameterTypeDescription
Auto band≥ 0.80 similarityHigh-confidence match. The field is linked to the existing registry entry and its occurrence count is incremented.
Confirm band0.50 – 0.79Candidate match. The field is linked but flagged for manual review in the reconciliation queue.
New band< 0.50No match found. A new Tier 3 field and cluster are created in the registry.

How Resolution Works

Resolution follows a strict three-band order that is never skipped. First, the system checks for an exact name match against existing registry entries. If no exact match is found, it checks for a cluster member match — whether the field name matches any synonym in an existing semantic cluster. Finally, it computes semantic embedding similarity using AI embeddings to find conceptually similar fields. This graduated approach prioritizes fast, deterministic matches before falling back to more expensive similarity comparisons.

Resolution runs concurrently across documents. Each document's fields are resolved in an isolated transaction to prevent lock contention. Occurrence counts are updated atomically in the same SQL transaction using upserts with deadlock retry logic. This keeps the registry eventually consistent without blocking concurrent ingestion, even when hundreds of documents are being processed simultaneously.

After resolution completes, the platform evaluates tier promotions and regenerates affected schemas in a fixed chain: resolve, then promote, then regenerate. This chain ensures that newly promoted fields immediately appear in auto-generated schemas. The resolution process also feeds into the job pipeline — during Phase 1 of a job run, the system uses a 3-tier lookup cascade (string normalization, token fuzzy matching, then AI fallback) to fill 60-80% of cells without a full LLM call, dramatically reducing cost.

Pending confirmations from the confirm band appear in Resolution &rarr; Pending Confirmations. Accept to merge into an existing cluster, or reject to create a new field.

Running Resolution via API

Batch resolution can be triggered programmatically through the REST API. This is useful for teams that want to run resolution on a schedule or immediately after a large document ingestion. The resolution endpoint starts an asynchronous run that processes all unresolved field occurrences against the registry. You can check the resolution status endpoint to monitor how many fields remain unresolved and track progress over time.

Trigger batch resolution
curl -X POST https://api.talonic.com/v1/schemas/resolution/run \
  -H "Authorization: Bearer $TALONIC_API_KEY"
Check resolution status
curl https://api.talonic.com/v1/schemas/resolution/status \
  -H "Authorization: Bearer $TALONIC_API_KEY"
Response
{
  "unresolved_count": 14,
  "total_occurrences": 1847,
  "pending_confirmations": 6,
  "last_run_at": "2026-05-07T09:00:00Z"
}

For best results, process pending confirmations promptly after each resolution run. Unconfirmed fields in the confirm band (0.50-0.79 similarity) remain in a pending state that can affect downstream extraction accuracy. Confirming correct matches strengthens the cluster and improves future resolution, while rejecting incorrect matches prevents bad data from propagating through the knowledge graph. Teams that review confirmations weekly typically see their auto-band match rate increase steadily over the first few months of platform usage.

The resolution system operates concurrently across documents with strict isolation guarantees. Each document's fields are resolved in an independent transaction, and occurrence counts are updated atomically using SQL upserts with a 3-attempt deadlock retry mechanism. This design means resolution can run on hundreds of documents simultaneously without lock contention or data inconsistency. The system is eventually consistent — all occurrence counts converge to the correct values even under high concurrent load.

After each resolution batch, a fixed chain of operations runs automatically: first tier promotion evaluation, then affected schema regeneration, and finally cross-schema view updates. This chain ensures that newly promoted fields immediately appear in auto-generated schemas and that the cross-schema harmonization view stays current. The chain is never interrupted — if promotion detects new Tier 2 fields, the downstream regeneration and view update steps run as part of the same workflow.

Frequently asked questions

How does field resolution work in Talonic?+
Each extracted field is matched against the registry using three bands in strict order: exact name match, cluster member match, then semantic embedding similarity. Results fall into auto (>=0.80, auto-linked), confirm (0.50-0.79, flagged for review), or new (<0.50, creates a new Tier 3 field). The three-band order is never skipped.
Where can I review pending field confirmations?+
Navigate to Resolution > Pending Confirmations to review fields in the confirm band. Accept to merge the field into an existing cluster, or reject to create a new independent field. Processing confirmations promptly improves resolution accuracy for future documents.
What happens after resolution completes?+
After resolution, the platform evaluates tier promotions and regenerates affected schemas in a fixed chain: resolve, then promote, then regenerate. This ensures that newly promoted fields immediately appear in auto-generated schemas. The chain is atomic — it never breaks midway.
How does resolution reduce extraction cost during job runs?+
During job runs, the system uses a 3-tier lookup cascade — string normalization, token fuzzy matching, then AI fallback — to fill 60-80% of cells without a full LLM call. Fields that are well-established in the registry with high occurrence counts are the most likely to resolve via lookup, making Tier 1 and Tier 2 fields essentially free to extract.