Field Resolution
When a document is processed, each extracted field is resolved against the registry using a three-band matching model. The bands determine whether a match is accepted automatically, flagged for confirmation, or treated as a new field. Resolution is the core mechanism that turns raw, document-specific field names into canonical registry entries — building a unified knowledge graph across all your documents.
| Parameter | Type | Description |
|---|---|---|
| Auto band | ≥ 0.80 similarity | High-confidence match. The field is linked to the existing registry entry and its occurrence count is incremented. |
| Confirm band | 0.50 – 0.79 | Candidate match. The field is linked but flagged for manual review in the reconciliation queue. |
| New band | < 0.50 | No match found. A new Tier 3 field and cluster are created in the registry. |
How Resolution Works
Resolution follows a strict three-band order that is never skipped. First, the system checks for an exact name match against existing registry entries. If no exact match is found, it checks for a cluster member match — whether the field name matches any synonym in an existing semantic cluster. Finally, it computes semantic embedding similarity using AI embeddings to find conceptually similar fields. This graduated approach prioritizes fast, deterministic matches before falling back to more expensive similarity comparisons.
Resolution runs concurrently across documents. Each document's fields are resolved in an isolated transaction to prevent lock contention. Occurrence counts are updated atomically in the same SQL transaction using upserts with deadlock retry logic. This keeps the registry eventually consistent without blocking concurrent ingestion, even when hundreds of documents are being processed simultaneously.
After resolution completes, the platform evaluates tier promotions and regenerates affected schemas in a fixed chain: resolve, then promote, then regenerate. This chain ensures that newly promoted fields immediately appear in auto-generated schemas. The resolution process also feeds into the job pipeline — during Phase 1 of a job run, the system uses a 3-tier lookup cascade (string normalization, token fuzzy matching, then AI fallback) to fill 60-80% of cells without a full LLM call, dramatically reducing cost.
Running Resolution via API
Batch resolution can be triggered programmatically through the REST API. This is useful for teams that want to run resolution on a schedule or immediately after a large document ingestion. The resolution endpoint starts an asynchronous run that processes all unresolved field occurrences against the registry. You can check the resolution status endpoint to monitor how many fields remain unresolved and track progress over time.
curl -X POST https://api.talonic.com/v1/schemas/resolution/run \
-H "Authorization: Bearer $TALONIC_API_KEY"curl https://api.talonic.com/v1/schemas/resolution/status \
-H "Authorization: Bearer $TALONIC_API_KEY"{
"unresolved_count": 14,
"total_occurrences": 1847,
"pending_confirmations": 6,
"last_run_at": "2026-05-07T09:00:00Z"
}For best results, process pending confirmations promptly after each resolution run. Unconfirmed fields in the confirm band (0.50-0.79 similarity) remain in a pending state that can affect downstream extraction accuracy. Confirming correct matches strengthens the cluster and improves future resolution, while rejecting incorrect matches prevents bad data from propagating through the knowledge graph. Teams that review confirmations weekly typically see their auto-band match rate increase steadily over the first few months of platform usage.
The resolution system operates concurrently across documents with strict isolation guarantees. Each document's fields are resolved in an independent transaction, and occurrence counts are updated atomically using SQL upserts with a 3-attempt deadlock retry mechanism. This design means resolution can run on hundreds of documents simultaneously without lock contention or data inconsistency. The system is eventually consistent — all occurrence counts converge to the correct values even under high concurrent load.
After each resolution batch, a fixed chain of operations runs automatically: first tier promotion evaluation, then affected schema regeneration, and finally cross-schema view updates. This chain ensures that newly promoted fields immediately appear in auto-generated schemas and that the cross-schema harmonization view stays current. The chain is never interrupted — if promotion detects new Tier 2 fields, the downstream regeneration and view update steps run as part of the same workflow.