Skip to main content

Evidence Validation

The evidence validation engine applies rule-based checks to extracted field values. Results appear as colored validation badges in the Evidence tab of the case detail page.

Structural validators (S1–S7)

ParameterTypeDescription
S1ruleFree-text spillover — field value contains unstructured text that leaked from adjacent content.
S2ruleEmpty value — required field is blank or whitespace-only.
S3ruleEmail/URL misclassification — value looks like an email or URL but is in a non-email/URL field.
S4ruleName in URL field — a person or company name extracted into a URL-typed field.
S5ruleAlpha in numeric field — alphabetic characters in a field expected to be numeric.
S6ruleCross-field duplicate — identical value appears in multiple unrelated fields on the same document.
S7ruleChecksum validation — Luhn (credit cards), ABA (routing numbers), IBAN, ISBN checksums verified via parameterized factory.

Domain packs extend validation with industry-specific rules. The freight domain pack includes DOT number state detection and MC number validation. Additional packs can be added to domain-packs/ without modifying the core engine.

Validation runs automatically after extraction and linking complete. Each field value is checked against every applicable validator — a single field can trigger multiple rules. Results are displayed as colored badges in the Evidence tab: green for pass, red for fail, and amber for warnings. You can filter by status, document, category, or free-text search.

The checksum validator (S7) uses a parameterized factory pattern — it accepts a checksum algorithm name and applies the corresponding verification logic. Supported algorithms include Luhn (credit card numbers), ABA (bank routing numbers), IBAN (international bank accounts), and ISBN (book identifiers). For best results, ensure your schema fields are typed correctly so the engine knows which checksum to apply.

A typical evidence validation workflow starts automatically after extraction and linking. You navigate to a case, open the Evidence tab, and immediately see colored badges next to each field value. Red badges indicate failures that need attention — click a badge to see which validator fired and what the expected format or value was. Use the filter bar to narrow results by status (pass/fail/warning), by document, by category, or by free-text search. Group-by-document collapsible sections let you review one document at a time within a case.

  • S1 — Free-text spillover: unstructured text leaked from adjacent content
  • S2 — Empty value: required field is blank or whitespace-only
  • S3 — Email/URL misclassification: value looks like an email or URL in the wrong field type
  • S4 — Name in URL field: person or company name extracted into a URL-typed field
  • S5 — Alpha in numeric field: alphabetic characters in a numeric-only field
  • S6 — Cross-field duplicate: identical value in multiple unrelated fields on the same document
  • S7 — Checksum validation: Luhn, ABA, IBAN, ISBN verification via parameterized factory
  • Domain packs: industry-specific rules (e.g., freight: DOT numbers, MC numbers)
View evidence validation results for a case
curl -s "https://api.talonic.com/v1/cases/case_abc/evidence" \
  -H "Authorization: Bearer $TALONIC_API_KEY"

# Response:
# {
#   "evidence": [
#     {
#       "document_id": "doc_001",
#       "field_key": "credit_card_number",
#       "validator": "S7",
#       "algorithm": "luhn",
#       "status": "fail",
#       "message": "Luhn checksum failed for value 4111-1111-1111-1112",
#       "severity": "error"
#     },
#     {
#       "document_id": "doc_001",
#       "field_key": "total_amount",
#       "validator": "S5",
#       "status": "fail",
#       "message": "Alphabetic characters found in numeric field: '$12,450.00'",
#       "severity": "warning"
#     }
#   ]
# }

The evidence validation engine is extensible through domain packs, which add industry-specific rules without modifying the core validators. Each domain pack is a self-contained module that registers its validators during application startup. The freight domain pack, for example, validates DOT numbers against state-issued format rules and verifies MC (Motor Carrier) numbers. Additional packs for financial services, healthcare, and legal domains can be added by creating a new module in the domain-packs directory with validator implementations that follow the standard interface. This plug-in architecture means the validation engine grows with your industry needs without accumulating complexity in the core rule set.

Evidence validation results are stored separately from extraction and linking data. This means you can re-run validation independently without re-extracting documents. Results are keyed by (document_id, entity_id, field_key) for precise field-level tracking.

Frequently asked questions

What is evidence validation?+
A rule-based engine that checks extracted field values for structural errors (spillover, misclassification, duplicates) and checksum validity (Luhn, IBAN, etc.). Results appear as colored badges in the case Evidence tab.
What are domain packs?+
Domain packs add industry-specific validation rules. For example, the freight domain pack validates DOT numbers and MC numbers. New packs can be added without modifying the core engine.
How are evidence validation results displayed?+
Results appear as colored badges in the Evidence tab of the case detail page. Green indicates pass, red indicates fail, and amber indicates a warning. Use the filter bar to narrow results by status, document, or category.
Can I re-run evidence validation without re-extracting documents?+
Yes. Evidence validation results are stored separately from extraction data, keyed by (document_id, entity_id, field_key). You can re-run validation independently at any time — for example, after adding a new domain pack or updating validator rules. The results replace the previous validation run without affecting extracted values or linking data.