Format Constraints
Format constraints apply regex-based validation to schema fields. They are evaluated post-extraction in Phase 4 of the pipeline, after all transforms have been applied. Original values are preserved for audit in original_extractions. This means you can always review what the AI originally extracted before the constraint was applied, giving you full visibility into the extraction pipeline even when values are cleared or replaced.
Mismatch behaviors
| Parameter | Type | Description |
|---|---|---|
| empty | default | If the extracted value does not match the regex, the cell is cleared. The original is preserved for audit. |
| flag | behavior | The value is kept but flagged with a format_applied indicator. Visible as an amber dot in the results grid. |
| constant | behavior | The value is replaced with a constant you specify (e.g. "INVALID", "N/A"). |
Define format constraints in the schema field editor. The pattern uses standard regex syntax with support for inline flags like (?i) for case-insensitive matching. The editor provides a live test input so you can verify the pattern against sample values before saving. This immediate feedback loop helps you catch overly strict or overly permissive patterns before they affect real extraction runs.
Format constraints are especially useful for fields with strict formatting requirements in downstream systems. For example, a purchase order number that must follow the pattern PO-\d{6} or a date that must match \d{4}-\d{2}-\d{2}. By catching format violations at extraction time, you avoid importing malformed data into your ERP, accounting, or analytics systems.
Choose the mismatch behavior based on your data quality requirements. Use empty (the default) when you prefer no data over bad data — the downstream system will see a blank cell. Use flag when you want to review mismatches manually before deciding — flagged cells appear with an amber dot in the results grid. Use constant when your downstream system needs a specific sentinel value like "N/A" or "INVALID" to trigger its own error handling.
curl -X PATCH https://api.talonic.com/v1/schemas/us_def456/fields/fld_po_number \
-H "Authorization: Bearer $TALONIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"constraints": {
"format": {
"type": "regex",
"pattern": "^PO-\\d{6}$",
"on_format_mismatch": "flag"
}
}
}'
# Values matching PO-123456 pass through unchanged.
# Values like "PO 123456" or "123456" are flagged with an amber dot.
# Original values are always preserved in original_extractions for audit.# Validate ISO date format with optional time component:
curl -X PATCH https://api.talonic.com/v1/schemas/us_def456/fields/fld_date \
-H "Authorization: Bearer $TALONIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"constraints": {
"format": {
"type": "regex",
"pattern": "^\\d{4}-\\d{2}-\\d{2}(T\\d{2}:\\d{2}:\\d{2})?$",
"on_format_mismatch": "empty"
}
}
}'
# Values like "2025-03-15" or "2025-03-15T14:30:00" pass.
# Values like "March 15, 2025" are cleared (on_format_mismatch: "empty").Format constraints are one of the most effective tools for ensuring downstream system compatibility. Many ERP and accounting systems reject records with malformed identifiers, dates outside their expected format, or amounts with unexpected characters. By catching these issues at extraction time with format constraints, you prevent bad data from reaching downstream systems entirely. The three mismatch behaviors give you control over the trade-off: use "empty" when no data is better than bad data, "flag" when you want human review before deciding, and "constant" when your downstream system needs a specific sentinel value to trigger error handling.
(?i) inline flag for case-insensitive matching. Format constraints support standard JavaScript regex syntax, so you can use character classes, alternation, and lookahead assertions for complex validation patterns.