Schema Features Reference
Every field in a template supports advanced features beyond the basic name and type. These features control how values are extracted, validated, transformed, and delivered. You can layer features independently — for example, a single field can have a format constraint, a reference table for code lookup, modifiers for post-processing, and an output name remap for delivery. Features compose without conflicts, giving you fine-grained control over every aspect of the extraction and output pipeline.
Field features
| Parameter | Type | Description |
|---|---|---|
| Format constraint | regex | Validates extracted values against a regex pattern. Failing values can be emptied, flagged, or replaced with a constant. |
| Modifiers | pipeline | Post-processing transforms applied in order: format (date/number), alias (value mapping), max_length (truncation). |
| Constraints | validation | Rules evaluated after modifiers: required, enum, date-format, length, cross-field expressions. |
| Bypass strategy | skip LLM | Fields that don't need extraction: constant (fixed value), generator (auto-ID), reference (lookup from reference table). |
| Reference table | key-value | Inline lookup table for code mapping (e.g., country name → ISO code). Also supports multi-hop resolution chains. |
| Manual instruction | text | User-written extraction directive. Overrides the AI-synthesized master instruction from the field registry. |
| Capture submoves | array | Ordered execution: match (field matching), compute (calculation), reason (LLM inference). |
| Output name | string | Renamed field in export output. The internal name stays the same. |
When configuring a field, start with the basics — name, type, and registry mapping — then layer on advanced features as needed. For example, add a format constraint to enforce a date pattern, attach a reference table for code lookups, or define capture submoves to control the exact extraction sequence. Features compose independently, so you can mix and match without conflicts.
- Format constraint — Regex validation with configurable mismatch behavior (clear, flag, or replace).
- Modifiers — Post-processing pipeline: format (date/number conversion), alias (value mapping), max_length (truncation).
- Constraints — Validation rules: required, enum, date-format, length, cross-field expressions.
- Bypass strategy — Skip AI extraction: constant value, deterministic ID generator, or reference table lookup.
- Reference table — Key-value pairs for code mapping with a 3-tier lookup cascade (normalization, fuzzy, AI).
- Manual instruction — User-written extraction directive that overrides the AI-synthesized master instruction.
- Capture submoves — Ordered extraction sequence: match (field matching), compute (calculation), reason (LLM inference).
- Output name — Remap the field name in delivery and export output without changing the internal schema name.
The modifier pipeline runs in a fixed order during Phase 4 of the extraction pipeline: format transforms first (converting dates or numbers to your target format), then alias mapping (replacing values using a lookup), and finally max_length truncation. Constraint evaluation happens after all modifiers have been applied, so constraints validate the final transformed value, not the raw extraction.
For best results, use manual instructions sparingly and only for fields that the registry cannot match. A well-written instruction should describe the field in plain language, specify where in the document to look, and note any formatting expectations. Avoid vague instructions like "extract the value" — instead, write something like "Extract the net payment amount from the invoice summary section, excluding VAT."
curl -X POST https://api.talonic.com/v1/schemas/us_def456/fields \
-H "Authorization: Bearer $TALONIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"display_name": "Purchase Order Number",
"data_type": "string",
"manual_instruction": "Extract the PO number from the order reference section",
"constraints": {
"format": {
"type": "regex",
"pattern": "PO-\\d{6}",
"on_format_mismatch": "flag"
}
},
"modifiers": {
"max_length": 20
},
"output_name": "po_number"
}'curl -X PATCH https://api.talonic.com/v1/schemas/us_def456/fields/fld_xyz \
-H "Authorization: Bearer $TALONIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"strategy": "constant",
"constant_value": "USD"
}'
# The field will always resolve to "USD" without any LLM call.
# Bypass strategies execute during Phase 1, before AI extraction.Schema features can be combined to build sophisticated field definitions. For example, a "Vendor Code" field might use a reference table for code mapping, a format constraint to validate the output format (^V\d{5}$), an alias modifier to normalize legacy codes, and an output name remap for the downstream ERP system. Each feature operates at a different stage of the pipeline — bypass strategies in Phase 1, extraction instructions in Phase 2, reference table lookups in Phases 1 and 3, and modifiers plus format constraints in Phase 4 — so they compose without conflicts.