Schema Formats
The Talonic extract endpoint accepts three schema formats: JSON Schema for full control, simplified fields for ease of use, and flat key-type maps for quick prototyping.
The schema parameter accepts three formats. Use whichever is most natural for your use case. (Note: the API parameter is still called schema.)
1. JSON Schema (full control)
JSON Schema format
{
"type": "object",
"properties": {
"vendor_name": {
"type": "string",
"description": "Legal name of the vendor"
},
"total_amount": {
"type": "number",
"description": "Invoice total including tax"
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "integer" },
"unit_price": { "type": "number" }
}
}
}
},
"required": ["vendor_name", "total_amount"]
}2. Simplified fields (recommended)
Simplified fields
{
"fields": [
{ "name": "vendor_name", "type": "string", "description": "Legal name of the vendor" },
{ "name": "total_amount", "type": "number", "required": true },
{ "name": "due_date", "type": "date" },
{
"name": "line_items",
"type": "array",
"children": [
{ "name": "description", "type": "string" },
{ "name": "quantity", "type": "number" },
{ "name": "unit_price", "type": "number" }
]
}
]
}3. Flat key-type map (quick prototyping)
Flat map
{
"vendor_name": "string",
"invoice_number": "string",
"total_amount": "number",
"due_date": "date",
"is_paid": "boolean"
}Supported types: string, number, integer, boolean, date, array, object, enum.
Full Schema Reference
Beyond basic field types, Talonic schemas support advanced features: format constraints, value modifiers, bypass strategies, reference lookups, cross-field validation, and delivery configuration. Use the x-talonic extension on individual fields and x-talonic-schema at the root level.
Supported data types
string, number, integer, boolean, date, array, object, enum
Format constraints
Validate extracted values with regex patterns. When a value fails validation, the on_format_mismatch action determines the fallback.
Format constraint
"invoice_number": {
"type": "string",
"x-talonic": {
"format": { "type": "regex", "pattern": "^INV-\\d{4,}$" },
"on_format_mismatch": { "action": "flag" }
}
}Actions: empty (clear value), flag (keep + flag), constant (replace with fixed value).
Modifiers (post-processing transforms)
Applied in order after extraction. Three types: format (date/number formatting), alias (value mapping), max_length (truncation).
Modifiers
"invoice_date": {
"type": "string",
"format": "date",
"x-talonic": {
"modifiers": [
{ "type": "format", "config": { "pattern": "YYYY-MM-DD" } }
]
}
},
"status": {
"type": "string",
"x-talonic": {
"modifiers": [
{ "type": "alias", "config": {
"mappings": { "paid": "SETTLED", "open": "PENDING", "overdue": "PAST_DUE" }
}},
{ "type": "max_length", "config": { "limit": 20, "on_overflow": "truncate" } }
]
}
}Constraints (validation rules)
Evaluated after modifiers. Failures flag the value but don't erase it.
Constraints
"currency": {
"type": "string",
"x-talonic": {
"constraints": [
{ "type": "enum", "config": { "values": ["EUR", "USD", "GBP", "CHF"] } },
{ "type": "length", "config": { "min": 3, "max": 3 } }
]
}
},
"due_date": {
"type": "string",
"x-talonic": {
"constraints": [
{ "type": "cross-field", "config": {
"expression": "due_date >= invoice_date",
"fields": ["due_date", "invoice_date"]
}}
]
}
}Types: required, enum, date-format, length, cross-field.
Bypass strategies
Fields that don't need LLM extraction. Set a fixed value, generate an ID, or look up from reference data.
Bypass strategies
"document_category": {
"type": "string",
"x-talonic": {
"strategy": "constant",
"constant_value": "financial/invoice"
}
},
"processing_id": {
"type": "string",
"x-talonic": {
"strategy": "generator",
"generator_type": "deterministic-id",
"generator_config": { "prefix": "EXT", "pad": 6 }
}
},
"country_code": {
"type": "string",
"x-talonic": {
"strategy": "reference",
"reference_name": "country_codes",
"key_expression": "{field.vendor_country}",
"reference_table": [
{ "key": "DE", "value": "Germany" },
{ "key": "US", "value": "United States" }
]
}
}Strategies: constant, generator (deterministic-id, context-fallback), reference (single or multi-hop via resolution_chain).
Extraction instructions
Manual instruction
"tax_amount": {
"type": "number",
"x-talonic": {
"manual_instruction": "Calculate as subtotal * tax_rate / 100. If stated explicitly, use the stated value.",
"uses_manual_instruction": true,
"capture_submoves": ["compute", "reason"]
}
}Schema-level configuration
x-talonic-schema (root level)
{
"type": "object",
"properties": { "..." : {} },
"x-talonic-schema": {
"extraction_type": "pipeline",
"extraction_model": "claude-sonnet",
"validation_mode": "sample",
"validation_sample_rate": 10,
"approval_action": "stage",
"delivery_config": {
"dialect": {
"inline": {
"date_format": "YYYY-MM-DD",
"number_locale": "en-US",
"null_representation": "",
"boolean_format": ["Yes", "No"],
"delimiter": ",",
"encoding": "UTF-8"
}
},
"output_files": [
{ "name": "invoices.csv", "fields": ["invoice_number", "vendor_name", "total_amount"] },
{ "name": "line_items.csv", "fields": ["invoice_number", "line_items"] }
],
"cross_file_constraints": [
{ "type": "uniqueness", "config": { "fields": ["invoice_number"] } }
],
"escalation_threshold": 0.9
}
}
}