Skip to main content

Schema Formats

The Talonic extract endpoint accepts three schema formats: JSON Schema for full control, simplified fields for ease of use, and flat key-type maps for quick prototyping.

The schema parameter accepts three formats. Use whichever is most natural for your use case. (Note: the API parameter is still called schema.)

1. JSON Schema (full control)

JSON Schema format

{
  "type": "object",
  "properties": {
    "vendor_name": {
      "type": "string",
      "description": "Legal name of the vendor"
    },
    "total_amount": {
      "type": "number",
      "description": "Invoice total including tax"
    },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "integer" },
          "unit_price": { "type": "number" }
        }
      }
    }
  },
  "required": ["vendor_name", "total_amount"]
}

2. Simplified fields (recommended)

Simplified fields

{
  "fields": [
    { "name": "vendor_name", "type": "string", "description": "Legal name of the vendor" },
    { "name": "total_amount", "type": "number", "required": true },
    { "name": "due_date", "type": "date" },
    {
      "name": "line_items",
      "type": "array",
      "children": [
        { "name": "description", "type": "string" },
        { "name": "quantity", "type": "number" },
        { "name": "unit_price", "type": "number" }
      ]
    }
  ]
}

3. Flat key-type map (quick prototyping)

Flat map

{
  "vendor_name": "string",
  "invoice_number": "string",
  "total_amount": "number",
  "due_date": "date",
  "is_paid": "boolean"
}

Supported types: string, number, integer, boolean, date, array, object, enum.

Full Schema Reference

Beyond basic field types, Talonic schemas support advanced features: format constraints, value modifiers, bypass strategies, reference lookups, cross-field validation, and delivery configuration. Use the x-talonic extension on individual fields and x-talonic-schema at the root level.

Supported data types

string, number, integer, boolean, date, array, object, enum

Format constraints

Validate extracted values with regex patterns. When a value fails validation, the on_format_mismatch action determines the fallback.

Format constraint

"invoice_number": {
  "type": "string",
  "x-talonic": {
    "format": { "type": "regex", "pattern": "^INV-\\d{4,}$" },
    "on_format_mismatch": { "action": "flag" }
  }
}

Actions: empty (clear value), flag (keep + flag), constant (replace with fixed value).

Modifiers (post-processing transforms)

Applied in order after extraction. Three types: format (date/number formatting), alias (value mapping), max_length (truncation).

Modifiers

"invoice_date": {
  "type": "string",
  "format": "date",
  "x-talonic": {
    "modifiers": [
      { "type": "format", "config": { "pattern": "YYYY-MM-DD" } }
    ]
  }
},
"status": {
  "type": "string",
  "x-talonic": {
    "modifiers": [
      { "type": "alias", "config": {
        "mappings": { "paid": "SETTLED", "open": "PENDING", "overdue": "PAST_DUE" }
      }},
      { "type": "max_length", "config": { "limit": 20, "on_overflow": "truncate" } }
    ]
  }
}

Constraints (validation rules)

Evaluated after modifiers. Failures flag the value but don't erase it.

Constraints

"currency": {
  "type": "string",
  "x-talonic": {
    "constraints": [
      { "type": "enum", "config": { "values": ["EUR", "USD", "GBP", "CHF"] } },
      { "type": "length", "config": { "min": 3, "max": 3 } }
    ]
  }
},
"due_date": {
  "type": "string",
  "x-talonic": {
    "constraints": [
      { "type": "cross-field", "config": {
        "expression": "due_date >= invoice_date",
        "fields": ["due_date", "invoice_date"]
      }}
    ]
  }
}

Types: required, enum, date-format, length, cross-field.

Bypass strategies

Fields that don't need LLM extraction. Set a fixed value, generate an ID, or look up from reference data.

Bypass strategies

"document_category": {
  "type": "string",
  "x-talonic": {
    "strategy": "constant",
    "constant_value": "financial/invoice"
  }
},
"processing_id": {
  "type": "string",
  "x-talonic": {
    "strategy": "generator",
    "generator_type": "deterministic-id",
    "generator_config": { "prefix": "EXT", "pad": 6 }
  }
},
"country_code": {
  "type": "string",
  "x-talonic": {
    "strategy": "reference",
    "reference_name": "country_codes",
    "key_expression": "{field.vendor_country}",
    "reference_table": [
      { "key": "DE", "value": "Germany" },
      { "key": "US", "value": "United States" }
    ]
  }
}

Strategies: constant, generator (deterministic-id, context-fallback), reference (single or multi-hop via resolution_chain).

Extraction instructions

Manual instruction

"tax_amount": {
  "type": "number",
  "x-talonic": {
    "manual_instruction": "Calculate as subtotal * tax_rate / 100. If stated explicitly, use the stated value.",
    "uses_manual_instruction": true,
    "capture_submoves": ["compute", "reason"]
  }
}

Schema-level configuration

x-talonic-schema (root level)

{
  "type": "object",
  "properties": { "..." : {} },
  "x-talonic-schema": {
    "extraction_type": "pipeline",
    "extraction_model": "claude-sonnet",
    "validation_mode": "sample",
    "validation_sample_rate": 10,
    "approval_action": "stage",
    "delivery_config": {
      "dialect": {
        "inline": {
          "date_format": "YYYY-MM-DD",
          "number_locale": "en-US",
          "null_representation": "",
          "boolean_format": ["Yes", "No"],
          "delimiter": ",",
          "encoding": "UTF-8"
        }
      },
      "output_files": [
        { "name": "invoices.csv", "fields": ["invoice_number", "vendor_name", "total_amount"] },
        { "name": "line_items.csv", "fields": ["invoice_number", "line_items"] }
      ],
      "cross_file_constraints": [
        { "type": "uniqueness", "config": { "fields": ["invoice_number"] } }
      ],
      "escalation_threshold": 0.9
    }
  }
}
The full reference schema template (invoice example with all features) is available as a downloadable JSON file in the [repository](https://github.com/talonicdev/platform/blob/main/packages/docs/src/schema-template.json).