Skip to main content

talonic_extract

Extract structured, schema-validated data from a document.

Inputs: one of file_data + filename (recommended for chat clients), file_path, file_url, or document_id, plus a schema (or schema_id). Returns clean JSON with per-field confidence scores.

ParameterTypeDescription
file_datastringBase64-encoded file bytes. Recommended for chat clients (drag-and-drop).
filenamestringOriginal filename (used for MIME type inference when using `file_data`).
file_pathstringLocal file path.
file_urlstringRemote file URL.
document_idstringID of a previously uploaded document.
schemaobjectInline schema definition (JSON Schema or flat key-type map).
schema_idstringUUID or SCH-XXXXXXXX short ID of a saved schema.
instructionsstringNatural-language guidance for the extractor.
include_markdownbooleanInclude OCR markdown alongside structured data.
Always provide a schema or schema_id. Auto-discovery extract (no schema) is not reliable in v0.1.

Example: inline schema

Tool input
{
  "file_url": "https://example.com/invoice-2026-001.pdf",
  "schema": {
    "type": "object",
    "properties": {
      "vendor_name": { "type": "string" },
      "invoice_number": { "type": "string" },
      "total_amount": { "type": "number" },
      "due_date": { "type": "string", "format": "date" },
      "line_items": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "description": { "type": "string" },
            "amount": { "type": "number" }
          }
        }
      }
    },
    "required": ["vendor_name", "total_amount"]
  },
  "instructions": "Amounts are in EUR. Focus on the billing section."
}
Tool response
{
  "document_id": "doc_8f3a...",
  "data": {
    "vendor_name": "Meridian Energy AG",
    "invoice_number": "INV-2026-001",
    "total_amount": 1500.00,
    "due_date": "2026-06-15",
    "line_items": [
      { "description": "Consulting — April", "amount": 1200.00 },
      { "description": "Travel expenses", "amount": 300.00 }
    ]
  },
  "confidence": {
    "vendor_name": 0.98,
    "invoice_number": 0.95,
    "total_amount": 0.99,
    "due_date": 0.97
  },
  "document": {
    "filename": "invoice-2026-001.pdf",
    "pages": 2,
    "type_detected": "Invoice",
    "language_detected": "de"
  }
}

Example: saved schema

Tool input
{
  "file_path": "./contracts/lease-agreement.pdf",
  "schema_id": "SCH-A1B2C3D4"
}