talonic_extract

Extract structured, schema-validated data from a document.

Inputs: one of file_data + filename (recommended for chat clients), file_path, file_url, or document_id, plus a schema (or schema_id). Returns clean JSON with per-field confidence scores.

This is the primary tool in the Talonic MCP server. When an agent calls `talonic_extract`, the MCP server uploads the document to the Talonic API, runs OCR and field extraction against the provided schema, and returns structured JSON with confidence metadata. The entire pipeline — upload, OCR, extraction, validation — runs server-side in a single request.

The response includes a document.id that persists in your workspace. Subsequent calls can reference this ID via the document_id parameter to re-extract with a different schema, retrieve markdown, or fetch metadata — all without re-uploading the file. This is both faster and cheaper than sending the file again.

Parameter	Type	Description
file_data	string	Base64-encoded file bytes. Recommended for chat clients (drag-and-drop).
filename	string	Original filename (used for MIME type inference when using `file_data`).
file_path	string	Local file path.
file_url	string	Remote file URL.
document_id	string	ID of a previously uploaded document.
schema	object	Inline schema definition (JSON Schema or flat key-type map).
schema_id	string	UUID or SCH-XXXXXXXX short ID of a saved schema.
instructions	string	Natural-language guidance for the extractor.
include_markdown	boolean	Include OCR markdown alongside structured data.

Always provide a schema or schema_id. Auto-discovery extract (no schema) is not reliable in v0.1.

Example: inline schema

Tool input

{
  "file_url": "https://example.com/invoice-2026-001.pdf",
  "schema": {
    "type": "object",
    "properties": {
      "vendor_name": { "type": "string" },
      "invoice_number": { "type": "string" },
      "total_amount": { "type": "number" },
      "due_date": { "type": "string", "format": "date" },
      "line_items": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "description": { "type": "string" },
            "amount": { "type": "number" }
          }
        }
      }
    },
    "required": ["vendor_name", "total_amount"]
  },
  "instructions": "Amounts are in EUR. Focus on the billing section."
}

Tool response

{
  "extraction_id": "ext_8f3a...",
  "request_id": "req_2c91...",
  "status": "complete",
  "document": {
    "id": "doc_8f3a...",
    "filename": "invoice-2026-001.pdf",
    "pages": 2,
    "type_detected": "invoice",
    "language_detected": "de"
  },
  "data": {
    "vendor_name": "Meridian Energy AG",
    "invoice_number": "INV-2026-001",
    "total_amount": 1500.00,
    "due_date": "2026-06-15",
    "line_items": [
      { "description": "Consulting, April", "amount": 1200.00 },
      { "description": "Travel expenses", "amount": 300.00 }
    ]
  },
  "schema": {
    "source": "inline"
  },
  "confidence": {
    "overall": 0.97,
    "fields": {
      "vendor_name": 0.98,
      "invoice_number": 0.95,
      "total_amount": 0.99,
      "due_date": 0.97
    }
  },
  "processing": {
    "duration_ms": 2840,
    "pages_processed": 2,
    "region": "eu-central-1"
  }
}

Example: saved schema

Tool input

{
  "file_path": "./contracts/lease-agreement.pdf",
  "schema_id": "SCH-A1B2C3D4"
}

Tool response

{
  "extraction_id": "ext_b29f...",
  "request_id": "req_4c81...",
  "status": "complete",
  "document": {
    "id": "doc_91ad...",
    "filename": "lease-agreement.pdf",
    "pages": 8,
    "type_detected": "lease_agreement",
    "language_detected": "en"
  },
  "data": {
    "lessor": "Acme Holdings Ltd.",
    "lessee": "Meridian Energy AG",
    "premises_address": "12 Hauptstrasse, 10115 Berlin",
    "term_start": "2026-07-01",
    "term_end": "2031-06-30",
    "monthly_rent_eur": 4250.00
  },
  "schema": {
    "source": "saved",
    "id": "SCH-A1B2C3D4"
  },
  "confidence": {
    "overall": 0.94,
    "fields": {
      "lessor": 0.97,
      "lessee": 0.97,
      "premises_address": 0.92,
      "term_start": 0.95,
      "term_end": 0.95,
      "monthly_rent_eur": 0.91
    }
  },
  "processing": {
    "duration_ms": 4380,
    "pages_processed": 8,
    "region": "eu-central-1"
  }
}

Frequently asked questions

How does talonic_extract work?+

Send a document (file_data, file_path, file_url, or document_id) with a schema or schema_id. The tool returns schema-validated JSON with per-field confidence scores.

What is the fastest way to extract data from a document?+

If the document is already in your workspace, pass its document_id instead of re-uploading. For new documents in chat clients, use file_data with base64-encoded bytes. For files on the web, use file_url to avoid downloading locally first.

How long does an extraction typically take?+

Processing time depends on document size. A 2-page invoice typically completes in 2-4 seconds. Larger documents (8+ pages) may take 4-8 seconds. The processing.duration_ms field in the response shows the exact server-side duration.

talonic_to_markdown

talonic_save_schema

Drag & Drop Files

talonic_extract

Example: inline schema

Example: saved schema

Frequently asked questions

Related