talonic_extract
Extract structured, schema-validated data from a document.
Inputs: one of file_data + filename (recommended for chat clients), file_path, file_url, or document_id, plus a schema (or schema_id). Returns clean JSON with per-field confidence scores.
This is the primary tool in the Talonic MCP server. When an agent calls `talonic_extract`, the MCP server uploads the document to the Talonic API, runs OCR and field extraction against the provided schema, and returns structured JSON with confidence metadata. The entire pipeline — upload, OCR, extraction, validation — runs server-side in a single request.
The response includes a document.id that persists in your workspace. Subsequent calls can reference this ID via the document_id parameter to re-extract with a different schema, retrieve markdown, or fetch metadata — all without re-uploading the file. This is both faster and cheaper than sending the file again.
| Parameter | Type | Description |
|---|---|---|
| file_data | string | Base64-encoded file bytes. Recommended for chat clients (drag-and-drop). |
| filename | string | Original filename (used for MIME type inference when using `file_data`). |
| file_path | string | Local file path. |
| file_url | string | Remote file URL. |
| document_id | string | ID of a previously uploaded document. |
| schema | object | Inline schema definition (JSON Schema or flat key-type map). |
| schema_id | string | UUID or SCH-XXXXXXXX short ID of a saved schema. |
| instructions | string | Natural-language guidance for the extractor. |
| include_markdown | boolean | Include OCR markdown alongside structured data. |
schema or schema_id. Auto-discovery extract (no schema) is not reliable in v0.1.Example: inline schema
{
"file_url": "https://example.com/invoice-2026-001.pdf",
"schema": {
"type": "object",
"properties": {
"vendor_name": { "type": "string" },
"invoice_number": { "type": "string" },
"total_amount": { "type": "number" },
"due_date": { "type": "string", "format": "date" },
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"amount": { "type": "number" }
}
}
}
},
"required": ["vendor_name", "total_amount"]
},
"instructions": "Amounts are in EUR. Focus on the billing section."
}{
"extraction_id": "ext_8f3a...",
"request_id": "req_2c91...",
"status": "complete",
"document": {
"id": "doc_8f3a...",
"filename": "invoice-2026-001.pdf",
"pages": 2,
"type_detected": "invoice",
"language_detected": "de"
},
"data": {
"vendor_name": "Meridian Energy AG",
"invoice_number": "INV-2026-001",
"total_amount": 1500.00,
"due_date": "2026-06-15",
"line_items": [
{ "description": "Consulting, April", "amount": 1200.00 },
{ "description": "Travel expenses", "amount": 300.00 }
]
},
"schema": {
"source": "inline"
},
"confidence": {
"overall": 0.97,
"fields": {
"vendor_name": 0.98,
"invoice_number": 0.95,
"total_amount": 0.99,
"due_date": 0.97
}
},
"processing": {
"duration_ms": 2840,
"pages_processed": 2,
"region": "eu-central-1"
}
}Example: saved schema
{
"file_path": "./contracts/lease-agreement.pdf",
"schema_id": "SCH-A1B2C3D4"
}{
"extraction_id": "ext_b29f...",
"request_id": "req_4c81...",
"status": "complete",
"document": {
"id": "doc_91ad...",
"filename": "lease-agreement.pdf",
"pages": 8,
"type_detected": "lease_agreement",
"language_detected": "en"
},
"data": {
"lessor": "Acme Holdings Ltd.",
"lessee": "Meridian Energy AG",
"premises_address": "12 Hauptstrasse, 10115 Berlin",
"term_start": "2026-07-01",
"term_end": "2031-06-30",
"monthly_rent_eur": 4250.00
},
"schema": {
"source": "saved",
"id": "SCH-A1B2C3D4"
},
"confidence": {
"overall": 0.94,
"fields": {
"lessor": 0.97,
"lessee": 0.97,
"premises_address": 0.92,
"term_start": 0.95,
"term_end": 0.95,
"monthly_rent_eur": 0.91
}
},
"processing": {
"duration_ms": 4380,
"pages_processed": 8,
"region": "eu-central-1"
}
}