POST /v1/extract
Extract structured data from a document by sending a file and schema. Supports PDFs, images, Word documents, spreadsheets, and plain text.
The core of the Talonic API. Send a document and a schema, receive structured, validated data back. Supports PDFs, images, Word documents, spreadsheets, and plain text.
POST
/v1/extractResponse
Response fields (200 Synchronous)
extraction_idstringUUID of the created extraction record.
request_idstringUnique request identifier for tracing and support.
statusstringAlways "complete" for synchronous responses.
documentobjectSource document summary: id, filename, pages, size_bytes, type_detected, language_detected.
dataobjectExtracted field values as a key-value map matching your schema.
schemaobjectSchema used: source (provided, saved, auto_discovered), id, definition, and save_url to persist it.
confidenceobjectConfidence scores: overall (0–1) and fields (per-field score map).
processingobjectProcessing metadata: duration_ms, pages_processed, region.
markdownstring | nullOCR-converted markdown of the document. Only present when include_markdown=true.
linksobjectRelated resource URLs: self (extraction), document, dashboard.
Response (200 — Synchronous)
{
"extraction_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"request_id": "req_b2c3d4e5f6a78901",
"status": "complete",
"document": {
"id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
"filename": "invoice-0847.pdf",
"pages": 2,
"size_bytes": 184320,
"type_detected": "invoice",
"language_detected": "en"
},
"data": {
"vendor_name": "Acme Corp",
"invoice_number": "INV-2024-0847",
"total_amount": 14250.00,
"due_date": "2024-03-15",
"line_items": [
{ "description": "Enterprise license (annual)", "quantity": 1, "unit_price": 12000.00 },
{ "description": "Implementation services", "quantity": 15, "unit_price": 150.00 }
]
},
"schema": {
"source": "provided",
"id": null,
"definition": { "type": "object", "properties": { "vendor_name": { "type": "string" } } },
"save_url": "https://app.talonic.com/schemas/save?from=a1b2c3d4-e5f6-7890-abcd-ef1234567890"
},
"confidence": {
"overall": 0.94,
"fields": {
"vendor_name": 0.99,
"invoice_number": 0.98,
"total_amount": 0.96,
"due_date": 0.91,
"line_items": 0.87
}
},
"processing": {
"duration_ms": 3420,
"pages_processed": 2,
"region": "eu-west"
},
"links": {
"self": "/v1/extractions/a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"document": "/v1/documents/c3d4e5f6-a7b8-9012-cdef-123456789012",
"dashboard": "https://app.talonic.com/extractions/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
}Response fields (202 Asynchronous)
request_idstringUnique request identifier for tracing.
statusstringAlways "processing" for asynchronous responses.
documentobjectSource document summary: id, filename, pages.
poll_urlstringURL to poll for document processing status.
estimated_secondsintegerEstimated processing time in seconds.
linksobjectRelated resource URLs: document, extractions, dashboard.
Response (202 — Asynchronous)
{
"request_id": "req_b2c3d4e5f6a78901",
"status": "processing",
"document": {
"id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
"filename": "large-report.pdf",
"pages": 42
},
"poll_url": "/v1/documents/c3d4e5f6-a7b8-9012-cdef-123456789012",
"estimated_seconds": 63,
"links": {
"document": "/v1/documents/c3d4e5f6-a7b8-9012-cdef-123456789012",
"extractions": "/v1/documents/c3d4e5f6-a7b8-9012-cdef-123456789012/extractions",
"dashboard": "https://app.talonic.com/documents/c3d4e5f6-a7b8-9012-cdef-123456789012"
}
}Errors
Error responses
400missing_documentNo document source provided. Supply one of: file, file_url, or document_id.
400ambiguous_documentMore than one document source provided. Supply only one of: file, file_url, or document_id.
400unsupported_file_typeThe uploaded file type is not supported. Accepted: PDF, PNG, JPG, TIFF, WEBP, DOCX, TXT, CSV.
400invalid_optionsThe options field is not valid JSON.
401unauthorizedMissing or invalid API key.
422extraction_failedExtraction completed but produced no usable output. Check the document quality or schema definition.
429rate_limitedToo many requests. Check X-RateLimit-Reset for when the window resets.
Cost Headers
Synchronous 200 responses include cost transparency headers so you can track spend per call without a separate API round-trip:
Cost response headers
X-Talonic-Cost-CreditsintegerCredits consumed by this extraction.
X-Talonic-Cost-EURnumberEUR cost of this extraction.
X-Talonic-Balance-CreditsintegerRemaining credit balance after this call.
X-Talonic-Cells-Resolved-RegistryintegerFields resolved from the registry (no AI cost).
X-Talonic-Cells-Resolved-AIintegerFields resolved by the AI model.
Cost headers on a sync response
Cost headers are only present on synchronous (200) responses. For async (202) extractions, check the credits balance endpoint or listen for the
extraction.complete webhook which includes cost data.