Skip to main content

POST /v1/extract

Extract structured data from a document by sending a file and schema. Supports PDFs, images, Word documents, spreadsheets, and plain text.

The core of the Talonic API. Send a document and a schema, receive structured, validated data back. Supports PDFs, images, Word documents, spreadsheets, and plain text.

POST/v1/extract

Response

Response fields (200 Synchronous)

extraction_idstringUUID of the created extraction record.
request_idstringUnique request identifier for tracing and support.
statusstringAlways "complete" for synchronous responses.
documentobjectSource document summary: id, filename, pages, size_bytes, type_detected, language_detected.
dataobjectExtracted field values as a key-value map matching your schema.
schemaobjectSchema used: source (provided, saved, auto_discovered), id, definition, and save_url to persist it.
confidenceobjectConfidence scores: overall (0–1) and fields (per-field score map).
processingobjectProcessing metadata: duration_ms, pages_processed, region.
markdownstring | nullOCR-converted markdown of the document. Only present when include_markdown=true.
linksobjectRelated resource URLs: self (extraction), document, dashboard.

Response (200 — Synchronous)

{
  "extraction_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "request_id": "req_b2c3d4e5f6a78901",
  "status": "complete",
  "document": {
    "id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
    "filename": "invoice-0847.pdf",
    "pages": 2,
    "size_bytes": 184320,
    "type_detected": "invoice",
    "language_detected": "en"
  },
  "data": {
    "vendor_name": "Acme Corp",
    "invoice_number": "INV-2024-0847",
    "total_amount": 14250.00,
    "due_date": "2024-03-15",
    "line_items": [
      { "description": "Enterprise license (annual)", "quantity": 1, "unit_price": 12000.00 },
      { "description": "Implementation services", "quantity": 15, "unit_price": 150.00 }
    ]
  },
  "schema": {
    "source": "provided",
    "id": null,
    "definition": { "type": "object", "properties": { "vendor_name": { "type": "string" } } },
    "save_url": "https://app.talonic.com/schemas/save?from=a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  },
  "confidence": {
    "overall": 0.94,
    "fields": {
      "vendor_name": 0.99,
      "invoice_number": 0.98,
      "total_amount": 0.96,
      "due_date": 0.91,
      "line_items": 0.87
    }
  },
  "processing": {
    "duration_ms": 3420,
    "pages_processed": 2,
    "region": "eu-west"
  },
  "links": {
    "self": "/v1/extractions/a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "document": "/v1/documents/c3d4e5f6-a7b8-9012-cdef-123456789012",
    "dashboard": "https://app.talonic.com/extractions/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  }
}

Response fields (202 Asynchronous)

request_idstringUnique request identifier for tracing.
statusstringAlways "processing" for asynchronous responses.
documentobjectSource document summary: id, filename, pages.
poll_urlstringURL to poll for document processing status.
estimated_secondsintegerEstimated processing time in seconds.
linksobjectRelated resource URLs: document, extractions, dashboard.

Response (202 — Asynchronous)

{
  "request_id": "req_b2c3d4e5f6a78901",
  "status": "processing",
  "document": {
    "id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
    "filename": "large-report.pdf",
    "pages": 42
  },
  "poll_url": "/v1/documents/c3d4e5f6-a7b8-9012-cdef-123456789012",
  "estimated_seconds": 63,
  "links": {
    "document": "/v1/documents/c3d4e5f6-a7b8-9012-cdef-123456789012",
    "extractions": "/v1/documents/c3d4e5f6-a7b8-9012-cdef-123456789012/extractions",
    "dashboard": "https://app.talonic.com/documents/c3d4e5f6-a7b8-9012-cdef-123456789012"
  }
}

Errors

Error responses

400missing_documentNo document source provided. Supply one of: file, file_url, or document_id.
400ambiguous_documentMore than one document source provided. Supply only one of: file, file_url, or document_id.
400unsupported_file_typeThe uploaded file type is not supported. Accepted: PDF, PNG, JPG, TIFF, WEBP, DOCX, TXT, CSV.
400invalid_optionsThe options field is not valid JSON.
401unauthorizedMissing or invalid API key.
422extraction_failedExtraction completed but produced no usable output. Check the document quality or schema definition.
429rate_limitedToo many requests. Check X-RateLimit-Reset for when the window resets.

Cost Headers

Synchronous 200 responses include cost transparency headers so you can track spend per call without a separate API round-trip:

Cost response headers

X-Talonic-Cost-CreditsintegerCredits consumed by this extraction.
X-Talonic-Cost-EURnumberEUR cost of this extraction.
X-Talonic-Balance-CreditsintegerRemaining credit balance after this call.
X-Talonic-Cells-Resolved-RegistryintegerFields resolved from the registry (no AI cost).
X-Talonic-Cells-Resolved-AIintegerFields resolved by the AI model.

Cost headers on a sync response

Cost headers are only present on synchronous (200) responses. For async (202) extractions, check the credits balance endpoint or listen for the extraction.complete webhook which includes cost data.