Skip to main content

Get Document

Retrieve a single document by ID with full metadata including file size, extracted text length, extraction count, and timestamps.

Retrieve full metadata for a single document, including processing status, detected type and language, triage information, and links to related resources. The response includes a dashboard link for viewing the document in the Talonic platform UI.

GET/v1/documents/:id

Response

Response fields

idstringDocument UUID.
filenamestringOriginal filename of the uploaded document.
pagesintegerEstimated page count.
size_bytesintegerFile size in bytes.
mime_typestringMIME type of the original file.
type_detectedstring | nullDocument type inferred during processing (e.g. invoice, contract).
language_detectedstring | nullISO 639-1 language code detected during extraction.
statusstringProcessing status: pending, processing, completed, error.
errorstring | nullError message when status is error, undefined otherwise.
sourceobjectSource information: id (source connection ID or null) and type (e.g. manual, google_drive).
triageobject | nullTriage metadata: sensitivity, department, jurisdiction, pii_detected, pii_categories, regulated_data, confidentiality_marking.
original_pathstring | nullOriginal file path if ingested via a connector.
extraction_countintegerNumber of extractions performed on this document (0 or 1).
latest_extraction_idstring | nullID of the most recent extraction, if any.
created_atstringISO 8601 creation timestamp.
linksobjectRelated resource URLs: self, extractions, dashboard.

Response

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "filename": "invoice-0847.pdf",
  "pages": 2,
  "size_bytes": 184320,
  "mime_type": "application/pdf",
  "type_detected": "invoice",
  "language_detected": "en",
  "status": "completed",
  "source": {
    "id": null,
    "type": "manual"
  },
  "triage": {
    "sensitivity": "internal",
    "department": "finance",
    "jurisdiction": "EU",
    "pii_detected": false,
    "pii_categories": [],
    "regulated_data": false,
    "confidentiality_marking": null
  },
  "original_path": null,
  "extraction_count": 1,
  "latest_extraction_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "created_at": "2024-09-14T10:32:00.000Z",
  "links": {
    "self": "/v1/documents/a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "extractions": "/v1/documents/a1b2c3d4-e5f6-7890-abcd-ef1234567890/extractions",
    "dashboard": "https://app.talonic.com/documents/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  }
}

Most integrations call this endpoint after receiving an extraction.complete webhook or after polling a document's status until it reaches completed. A typical workflow is to extract a document via POST /v1/extract, store the returned document.id, then fetch full metadata here when needed.

The response includes the current status field which will be completed when extraction has finished, processing while in progress, or error if something went wrong. Use the latest_extraction_id to navigate directly to the extraction result via GET /v1/extractions/:id.

Pair this with GET /v1/documents/:id/markdown to retrieve the raw OCR text, or with GET /v1/extractions/:id/data for just the structured field values. Note that the triage object is only populated after ingestion completes and may be null for documents still in processing.

The links.dashboard URL opens the document directly in the Talonic platform UI, which is useful for sharing with team members who need to review or correct extractions.

Errors

Error responses

401unauthorizedMissing or invalid API key.
404document_not_foundNo document with this ID exists for your organization.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.