Skip to main content

Source Documents

Ingest documents into a specific source or list documents belonging to a source. Supports batch processing mode at 50% cost discount.

Ingest documents into a source for processing, or list all documents that belong to a source. The ingestion endpoint accepts a file upload or a URL, processes the document through the extraction pipeline, and returns the document ID for status tracking.

Documents can be processed in realtime (default, results in seconds) or batch mode (50% cost discount, results within 48 hours). Duplicate files are detected via SHA-256 hash and rejected with a duplicate status.

Batch processing mode reduces cost by 50% but delivers results within 48 hours. Use processing_mode=batch for large ingestion jobs where latency is not critical.
POST/v1/sources/:id/documents

Response

Response fields

document_idstringUUID of the newly created document. Null if the file was a duplicate.
filenamestringStored filename of the document.
statusstring`queued` when accepted for processing, or `duplicate` if the file already exists.
processing_modestring`realtime` or `batch` — reflects the mode the document was queued under.
source_idstringID of the source the document was ingested into.
existing_document_idstring | nullSet to the existing document ID when `status` is `duplicate`.
linksobjectRelated resource URLs (document, source).

Response

{
  "document_id": "d4e5f6a7-b8c9-0123-defa-234567890123",
  "filename": "invoice-2024-09.pdf",
  "status": "queued",
  "processing_mode": "realtime",
  "source_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "links": {
    "document": "/v1/documents/d4e5f6a7-b8c9-0123-defa-234567890123",
    "source": "/v1/sources/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  }
}

Errors

Error responses

400missing_documentNo file was provided in the request.
401unauthorizedMissing or invalid API key.
404not_foundNo source with this ID exists for your organization.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.
GET/v1/sources/:id/documents

Response

Response fields

dataarrayArray of document objects.
data[].idstringDocument UUID.
data[].filenamestringStored filename.
data[].statusstringProcessing status (e.g. `processing`, `complete`, `failed`).
data[].size_bytesinteger | nullFile size in bytes.
data[].type_detectedstring | nullInferred document type.
data[].created_atstringISO 8601 ingestion timestamp.
data[].linksobjectRelated resource URLs (self).
pagination.totalintegerTotal number of documents matching the query.
pagination.limitintegerMaximum results per page.
pagination.has_morebooleanWhether more results exist beyond this page.
pagination.next_cursorstring | nullCursor to fetch the next page. Null if no more results.

Response

{
  "data": [
    {
      "id": "d4e5f6a7-b8c9-0123-defa-234567890123",
      "filename": "invoice-2024-09.pdf",
      "status": "complete",
      "size_bytes": 204800,
      "type_detected": "Invoice",
      "created_at": "2024-09-14T12:30:00.000Z",
      "links": {
        "self": "/v1/documents/d4e5f6a7-b8c9-0123-defa-234567890123"
      }
    }
  ],
  "pagination": {
    "total": 234,
    "limit": 50,
    "has_more": true,
    "next_cursor": "eyJjcmVhdGVkQXQiOiIyMDI0LTA5LTE0VDEyOjMwOjAwLjAwMFoiLCJpZCI6ImQ0ZTVmNmE3In0="
  }
}

Errors

Error responses

401unauthorizedMissing or invalid API key.
404not_foundNo source with this ID exists for your organization.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.