Source Documents
Ingest documents into a specific source or list documents belonging to a source. Supports batch processing mode at 50% cost discount.
Ingest documents into a source for processing, or list all documents that belong to a source. The ingestion endpoint accepts a file upload or a URL, processes the document through the extraction pipeline, and returns the document ID for status tracking.
Documents can be processed in realtime (default, results in seconds) or batch mode (50% cost discount, results within 48 hours). Duplicate files are detected via SHA-256 hash and rejected with a duplicate status.
Batch processing mode reduces cost by 50% but delivers results within 48 hours. Use
processing_mode=batch for large ingestion jobs where latency is not critical.POST
/v1/sources/:id/documentsResponse
Response fields
document_idstringUUID of the newly created document. Null if the file was a duplicate.
filenamestringStored filename of the document.
statusstring`queued` when accepted for processing, or `duplicate` if the file already exists.
processing_modestring`realtime` or `batch` — reflects the mode the document was queued under.
source_idstringID of the source the document was ingested into.
existing_document_idstring | nullSet to the existing document ID when `status` is `duplicate`.
linksobjectRelated resource URLs (document, source).
Response
{
"document_id": "d4e5f6a7-b8c9-0123-defa-234567890123",
"filename": "invoice-2024-09.pdf",
"status": "queued",
"processing_mode": "realtime",
"source_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"links": {
"document": "/v1/documents/d4e5f6a7-b8c9-0123-defa-234567890123",
"source": "/v1/sources/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
}Errors
Error responses
400missing_documentNo file was provided in the request.
401unauthorizedMissing or invalid API key.
404not_foundNo source with this ID exists for your organization.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.
GET
/v1/sources/:id/documentsResponse
Response fields
dataarrayArray of document objects.
data[].idstringDocument UUID.
data[].filenamestringStored filename.
data[].statusstringProcessing status (e.g. `processing`, `complete`, `failed`).
data[].size_bytesinteger | nullFile size in bytes.
data[].type_detectedstring | nullInferred document type.
data[].created_atstringISO 8601 ingestion timestamp.
data[].linksobjectRelated resource URLs (self).
pagination.totalintegerTotal number of documents matching the query.
pagination.limitintegerMaximum results per page.
pagination.has_morebooleanWhether more results exist beyond this page.
pagination.next_cursorstring | nullCursor to fetch the next page. Null if no more results.
Response
{
"data": [
{
"id": "d4e5f6a7-b8c9-0123-defa-234567890123",
"filename": "invoice-2024-09.pdf",
"status": "complete",
"size_bytes": 204800,
"type_detected": "Invoice",
"created_at": "2024-09-14T12:30:00.000Z",
"links": {
"self": "/v1/documents/d4e5f6a7-b8c9-0123-defa-234567890123"
}
}
],
"pagination": {
"total": 234,
"limit": 50,
"has_more": true,
"next_cursor": "eyJjcmVhdGVkQXQiOiIyMDI0LTA5LTE0VDEyOjMwOjAwLjAwMFoiLCJpZCI6ImQ0ZTVmNmE3In0="
}
}Errors
Error responses
401unauthorizedMissing or invalid API key.
404not_foundNo source with this ID exists for your organization.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.