Overview

Documents can be processed in batch mode at 50% cost with a 48-hour delivery window. Toggle batch mode on the upload screen, or set processing_mode=batch when uploading into a source via the API (POST /v1/sources/{id}/documents). Batch processing is ideal for large backlog ingestion where real-time results are not required. The cost reduction comes from the provider's native batch API, which schedules processing during off-peak capacity: there is no loss in extraction quality because the same Claude model and prompts are used as in real-time mode.

Batch mode cuts extraction cost in half. Stage 1 (OCR + classify) still runs immediately — only Stage 2 (Claude extraction) is deferred.

Under the hood, batch inference leverages the provider's native batch API (Anthropic Message Batches or AWS Bedrock invocation jobs). Documents accumulate in a queue and are submitted together, allowing the provider to schedule processing during off-peak capacity. This is why the cost reduction is possible without any loss in extraction quality.

Batch mode is best suited for backlog ingestion, periodic bulk uploads, and any scenario where results are not needed in real time. Most teams use batch mode for overnight processing of large document volumes and reserve real-time processing for time-sensitive documents that need immediate attention.

When batch results arrive, they pass through the same post-processing steps as real-time extractions — including markdown pre-processing, field parsing, quality metrics, and extraction metadata computation. The only difference is that LLM-based quality passes (field estimation, verification, cross-reference enrichment) are skipped in batch mode to preserve the cost savings.

50% cost reduction on all Claude extraction calls in Stage 2.
48-hour delivery window — most batches complete well within this timeframe.
No quality difference — the same extraction model and prompts are used as in real-time mode.
Immediate visibility — documents appear in your library right after Stage 1 (OCR + classification).
Automatic result application — when the batch completes, results are applied and documents transition to their final status.

Upload documents in batch mode via API

# Upload into a source with processing_mode=batch:
curl -X POST https://api.talonic.com/v1/sources/a1b2c3d4-e5f6-7890-abcd-ef1234567890/documents \
  -H "Authorization: Bearer $TALONIC_API_KEY" \
  -F "file=@invoice-042.pdf" \
  -F "processing_mode=batch"

# Response:
# {
#   "document_id": "f0e1d2c3-b4a5-9687-8765-432109876543",
#   "filename": "invoice-042.pdf",
#   "status": "queued",
#   "processing_mode": "batch",
#   "source_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
#   "links": {
#     "document": "/v1/documents/f0e1d2c3-b4a5-9687-8765-432109876543",
#     "source": "/v1/sources/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
#   }
# }

# Stage 1 (OCR + classify) runs immediately; the document then shows
# the batch_queued status in your library until batch results arrive.

Check batch status

curl -s https://api.talonic.com/v1/batches \
  -H "Authorization: Bearer $TALONIC_API_KEY"

# Response (paginated):
# {
#   "data": [
#     {
#       "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
#       "status": "submitted",
#       "provider": "anthropic",
#       "item_count": 150,
#       "succeeded_count": 0,
#       "errored_count": 0,
#       "expired_count": 0,
#       "submitted_at": "2026-04-22T10:15:00Z",
#       "created_at": "2026-04-22T09:58:00Z"
#     }
#   ]
# }

Batch inference is particularly cost-effective for periodic bulk operations. A common workflow is to configure a source connection (such as Google Drive or S3) with the batch processing toggle enabled, so all documents ingested through that source are automatically routed to the batch queue. This is ideal for overnight processing of large document volumes — documents arrive throughout the day, accumulate in the batch queue, and are submitted together to the provider for off-peak processing. By morning, all results have been applied and documents are ready for review at half the extraction cost.

Frequently asked questions

What is batch inference?+

Batch inference processes documents at 50% cost with a 48-hour delivery window. Stage 1 (OCR + classification) runs immediately; Stage 2 (Claude extraction) is deferred to the batch API.

When should I use batch mode?+

Batch mode is ideal for large backlog ingestion where real-time results are not required. It cuts extraction cost in half compared to real-time processing.

Is there a minimum number of documents for batch processing?+

The batch system requires a minimum of 100 items per batch (a Bedrock requirement). If fewer documents are uploaded in batch mode, the system falls back to real-time processing with a warning.

Does batch mode affect extraction quality?+

No. Batch mode uses the same Claude extraction model and prompts as real-time processing. The only difference is timing — extraction is deferred to take advantage of provider off-peak pricing.

Can I mix batch and real-time documents in the same workspace?+

Yes. Batch and real-time documents coexist in the same workspace and library. Each document tracks its processing_mode independently. You can toggle batch mode per upload or per source connection. Time-sensitive documents use real-time processing while bulk backlogs use batch mode — both produce identical extraction output.

Batch Processing Mode

Monitoring Batches

Uploading Documents

Overview

Frequently asked questions

Related