Overview
Documents can be processed in batch mode at 50% cost with a 48-hour delivery window. Toggle batch mode on the upload screen or set processing_mode=batch via the API. Batch processing is ideal for large backlog ingestion where real-time results are not required. The cost reduction comes from the provider's native batch API, which schedules processing during off-peak capacity — there is no loss in extraction quality because the same Claude model and prompts are used as in real-time mode.
Under the hood, batch inference leverages the provider's native batch API (Anthropic Message Batches or AWS Bedrock invocation jobs). Documents accumulate in a queue and are submitted together, allowing the provider to schedule processing during off-peak capacity. This is why the cost reduction is possible without any loss in extraction quality.
Batch mode is best suited for backlog ingestion, periodic bulk uploads, and any scenario where results are not needed in real time. Most teams use batch mode for overnight processing of large document volumes and reserve real-time processing for time-sensitive documents that need immediate attention.
When batch results arrive, they pass through the same post-processing pipeline as real-time extractions — including markdown pre-processing, field parsing, quality metrics, and extraction metadata computation. The only difference is that LLM-based quality passes (field estimation, verification, cross-reference enrichment) are skipped in batch mode to preserve the cost savings.
- 50% cost reduction on all Claude extraction calls in Stage 2.
- 48-hour delivery window — most batches complete well within this timeframe.
- No quality difference — the same extraction model and prompts are used as in real-time mode.
- Immediate visibility — documents appear in your library right after Stage 1 (OCR + classification).
- Automatic result application — when the batch completes, results are applied and documents transition to their final status.
curl -X POST https://api.talonic.com/v1/extract \
-H "Authorization: Bearer $TALONIC_API_KEY" \
-F "file=@invoices_batch.pdf" \
-F "processing_mode=batch"
# Response:
# {
# "document_id": "doc_batch_001",
# "status": "batch_queued",
# "processing_mode": "batch",
# "stage_1_completed": true,
# "document_type": "Invoice",
# "message": "Stage 1 complete. Stage 2 deferred to batch API."
# }curl -s https://api.talonic.com/v1/batches \
-H "Authorization: Bearer $TALONIC_API_KEY"
# Response:
# {
# "batches": [
# {
# "id": "batch_abc",
# "status": "submitted",
# "item_count": 150,
# "submitted_at": "2025-04-22T10:15:00Z",
# "provider": "anthropic"
# }
# ]
# }Batch inference is particularly cost-effective for periodic bulk operations. A common workflow is to configure a source connection (such as Google Drive or S3) with the batch processing toggle enabled, so all documents ingested through that source are automatically routed to the batch queue. This is ideal for overnight processing of large document volumes — documents arrive throughout the day, accumulate in the batch queue, and are submitted together to the provider for off-peak processing. By morning, all results have been applied and documents are ready for review at half the extraction cost.