Skip to main content

Jobs

The jobs resource handles asynchronous batch extraction across multiple documents. Create a job with a schema_id and an array of document_ids, then poll for completion or fetch results when the job finishes.

Create and monitor a batch job
// Create a batch extraction job
const job = await talonic.jobs.create({
  schema_id: 'sch_abc123',
  document_ids: ['doc_001', 'doc_002', 'doc_003', 'doc_004', 'doc_005'],
  name: 'Q4 Invoice Batch',
})

console.log(job.id)     // 'job_xyz789'
console.log(job.status) // 'queued'

// Poll for completion
let current = await talonic.jobs.get(job.id)
while (current.status === 'queued' || current.status === 'processing') {
  console.log(`Progress: ${current.completed_documents}/${current.total_documents} (${current.current_phase})`)
  await new Promise(r => setTimeout(r, 5000))
  current = await talonic.jobs.get(job.id)
}

console.log(`Job ${current.status}: ${current.completed_documents} completed, ${current.failed_documents} failed`)

The create() method accepts CreateJobParams with schema_id (required), optional document_ids (array of document UUIDs to process), and optional name (human-readable label). When document_ids is omitted, the job processes all unprocessed documents in the workspace. The returned Job object includes id, status, progress, total_documents, completed_documents, failed_documents, current_phase, estimated_completion, and links with URLs for the job and its results.

Jobs run server-side and process documents in parallel. Use get() to poll the job status ('queued', 'processing', 'completed', 'failed', 'cancelled') and getResults() to retrieve the extraction output once complete. For long-running batches, poll on a reasonable interval such as every 5 seconds. The grid_stats block on the job object provides total_cells, filled, empty, and fill_rate for monitoring extraction quality across the batch.

Retrieve batch results
// Get structured results from a completed job
const results = await talonic.jobs.getResults('job_xyz789')

for (const row of results.data) {
  console.log(`${row.document_filename}: ${JSON.stringify(row.values)}`)
}
// invoice_001.pdf: { vendor_name: 'Acme', total: 1500, ... }
// invoice_002.pdf: { vendor_name: 'Globex', total: 3200, ... }

// Export to CSV or send to your database
const rows = results.data.map(r => ({
  filename: r.document_filename,
  document_id: r.document_id,
  ...r.values,
}))

Use cancel() to abort a job that is still in progress. Cancelled jobs stop processing remaining documents but any extractions already completed are preserved and accessible via getResults(). The cancelled_at timestamp is set on the job object when cancellation takes effect.

List and cancel jobs
// List jobs filtered by status
const runningJobs = await talonic.jobs.list({ status: 'processing', limit: 10 })
for (const j of runningJobs.data) {
  console.log(`${j.id}: ${j.completed_documents}/${j.total_documents} (est. ${j.estimated_completion})`)
}

// Cancel a job — partial results are retained
const cancelled = await talonic.jobs.cancel('job_xyz789')
console.log(cancelled.status)        // 'cancelled'
console.log(cancelled.cancelled_at)  // '2025-06-15T14:30:00.000Z'

// Retrieve whatever completed before cancellation
const partial = await talonic.jobs.getResults('job_xyz789')
console.log(`Retrieved ${partial.data.length} results before cancellation`)

The list() method accepts ListJobsParams with optional status filter, cursor-based pagination (cursor and limit), and order for sorting. The JobResults response contains a data array where each entry has document_id, document_filename, and values (the extracted field data). This flat structure makes it straightforward to aggregate results across documents for reporting or database insertion.

Jobs require a saved schema_id rather than an inline schema. Create your schema with schemas.create() first, then pass the returned ID to jobs.create().

Frequently asked questions

How do I run batch extraction with the Talonic SDK?+
Call talonic.jobs.create({ schema_id, document_ids }) to start an async batch job, then poll with talonic.jobs.get(id) or retrieve results with talonic.jobs.getResults(id). Omit document_ids to process all unprocessed documents in the workspace.
Can I use an inline schema with jobs?+
No. Jobs require a saved schema_id. Create your schema first with talonic.schemas.create(), then pass the returned ID to talonic.jobs.create().
What happens to completed extractions if I cancel a job?+
Extractions already completed before cancellation are preserved. You can still retrieve them with talonic.jobs.getResults(id). The job status changes to 'cancelled' and the cancelled_at timestamp is set.
How do I monitor batch job progress?+
Poll talonic.jobs.get(id) on a regular interval (e.g. every 5 seconds). The Job object includes completed_documents, total_documents, failed_documents, current_phase, progress, estimated_completion, and grid_stats with fill_rate for quality monitoring.
What job statuses are available for filtering?+
Jobs can be in one of five statuses: 'queued' (waiting to start), 'processing' (actively extracting), 'completed' (all documents processed), 'failed' (terminal error), or 'cancelled' (manually stopped). Use the status parameter on list() to filter.