Jobs

The jobs resource handles asynchronous batch extraction across multiple documents. Create a job with a schema_id and an array of document_ids, then poll for completion or fetch results when the job finishes.

Create and monitor a batch job

// Create a batch extraction job
const job = await talonic.jobs.create({
  schema_id: 'sch_abc123',
  document_ids: ['doc_001', 'doc_002', 'doc_003', 'doc_004', 'doc_005'],
  name: 'Q4 Invoice Batch',
})

console.log(job.id)     // 'job_xyz789'
console.log(job.status) // 'queued'

// Poll for completion
let current = await talonic.jobs.get(job.id)
while (current.status === 'queued' || current.status === 'processing') {
  console.log(`Progress: ${current.completed_documents}/${current.total_documents} (${current.current_phase})`)
  await new Promise(r => setTimeout(r, 5000))
  current = await talonic.jobs.get(job.id)
}

console.log(`Job ${current.status}: ${current.completed_documents} completed, ${current.failed_documents} failed`)

The create() method accepts CreateJobParams with schema_id (required), optional document_ids (array of document UUIDs to process), and optional name (human-readable label). When document_ids is omitted, the job processes all unprocessed documents in the workspace. The returned Job object includes id, status, progress, total_documents, completed_documents, failed_documents, current_phase, estimated_completion, and links with URLs for the job and its results.

Jobs run server-side and process documents in parallel. Use get() to poll the job status ('queued', 'processing', 'completed', 'failed', 'cancelled') and getResults() to retrieve the extraction output once complete. For long-running batches, poll on a reasonable interval such as every 5 seconds. The grid_stats block on the job object provides total_cells, filled, empty, and fill_rate for monitoring extraction quality across the batch.

Retrieve batch results

// Get structured results from a completed job
const results = await talonic.jobs.getResults('job_xyz789')

for (const row of results.data) {
  console.log(`${row.document_filename}: ${JSON.stringify(row.values)}`)
}
// invoice_001.pdf: { vendor_name: 'Acme', total: 1500, ... }
// invoice_002.pdf: { vendor_name: 'Globex', total: 3200, ... }

// Export to CSV or send to your database
const rows = results.data.map(r => ({
  filename: r.document_filename,
  document_id: r.document_id,
  ...r.values,
}))

Use cancel() to abort a job that is still in progress. Cancelled jobs stop processing remaining documents but any extractions already completed are preserved and accessible via getResults(). The cancelled_at timestamp is set on the job object when cancellation takes effect.

List and cancel jobs

// List jobs filtered by status
const runningJobs = await talonic.jobs.list({ status: 'processing', limit: 10 })
for (const j of runningJobs.data) {
  console.log(`${j.id}: ${j.completed_documents}/${j.total_documents} (est. ${j.estimated_completion})`)
}

// Cancel a job — partial results are retained
const cancelled = await talonic.jobs.cancel('job_xyz789')
console.log(cancelled.status)        // 'cancelled'
console.log(cancelled.cancelled_at)  // '2025-06-15T14:30:00.000Z'

// Retrieve whatever completed before cancellation
const partial = await talonic.jobs.getResults('job_xyz789')
console.log(`Retrieved ${partial.data.length} results before cancellation`)

The list() method accepts ListJobsParams with optional status filter, cursor-based pagination (cursor and limit), and order for sorting. The JobResults response contains a data array where each entry has document_id, document_filename, and values (the extracted field data). This flat structure makes it straightforward to aggregate results across documents for reporting or database insertion.

Jobs require a saved schema_id rather than an inline schema. Create your schema with schemas.create() first, then pass the returned ID to jobs.create().

Frequently asked questions

How do I run batch extraction with the Talonic SDK?+

Call talonic.jobs.create({ schema_id, document_ids }) to start an async batch job, then poll with talonic.jobs.get(id) or retrieve results with talonic.jobs.getResults(id). Omit document_ids to process all unprocessed documents in the workspace.

Can I use an inline schema with jobs?+

No. Jobs require a saved schema_id. Create your schema first with talonic.schemas.create(), then pass the returned ID to talonic.jobs.create().

What happens to completed extractions if I cancel a job?+

Extractions already completed before cancellation are preserved. You can still retrieve them with talonic.jobs.getResults(id). The job status changes to 'cancelled' and the cancelled_at timestamp is set.

How do I monitor batch job progress?+

Poll talonic.jobs.get(id) on a regular interval (e.g. every 5 seconds). The Job object includes completed_documents, total_documents, failed_documents, current_phase, progress, estimated_completion, and grid_stats with fill_rate for quality monitoring.

What job statuses are available for filtering?+

Jobs can be in one of five statuses: 'queued' (waiting to start), 'processing' (actively extracting), 'completed' (all documents processed), 'failed' (terminal error), or 'cancelled' (manually stopped). Use the status parameter on list() to filter.

Extract

Schemas

Jobs

Frequently asked questions

Related