Skip to main content

Extractions

The extractions resource gives you access to the structured data produced by each extraction run. Every extract() call creates an extraction record that you can query, export, and correct after the fact.

List and retrieve extractions
// List extractions for a specific document
const extractions = await talonic.extractions.list({
  document_id: 'doc_abc123',
  status: 'complete',
  limit: 10,
})

for (const ext of extractions.data) {
  console.log(`${ext.id}: confidence ${ext.confidence_overall}, created ${ext.created_at}`)
}

// Get full extraction with per-field confidence
const extraction = await talonic.extractions.get('ext_xyz789')
console.log(extraction.data)                    // { vendor_name: 'Acme', total: 1500, ... }
console.log(extraction.confidence?.overall)     // 0.94
console.log(extraction.confidence?.fields)      // { vendor_name: 0.99, total: 0.91, ... }
console.log(extraction.metadata?.processing_time_ms) // 1820

The list() method accepts ListExtractionsParams with optional filters for document_id, schema_id, and status ('complete', 'processing', 'failed'). Pagination uses cursor-based navigation with cursor and limit parameters. The legacy page and per_page parameters are accepted for compatibility. Each extraction in the list response includes a compact confidence_overall number, while the individual get() response includes the full confidence object with per-field scores.

Use getData() to retrieve extraction results as JSON (default) or CSV. The JSON format returns typed objects matching your schema definition, while CSV is convenient for spreadsheet workflows or bulk data exports. The method uses TypeScript overloads: calling with { format: 'json' } or no options returns WithRateLimit<Record<string, unknown>>, while { format: 'csv' } returns a plain string.

Export extraction data
// Get structured data as JSON (default)
const jsonData = await talonic.extractions.getData('ext_xyz789')
console.log(jsonData) // { vendor_name: 'Acme', line_items: [...], total: 1500 }

// Get structured data as CSV for spreadsheet workflows
const csvData = await talonic.extractions.getData('ext_xyz789', { format: 'csv' })
console.log(csvData)
// "vendor_name,total,due_date\nAcme Corp,1500,2025-03-15"

// Write CSV to file
import { writeFile } from 'node:fs/promises'
await writeFile('./export.csv', csvData)

The patch() method submits field-level corrections back to the extraction. Corrections improve future extraction accuracy for similar documents by feeding the correction loop, so submitting them is worth the effort even if you fix the value downstream. Each correction specifies the field name, the corrected value, and an optional reason string explaining the change. The propagate parameter controls scope: 'this_document_only' (default) applies the correction to this extraction only, while 'all_similar' propagates it to similar extractions across your workspace.

Submit corrections
// Submit field-level corrections to improve future accuracy
await talonic.extractions.patch('ext_xyz789', {
  corrections: [
    {
      field: 'vendor_name',
      value: 'Acme Corporation',
      reason: 'Full legal name required',
    },
    {
      field: 'total_amount',
      value: 14250.00,
      reason: 'OCR misread decimal separator',
    },
  ],
  propagate: 'all_similar', // apply to similar documents too
})

The Extraction interface includes metadata about the extraction run. The metadata block provides pages (number of pages processed), language (detected language code), document_type (detected document category), and processing_time_ms. The links object contains URLs for the extraction resource, related endpoints, and the dashboard view. Use these links to build navigation between the SDK and the Talonic web dashboard.

A single document can have multiple extractions if it was re-extracted with different schemas. Use list({ document_id }) to see all extraction runs for a given document.

Frequently asked questions

How do I get extraction data as CSV?+
Call talonic.extractions.getData(id, { format: "csv" }) to receive the structured data as a CSV string. The return type is a plain string rather than a WithRateLimit wrapper.
What does the patch() method do?+
It submits field-level corrections to an extraction result. Each correction specifies a field name, corrected value, and optional reason. Corrections are fed back into the extraction engine to improve accuracy for similar documents in future runs. Use the propagate parameter to control whether corrections apply to this document only or all similar documents.
Can a document have multiple extractions?+
Yes. Re-extracting a document with a different schema creates a new extraction record. Use extractions.list({ document_id }) to retrieve all runs for one document.
What is the difference between get() and getData()?+
The get() method returns the full Extraction object including metadata, confidence scores, status, timestamps, and links alongside the extracted data. The getData() method returns just the extracted field values as a plain JSON object or CSV string, without the surrounding metadata. Use get() for inspection and debugging, getData() for data export and downstream processing.
How do I filter extractions by status?+
Pass the status parameter to extractions.list() with one of 'complete', 'processing', or 'failed'. You can combine this with document_id and schema_id filters to narrow results further. Pagination is cursor-based using cursor and limit parameters.