Extractions
The extractions resource gives you access to the structured data produced by each extraction run. Every extract() call creates an extraction record that you can query, export, and correct after the fact.
// List extractions for a specific document
const extractions = await talonic.extractions.list({
document_id: 'doc_abc123',
status: 'complete',
limit: 10,
})
for (const ext of extractions.data) {
console.log(`${ext.id}: confidence ${ext.confidence_overall}, created ${ext.created_at}`)
}
// Get full extraction with per-field confidence
const extraction = await talonic.extractions.get('ext_xyz789')
console.log(extraction.data) // { vendor_name: 'Acme', total: 1500, ... }
console.log(extraction.confidence?.overall) // 0.94
console.log(extraction.confidence?.fields) // { vendor_name: 0.99, total: 0.91, ... }
console.log(extraction.metadata?.processing_time_ms) // 1820The list() method accepts ListExtractionsParams with optional filters for document_id, schema_id, and status ('complete', 'processing', 'failed'). Pagination uses cursor-based navigation with cursor and limit parameters. The legacy page and per_page parameters are accepted for compatibility. Each extraction in the list response includes a compact confidence_overall number, while the individual get() response includes the full confidence object with per-field scores.
Use getData() to retrieve extraction results as JSON (default) or CSV. The JSON format returns typed objects matching your schema definition, while CSV is convenient for spreadsheet workflows or bulk data exports. The method uses TypeScript overloads: calling with { format: 'json' } or no options returns WithRateLimit<Record<string, unknown>>, while { format: 'csv' } returns a plain string.
// Get structured data as JSON (default)
const jsonData = await talonic.extractions.getData('ext_xyz789')
console.log(jsonData) // { vendor_name: 'Acme', line_items: [...], total: 1500 }
// Get structured data as CSV for spreadsheet workflows
const csvData = await talonic.extractions.getData('ext_xyz789', { format: 'csv' })
console.log(csvData)
// "vendor_name,total,due_date\nAcme Corp,1500,2025-03-15"
// Write CSV to file
import { writeFile } from 'node:fs/promises'
await writeFile('./export.csv', csvData)The patch() method submits field-level corrections back to the extraction. Corrections improve future extraction accuracy for similar documents by feeding the correction loop, so submitting them is worth the effort even if you fix the value downstream. Each correction specifies the field name, the corrected value, and an optional reason string explaining the change. The propagate parameter controls scope: 'this_document_only' (default) applies the correction to this extraction only, while 'all_similar' propagates it to similar extractions across your workspace.
// Submit field-level corrections to improve future accuracy
await talonic.extractions.patch('ext_xyz789', {
corrections: [
{
field: 'vendor_name',
value: 'Acme Corporation',
reason: 'Full legal name required',
},
{
field: 'total_amount',
value: 14250.00,
reason: 'OCR misread decimal separator',
},
],
propagate: 'all_similar', // apply to similar documents too
})The Extraction interface includes metadata about the extraction run. The metadata block provides pages (number of pages processed), language (detected language code), document_type (detected document category), and processing_time_ms. The links object contains URLs for the extraction resource, related endpoints, and the dashboard view. Use these links to build navigation between the SDK and the Talonic web dashboard.