Documents
The documents resource lets you manage uploaded files in your workspace. Every call to extract() creates a document automatically, but you can also list, inspect, re-extract, filter, and delete documents independently.
// List documents with cursor-based pagination
const docs = await talonic.documents.list({ limit: 50 })
console.log(docs.data.map(d => d.filename))
console.log(docs.pagination) // { next_cursor: '...', has_more: true }
// Get a single document with full metadata
const doc = await talonic.documents.get('doc_abc123')
console.log(doc.filename) // 'invoice.pdf'
console.log(doc.status) // 'completed'
console.log(doc.pages) // 3
console.log(doc.triage) // { sensitivity: 'internal', pii_detected: true, ... }
// Get OCR markdown
const md = await talonic.documents.getMarkdown('doc_abc123')
console.log(md.markdown) // '# Invoice\n\nVendor: Acme Corp...'The list() method accepts ListDocumentsParams with filtering by source_id, status ('pending', 'processing', 'completed', 'error'), date range (after, before as ISO 8601 strings), and full-text search across filenames and extracted content. Pagination uses cursor-based navigation: pass limit for page size and cursor from a previous response's pagination.next_cursor to fetch the next page. The legacy page and per_page parameters are accepted as aliases but cursor-based is the canonical form. Results include a pagination object with next_cursor and has_more.
Use getMarkdown() to retrieve the raw OCR output for a document. This is useful for debugging extraction quality or building custom post-processing pipelines on top of the parsed text.
// Filter documents using composable conditions on extracted fields
const filtered = await talonic.documents.filter({
conditions: [
{ field: 'vendor.name', operator: 'eq', value: 'Acme Corp' },
{ field: 'total_amount', operator: 'gt', value: 10000 },
],
sort: { field: 'invoice_date', direction: 'desc' },
limit: 25,
})
console.log(filtered.total) // 47
console.log(filtered.documents) // [{ id: '...', filename: '...', fieldValues: { ... } }, ...]The filter() method lets you query documents by extracted field values using composable conditions. Each condition specifies a field (canonical name like 'vendor.name') or fieldId (UUID), an operator (eq, neq, gt, gte, lt, lte, between, contains, is_empty, is_not_empty), and a value. The between operator also accepts valueTo for range queries. Results include fieldValues with the matched field data for each document hit. You can optionally scope results to a specific source connection with source_connection_id.
// Re-run extraction on an existing document (e.g. after schema update)
const reExtracted = await talonic.documents.reExtract('doc_abc123')
console.log(reExtracted.status) // 'processing'
console.log(reExtracted.message) // 'Re-extraction started'
// Delete a document and all associated extractions (irreversible)
const deleted = await talonic.documents.delete('doc_abc123')
console.log(deleted.deleted) // trueThe get() method returns a Document object with full metadata including triage classification data when available. The triage block contains sensitivity (public, internal, restricted), department, jurisdiction (ISO country code), pii_detected, pii_categories, regulated_data, and confidentiality_marking. The processing_log array shows each pipeline step with status, duration, and detail. These fields are populated progressively as the document moves through the processing pipeline.