Schemas
The schemas resource manages reusable extraction definitions. Instead of passing an inline schema on every extract() call, create a named schema once and reference it by schema_id across all your extractions.
// Create a reusable schema for invoice extraction
const schema = await talonic.schemas.create({
name: 'Invoice Schema v2',
description: 'Standard invoice fields for AP automation',
definition: {
type: 'object',
properties: {
vendor_name: { type: 'string', description: 'Legal entity name of the vendor' },
invoice_number: { type: 'string' },
invoice_date: { type: 'string', format: 'date' },
due_date: { type: 'string', format: 'date' },
total_amount: { type: 'number' },
currency: { type: 'string' },
line_items: {
type: 'array',
items: {
type: 'object',
properties: {
description: { type: 'string' },
quantity: { type: 'number' },
unit_price: { type: 'number' },
},
},
},
},
},
})
console.log(schema.id) // 'sch_abc123'
console.log(schema.short_id) // 'SCH-3A4D79D2'
console.log(schema.version) // 1
// Use the saved schema for extraction
const result = await talonic.extract({
file_path: './invoice.pdf',
schema_id: schema.id,
})The create() method accepts a CreateSchemaParams object with name (required), definition (required, JSON Schema object), and an optional description. The returned Schema object includes id (canonical UUID), short_id (human-readable identifier like 'SCH-3A4D79D2' visible in the dashboard), version (starts at 1, bumps on each update), field_count, extraction_count, and links with URLs for the schema, its extractions, and the dashboard view. Both id and short_id are accepted as lookup keys on get(), update(), and delete().
Schema definitions follow JSON Schema format with type: "object" and properties. Each property specifies a field name and type that the extraction engine will resolve. Updating a schema does not retroactively change existing extractions, but all future extract() calls using that schema_id will pick up the new definition. The version field is bumped automatically on each update so you can track which version a given extraction used.
// List all schemas in the workspace
const schemas = await talonic.schemas.list()
for (const s of schemas.data) {
console.log(`${s.name} (v${s.version}): ${s.field_count} fields, ${s.extraction_count} extractions`)
}
// Update a schema definition (bumps version, does not change existing extractions)
const updated = await talonic.schemas.update('sch_abc123', {
name: 'Invoice Schema v3',
definition: {
type: 'object',
properties: {
vendor_name: { type: 'string' },
invoice_number: { type: 'string' },
total_amount: { type: 'number' },
tax_amount: { type: 'number' }, // new field
payment_terms: { type: 'string' }, // new field
},
},
})
console.log(updated.version) // 2
// Delete a schema (existing extractions are retained)
const deleted = await talonic.schemas.delete('sch_abc123')
console.log(deleted.deleted) // trueSaved schemas are workspace-scoped, so every team member with API access can reference the same schema_id. This makes schemas the right tool for standardising extraction output across a pipeline or team. The update() method uses HTTP PUT and replaces the entire schema definition. Pass all fields you want to keep, not just the changes. The delete() method removes the schema but retains all existing extractions that used it, so historical data is preserved.
// Get a schema by canonical UUID or short_id
const schema = await talonic.schemas.get('sch_abc123')
// or: await talonic.schemas.get('SCH-3A4D79D2')
console.log(schema.name) // 'Invoice Schema v2'
console.log(schema.definition) // { type: 'object', properties: { ... } }
console.log(schema.created_at) // '2025-06-01T10:30:00.000Z'
console.log(schema.updated_at) // '2025-06-15T14:22:00.000Z'
console.log(schema.links) // { self: '...', extractions: '...', dashboard: '...' }type: "object" with properties) rather than the flat key-type shorthand. The server-side normaliser for the shorthand format is not fully supported yet.