Skip to main content

Schemas

The schemas resource manages reusable extraction definitions. Instead of passing an inline schema on every extract() call, create a named schema once and reference it by schema_id across all your extractions.

Create and use a saved schema
// Create a reusable schema for invoice extraction
const schema = await talonic.schemas.create({
  name: 'Invoice Schema v2',
  description: 'Standard invoice fields for AP automation',
  definition: {
    type: 'object',
    properties: {
      vendor_name: { type: 'string', description: 'Legal entity name of the vendor' },
      invoice_number: { type: 'string' },
      invoice_date: { type: 'string', format: 'date' },
      due_date: { type: 'string', format: 'date' },
      total_amount: { type: 'number' },
      currency: { type: 'string' },
      line_items: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            description: { type: 'string' },
            quantity: { type: 'number' },
            unit_price: { type: 'number' },
          },
        },
      },
    },
  },
})

console.log(schema.id)        // 'sch_abc123'
console.log(schema.short_id)  // 'SCH-3A4D79D2'
console.log(schema.version)   // 1

// Use the saved schema for extraction
const result = await talonic.extract({
  file_path: './invoice.pdf',
  schema_id: schema.id,
})

The create() method accepts a CreateSchemaParams object with name (required), definition (required, JSON Schema object), and an optional description. The returned Schema object includes id (canonical UUID), short_id (human-readable identifier like 'SCH-3A4D79D2' visible in the dashboard), version (starts at 1, bumps on each update), field_count, extraction_count, and links with URLs for the schema, its extractions, and the dashboard view. Both id and short_id are accepted as lookup keys on get(), update(), and delete().

Schema definitions follow JSON Schema format with type: "object" and properties. Each property specifies a field name and type that the extraction engine will resolve. Updating a schema does not retroactively change existing extractions, but all future extract() calls using that schema_id will pick up the new definition. The version field is bumped automatically on each update so you can track which version a given extraction used.

List, update, and delete schemas
// List all schemas in the workspace
const schemas = await talonic.schemas.list()
for (const s of schemas.data) {
  console.log(`${s.name} (v${s.version}): ${s.field_count} fields, ${s.extraction_count} extractions`)
}

// Update a schema definition (bumps version, does not change existing extractions)
const updated = await talonic.schemas.update('sch_abc123', {
  name: 'Invoice Schema v3',
  definition: {
    type: 'object',
    properties: {
      vendor_name: { type: 'string' },
      invoice_number: { type: 'string' },
      total_amount: { type: 'number' },
      tax_amount: { type: 'number' },  // new field
      payment_terms: { type: 'string' }, // new field
    },
  },
})
console.log(updated.version) // 2

// Delete a schema (existing extractions are retained)
const deleted = await talonic.schemas.delete('sch_abc123')
console.log(deleted.deleted) // true

Saved schemas are workspace-scoped, so every team member with API access can reference the same schema_id. This makes schemas the right tool for standardising extraction output across a pipeline or team. The update() method uses HTTP PUT and replaces the entire schema definition. Pass all fields you want to keep, not just the changes. The delete() method removes the schema but retains all existing extractions that used it, so historical data is preserved.

Retrieve a schema by ID
// Get a schema by canonical UUID or short_id
const schema = await talonic.schemas.get('sch_abc123')
// or: await talonic.schemas.get('SCH-3A4D79D2')

console.log(schema.name)         // 'Invoice Schema v2'
console.log(schema.definition)   // { type: 'object', properties: { ... } }
console.log(schema.created_at)   // '2025-06-01T10:30:00.000Z'
console.log(schema.updated_at)   // '2025-06-15T14:22:00.000Z'
console.log(schema.links)        // { self: '...', extractions: '...', dashboard: '...' }
Use full JSON Schema format (type: "object" with properties) rather than the flat key-type shorthand. The server-side normaliser for the shorthand format is not fully supported yet.

Frequently asked questions

How do I create a reusable schema?+
Call talonic.schemas.create({ name, definition }) with a JSON Schema definition. The returned object includes id and short_id. Use either as schema_id in future extract calls.
Are schemas shared across team members?+
Yes. Schemas are workspace-scoped. Anyone with API access to the workspace can reference the same schema_id in their extract calls.
Does updating a schema change existing extractions?+
No. Updating a schema bumps the version number and only affects future extract calls that reference that schema_id. Existing extraction results remain unchanged and retain their original schema version.
What happens to extractions when I delete a schema?+
Existing extractions are retained when you delete a schema. The extraction data remains accessible via the extractions resource. Only the schema definition itself is removed, so you can no longer use that schema_id for new extractions.
Can I look up a schema by its short_id?+
Yes. Both the canonical UUID (id) and the human-readable short_id (e.g. 'SCH-3A4D79D2') are accepted as lookup keys on get(), update(), and delete(). The short_id is displayed in the Talonic dashboard for easy reference.