Skip to main content

The Spec & One Engine

A Spec is the configuration layer for production structuring. It sits on top of one schema and describes the full path a document travels: which stages run, in what order, with which policies and validation gates attached. Where a quick Job answers "structure these documents against this schema right now", a Spec answers "this is how documents of this kind are always structured", and it can be run again and again as new documents arrive. The Spec is the front door for any workload that is more than a one-off.

Running a Spec compiles its composed rail into a pipeline and executes it on the One Engine: a single, per-document phase runner that replaces the older multi-strategy structuring run. Each document moves through the same ordered phases independently and in parallel, so one slow document never holds up the rest. Every value the engine writes lands in the value plane as a versioned cell with provenance, so the output is auditable from raw extraction through resolution and human review.

The engine runs four kinds of per-document phase in sequence. Transfer fills cells deterministically from the Field Registry, binding known values with no AI call. Extraction runs Claude over the gap fields that transfer could not fill. Resolution applies your Data Policies to normalize and transform values. Validation checks the document against the gates you placed in the rail. After every document is terminal, a pipeline-scoped Assembly step can compose grouped documents into a single record. Phases are strictly sequential per document: phase N+1 starts only once phase N completes.

Spec, Pipeline, and Job are distinct. A Spec is the saved configuration (rail, policies, gates, assembly) over one schema. A Pipeline is one compiled run of that Spec over a set of documents. A Job is the quick one-time tier for ad-hoc structuring that does not justify setting up a Spec. Specs and Pipelines are the production path; Jobs are the fast path.

This separation is what makes the platform repeatable. You invest once in composing a Spec for a document kind (a delivery note, a purchase order, a contract), attach the policies and checkpoints that encode your quality bar, and from then on every run is a single call. Because the rail is a faithful description of the run, the compiled pipeline always matches what you configured, and a re-run reproduces the same path. The growing Field Registry means each run also resolves more cells deterministically at zero cost.

Spec and Pipeline via API

A Spec is a schema with a composed rail. Create the schema, set its rail, then run a pipeline against it with a set of document IDs. The public pipeline endpoint compiles the rail and starts processing in one call, so there is no separate start step. Poll the progress endpoint to watch each phase advance.

Run a Spec as a pipeline
curl -X POST https://api.talonic.com/v1/pipelines \
  -H "Authorization: Bearer $TALONIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "schema_id": "sch_delivery_notes",
    "document_ids": ["doc_7f3a1b2c", "doc_9e4d5f6a"]
  }'
Response
{
  "id": "pl_x8k2m9",
  "status": "active",
  "schema": { "id": "sch_delivery_notes", "name": "Delivery Notes" },
  "document_count": 2,
  "enqueued_documents": 2,
  "message": "Pipeline created and queued for processing.",
  "links": {
    "self": "/v1/pipelines/pl_x8k2m9",
    "progress": "/v1/pipelines/pl_x8k2m9/progress",
    "data_product": "/v1/pipelines/pl_x8k2m9/data-product"
  }
}
A pipeline run requires the Spec to have a composed rail. If the schema has no rail, the pipeline endpoint returns a 400 telling you to configure the Spec first, or to use POST /v1/jobs for a quick one-off run. Set the rail with PUT /v1/schemas/{id}/rail before your first run.

Frequently asked questions

What is the difference between a Spec, a Pipeline, and a Job?+
A Spec is a reusable configuration over one schema: the composed rail, attached Data Policies, validation checkpoints, and assembly rule. A Pipeline is one compiled run of that Spec over a set of documents. A Job is the quick one-time tier for ad-hoc structuring that does not justify a Spec. Specs and Pipelines are the production path; Jobs are the fast path.
What is the One Engine?+
The One Engine is the per-document phase runner that executes a compiled Spec. Each document passes through Transfer (registry fill), Extraction (AI for gaps), Resolution (Data Policies), and Validation (your gates) in strict sequence, with documents processed in parallel. It replaces the older structuring-run engine and writes every value as a versioned cell in the value plane.
Do I need to call a start endpoint after creating a pipeline?+
No. The public POST /v1/pipelines endpoint compiles the Spec rail and enqueues the documents in a single call. The response status is active and enqueued_documents reflects how many were queued. Poll GET /v1/pipelines/{id}/progress to watch each phase advance.
Why does my pipeline run fail with "no composed rail"?+
A pipeline needs the Spec to have a rail to compile. Set one with PUT /v1/schemas/{id}/rail. A minimal runnable rail is a single extraction stage. If you only need a one-off structuring run without configuring a Spec, use POST /v1/jobs instead.