Skip to main content

Node Jobs

Run a single pipeline stage as a standalone async job — transfer, extract, resolve, validate, or assemble — chained over one record set.

Node jobs expose the individual stages of the structuring pipeline as standalone primitives. Where a Spec/pipeline runs the whole rail end to end, node jobs let you run one stage at a time — Transfer, Extraction, Resolution, Validation, or Assembly — and chain them yourself over a shared record_set_id. Each call is asynchronous: it returns a node-run id you poll for status and results.

The typical chain starts with POST /v1/nodes/transfer (or extract) against a set of document_ids and a schema_id, which creates a record set and fills cells. Each subsequent stage takes the record_set_id from the previous one: extract fills gaps the registry could not, resolve applies Data Policies, validate runs your gates, and assemble composes grouped documents. This gives you fine-grained control over the structuring flow when a single Spec run is too coarse.

Node jobs are an advanced primitive. Most integrations should run a configured Spec via POST /v1/pipelines, which compiles and runs the whole rail in one call. Reach for node jobs when you need to drive stages individually.
POST/v1/nodes/transfer
POST/v1/nodes/extract
POST/v1/nodes/resolve
POST/v1/nodes/validate
POST/v1/nodes/assemble

Polling a node run

Each stage call returns a node-run id. Poll GET /v1/nodes/:id for its status and progress, and GET /v1/nodes/:id/results for its output once it completes. Carry the record_set_id forward to the next stage to chain the pipeline.

GET/v1/nodes/:id
GET/v1/nodes/:id/results