Skip to main content

Running a Pipeline

Running a pipeline applies a Spec to a set of documents. Each document is enqueued as its own job and processed in parallel; within a document the phases run in strict sequence. The run is observable while it executes: the progress endpoint reports per-phase counts (pending, running, completed, errors), so you can watch transfer drain into extraction, extraction into resolution, and so on. Results appear progressively, the same way they do for a Job.

A document that clears every phase finishes as complete. A document with fields still blocked at a validation gate, held for declarative review, or waiting on a dependency finishes as partial: its clean fields are done, and the flagged fields wait in the review queue. This is by design. The pipeline never silently ships a value that failed a gate; it holds that cell and surfaces it for a human decision while the rest of the document proceeds.

Once a pipeline is finished you can turn it into a data product: an assembled, deliverable dataset that reads the latest canonical value per cell. The data product owns its own review surface, so creating one replays the pipeline gates over the stored verdicts to reconstruct the review queue, and held cells surface with status only until a reviewer resolves them. From there the product flows into delivery the same as any other.

Polling progress

Poll the progress endpoint to track a run. It derives its phase rows from the Spec rail, so the phases you see match what you composed, and it returns an errors array with per-document messages for anything that failed.

Check pipeline progress
curl https://api.talonic.com/v1/pipelines/pl_x8k2m9/progress \
  -H "Authorization: Bearer $TALONIC_API_KEY"
Response
{
  "pipelineId": "pl_x8k2m9",
  "status": "active",
  "totalDocuments": 2,
  "completedDocuments": 1,
  "errorDocuments": 0,
  "phases": [
    { "phaseId": "transfer", "name": "Field Registry", "type": "transfer",
      "completed": 2, "running": 0, "pending": 0, "errors": 0 },
    { "phaseId": "extraction", "name": "Extraction", "type": "extraction",
      "completed": 1, "running": 1, "pending": 0, "errors": 0 }
  ],
  "errors": []
}

Re-run and data product

A finished pipeline can be re-run from a phase onward without redoing the work before it: pass the phase type to start from, and earlier phases reuse their stored cells. When the output looks right, create a data product from the run to make it deliverable.

Re-run from resolution onward
curl -X POST https://api.talonic.com/v1/pipelines/pl_x8k2m9/rerun \
  -H "Authorization: Bearer $TALONIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "from_phase": "resolution" }'
Create a data product from the run
curl -X POST https://api.talonic.com/v1/pipelines/pl_x8k2m9/data-product \
  -H "Authorization: Bearer $TALONIC_API_KEY"
Held and blocked cells surface with status only and a null value in data-product reads, the public share page, and CSV exports, until a reviewer resolves them. This holdback is enforced at read time, so an in-review value can never leak into delivered output.

Frequently asked questions

How do I track a running pipeline?+
Poll GET /v1/pipelines/{id}/progress. It reports overall status and per-phase counts (pending, running, completed, errors) derived from the Spec rail, plus an errors array with per-document messages. Results are progressive, so you can begin reviewing completed documents while others are still processing.
Why did a document finish as "partial"?+
A document finishes partial when it has fields blocked at a validation gate, held for declarative review, or waiting on a dependency. Its clean fields are complete; the flagged fields wait in the review queue for a human decision. The pipeline never ships a value that failed a blocking gate.
Can I re-run only part of a pipeline?+
Yes. POST /v1/pipelines/{id}/rerun with from_phase re-runs from that phase onward and reuses the stored cells from earlier phases. This is useful after changing a Data Policy or a gate: re-run from resolution or validation without repeating transfer and extraction.
How does a pipeline become deliverable?+
Create a data product from the finished run with POST /v1/pipelines/{id}/data-product. The data product reads the latest canonical value per cell, owns its own review surface, and feeds delivery. Cells still in review surface with status only until resolved, so delivered output is always reviewed.