Dataset Templates

A data product packages your extraction, resolution, and validation outputs into a shareable, deliverable dataset. Each product wraps one or more run_ids, a validation_run_id, or a pipeline_id, and moves through a status lifecycle of draft, ready, published, and archived (archiving is a soft delete, so the underlying data is retained).

The standalone Dataset Templates page is a non-functional stub. The earlier model where templates defined the output shape and assemblies combined sources is superseded. There is no dataset-template object and no separate template step: a data product is composed directly from its source runs, validation run, or pipeline.

Data products live under Data Products (/delivery/data-products), a tabbed shell that embeds the data-products list alongside a Delivery tab (the Delivery surface itself is hidden from the main nav). Open the list to see each product with its name, the schema it was built against, document counts, and its status.

Shaping the output

To shape the output of a product, configure the upstream pipeline rather than a separate template. Column order, renamed headers, excluded fields, and transforms are governed by the schema and the Resolution stage of the pipeline that feeds the product. Re-running the pipeline produces a fresh set of run values, and the data product reflects them on its next assembly.

Most teams keep one data product per downstream consumer. If your finance team and operations team need different views of the same documents, build two products from the appropriate runs (or pipelines) rather than reconfiguring a single export each time. Archive a product when it is no longer needed: the soft delete keeps its history available for audit while removing it from the active list.

A data product wraps run_ids[], a validation_run_id, or a pipeline_id
Status lifecycle: draft → ready → published → archived (archive is a soft delete)
Output shape is governed by the schema and pipeline, not a separate template object
Lives under Data Products (/delivery/data-products), a tabbed shell with the list plus a Delivery tab
The Dataset Templates page is a non-functional stub
One data product per downstream consumer is the recommended pattern

List data products, filtered by status

curl -s "https://api.talonic.com/v1/data-products?status=published" \
  -H "Authorization: Bearer $TALONIC_API_KEY"

# Response:
# {
#   "data": [
#     {
#       "id": "6a7b8c9d-…",
#       "name": "Q1 2026 Invoice Extract",
#       "description": "Finance handover set",
#       "schema_id": "…",
#       "run_id": "…",
#       "status": "published",
#       "created_at": "2026-04-01T08:00:00Z"
#     }
#   ],
#   "pagination": { ... }
# }
# status filter accepts: draft, ready, published, archived

Create a data product from completed runs

# The public API create path is run-backed; validation-session and
# pipeline-backed products are created from within the app.
curl -X POST https://api.talonic.com/v1/data-products \
  -H "Authorization: Bearer $TALONIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Q1 2026 Invoice Extract",
    "run_ids": ["d4e5f6a7-…"],
    "thresholds": { "min_confidence": 0.8 }
  }'
# -> 201: the created product, with its share token minted

The data product is the bridge between your pipeline and production-ready delivery. A schema defines what fields to extract; the pipeline (extraction, resolution, validation) governs how those fields appear in the final output. The product wraps the resulting runs (or a validation run, or a pipeline) so the assembled dataset is reproducible. To serve different consumers, build separate products from the appropriate runs rather than maintaining a separate template layer.

Data products are workspace-scoped. Any team member can create, view, or archive a product. Archiving is a soft delete: the product leaves the active list but its history is retained for audit and can be referenced later.

Frequently asked questions

Is there a Dataset Templates feature?+

No. The Dataset Templates page is a non-functional stub. The model where templates defined the output shape and assemblies combined sources is superseded. A data product is composed directly from its source runs, a validation run, or a pipeline.

What does a data product wrap?+

A data product wraps one or more run_ids, a validation_run_id, or a pipeline_id. Its status moves through draft, ready, published, and archived, where archiving is a soft delete that retains the underlying history.

How do I shape the output of a data product?+

Configure the upstream pipeline. The schema and the Resolution stage govern column order, renamed headers, excluded fields, and transforms. Re-running the pipeline updates the run values the product assembles from.

How do I serve different downstream consumers?+

Build a separate data product per consumer from the appropriate runs or pipelines. There is no separate template object to maintain. The underlying extraction is shared; only the runs each product references differ.

Can I create a data product via the API?+

Yes. POST /v1/data-products with a name and one or more completed run IDs creates a run-backed product and mints its share token. Optional quality thresholds (min_confidence, require_validation_pass, require_approval) gate which values are included. Validation-session and pipeline-backed products are created from within the app.

Dataset Templates

Shaping the output

Frequently asked questions

Related