Skip to main content

Assemblies

An assembly combines documents from one or more sources into a single structured dataset based on a template. Assemblies track their constituent documents, source counts, and processing status.

Navigate to Data Products → Assemblies to view and create assemblies. Each assembly shows its document count, linked schema, processing status, and the date it was created.

When you create an assembly, you select a dataset template and one or more document sources. The system pulls all matching documents, applies the template's column mappings and transforms, and produces a single structured output. The assembly tracks which documents contributed to each row, giving you full traceability from output back to source.

Use assemblies whenever you need a repeatable, auditable output for downstream systems or stakeholders. Most teams create one assembly per reporting period or delivery cycle. Because assemblies reference a template, you can regenerate the same output shape from different document sets without reconfiguring columns or transforms each time.

Assemblies also support incremental updates. When new documents arrive in a source that is already part of an assembly, you can regenerate the assembly to include them without reconfiguring anything. The system re-applies the template, pulls the updated document set, and produces a fresh output. Previous assembly versions are retained for comparison, so you can track how your dataset evolves over successive runs.

  • Select a dataset template and one or more document sources to create an assembly
  • Column mappings and transforms from the template are applied automatically
  • Full traceability from every output row back to its source document
  • Incremental updates — regenerate to include newly arrived documents
  • Previous assembly versions retained for comparison and auditing
  • Export the assembled dataset as CSV with leading zero preservation
Export an assembled dataset as CSV
# Download the assembled dataset with leading zero preservation:
curl -s "https://api.talonic.com/v1/data-products/dp_001/export?format=csv" \
  -H "Authorization: Bearer $TALONIC_API_KEY" \
  -o "q1_invoices.csv"

# CSV values are never coerced to numbers — leading zeros
# on codes like ZIP codes and account numbers are preserved.
# Fields like "00123" remain "00123", not 123.

Assemblies support incremental workflows that align with real-world business cadences. A common pattern is to create a weekly assembly that pulls all newly arrived documents from connected sources, applies the template transforms, and produces a fresh output. Because previous assembly versions are retained, you can compare this week's output against last week's to identify changes — new records added, values updated, or documents removed. This diff capability is particularly valuable for reconciliation workflows where you need to track what changed between reporting periods.

Assemblies are the recommended way to produce production datasets. They provide a single audit trail from source documents through extraction, resolution, and validation to the final output. If your workflow requires repeatable, auditable deliverables, assemblies eliminate the need for manual export configuration on every run.

Frequently asked questions

What is an assembly?+
An assembly combines documents from one or more sources into a single structured dataset based on a template. It tracks constituent documents, source counts, and processing status.
Why should I use assemblies for production data?+
Assemblies provide a single audit trail from source documents through extraction, resolution, and validation to the final output, making them the recommended approach for production datasets. Unlike ad-hoc exports, assemblies are versioned and reproducible — you can regenerate the same output shape from different document sets without reconfiguring columns or transforms. Previous versions are retained automatically, so you can compare outputs across time periods and demonstrate compliance with audit requirements.
Can an assembly pull from multiple sources?+
Yes. An assembly can combine documents from any number of sources — uploaded files, connected drives, email attachments, and more — into a single structured dataset. This is particularly useful for cross-functional reporting where data arrives through different channels. For example, you can combine invoices from a Google Drive connector, purchase orders uploaded manually, and contracts ingested via the API into a single unified procurement dataset.
How do I regenerate an assembly with updated documents?+
Navigate to the assembly detail page and click Regenerate, or trigger it via the API. The system re-applies the template, pulls the updated document set from all configured sources, and produces a fresh output. The previous assembly version is retained for comparison. No reconfiguration of columns or transforms is needed — the template handles everything.