Skip to main content

Assembly

Assembly is the post-run compose step. Where the per-document phases each handle one document, assembly works across documents: it groups them and folds each group into a single composed record. It runs once, after every document in the pipeline reaches a terminal state, so it composes the resolved values (the latest cell version per field), not raw extraction. Assembly is how a pipeline turns many related documents into one authoritative row.

Assembly is driven by two fields you name in the rail. The grouping field decides which documents belong together. Within each group, the anchor field identifies the Anchor document: the one whose anchor field matches a configured anchor value. The anchor seeds the composed record, and the other documents in the group act as Amendments that override selected fields per the rule. A group with no anchor, or more than one, is a conflict and composes nothing, so an ambiguous group is surfaced rather than guessed.

The composed overrides are written back onto the anchor record as new cell versions with an assembly source. This means the anchor cell version history reads extraction, then resolution, then assembly, in order, and each assembly cell records the override detail: the previous value, the source document it came from, and the rule that produced it. The version history is the assembly audit trail. The full composed product is also materialized as its own record set for delivery.

The grouping field can be a customer-injected field that never appears in the extracted data. Assembly merges each document's injected fields with its extracted cells before grouping, so you can group by an external contract ID supplied at ingest even though that ID was never written into the run.

A common shape is a base document amended by later ones: an original order and its revisions, or a master agreement and its addenda. The original is the anchor; each amendment overrides the fields it changes. The composed record reflects the current state of the agreement, while the version history on every field preserves exactly which document set each value and when. This gives you both the answer and its provenance in one place.

Assembly is configured as an assembly stage in the rail, naming the grouping and anchor fields and the amendable field set. Because assembly composes the resolved values, place your resolution and validation stages before it so the values it folds together are already normalized and checked. After a run you can re-compose without redoing extraction by re-running only the assembly step over the existing cells.

Assembly composes only after all documents are terminal, so a pipeline with documents still in review will not produce a final composed record for those groups until the review clears. Resolve the held fields in the review queue, then re-run assembly to fold the now-canonical values into the anchor record.

Frequently asked questions

What does assembly do?+
Assembly composes grouped documents into a single record after a pipeline finishes. It groups documents by a grouping field, picks an anchor document per group, seeds the record from the anchor, and lets amendment documents override selected fields. It runs once after every document is terminal, composing resolved values rather than raw extraction.
How does assembly choose which document leads a group?+
By the anchor field. Within a group, the anchor is the document whose anchor field matches a configured anchor value; it seeds the composed record. Other documents are amendments that override specific fields. A group with zero or multiple anchors is a conflict and composes nothing, so ambiguity is surfaced, not guessed.
Where is the audit trail for a composed value?+
In the cell version history. Assembly writes overrides onto the anchor record as new cell versions with an assembly source, so each field reads extraction, then resolution, then assembly. Every assembly cell records the previous value, the source document, and the rule that set it.
Can I group by a field that is not in the documents?+
Yes. Assembly merges each document's customer-injected fields with its extracted cells before grouping, so the grouping field can be an external identifier supplied at ingest (for example a contract ID) even though it was never written into the run's extracted data.