Assembly
Assembly is the post-run compose step. Where the per-document phases each handle one document, assembly works across documents: it groups them and folds each group into a single composed record. It runs once, after every document in the pipeline reaches a terminal state, so it composes the resolved values (the latest cell version per field), not raw extraction. Assembly is how a pipeline turns many related documents into one authoritative row.
Assembly is driven by two fields you name in the rail. The grouping field decides which documents belong together. Within each group, the anchor field identifies the Anchor document: the one whose anchor field matches a configured anchor value. The anchor seeds the composed record, and the other documents in the group act as Amendments that override selected fields per the rule. A group with no anchor, or more than one, is a conflict and composes nothing, so an ambiguous group is surfaced rather than guessed.
The composed overrides are written back onto the anchor record as new cell versions with an assembly source. This means the anchor cell version history reads extraction, then resolution, then assembly, in order, and each assembly cell records the override detail: the previous value, the source document it came from, and the rule that produced it. The version history is the assembly audit trail. The full composed product is also materialized as its own record set for delivery.
A common shape is a base document amended by later ones: an original order and its revisions, or a master agreement and its addenda. The original is the anchor; each amendment overrides the fields it changes. The composed record reflects the current state of the agreement, while the version history on every field preserves exactly which document set each value and when. This gives you both the answer and its provenance in one place.
Assembly is configured as an assembly stage in the rail, naming the grouping and anchor fields and the amendable field set. Because assembly composes the resolved values, place your resolution and validation stages before it so the values it folds together are already normalized and checked. After a run you can re-compose without redoing extraction by re-running only the assembly step over the existing cells.