Skip to main content

Extract data from bills of materials

A bill of materials is the recipe for a manufactured product: every part, the quantity of each, and how the parts nest into subassemblies. Procurement, manufacturing engineering, and contract manufacturers all work from it, and it frequently arrives as a PDF exported from a PLM or ERP system rather than a clean file. An electronics BOM for a control board might run 220 lines, each with a part number, a manufacturer such as Texas Instruments or Murata, a manufacturer part number, a quantity, a unit of measure, and the reference designators (R12, C7, U3) that tie the line to a position on the board. A mechanical BOM for an assembly nests differently, with a top-level assembly that contains subassemblies that contain individual machined parts. The structural challenge is hierarchy and identity. A multi-level BOM uses an indented level column (0, 1, 2) to show that a subassembly belongs to a parent, and that indentation is visual, not tagged. The same physical part appears under two different internal part numbers when two divisions maintain separate catalogs. Reference designators are listed as ranges (R1 to R8) that have to expand to a count of eight. Quantities depend on the unit of measure, so 2.5 might mean 2.5 meters of wire on one line and 2 each plus a fractional rounding error on another. Alternate or approved-manufacturer parts are noted in a sub-line under the primary. Talonic reads the BOM and returns a structured part list that preserves the level hierarchy, the reference designators expanded to counts, and the manufacturer and internal part numbers as distinct fields. Procurement can price the list and engineering can reconcile it against the design without retyping 220 rows. A 220-line BOM revised 2026-03-15 lists Texas Instruments and Analog Devices parts, each with an MPN, a SKU, and a UOM, exported from the PLM as a PDF, priced in USD, and issued to procurement via EDI into the ERP.

What gets extracted from bills of materials

Assembly NameMotor Controller Rev C
Assembly Part NumberASM-3050-C
Level1Indent depth in a multi-level BOM
Line Part NumberPN-10422
ManufacturerTexas Instruments
Manufacturer Part NumberLM2596S-ADJ
Quantity8
Unit of MeasureEA
Reference DesignatorsR1 to R8

How extraction works for bills of materials

Bills of materials are exported from PLM and ERP systems such as Arena, SolidWorks PDM, SAP, and Oracle, and often land as a printed PDF for a quote, so the column set and the way hierarchy is shown vary widely. Talonic reads the BOM and maps it to the bill-of-materials schema in the Field Registry, which models the parent-child level structure rather than a flat parts list. The indented level column is used to attach each subassembly and part to its parent. Reference designator ranges such as R1 to R8 are expanded so the designator count matches the stated quantity. Manufacturer part numbers and internal part numbers are kept as separate fields, and alternate or approved manufacturers are captured as sub-lines. Every value returns with a confidence score and pixel-region provenance under DIN SPEC 91491 conformity, so engineering and procurement can verify a line against the source BOM.

Sample extraction

A multi-level electronics BOM exported to PDF

{
  "assembly_name": "Motor Controller Rev C",
  "assembly_part_number": "ASM-3050-C",
  "revision": "C",
  "lines": [
    {
      "level": 1,
      "part_number": "PN-10422",
      "manufacturer": "Texas Instruments",
      "mpn": "LM2596S-ADJ",
      "quantity": 8,
      "uom": "EA",
      "ref_designators": [
        "R1",
        "R2",
        "R3",
        "R4",
        "R5",
        "R6",
        "R7",
        "R8"
      ]
    },
    {
      "level": 1,
      "part_number": "PN-10588",
      "manufacturer": "Murata",
      "mpn": "GRM188R61A106KE69D",
      "quantity": 12,
      "uom": "EA",
      "ref_designators": [
        "C1",
        "C2",
        "C3",
        "C4",
        "C5",
        "C6",
        "C7",
        "C8",
        "C9",
        "C10",
        "C11",
        "C12"
      ]
    }
  ]
}

Frequently asked

Does it preserve multi-level BOM hierarchy?

Yes. The indented level column is read so subassemblies attach to their parent assembly, and the result is a tree. A flat extraction would lose the relationship between a subassembly and the parts it contains.

How are reference designators handled?

Designator ranges such as R1 to R8 are expanded to an explicit list of eight, and the count is checked against the stated quantity so a mismatch between designators and quantity is flagged.

Can it keep manufacturer and internal part numbers separate?

Yes. The internal part number, the manufacturer name, and the manufacturer part number are distinct fields, and approved alternate manufacturers are captured as sub-lines under the primary part.

Ready to extract from your own bills of materials?

Author note

Reviewed by Talonic engineering, schema review · last reviewed 2026-06-12