Extract data from purchase orders
Purchase orders sit at the start of every B2B procurement workflow. The PO is the formal commitment: buyer X requests these items, at these unit prices, from supplier Y, for delivery to this location, by this date, under these terms. Sourcing teams generate POs out of an ERP. Vendors receive them as PDFs in email, in EDI feeds (X12 850 in the US, EDIFACT ORDERS in Europe), or through buyer-side punchout integrations. Order desks at supplier companies then re-key the same data into their own systems to acknowledge the PO and confirm fulfillment. That re-keying is the bottleneck: a typical mid-market supplier processes 200 to 2,000 POs a month, mostly from buyers using different ERP templates, and even small mistakes (wrong SKU, transposed quantity, missed delivery date) cascade into ship-block exceptions and missed cut-offs. The hard parts live in the table. POs almost always carry line items: SKU or part number, description, ordered quantity, unit of measure (each, case, pallet, kilogram), unit price, line total, requested delivery date per line, and sometimes a ship-to address that differs from the header. Some POs include service lines (consulting hours, milestone deliverables) alongside material lines. Long-running blanket POs have release schedules that look like child line items. Currency is mostly USD for US buyers but can mix EUR or GBP for European multinationals. Header fields are usually clean: PO number, PO date, buyer name and address, supplier name and address, payment terms, Incoterms, and the buyer's authorized signature. The full structure has to land in the supplier's ERP exactly the way the buyer's ERP rendered it, or the order acknowledgement bounces. Talonic extracts the full PO structure from any source format. Line items are returned as a structured array with all the fields above, regardless of whether the source PDF uses one table per page or stitches the table across continuation pages. Multi-line addresses are parsed into structured components. Every extracted cell carries a confidence score and a pixel-region reference so the supplier's order desk can verify any field before acknowledging the PO in their ERP.
What gets extracted from purchase orders
How extraction works for purchase orders
POs originate in buyer-side ERPs in dozens of formats: SAP Ariba, Coupa, NetSuite, Microsoft Dynamics, Oracle, and custom systems. Talonic classifies each PO and matches it against the procurement schema in the Field Registry, which maps every header and line-item field regardless of the source ERP layout. Multi-page tables are stitched. Unit-of-measure variations (each, case, pallet, KG, LB) are normalized to a canonical UOM string. Currency follows ISO 4217. Delivery dates that vary per line are preserved per line rather than collapsed to a single header date. The output is structured so it can be routed into a supplier-side order management system without re-keying, and the per-cell confidence with pixel-region provenance keeps the extraction auditable under DIN SPEC 91491 conformity.
Sample extraction
A typical B2B purchase order in USD with two line items
{
"po_number": "PO-2026-01102",
"po_date": "2026-04-05",
"buyer": "Globex Logistics LLC",
"supplier": "Acme Software, Inc.",
"ship_to": "Globex Warehouse 3, 4421 Logistics Pkwy, Memphis, TN 38116",
"currency": "USD",
"line_items": [
{
"sku": "ASW-PRO-AN",
"description": "Annual subscription, Pro plan",
"quantity": 5,
"uom": "EACH",
"unit_price": 1200,
"line_total": 6000,
"delivery_date": "2026-05-15"
},
{
"sku": "ASW-ONBOARD",
"description": "Onboarding services",
"quantity": 8,
"uom": "HR",
"unit_price": 250,
"line_total": 2000,
"delivery_date": "2026-04-30"
}
],
"totals": {
"subtotal": 8000,
"tax": 0,
"total": 8000
},
"payment_terms": "Net 45, FOB Origin"
}Frequently asked
Can Talonic handle POs from any buyer-side ERP?
Yes. SAP Ariba, Coupa, NetSuite, Microsoft Dynamics, Oracle, and custom systems each produce different PO layouts. The schema does not require per-template configuration. Extraction adapts to whatever the source PDF looks like.
What about EDI POs (X12 850, EDIFACT ORDERS)?
Talonic processes PDF renders of EDI POs, which is the common form when EDI is exchanged outside an integrated EDI VAN. Native EDI flat-file ingest is supported through the API but is a separate code path from PDF extraction.
How are blanket POs and call-offs handled?
Blanket POs (long-running agreements with periodic releases) are extracted as a standard PO; each release is treated as a child line item with its own quantity and delivery date. The aggregate annual quantity sits in the header notes if the source includes it.
Are line-level delivery dates preserved?
Yes. Each line item carries its own delivery_date when the source PO specifies one. If only a header-level delivery date is shown, that date is repeated on every line for downstream compatibility.
Can the output be routed directly into our order management system?
The structured JSON maps cleanly into common order management formats. Field name mapping into your specific destination system happens downstream; Talonic provides the structured PO data, your integration layer routes it.
Ready to extract from your own purchase orders?
Author note
Reviewed by Talonic engineering, procurement subject-matter review · last reviewed 2026-05-14