Extract data from air waybills
Air Waybills are the carriage contract for air cargo. Every shipment moving on a passenger or freighter flight travels with an AWB, a numbered document issued by the airline (the carrier) or a freight forwarder consolidating cargo from multiple shippers. The AWB number itself follows a strict IATA-defined format: a three-digit airline prefix (Lufthansa is 020, Singapore Airlines is 618, FedEx is 023) followed by an 8-digit serial. That prefix is how customs, ground handlers, and the carrier's own systems route the cargo from origin to destination. Air freight teams at large importers, customs brokers clearing cargo into a country, and freight forwarders consolidating shipments at gateway airports all need the same data: AWB number, shipper, consignee, origin airport (IATA code), destination airport, carrier, flight numbers, pieces, weight, dimensions, and freight charges. The hard parts are the dual layout and the rate computation. AWBs are printed in a fixed eight-block layout that has been the IATA standard for decades, but the print quality varies wildly: an electronic AWB (e-AWB) from a major carrier renders cleanly, a thermal-printed AWB from a regional handler smudges, and a forwarder-house AWB inserts its own branding into the same template. The cargo description block holds free text in two columns. Dimensions are often handwritten on the original AWB and re-typed on the carrier copy. Charges columns include weight charge, valuation charge, taxes, and other charges, each in the local currency at the origin airport, with a Currency Code (CC) field that defines which currency. Master AWBs (issued by carriers) and House AWBs (issued by forwarders, referencing a master) carry the same fields but link to each other through the prefix-serial structure. Talonic extracts the full AWB structure regardless of carrier prefix or layout variant. Airport codes are validated against the IATA airport list. Weight is normalized to a numeric kilogram value with an explicit unit field. Charges are returned as typed numeric values with the explicit currency code. Master and house AWB linkages are detected when present. Every extracted cell carries a confidence score and a pixel-region pointer so a customs broker or air freight forwarder can audit any value against the source AWB before clearing the cargo at the destination airport.
What gets extracted from air waybills
How extraction works for air waybills
AWBs originate from carrier reservations systems, forwarder TMS platforms, and consolidator software. Talonic classifies each AWB by issue type (master, house, neutral) and routes it through the air-cargo schema in the Field Registry, which encodes the eight-block IATA layout. Airline prefixes are validated against the IATA list. Airport codes are validated as three-letter IATA identifiers. Weight is normalized to kilograms with an explicit unit field. Currency follows ISO 4217 and the AWB Currency Code (CC) box is captured separately. Master-to-house linkages are detected when both prefix-serial numbers are present on the same document. Per-cell confidence and pixel-region provenance follow DIN SPEC 91491 conformity, so air freight forwarders and customs brokers can audit charges or cargo descriptions against the source AWB before clearing the shipment.
Sample extraction
A Lufthansa Cargo Air Waybill, Frankfurt to New York JFK
{
"awb_number": "020-44782311",
"issue_date": "2026-04-18",
"shipper_name": "Berlin Tech Components GmbH",
"shipper_address": "Friedrichstr. 200, 10117 Berlin, Germany",
"consignee_name": "Acme Imports LLC",
"consignee_address": "88 Harbor Blvd, New York, NY 10001, USA",
"carrier_name": "Lufthansa Cargo",
"vessel_or_flight": "LH 400 / 2026-04-19",
"port_of_loading": "FRA Frankfurt",
"port_of_discharge": "JFK New York",
"cargo_items_description": "Industrial machine parts, HS 8479.89",
"cargo_items_quantity": 4,
"cargo_items_gross_weight": 142.5,
"cargo_items_volume": 0.42,
"cargo_items_marks_and_numbers": "BTC PROJ-2026-04 / labels 1-4",
"incoterms": "FCA Frankfurt",
"freight_terms": "Freight Prepaid"
}Frequently asked
What is the difference between a Master AWB and a House AWB?
A Master AWB (MAWB) is issued by the actual airline (carrier) covering one or more consolidated shipments. A House AWB (HAWB) is issued by a freight forwarder for each individual shipper whose cargo is consolidated under the master. The two link through the prefix-serial numbers; Talonic captures both AWB numbers when both are present on the document.
How are airline prefixes validated?
Each AWB number starts with a three-digit airline prefix assigned by IATA (Lufthansa 020, Singapore 618, Cathay 160, FedEx 023, and so on). Talonic validates the prefix against the IATA airline list; an invalid prefix is flagged in the confidence output so the broker can verify against the carrier copy.
Are dimensions and chargeable weight captured?
Yes. Pieces, gross weight, dimensions (length x width x height per piece, when shown), and total volume are extracted. Volumetric (chargeable) weight is computed downstream by the carrier or forwarder using their own dimensional weight divisor; the extraction provides the raw inputs.
What about e-AWB versus paper AWB?
Most international air cargo now moves on e-AWB (electronic AWB), which is a digital data exchange rather than a paper document. When an e-AWB is rendered as a PDF for handler or broker reference, Talonic extracts it the same way as a paper scan. Native EDI message ingest (Cargo-IMP, Cargo-XML) goes through a separate API path.
Can it handle multi-leg routings?
Yes. AWBs with transit segments (e.g., Frankfurt-Singapore-Sydney) show all legs on the document. Talonic captures the origin, destination, and any intermediate airports as a structured route array.
Ready to extract from your own air waybills?
Author note
Reviewed by Talonic engineering, freight document review · last reviewed 2026-05-13