Extract data from expense reports
Expense reports are the back-of-house side of every business trip, client dinner, and corporate card transaction. A finance ops team at a mid-market company processes thousands of them per quarter, each one a stack of receipts (the actual proof of spend), aggregated by an employee in an expense management platform (Expensify, Concur, Brex, Ramp, Navan), routed for approval, and ultimately settled against a corporate card statement or reimbursed via payroll. The data inside the receipts is what matters: merchant name and address, date, total amount, currency, tax, payment method (corporate card last four, cash, personal card), and the type of expense (meals, lodging, ground transport, supplies). Compliance, GL categorization, tax recovery (VAT reclaim on European receipts), and audit trail all depend on getting that data structured. The hard parts are receipt variety. A taxi receipt from a Berlin airport is a 4-line thermal-paper stub. A hotel folio from a 4-night stay at a Hilton in Tokyo is a 2-page itemized invoice in Japanese and English. A coffee shop receipt is a phone photo taken at a 30-degree angle on a wrinkled surface. A grocery receipt has 30 line items but the only one that matters for reimbursement is the bottle of wine for the client. Mileage logs are not receipts at all; they are spreadsheet entries with a date, miles driven, business purpose, and a per-mile reimbursement rate. Per diem expenses use a fixed rate per location regardless of actual spend. Multi-attendee meals require an attendee list under IRS Section 274 documentation rules. Foreign-currency receipts need a conversion rate captured at the date of the expense. Talonic returns each receipt as a structured row with the fields above, sign-corrected, currency-normalized, with a category assigned where the source supports it (meals, lodging, ground transport, supplies, communication, fuel). Attendee fields are captured when present. Mileage and per diem entries pass through as flagged non-receipt rows. The aggregated expense-report bundle is a flat array of rows that maps directly into the destination expense management system or the GL.
What gets extracted from expense reports
How extraction works for expense reports
Receipt formats span thermal paper, phone photos at oblique angles, hotel folio PDFs, ride-share email confirmations, and itemized restaurant tabs. Talonic classifies each receipt by category (lodging, meal, ground transport, fuel, supplies, communication) and runs it through the receipt schema in the Field Registry, which captures merchant, date, total, tax, payment method, and attendee count without per-merchant configuration. Currency is normalized to ISO 4217 codes and a captured exchange rate is preserved on the receipt for foreign-currency expenses. Image quality is assessed at extraction so finance ops can flag low-confidence receipts for employee re-submission. Every extracted field carries a confidence score and a pixel-region pointer under DIN SPEC 91491 conformity, so an auditor can verify any expense report line against the source receipt before approving reimbursement.
Sample extraction
A €842.50 Lufthansa airfare receipt from a Berlin-New York business trip
{
"merchant_name": "Lufthansa",
"merchant_address": "Lufthansaallee 1, 60486 Frankfurt am Main, Germany",
"receipt_number": "220-4567891234",
"receipt_date": "2026-04-18",
"currency": "EUR",
"subtotal": 800,
"tax_amount": 42.5,
"total_amount": 842.5,
"payment_method": "credit_card",
"card_last_four": "8842",
"expense_category": "travel",
"number_of_attendees": 1,
"line_items": "BER-JFK economy, fare class Y, ticket 220-4567891234",
"notes": "Trip purpose: client onboarding visit"
}Frequently asked
How does it handle multi-attendee meals for IRS §274 documentation?
When the source receipt includes an attendee list (annotated by the employee or printed by the venue), Talonic captures the attendee count and any named attendees as a structured array. For meals over the IRS substantiation threshold ($75 in current rules), the attendee list is required documentation; the extraction makes the relevant fields available for downstream audit.
What about foreign-currency receipts?
The receipt currency is captured as ISO 4217 (EUR, JPY, GBP, etc.). If the source receipt or an attached annotation includes a conversion rate or USD-equivalent amount, both are preserved. The expense management system typically applies a corporate FX rate at submission; Talonic returns the raw figures.
Does it process mileage logs and per diem entries?
Yes, with a non-receipt flag. A mileage entry returns business_purpose, miles, rate, and total. A per diem entry returns location, day count, and rate. Both pass through the same expense-report bundle as flagged rows so the destination GL can categorize them correctly.
Can it pull line items from a multi-item restaurant receipt?
Yes. Itemized restaurant tabs return a line_items array (appetizers, entrees, drinks, dessert). Most expense workflows do not require line-level itemization for meals, but it is available when policy or audit requires it (e.g., alcohol separated for tax-deductibility limits in certain jurisdictions).
Is the output ready for Concur, Expensify, Brex, or Ramp?
The structured output maps cleanly into the receipt-import formats used by major expense management systems. Field mapping into a specific platform happens via the integration layer; Talonic provides the structured expense data per receipt.
Ready to extract from your own expense reports?
Author note
Reviewed by Talonic engineering, schema review · last reviewed 2026-05-16