Extract data from bank statements
Bank statements are the lingua franca of financial reconciliation. Every bank ships them differently: multi-page PDFs from Chase, two-column scans from a community credit union, e-statements from JPMorgan with header noise on every page, statements with redacted account numbers, statements that mix USD, EUR, and GBP transactions across one accounting period. The shape changes constantly while the underlying data does not. Accounts payable teams reconciling vendor payments, lenders running cash-flow analysis on small-business applicants, accountants closing the books on March 31, mortgage underwriters verifying income against W-2 wages, and bookkeepers categorizing ACH credits, wire transfers, and merchant settlements all need the same thing: every line of the statement as a structured row with a date, a description, a signed amount, and a running balance, plus the account metadata captured exactly once at the statement level. The hard parts are usually invisible until you try to extract at scale. Banks change their layouts without notice. Statement periods cross months, so the opening and closing balances anchor a window that has to tie out. Debit and credit conventions differ. Some statements present withdrawals as negative numbers in a signed column, others as a separate Debits column with the sign implied. Running balances may or may not appear per row. Scanned statements lose alignment between the Date, Description, and Amount columns. Multi-currency accounts mix three or four ISO 4217 currencies on the same page. Page headers and footers, including the bank logo, the statement period, the cycle date, and the disclaimer, repeat on every page and have to be filtered without losing the actual data. Talonic processes any bank statement against a schema designed for these realities. Every transaction becomes a row with a normalized date in ISO 8601 form, a description, an amount in the canonical sign convention (debits negative, credits positive), and a running balance where present. Account metadata, including bank name, account holder, account number, statement start and end dates, opening balance, and closing balance, is captured once at the statement level, not duplicated per row. Every extracted cell carries a confidence score and a pixel-region reference back to the source PDF so any number can be audited in seconds.
What gets extracted from bank statements
How extraction works for bank statements
Bank statements arrive in dozens of layouts even within a single bank, so templates fail almost immediately at scale. Talonic classifies each statement and runs it through the Bank Statement schema in the Field Registry without per-bank configuration. Page headers and footers are filtered so they do not appear as transaction rows. The sign convention is normalized: withdrawals are negative, deposits are positive, regardless of whether the source statement uses parentheses, separate columns, or color coding. Multi-page statements are stitched into a single transaction stream with the opening and closing balances tying out against the per-row running balance. For scanned or low-resolution statements, every extracted cell is returned with a confidence score and a pixel-region pointer in line with DIN SPEC 91491 conformity, so any value below the confidence threshold can be reviewed against the source image in the dashboard.
Sample extraction
A 3-page Chase business checking statement (April 2026)
{
"bank_name": "JPMorgan Chase Bank, N.A.",
"account_holder": "Acme Corporation",
"account_number": "****6431",
"statement_start_date": "2026-04-01",
"statement_end_date": "2026-04-30",
"opening_balance": 12480.55,
"closing_balance": 18902.11,
"transactions": [
{
"transaction_date": "2026-04-03",
"description": "ACH CREDIT, STRIPE PAYOUT",
"amount": 2450,
"running_balance": 14930.55
},
{
"transaction_date": "2026-04-05",
"description": "CHECK #2174, ABC SUPPLIES",
"amount": -842.16,
"running_balance": 14088.39
},
{
"transaction_date": "2026-04-15",
"description": "WIRE OUT, VENDOR PAYMENT",
"amount": -5000,
"running_balance": 9088.39
}
]
}Frequently asked
Does it work on scanned bank statements, or only digital PDFs?
Both. Scanned statements are OCRed and run through the same schema, with confidence scores per cell so low-confidence rows can be reviewed against the source image. Digital PDFs extract at higher confidence because the text layer is already present.
How are debits and credits handled if the statement uses two columns instead of signed amounts?
The output is always sign-normalized: withdrawals are negative, deposits are positive, in a single amount column. Two-column source layouts are merged at extraction time so downstream reconciliation does not have to handle two formats.
What happens with multi-currency accounts?
Each transaction carries its own currency code. The account-level metadata records the statement currency. Mixed-currency statements are extracted as-is with the transaction-level currency preserved; no automatic conversion is performed.
Can it stitch a multi-page statement into one transaction stream?
Yes. Multi-page statements return a single ordered transaction array. The opening balance from page 1 and the closing balance from the last page are tied out against the per-row running balance; any discrepancy raises a validation flag.
Is the output ready to import into Excel or an accounting system?
The structured output exports cleanly to CSV for spreadsheets and to JSON for ERPs, accounting platforms, and lending engines. Account metadata appears once at the top level; transactions are a flat array of rows.
Ready to extract from your own bank statements?
Author note
Reviewed by Talonic engineering, schema review · last reviewed 2026-05-12