Extract data from pay stubs
A pay stub is the document a lender, a landlord, or a benefits administrator reaches for when they need proof of income that a bank statement cannot give them. It breaks a single paycheck into its parts: gross earnings, the taxes withheld, the voluntary deductions, and the net amount that actually reached the employee. A mortgage underwriter verifying income for a $480,000 loan needs the most recent two stubs to show a stable base, any overtime, and year-to-date gross that reconciles to the W-2 the borrower will file. Payroll providers such as ADP, Gusto, and Paychex each render the same information in a different grid, and an employer running its own payroll may produce something different again. The structure under the surface is consistent even when the layout is not. Earnings split into regular hours at a rate, overtime at 1.5x, bonus, and commission lines. Taxes appear as Federal Income Tax, Social Security at 6.2%, Medicare at 1.45%, and a state line that depends on jurisdiction. Pre-tax deductions for a 401(k) contribution or a Section 125 health plan reduce taxable wages, while post-tax deductions for garnishments do not. Every line carries both a current-period amount and a year-to-date total, and the two have to tie out. A stub for the pay period ending 2026-05-31 with $4,615.38 gross and $3,212.04 net is only useful if the deductions in between are itemized correctly. Talonic reads the stub and returns earnings, taxes, and deductions as itemized arrays with both current and year-to-date amounts. Net pay is reconciled against gross minus total withholdings so a verification team can trust the figure without recomputing it from the image. A US worker paid by ACH sees Federal Income Tax, Social Security, and Medicare withheld before the net reaches the bank, with the YTD totals reconciling to the IRS Form W-2 and reported in USD.
What gets extracted from pay stubs
How extraction works for pay stubs
Pay stubs come out of payroll platforms such as ADP, Gusto, Paychex, and Workday, plus in-house systems, so the grid placement of earnings and deductions shifts from provider to provider. Talonic classifies the stub and maps it to the earnings-statement schema in the Field Registry, which separates earnings, taxes, pre-tax deductions, and post-tax deductions, each with a current-period and a year-to-date amount. Statutory lines such as Social Security at 6.2% and Medicare at 1.45% are recognized by name so they are not mixed with voluntary deductions. The arithmetic is checked: gross minus total withholdings is reconciled against the printed net pay, and a mismatch is flagged. Every value returns with a confidence score and pixel-region provenance under DIN SPEC 91491 conformity, so an underwriter or auditor can confirm a figure against the source stub.
Sample extraction
A single bi-weekly pay stub from a payroll provider
{
"employee_name": "Marcus Lee",
"employer_name": "Northgate Logistics Inc.",
"pay_period_start": "2026-05-18",
"pay_period_end": "2026-05-31",
"pay_date": "2026-06-05",
"gross_pay_current": 4615.38,
"earnings": [
{
"type": "regular",
"amount": 4615.38
}
],
"taxes": [
{
"type": "federal_income_tax",
"current": 612.4,
"ytd": 6124
},
{
"type": "social_security",
"current": 286.15,
"ytd": 2861.5
},
{
"type": "medicare",
"current": 66.92,
"ytd": 669.2
}
],
"deductions": [
{
"type": "401k",
"pre_tax": true,
"current": 230.77,
"ytd": 2307.7
}
],
"net_pay_current": 3212.04,
"ytd_gross": 46153.8,
"currency": "USD"
}Frequently asked
Does it separate pre-tax and post-tax deductions?
Yes. A 401(k) contribution or a Section 125 health premium is tagged pre-tax because it reduces taxable wages, while a wage garnishment or a Roth contribution is tagged post-tax. The distinction matters for any team reconciling taxable gross to the W-2.
Are year-to-date totals captured alongside current amounts?
Every earnings, tax, and deduction line returns with both the current-period figure and the year-to-date total, because income verification and payroll audit both depend on the YTD column tying out across consecutive stubs.
How does it handle overtime and multiple earning types?
Regular, overtime at 1.5x, bonus, and commission are captured as separate earnings lines with hours and rate when shown, so an underwriter can distinguish stable base pay from variable income.
Ready to extract from your own pay stubs?
Author note
Reviewed by Talonic engineering, schema review · last reviewed 2026-06-14