Skip to main content

Extract data from income statements

An income statement, the profit and loss statement that finance teams call a P&L, tells you whether a company made money over a period and where the money went. Analysts read it constantly and almost never as structured data. A lender sizing a $3,000,000 working-capital line, an investor building a quarterly model, and a controller closing the books all begin with a PDF: a statement out of QuickBooks, a page of an audited 10-K, or a management report. The line items they extract are predictable: Revenue, Cost of Goods Sold, Gross Profit, operating expenses broken into Selling, General and Administrative, Depreciation, Operating Income, Interest Expense, taxes, and Net Income at the bottom. What complicates extraction is that the statement is a cascade of subtotals, and the labels move. Revenue minus Cost of Goods Sold yields Gross Profit; Gross Profit minus operating expenses yields Operating Income; and each subtotal depends on the lines above it. Comparative statements stack the current quarter against the prior-year quarter, sometimes with a year-to-date column beside both, so 2026-03-31 three-month and twelve-month figures sit in the same table. Negative values for expenses or losses are shown in parentheses. One company calls the top line Revenue, another Net Sales, a third Total Income, and the same SG&A bucket might be split into a dozen departmental lines. Talonic reads the income statement and returns each line with its label, amount, and period, with the subtotal cascade preserved. Gross Profit, Operating Income, and Net Income are kept as computed subtotals so an analyst can verify the margin stack without retyping a single figure from the page.

What gets extracted from income statements

Entity NameHarbor Freight Components Inc.
PeriodQ1 2026, three months ended 2026-03-31
Revenue$2,800,000
Cost of Goods Sold$1,540,000
Gross Profit$1,260,000Subtotal
SG&A$640,000
Operating Income$520,000
Interest Expense$48,000
Net Income$352,000

How extraction works for income statements

Income statements are exported from QuickBooks, NetSuite, and Xero, pulled from audited filings, and assembled in Excel, so the top-line label and the depth of the expense breakdown change with every source. Talonic reads the statement and maps it to the financial-statement schema in the Field Registry, which models the revenue-to-net-income cascade rather than a flat list. Each subtotal, Gross Profit, Operating Income, Net Income, is attached to the lines that produce it, so the margin stack stays intact. Synonyms for the top line such as Revenue, Net Sales, and Total Income resolve to one field. Comparative and year-to-date columns are split by period. Parenthesized figures are read as negatives. Every value returns with a confidence score and pixel-region provenance under DIN SPEC 91491 conformity, so a controller or analyst can verify a captured figure against the source P&L.

Sample extraction

A single-period profit and loss statement exported to PDF

{
  "entity_name": "Harbor Freight Components Inc.",
  "period_label": "Three months ended 2026-03-31",
  "period_end": "2026-03-31",
  "currency": "USD",
  "revenue": 2800000,
  "cost_of_goods_sold": 1540000,
  "gross_profit": 1260000,
  "operating_expenses": {
    "sga": 640000,
    "depreciation": 100000,
    "total": 740000
  },
  "operating_income": 520000,
  "interest_expense": 48000,
  "income_tax": 120000,
  "net_income": 352000
}

Frequently asked

Does it keep the subtotal cascade intact?

Yes. Gross Profit, Operating Income, and Net Income are returned as subtotals linked to the lines that compute them, so a model can recompute any margin from the components or rely on the stated subtotal.

How does it handle different names for revenue?

Top-line synonyms such as Revenue, Net Sales, and Total Income are mapped to a single revenue field, while the original label is preserved so the source wording is never lost.

Can it separate quarterly and year-to-date columns?

Yes. When a statement presents a three-month and a twelve-month column for the same period end, each is split out and tagged with its period length so the figures are not conflated.

Ready to extract from your own income statements?

Author note

Reviewed by Talonic engineering, schema review · last reviewed 2026-06-15