Skip to main content

Extract data from resumes

A resume is the most format-chaotic document in any hiring pipeline. For a single Senior Backend Engineer opening, a recruiting team at a 600-person company might receive 400 applications inside a week: single-column PDFs exported from LinkedIn, two-column designer templates with skills in a sidebar, academic CVs running nine pages with a publications list, and federal-style resumes that spell out every responsibility in full sentences. The data a recruiter or an applicant tracking system actually needs is the same across all of them: candidate name, email, phone, location, a short summary, the list of jobs with employer, title, and dates, the education history, and the skills. The difficulty is layout, not content. Two-column resumes confuse naive text extraction because the reading order interleaves the sidebar with the body, so "Python" lands in the middle of a job description. Dates appear as "Jan 2022 to Present", "2019 to 2021", or "06/2020 to 09/2023", and tenure has to be computed consistently. Employer names carry suffixes (Stripe, Inc. versus Stripe) that need to reconcile to one entity. A candidate who worked at Google from 2018 to 2022 and then at a startup acquired mid-tenure should still show a clean two-role history. Skills are scattered across a header list, the summary, and the bullet points under each job. Talonic reads the resume regardless of column layout and returns a structured profile. Contact details are isolated from body text. Work experience is captured as an ordered array of roles with employer, title, start and end dates, and normalized tenure. Education and skills come back as discrete lists, so an applicant tracking system can match candidates without a recruiter retyping anything. For one Senior Backend Engineer req, a recruiter screens resumes citing Acme Robotics tenure from 2018-06-01 to 2022-03-15, a 2016-05-20 graduation, US and EU work authorization, and skills the applicant tracking API ingests; the parsed PDF, a CSV export, and the OCR confidence all post to the ERP.

What gets extracted from resumes

Full NameDaniel Okafor
Emaildaniel.okafor@example.com
Phone+1 415 555 0182
LocationAustin, TX
SummaryBackend engineer, 7 years, distributed systems
Work ExperienceStripe, Inc., Software Engineer, 2020-06 to present
EducationBS Computer Science, University of Texas, 2016
SkillsPython, Go, PostgreSQL, Kubernetes

How extraction works for resumes

Resumes arrive as PDFs, DOCX exports, and scans through job boards, referral inboxes, and applicant tracking systems, with no shared layout. Talonic detects the column structure first so reading order is correct on two-column and sidebar templates, then runs the document through the resume schema in the Field Registry, which captures contact details, summary, work experience, education, and skills. Employment dates in mixed formats (Jan 2022, 06/2020, 2019 to 2021) are normalized to ISO month precision and tenure is computed per role. Employer name variants such as Stripe and Stripe, Inc. are reconciled to a single entity. Every field returns with a confidence score and pixel-region provenance under DIN SPEC 91491 conformity, so a recruiter can verify a parsed title or date against the original resume before advancing the candidate.

Sample extraction

A 2-column 2-page resume exported as PDF

{
  "full_name": "Daniel Okafor",
  "email": "daniel.okafor@example.com",
  "phone": "+1 415 555 0182",
  "location": "Austin, TX",
  "summary": "Backend engineer with 7 years in distributed systems",
  "work_experience": [
    {
      "employer": "Stripe, Inc.",
      "title": "Software Engineer",
      "start": "2020-06",
      "end": "present"
    },
    {
      "employer": "Google",
      "title": "Software Engineer",
      "start": "2018-01",
      "end": "2020-05"
    }
  ],
  "education": [
    {
      "degree": "BS Computer Science",
      "institution": "University of Texas",
      "year": 2016
    }
  ],
  "skills": [
    "Python",
    "Go",
    "PostgreSQL",
    "Kubernetes"
  ]
}

Frequently asked

Does it handle two-column and sidebar resume layouts?

Yes. Column detection runs before field extraction, so a skills sidebar is not interleaved into a job description. The reading order is reconstructed from the visual layout, which is the most common failure point for plain text extraction on designer templates.

How are employment dates normalized?

Mixed formats such as "Jan 2022", "06/2020", and "2019 to 2021" are converted to ISO month precision, and per-role tenure is computed. An ongoing role marked Present is kept as an open end date so downstream filters can treat it as current.

Can it parse academic CVs and federal resumes?

Yes. Longer formats with publications, grants, or detailed responsibility statements run through the same schema. Sections that have no structured target, such as a publications list, are returned as labeled text blocks rather than discarded.

Author note

Reviewed by Talonic engineering, resume schema review · last reviewed 2026-06-11