Skip to main content

Options

Configure extraction options including output format, strict mode, async processing, webhook callbacks, raw text inclusion, page ranges, and language hints.

Pass these options as fields in the options JSON object on POST /v1/extract to control extraction behavior. Options let you switch between sync and async mode, include raw text, restrict page ranges, and configure webhook delivery.

formatstringOutput format for the extracted data.
strictbooleanWhen true, fields not in the schema are omitted from the response. When false, additional discovered fields may be included.
asyncbooleanWhen true, returns a 202 with a job ID instead of blocking. Poll the job endpoint for results.
webhook_urlstringURL to POST results to when extraction completes. Implies async behavior.
include_raw_textbooleanInclude the raw extracted text alongside structured data.
page_rangestringPages to extract from. E.g. "1-5", "1,3,7-10". PDF only.
language_hintstringISO 639-1 language code hint. Improves extraction for non-English documents.

Most integrations use strict: true (default) to receive only the schema-defined fields. Set strict: false when you want the AI to also return additional fields it discovers beyond your schema. The async and webhook_url options are mutually beneficial — set webhook_url to avoid polling entirely.

The page_range option accepts comma-separated page numbers and ranges (e.g. "1-5", "1,3,7-10") and applies only to PDF files. Use language_hint with an ISO 639-1 code (e.g. "de", "ja") to improve extraction accuracy for non-English documents, especially when the OCR needs guidance on character sets.

Pair include_raw_text: true with schema-driven extraction when your downstream system needs both structured data and the original text for audit or display purposes. Note that setting webhook_url implicitly enables async behavior — the response will be 202 Accepted regardless of the async flag.

The format option controls the output shape of the data field. Use "json" (default) for programmatic consumption. CSV format is available on the GET /v1/extractions/:id/data endpoint instead.