Options
Configure extraction options including output format, strict mode, async processing, webhook callbacks, raw text inclusion, page ranges, and language hints.
Pass these options as fields in the options JSON object on POST /v1/extract to control extraction behavior. Options let you switch between sync and async mode, include raw text, restrict page ranges, and configure webhook delivery.
Most integrations use strict: true (default) to receive only the schema-defined fields. Set strict: false when you want the AI to also return additional fields it discovers beyond your schema. The async and webhook_url options are mutually beneficial — set webhook_url to avoid polling entirely.
The page_range option accepts comma-separated page numbers and ranges (e.g. "1-5", "1,3,7-10") and applies only to PDF files. Use language_hint with an ISO 639-1 code (e.g. "de", "ja") to improve extraction accuracy for non-English documents, especially when the OCR needs guidance on character sets.
Pair include_raw_text: true with schema-driven extraction when your downstream system needs both structured data and the original text for audit or display purposes. Note that setting webhook_url implicitly enables async behavior — the response will be 202 Accepted regardless of the async flag.
format option controls the output shape of the data field. Use "json" (default) for programmatic consumption. CSV format is available on the GET /v1/extractions/:id/data endpoint instead.