Skip to main content

Convert to Markdown

Convert an uploaded file to markdown on demand via multipart upload, without ingesting it. A stateless one-off conversion that does not create a document or run extraction.

Convert an arbitrary uploaded file to markdown on demand, without ingesting it. This is a stateless operation: it does not create a document record, does not run field extraction, and does not touch the field registry. It simply turns a one-off file into clean markdown so you can inspect the OCR output or feed it into your own processing.

This endpoint is the complement of GET /v1/documents/{id}/markdown. That endpoint returns the stored markdown of a document that has already been ingested and processed, looked up by ID. This endpoint takes a raw file you upload in the request and returns its markdown immediately, with no persistence. Use convert for ad-hoc inspection, and the stored-markdown endpoint for documents already in your workspace.

The file is sent as multipart form data under the field name file. PDFs, images, and common document and text formats are supported. Scanned pages and images are escalated to vision OCR by default; set the vision form field to the string false to disable that escalation and rely on the native text layer only.

The response returns the rendered markdown plus conversion metadata: the detected source_format, a page_count, a table_count of tables found, the 1-based vision_pages that required vision OCR, and any warnings raised during conversion (for example an unsupported file type that fell back to a plain text decode).

Convert is stateless. It produces markdown for inspection only: nothing is persisted, no document is created, and no extraction runs. To ingest a file into your workspace for extraction and linking, use the ingestion endpoints instead.
POST/v1/documents/convert

Response

Response fields

filenamestringOriginal filename of the uploaded file.
markdownstringRendered markdown representation of the file.
source_formatstringDetected source format of the input (e.g. pdf, docx, image, markdown).
page_countintegerNumber of pages detected in the source.
table_countintegerNumber of tables found and rendered.
vision_pagesinteger[]1-based page numbers whose content came from vision OCR.
warningsstring[]Warnings raised during conversion (e.g. unsupported type fallbacks).

curl

Response

{
  "filename": "invoice-0847.pdf",
  "markdown": "# Invoice 0847\n\n| Item | Qty | Price |\n| --- | --- | --- |\n| Widget A | 10 | 25.00 |\n| Widget B | 4 | 60.00 |\n\n**Total:** 490.00\n",
  "source_format": "pdf",
  "page_count": 2,
  "table_count": 1,
  "vision_pages": [],
  "warnings": []
}

A common pattern is to convert a file, review the markdown and the warnings, and only then decide whether to ingest it for full extraction. Because convert never persists anything, you can run it repeatedly on the same file (for example with and without vision) without creating duplicate documents.

Errors

Error responses

400missing_fileNo file was provided under the multipart field "file".
401unauthorizedMissing or invalid API key.
429rate_limitedToo many requests. Retry after the period indicated in the Retry-After header.