Parse files to get cleaned, chunked target content (e.g. markdown).
The Parse endpoint allows you to convert documents into structured, machine-readable formats with fine-grained control over the parsing process. This endpoint is ideal for extracting cleaned document content to be used as context for downstream processing, e.g. RAG pipelines, custom ingestion pipelines, embeddings classification, etc. Unlike processor and workflow runs, parsing is a synchronous endpoint and returns the parsed content in the response. Expected latency depends primarily on file size. This makes it suitable for workflows where you need immediate access to document content without waiting for asynchronous processing. For a deeper guide on how to use the output of this endpoint, jump to Using Parsed Output.Documentation Index
Fetch the complete documentation index at: https://docs.extend.app/llms.txt
Use this file to discover all available pages before exploring further.
PROCESSED: The file was successfully processedFAILED: The processing failed (see failureReason for details)content: A fully formatted representation of the entire chunk in the target format (e.g., markdown). This is ready to use as-is if you need the complete formatted content of a page.
blocks: An array of individual content blocks that make up the chunk, each with its own formatting, position information, and metadata.
chunk.content vs. chunk.blockschunk.content when:
chunk.blocks when:
polygon (precise outline) and a simplified boundingBox. This information can be used to:
retryable=true|false field in the response body, but you can also find a breakdown below. Most errors are not retryable and are client errors related to the file provided for parsing.
| Error Code | Description | Retryable |
|---|---|---|
INVALID_CONFIG_OPTIONS | Invalid combination of options in the incoming config. | ❌ |
UNABLE_TO_DOWNLOAD_FILE | The system could not download the file from the provided URL, likely means your presigned url is expired, or malformed somehow. | ❌ |
FILE_TYPE_NOT_SUPPORTED | The file type is not supported for parsing. | ❌ |
FILE_SIZE_TOO_LARGE | The file exceeds the maximum allowed size. | ❌ |
CORRUPT_FILE | The file is corrupt and cannot be parsed. | ❌ |
OCR_ERROR | An error occurred in the OCR system. This is a rare error code and would indicate downtime, so requests can be retried. We’d suggest applying a retry with backoff for this error. | ✅ |
PASSWORD_PROTECTED_FILE | The file is password protected and cannot be parsed. | ❌ |
FAILED_TO_CONVERT_TO_PDF | The system could not convert the file to PDF format. | ❌ |
FAILED_TO_GENERATE_TARGET_FORMAT | The system could not generate the requested target format. | ❌ |
INTERNAL_ERROR | An unexpected internal error occurred. We’d suggest applying a retry with backoff for this error as it likely a result of some outage. | ✅ |
file)fileUrl nor fileBase64 is provided in the file objectfileUrl is invalidfileBase64 is invalidconfig contains invalid values (e.g., unsupported target format or chunking strategy)