{ "object": "file", "id": "file_1234", "name": "example_file", "type": "PDF", "presignedUrl": "https://s3.example.com/file_1234.pdf", "parentFileId": "file_5678", // Optional, only set if this file is a derivative of another file "contents": { "rawText": "This is the raw text content of the file...", "pages": [ { "pageNumber": 1, "markdown": "This is the markdown content of the page...", } ] }, "metadata": { "parentSplit": { // Optional, only set if this file is a derivative of another file "id": "324kjlfsd", "type": "addendum", "identifier": "addendum_1", "startPage": 7, "endPage": 9 } } "createdAt": "2024-01-01T00:00:00Z", "updatedAt": "2024-01-01T00:00:00Z"}
Use this file to discover all available pages before exploring further.
The File object represents a file in Extend. Files are created for each workflow run, and can also be created directly via API for use in evaluation sets.
Cleaned and structured markdown content of the page.
Available for PDF and IMG file types.
Only included if the markdown query parameter is set to true in the endpoint request.
Cleaned and structured html content of the page.
Available for DOCX file types (that were not auto-converted to PDFs).
Only included if the html query parameter is set to true in the endpoint request.
Note: There are several deprecated fields that are still in the payload for backwards compatibility. These are:
markdown/rawText in IMGs not nested under pages array. These will still be included in payloads until full deprecation in December 2024.
{ "object": "file", "id": "file_1234", "name": "example_file", "type": "PDF", "presignedUrl": "https://s3.example.com/file_1234.pdf", "parentFileId": "file_5678", // Optional, only set if this file is a derivative of another file "contents": { "rawText": "This is the raw text content of the file...", "pages": [ { "pageNumber": 1, "markdown": "This is the markdown content of the page...", } ] }, "metadata": { "parentSplit": { // Optional, only set if this file is a derivative of another file "id": "324kjlfsd", "type": "addendum", "identifier": "addendum_1", "startPage": 7, "endPage": 9 } } "createdAt": "2024-01-01T00:00:00Z", "updatedAt": "2024-01-01T00:00:00Z"}