Documentation Index
Fetch the complete documentation index at: https://docs.extend.app/llms.txt
Use this file to discover all available pages before exploring further.
Document Processor Configuration Guide
Document processors are the core components that analyze and manipulate information from your documents. This guide explains how to configure processors through our API, including detailed examples for each processor type.Overview
We support three types of document processors:- Extraction Processors: Extract specific fields from documents.
- Classification Processors: Categorize documents.
- Splitter Processors: Divide documents into logical sub-documents.
Schema Definitions
Base Processor Schema
All processor configurations share these base properties:Extraction Processor Configuration
Extraction processors extract specific fields from documents.JSON Schema Structure (schema)
This section is relevant for processors using the JSON Schema config type. If
you are using the legacy Fields Array config type, please see the Fields
Array Structure documentation. If you aren’t
sure which config type you are using, please see the Migrating to JSON
Schema documentation.
- The root must be an
objecttype - Allowed types are
string,number,integer,boolean,object, andarray - All primitive fields (
string,number,boolean,integer) must be nullable (use array type with “null” as an option e.g."type": ["string", "null"]) - Maximum nesting level is 3 (each non-root object counts as 1 level)
- Property keys and names must only contain lowercase letters, numbers, and underscores
- Array items must be objects
- Enums must only contain strings and must contain a
nulloption - Custom types are supported by adding a
"extend:type": "currency","extend:type": "signature", or"extend:type": "date"property to the appropriate field type with the required properties. See below for examples. - Property names can be added using the
"extend:name"property. If supplied, this will override the name of the property as it will appear to the model, but not in the output returned to you. This is useful for providing more descriptive names or instructions to the model without altering the actual keys in your output data structure. - You can add descriptions to individual enum values using the
"extend:descriptions"property.
Unsupported Features
While we support the JSON Schema structure, we do not support many of the additional features some of which include:- Schema composition like
anyOf,oneOf,allOf, schema definitions, or recursive schemas - Regular expressions and other type-specific validation keywords
- Conditional schema validation
- Constant values
Schema Examples
Primitive Schema
All primitive types must be nullable.Object Schema
Objects must have properties. If you set a required array of the properties, we will respect that order when extracting. If you do not set required array, we will generate it and enforce order.Array Schema
Arrays items must be objects.Enum Schema
Enums must include null as an option. Only strings are supported for enums. Theextend:descriptions is an optional array of strings. It is recommended to give more context for each enum option for more accurate extraction.
Custom Field Types
Theextend:type keyword enables custom pre-processing and post-processing of fields which bake in best practices and heuristics for the field type.
Date Schema
Date fields must be strings and use theextend:type keyword with the value date. This will guarantee the date format is always an ISO compliant date (yyyy-mm-dd).
Currency Schema
Currency fields must be objects with specific properties.Signature Schema
Signature fields must be objects with specific properties. This will auto-enable our advanced signature detection in the parsing step prior to extraction, and apply a number of prompt and post-processing heuristics to improve accuracy, particularly on reduction of false positives for signature blocks that are not actually signed.Configuration Examples
Basic Example
Basic Example
Example with nested fields
Example with nested fields
Example with nested arrays and objects
Example with nested arrays and objects
Example with signature, currency, and date fields
Example with signature, currency, and date fields
Type Definitions
JSON Schema Type Definitions
JSON Schema Type Definitions
Fields Array Structure (fields)
This section is relevant for the Fields Array config type. If you are using
the JSON Schema config type, please see the JSON Schema
Structure documentation. If you aren’t sure
which config type you are using, please see the Migrating to JSON
Schema documentation.
Configuration Examples
Basic Example
Basic Example
Example with Nested Fields
Example with Nested Fields
Example with nested arrays and objects
Example with nested arrays and objects

