Builder

Once you have created an Extraction processor, navigate to the “Build” tab.

Configuring Properties

This section is relevant for processors using the JSON Schema config type. If you are using the legacy Fields Array config type, please see the Configuring Fields documentation. If you aren’t sure which config type you are using, please see the Migrating to JSON Schema documentation.

To configure a field, add a semantically accurate field name and write a description that explains how to identify and extract that field from the document. The property keys are sent to the model to extract the field so it’s important to use a name that is meaningful to the model, not an internal identifier. If you’d like to change what gets sent to the model but not the property key, you can use the “Property Name” field.

Property Types

The following property types are supported:

Basic Types

String: Used for text values.
Number: Used for numeric values.
Boolean: Used for true/false values.
Integer: Used for whole number values.
Enum: Used for fields with a predefined set of string values.
Object: Used to group related fields together.
Array: Used for lists of items.

Custom Types

Custom types are extensions of the basic types with added validation, formatting, and processing logic.

Date: A string type that ensures ISO compliant date format (yyyy-mm-dd).
Currency: An object containing currency details including:
- amount (number)
- iso_4217_currency_code (string)
Signature: An object containing signature details including:
- printed_name (string)
- signature_date (date)
- is_signed (boolean)
- title_or_role (string)

Once you’ve configured your schema using the Schema Builder, you can view the JSON Schema you’ve created by clicking the “JSON” toggle.

Configuring Fields

This section is relevant for the Fields Array config type. If you are using the JSON Schema config type, please see the Configuring Properties documentation. If you aren’t sure which config type you are using, please see the Migrating to JSON Schema documentation.

To configure a field, add a semantically accurate field name and write a description that explains how to identify and extract that field from the document. You must also configure the proper field type:

Text

Use the text data type when you want to extract a string of text from a document. For example, if you want to extract the name of a person from a document, you would use the text data type.

Number

Use the number data type when you want to extract a number from a document. For example, if you want to extract the age of a person from a document, you would use the number data type.

Currency

Use the currency data type when you want to extract a currency value from a document. For example, if you want to extract the price of a product from a document, you would use the currency data type.

Boolean

Use the boolean data type when you want to extract a boolean value from a document. For example, if you want to extract whether a product is in stock from a document, you would use the boolean data type.

Date

Use the date data type when you want to extract a date from a document. For example, if you want to extract the date of birth of a person from a document, you would use the date data type.

Signature

Use the signature data type when you want to extract a signature from a document. For example, if you want to extract the signature of a person from a document, you would use the signature data type. Signature fields will automatically extract all relevant details of a document’s signature block:

is_signed
printed_name
signatory_title
signature_date

Object

Use the object data type when you want to extract a set of related fields from a document. For example, if you want to extract the address, name, and birth date of a person from a document you would use the object data type.

Array

Use the array data type when you want to extract a list of related fields from a document. For example, if you want to extract a list of products that each have a name, price, and quantity from a document you would use the array data type.

Configuration table

The field config table also will allow you to select the drag button to move the field up or down. Performance is best when related fields in the document are positioned in related order in the configuration table.

The below documentation about field IDs is relevant for the legacy Fields Array config type. This is not relevant for the JSON Schema config type.

You can also set a field ID which is a unique identifier for the field to use in your downstream system, so that you can make changes to the semantic field name without updating your downstream system.

Configuring Custom Settings

In addition to the fields, you can also configure custom settings for each field. These settings allow you to further customize the extraction process to better suit your specific needs. However, please note that these settings are experimental and may not work as expected in all cases. Before using these settings, we recommend consulting with the Extend team to understand their potential impact on the extraction process.

Using the Run tab

While it often makes sense to run files from the “Build” tab when getting set up, once you are ready to start testing your processor at scale, you should move over to the “Run” tab. From this tab you can:

Quickly run any number of files in a batch (supported file types can be found here)
Select the version of the processor you want to run (or default to the saved draft version)
Run an existing Evaluation set for the processor

Once you run a batch of files, you will be redirected to a results page that looks like this:

From here you can:

See at a glance the coverage of fields extracted and average confidence
Drill down into individual files to see the extracted fields and confidence levels
(optionally) correct/edit the results of each output, then turn the entire batch into an Evaluation set

Note: our recommendation is to not create Evaluation Sets until you have at least mostly finalized what fields you are extracting, even if you are still iterating on the field descriptions. The reason for this is that Evaluation sets will be used to compare current and expected outputs, so if you add or remove fields from the processor, the Evaluation set will no longer be valid. This is fine, and everything will run, but metrics like accuracy and coverage will drop as a result, and you will need to go and update the Evaluation set to reflect the new expected outputs. This can be a tedious process to do repeatedly, so it is best to wait until you are mostly done finalizing the set of fields you are extracting or set of classification types you are using.

Publishing

See the Publishing page for more information on how to publish and use processors.

Get Started

Studio

Evaluation

Workflows

Review Experience

Validation Rules

Configure an Extraction processor

Builder

Configuring Properties

Property Types

Basic Types

Custom Types

Configuring Fields

Text

Number

Currency

Boolean

Date

Signature

Object

Array

Configuration table

Configuring Custom Settings

Using the Run tab

Publishing

Get Started

Studio

Evaluation

Workflows

Review Experience

Validation Rules

Documentation Index

​Builder

​Configuring Properties

​Property Types

​Basic Types

​Custom Types

​Configuring Fields

​Text

​Number

​Currency

​Boolean

​Date

​Signature

​Object

​Array

​Configuration table

​Configuring Custom Settings

​Using the Run tab

​Publishing

Builder

Configuring Properties

Property Types

Basic Types

Custom Types

Configuring Fields

Text

Number

Currency

Boolean

Date

Signature

Object

Array

Configuration table

Configuring Custom Settings

Using the Run tab

Publishing