Navigation:  File > Capture configuration > Creating a capture configuration >

Indexing

Previous  Top  Next

The indexing is the last step in the capture process and during it, the documents (SE Document) generated from the capture batch will be created. This section is divided into two sub-sections:

 

Output Settings

In this section, you can establish the output settings of the capture batch files. To do that, the following fields are available:

Category

Select the category in which the batch documents will be saved. Refer to the specific documentation of the SE Document component for more information about the record and configuration of a document category.

Option

Checked

Unchecked

Dynamic category selection

It will be allowed to:

Select a different category for each document in the batch (if there is more than one document in the batch);

Define the category through an index of the recognition profile in the indexing step execution.

The batch documents will be saved in the previously defined category, and it will not be possible to edit it.

File format

PDF (image only)

Select this option for the electronic files to be saved in the PDF bitmap format, that is, if the electronic file contains a text, it will not be possible to search for words in it.

Searchable PDF

Select this option for the electronic files to be saved in PDF format and after passing by the OCR, to be possible to perform the word searches in its content.

PDF/A

Select this option for the electronic files to be saved in the PDF/A standard, also known as ISO19005-1, which consists of an archiving standard, that is, it preserves the electronic file so that it can be viewed, looking the same in the long-term. This option does not permit performing searches for words in its content.

TIFF multipage

Select this option for the electronic files to be saved in the document in the TIFF multipage format. TIFF is a high-resolution graphic format based on tags used for the interchange of digital graphic elements. Through the tags feature, a single multi-page .tiff file can store several images along with related information such as compression and orientation type.

Images

Select this option for the electronic files to be saved in the document in the image format. In the field next to it, select the desired extension: TIFF, JPEG or GIF.

JPEG compression

If the format set previously is "Images" and "JPEG", select the compression level that will be applied in the electronic file:

Regular

When setting the compression level, it is necessary to take into account that a high level of compression produces smaller files and low-quality images, while a low compression level produces larger files with high image quality.

A low-quality JPEG image is not necessarily a bad image.

Good

Better

Customized

Enter the desired compression level.

OCR

Checked

Unchecked

Binarize image before OCR1

The system will convert the batch image to black and white before any OCR operation.

For example, if the image used by the batch is colored or grayscale, when a capture step is performed where OCR is required, the system will binarize the image, perform the OCR, and then discard the black and white image, keeping the image colorful.

The system will not convert a batch image to black and white before OCR operations.

1 - This option is only available for editing after saving the record for the first time if the "Recognition" step is parameterized and the file format is "searchable PDF ".

 

After saving the capture configuration record for the first time, if the selected category has associated attributes, these may be used in the "Processing" section.

 

Processing

In this section, the data that will be created in the category selected in the "Output Settings" section is configured. To do that, click on the button on the side toolbar. On the screen that will be displayed, configure the following fields:

 

 

Source

 

Define from where will be retrieved the data that will be used to compose a document property created from the capture batch. In the Type field, select the desired option:

Type

Variable

The origin data will be a variable previously recorded in the system.

In the Name field select the desired variable.

Database1

The source data will be one of the template metadata used in the service associated with the capture configuration.

In the Name field, select the desired metadata.

Webservice data source2

The source data will be a return variable of the Web Service associated with the capture configuration.

In the Name field, select the desired metadata.

Recognition profile3

The source data will be one of the indexes configured in the recognition profile associated with the capture configuration. Fill out the following fields that will be displayed:

Recognition profile: If the batch type is "Multiple documents" and the document type is "Image", and more than one recognition profile is associated with the capture configuration, you can select the recognition profile you wish to use as the source data. Otherwise, this field will be filled in by the system with the recognition profile associated with the capture configuration.

Index name: Select the index of the previously selected recognition profile that you wish to use as the source data.

# of pages

The origin data will be the page number that the document contains.

File name

The source data will be the name of the imported file in the batch. If the document has more than one file, the name of the first file in the document will be used.

Fixed value

The source data will be a preset value. In this case, enter the desired value in the respective field.

1 - It is only available if the capture configuration has relationships configured in the "Relationship" section.

2 - This option will only be available if the Relationship section defines that it will take place through a "Web Service data source".

3 - This option is only available if the capture configuration has an associated recognition profile in the "Recognition" section.

 

Destination

 

Configure where the value obtained through the source will be used in the document created from the batch capture. In the Type field, select the desired option.

 

The value will be a document property that will be created. In the Name field, select the desired property:

Document property

ID #

The value will be used in the "ID #" field of the document created from the batch.

Title

The value will be used in the "Title" field of the document created from the batch.

Summary

The value will be the "Summary" field of the document created from a batch.

Attribute

The value will be applied to the attribute of the document created from a batch. In the Attribute field that will be enabled:

The attributes associated with the category that was selected in the "Output Settings" section will be displayed, if any.

If the "Dynamic category selection" option was checked, all attributes recorded in the SE Document component will be available for selection.

Complex file container

The value will be used in the "Complex file container" field of the document category created from a batch.

Category

The value will be used in the "Category" field of the document created from a batch.

New variable

Name

Enter the variable name that will be created containing the value of the source data.

Existing variable

Name

Select the desired variable. This will be the obtained value.

Concatenate variable value

Checked: The value obtained from the source will be concatenated after (at the end) the existing value.

Unchecked: The data obtained from the source will replace the value of the selected variable.

Regular expression

Name

Enter a name for the regular expression. This name will be available as a new variable, which can then be the source of a document property.

Pattern matching

Enter the regular expression from which the source value will be obtained.

The value is the data retrieved from an associated form field as a template in the category (SE Document) that was selected in the "Output settings" section. For this feature to work correctly, it is necessary for the SE Form to be part of the solutions acquired by your organization. To do that, use the following fields that will be displayed:

Form field

Entity

Select the template form of the category to which the document belongs.

Entity field

Select the field of the form which the desired value will be obtained from.

 

The target field will be an excerpt extracted from the previously defined source. To do that, use the following fields that will be displayed:

Extract text function

Name

Enter a name for the function. This name will be available as a new variable, which can then be the source of a document property.

Initial character

Enter the number that corresponds to the position of the character that the extraction of the text will start from.

For example, when entering the number 1, the extraction will begin from the first character introduced in the text.

Number of characters

Enter the number of characters, from the initial character, that will be extracted to form the desired value.

For example, if the initial character is 1, and 5 characters are entered, the desired value will be formed from the 1st to the 5th character.

 

The target field will be a piece of text from the source defined previously. To do that, use the following fields that will be displayed:

Wrap text function

Name

Enter a name for the function. This name will be available as a new variable, which can then be the source of a document property.

Delimiter

Enter the character that will be used when wrapping the text.

For example, when the ";" character is entered and the source is composed by the "Name;date;marital status" value, the system will wrap the text into 3 values: "Name", "date" and "marital status".

Position

Enter the number corresponding to the path that you wish to use as the value.

For example, when entering the value 2, in the above example, only the "date" value will be used.

 

Save the record after performing the necessary configurations. Use the other side toolbar buttons to edit and delete the data selected in the list of records.