PDF Assistant
Table and form tags for guidance
You can list a component in the marketplace, and define if you want it to be a template.
❌ Not available in Marketplace
❌ Can not be used as a template
Meta-Labels
This taxonomy is based on taxon's that represent meta-labels that are used to apply to elements of a document in order to inform, or guide, the model or processing.
🏷️ Table Markers
🏷️ Page Start
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | False | string | Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text. |
🏷️ Page End
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | False | string | Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text. |
include_page_end_line | False | False | boolean | Set to True if the Page End line is part of the table. |
🏷️ Table Start
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | False | string | Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text. |
col_space_multiplier | 3.0 | False | number | The multiplier used to determine the spacing calculations, relative to the mean width of the characters on the page. Default is 3.0, i.e. each column is at least 3 spaces apart. Set to a lower number if multiple columns are put together in one. Set to a higher number if more columns are introduced. |
use_graphical_nodes | False | False | boolean | Set to True if the graphical lines will be used to determine the columns of the table. |
advanced_table_analysis | False | False | boolean | Set to true if advanced table analysis will be used |
ata_flavour | lattice | False | string | The algorithm used to determine the columns of the table for advanced table analysis. |
ata_line_scale | 15 | False | number | The line scale used to determine the columns of the table for advanced table analysis. |
transpose | False | False | boolean | Set to True if the table needs to be transposed before extracting data. |
header_label | None | False | taxon_label_with_properties | The label used to identify the header when the table is transposed. |
header_lines_count | 1 | False | number | The number of header lines for a transposed table. |
column_header_labels | None | False | list | The labels used for columns with empty headers. |
row_white_list_regex | None | False | string | All the lines that do NOT match this regular expression will be ignored in this table. |
row_black_list_regex | None | False | string | All the lines that match this regular expression will be ignored in this table. |
extract_header | False | False | boolean | Set to True if the header needs to be extracted. |
🏷️ Table End
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | False | string | Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text. |
include_table_end_line | False | False | boolean | Set to True if the table end line is part of the table. |
🏷️ Column Marker
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | False | string | Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text. |
insert_col_before | None | False | boolean | Set to True to insert a column before the first column in the Column Marker line. |
insert_col_after | None | False | boolean | Set to True to insert a column after the last column in the Column Marker line. |
insert_col_idx | None | False | number | The index where to insert a column in the Column Marker line. |
🏷️ Page Header
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | False | string | Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text. |
🏷️ Page Footer
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | False | string | Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text. |
🏷️ Page Number
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | False | string | Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text. |
🏷️ Ignore
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | False | string | Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text. |
🏷️ Form Markers
🏷️ Form Start
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | True | string | Regular expression used to identify the first line of the form. |
🏷️ Form End
Options
This taxon has the following options:
Option Name | Default | Required? | Type | Description |
---|---|---|---|---|
regular_expression | None | False | string | Regular expression used to identify the last line of the form. |
include_form_end_line | False | False | boolean | Set to true if the Form End line is part of the form. |