Skip to content

PDF Assistant

table-taxonomy

Table and form tags for guidance

You can list a component in the marketplace, and define if you want it to be a template.

❌ Not available in Marketplace

❌ Can not be used as a template

Meta-Labels

This taxonomy is based on taxon's that represent meta-labels that are used to apply to elements of a document in order to inform, or guide, the model or processing.

🏷️ Table Markers

🏷️ Page Start

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None False string Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text.

🏷️ Page End

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None False string Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text.
include_page_end_line False False boolean Set to True if the Page End line is part of the table.

🏷️ Table Start

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None False string Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text.
col_space_multiplier 3.0 False number The multiplier used to determine the spacing calculations, relative to the mean width of the characters on the page. Default is 3.0, i.e. each column is at least 3 spaces apart. Set to a lower number if multiple columns are put together in one. Set to a higher number if more columns are introduced.
use_graphical_nodes False False boolean Set to True if the graphical lines will be used to determine the columns of the table.
advanced_table_analysis False False boolean Set to true if advanced table analysis will be used
ata_flavour lattice False string The algorithm used to determine the columns of the table for advanced table analysis.
ata_line_scale 15 False number The line scale used to determine the columns of the table for advanced table analysis.
transpose False False boolean Set to True if the table needs to be transposed before extracting data.
header_label None False taxon_label_with_properties The label used to identify the header when the table is transposed.
header_lines_count 1 False number The number of header lines for a transposed table.
column_header_labels None False list The labels used for columns with empty headers.
row_white_list_regex None False string All the lines that do NOT match this regular expression will be ignored in this table.
row_black_list_regex None False string All the lines that match this regular expression will be ignored in this table.
extract_header False False boolean Set to True if the header needs to be extracted.

🏷️ Table End

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None False string Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text.
include_table_end_line False False boolean Set to True if the table end line is part of the table.

🏷️ Column Marker

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None False string Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text.
insert_col_before None False boolean Set to True to insert a column before the first column in the Column Marker line.
insert_col_after None False boolean Set to True to insert a column after the last column in the Column Marker line.
insert_col_idx None False number The index where to insert a column in the Column Marker line.

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None False string Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text.

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None False string Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text.

🏷️ Page Number

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None False string Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text.

🏷️ Ignore

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None False string Regular expression used to identify the lines that will be given this label. If this string is not given, the system will identify these lines by generalizing the line text.

🏷️ Form Markers

🏷️ Form Start

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None True string Regular expression used to identify the first line of the form.

🏷️ Form End

Options

This taxon has the following options:

Option Name Default Required? Type Description
regular_expression None False string Regular expression used to identify the last line of the form.
include_form_end_line False False boolean Set to true if the Form End line is part of the form.