Data Model Utilities¶
This page shows descriptions of the utility functions included with Fonduer which can be used to label candidates based on textual, structural, tabular, and visual information. We group each data model utility based on the modality of information that they leverage.
General Data Model Utilities¶
Fonduer data model utils.
-
fonduer.utils.data_model_utils.utils.
get_matches
(lf, candidate_set, match_values=[1, - 1])[source]¶ Return a list of candidates that are matched by a particular LF.
A simple helper function to see how many matches (non-zero by default) an LF gets.
- Parameters
lf (
Callable
) – The labeling function to apply to the candidate_setcandidate_set (
Set
[Candidate
]) – The set of candidates to evaluatematch_values (
List
[int
]) – An option list of the values to consider as matched. [1, -1] by default.
- Return type
List
[Candidate
]
-
fonduer.utils.data_model_utils.utils.
is_superset
(a, b)[source]¶ Check if a is a superset of b.
This is typically used to check if ALL of a list of sentences is in the ngrams returned by an lf_helper.
- Parameters
a (
Iterable
) – A collection of itemsb (
Iterable
) – A collection of items
- Return type
bool
Textual Data Model Utilities¶
Fonduer textual modality utilities.
-
fonduer.utils.data_model_utils.textual.
get_between_ngrams
(c, attrib='words', n_min=1, n_max=1, lower=True)[source]¶ Return the ngrams between two unary Mentions of a binary-Mention Candidate.
Get the ngrams between two unary Mentions of a binary-Mention Candidate, where both share the same sentence Context.
- Parameters
c (
Candidate
) – The binary-Mention Candidate to evaluate.attrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If ‘True’, all ngrams will be returned in lower case
- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.textual.
get_left_ngrams
(mention, window=3, attrib='words', n_min=1, n_max=1, lower=True)[source]¶ Get the ngrams within a window to the left from the sentence Context.
For higher-arity Candidates, defaults to the first argument.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate. If a candidate is given, default to its first Mention.window (
int
) – The number of tokens to the left of the first argument to return.attrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower case
- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.textual.
get_neighbor_sentence_ngrams
(mention, d=1, attrib='words', n_min=1, n_max=1, lower=True)[source]¶ Get the ngrams that are in the neighoring Sentences of the given Mention.
Note that if a candidate is passed in, all of its Mentions will be searched.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention whose neighbor Sentences are being searchedattrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower case
- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.textual.
get_right_ngrams
(mention, window=3, attrib='words', n_min=1, n_max=1, lower=True)[source]¶ Get the ngrams within a window to the right from the sentence Context.
For higher-arity Candidates, defaults to the last argument.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate. If a candidate is given, default to its last Mention.window (
int
) – The number of tokens to the left of the first argument to returnattrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower case
- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.textual.
get_sentence_ngrams
(mention, attrib='words', n_min=1, n_max=1, lower=True)[source]¶ Get the ngrams that are in the Sentence of the given Mention, not including itself.
Note that if a candidate is passed in, all of its Mentions will be searched.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention whose Sentence is being searchedattrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower case
- Return type
Iterator
[str
]
Structural Data Model Utilities¶
Fonduer structural modality utilities.
-
fonduer.utils.data_model_utils.structural.
common_ancestor
(c)[source]¶ Return the path to the root that is shared between a multinary-Mention Candidate.
In particular, this is the common path of HTML tags.
- Parameters
c (
Tuple
[SpanMention
, …]) – The multinary-Mention Candidate to evaluate- Return type
List
[str
]
-
fonduer.utils.data_model_utils.structural.
get_ancestor_class_names
(mention)[source]¶ Return the HTML classes of the Mention’s ancestors.
If a candidate is passed in, only the ancestors of its first Mention are returned.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate- Return type
List
[str
]
-
fonduer.utils.data_model_utils.structural.
get_ancestor_id_names
(mention)[source]¶ Return the HTML id’s of the Mention’s ancestors.
If a candidate is passed in, only the ancestors of its first Mention are returned.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate- Return type
List
[str
]
-
fonduer.utils.data_model_utils.structural.
get_ancestor_tag_names
(mention)[source]¶ Return the HTML tag of the Mention’s ancestors.
For example, [‘html’, ‘body’, ‘p’]. If a candidate is passed in, only the ancestors of its first Mention are returned.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate- Return type
List
[str
]
-
fonduer.utils.data_model_utils.structural.
get_attributes
(mention)[source]¶ Return the HTML attributes of the Mention.
If a candidate is passed in, only the tag of its first Mention is returned.
A sample outout of this function on a Mention in a paragraph tag is [u’style=padding-top: 8pt;padding-left: 20pt;text-indent: 0pt;text-align: left;’]
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate- Return type
List
[str
]- Returns
list of strings representing HTML attributes
Return the HTML tag of the Mention’s next siblings.
Next siblings are Mentions which are at the same level in the HTML tree as the given mention, but are declared after the given mention. If a candidate is passed in, only the next siblings of its last Mention are considered in the calculation.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate- Return type
List
[str
]
-
fonduer.utils.data_model_utils.structural.
get_parent_tag
(mention)[source]¶ Return the HTML tag of the Mention’s parent.
These may be tags such as ‘p’, ‘h2’, ‘table’, ‘div’, etc. If a candidate is passed in, only the tag of its first Mention is returned.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate- Return type
Optional
[str
]
Return the HTML tag of the Mention’s previous siblings.
Previous siblings are Mentions which are at the same level in the HTML tree as the given mention, but are declared before the given mention. If a candidate is passed in, only the previous siblings of its first Mention are considered in the calculation.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate- Return type
List
[str
]
-
fonduer.utils.data_model_utils.structural.
get_tag
(mention)[source]¶ Return the HTML tag of the Mention.
If a candidate is passed in, only the tag of its first Mention is returned.
These may be tags such as ‘p’, ‘h2’, ‘table’, ‘div’, etc.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate- Return type
str
-
fonduer.utils.data_model_utils.structural.
lowest_common_ancestor_depth
(c)[source]¶ Return the lowest common ancestor depth.
In particular, return the minimum distance between a multinary-Mention Candidate to their lowest common ancestor.
For example, if the tree looked like this:
html ├──<div> Mention 1 </div> ├──table │ ├──tr │ │ └──<th> Mention 2 </th>
we return 1, the distance from Mention 1 to the html root. Smaller values indicate that two Mentions are close structurally, while larger values indicate that two Mentions are spread far apart structurally in the document.
- Parameters
c (
Tuple
[SpanMention
, …]) – The multinary-Mention Candidate to evaluate- Return type
int
Tabular Data Model Utilities¶
Fonduer tabular modality utilities.
-
fonduer.utils.data_model_utils.tabular.
get_aligned_ngrams
(mention, attrib='words', n_min=1, n_max=1, spread=[0, 0], lower=True)[source]¶ Get the ngrams from all Cells in the same row or column as the given Mention.
Note that if a candidate is passed in, all of its Mentions will be searched. Also note that if the mention is not tabular, nothing will be yielded.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention whose row and column Cells are being searchedattrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedspread (
List
[int
]) – The number of rows/cols above/below/left/right to also consider “aligned”.lower (
bool
) – If True, all ngrams will be returned in lower case
- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.tabular.
get_cell_ngrams
(mention, attrib='words', n_min=1, n_max=1, lower=True)[source]¶ Get the ngrams that are in the Cell of the given mention, not including itself.
Note that if a candidate is passed in, all of its Mentions will be searched. Also note that if the mention is not tabular, nothing will be yielded.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention whose Cell is being searchedattrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower case
- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.tabular.
get_col_ngrams
(mention, attrib='words', n_min=1, n_max=1, spread=[0, 0], lower=True)[source]¶ Get the ngrams from all Cells that are in the same column as the given Mention.
Note that if a candidate is passed in, all of its Mentions will be searched. Also note that if the mention is not tabular, nothing will be yielded.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention whose column Cells are being searchedattrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedspread (
List
[int
]) – The number of cols left and right to also consider “aligned”.lower (
bool
) – If True, all ngrams will be returned in lower case
- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.tabular.
get_head_ngrams
(mention, axis=None, attrib='words', n_min=1, n_max=1, lower=True)[source]¶ Get the ngrams from the cell in the head of the row or column.
More specifically, this returns the ngrams in the leftmost cell in a row and/or the ngrams in the topmost cell in the column, depending on the axis parameter.
Note that if a candidate is passed in, all of its Mentions will be searched. Also note that if the mention is not tabular, nothing will be yielded.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention whose head Cells are being returnedaxis (
Optional
[str
]) – Which axis {‘row’, ‘col’} to search. If None, then both row and col are searched.attrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower case
- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.tabular.
get_max_col_num
(mention)[source]¶ Return the largest column number that a Mention occupies.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate. If a candidate is given, default to its last Mention.- Return type
Optional
[int
]
-
fonduer.utils.data_model_utils.tabular.
get_max_row_num
(mention)[source]¶ Return the largest row number that a Mention occupies.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate. If a candidate is given, default to its last Mention.- Return type
Optional
[int
]
-
fonduer.utils.data_model_utils.tabular.
get_min_col_num
(mention)[source]¶ Return the lowest column number that a Mention occupies.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate. If a candidate is given, default to its first Mention.- Return type
Optional
[int
]
-
fonduer.utils.data_model_utils.tabular.
get_min_row_num
(mention)[source]¶ Return the lowest row number that a Mention occupies.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate. If a candidate is given, default to its first Mention.- Return type
Optional
[int
]
-
fonduer.utils.data_model_utils.tabular.
get_neighbor_cell_ngrams
(mention, dist=1, directions=False, attrib='words', n_min=1, n_max=1, lower=True)[source]¶ Get ngrams from all neighbor Cells.
Get the ngrams from all Cells that are within a given Cell distance in one direction from the given Mention.
Note that if a candidate is passed in, all of its Mentions will be searched. If directions=True`, each ngram will be returned with a direction in {‘UP’, ‘DOWN’, ‘LEFT’, ‘RIGHT’}. Also note that if the mention is not tabular, nothing will be yielded.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention whose neighbor Cells are being searcheddist (
int
) – The Cell distance within which a neighbor Cell must be to be considereddirections (
bool
) – A Boolean expressing whether or not to return the direction of each ngramattrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower case
- Return type
Iterator
[Union
[str
,Tuple
[str
,str
]]]- Returns
a generator of ngrams (or (ngram, direction) tuples if directions=True)
-
fonduer.utils.data_model_utils.tabular.
get_neighbor_sentence_ngrams
(mention, d=1, attrib='words', n_min=1, n_max=1, lower=True)[source]¶ Get the ngrams that are in the neighoring Sentences of the given Mention.
Note that if a candidate is passed in, all of its Mentions will be searched.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention whose neighbor Sentences are being searchedattrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower case
Deprecated since version 0.8.3: This will be removed in 0.9.0. Use
textual.get_neighbor_sentence_ngrams()
instead- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.tabular.
get_row_ngrams
(mention, attrib='words', n_min=1, n_max=1, spread=[0, 0], lower=True)[source]¶ Get the ngrams from all Cells that are in the same row as the given Mention.
Note that if a candidate is passed in, all of its Mentions will be searched. Also note that if the mention is not tabular, nothing will be yielded.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention whose row Cells are being searchedattrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedspread (
List
[int
]) – The number of rows above and below to also consider “aligned”.lower (
bool
) – If True, all ngrams will be returned in lower case
- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.tabular.
get_sentence_ngrams
(mention, attrib='words', n_min=1, n_max=1, lower=True)[source]¶ Get the ngrams that are in the Sentence of the given Mention, not including itself.
Note that if a candidate is passed in, all of its Mentions will be searched.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention whose Sentence is being searchedattrib (
str
) – The token attribute type (e.g. words, lemmas, poses)n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower case
Deprecated since version 0.8.3: This will be removed in 0.9.0. Use
textual.get_sentence_ngrams()
instead- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.tabular.
is_tabular_aligned
(c)[source]¶ Return True if all Mentions in the given candidate are from the same Row or Col.
- Parameters
c (
Candidate
) – The candidate whose Mentions are being compared- Return type
bool
-
fonduer.utils.data_model_utils.tabular.
same_cell
(c)[source]¶ Return True if all Mentions in the given candidate are from the same Cell.
- Parameters
c (
Candidate
) – The candidate whose Mentions are being compared- Return type
bool
-
fonduer.utils.data_model_utils.tabular.
same_col
(c)[source]¶ Return True if all Mentions in the given candidate are from the same Col.
- Parameters
c (
Candidate
) – The candidate whose Mentions are being compared- Return type
bool
-
fonduer.utils.data_model_utils.tabular.
same_row
(c)[source]¶ Return True if all Mentions in the given candidate are from the same Row.
- Parameters
c (
Candidate
) – The candidate whose Mentions are being compared- Return type
bool
-
fonduer.utils.data_model_utils.tabular.
same_sentence
(c)[source]¶ Return True if all Mentions in the given candidate are from the same Sentence.
- Parameters
c (
Candidate
) – The candidate whose Mentions are being compared
Deprecated since version 0.8.3: This will be removed in 0.9.0. Use
textual.same_sentence()
instead- Return type
bool
Visual Data Model Utilities¶
Fonduer visual modality utilities.
-
fonduer.utils.data_model_utils.visual.
get_aligned_lemmas
(mention)[source]¶ Return a set of the lemmas aligned visually with the Mention.
Note that if a candidate is passed in, all of its Mentions will be searched.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate.- Return type
Set
[str
]
-
fonduer.utils.data_model_utils.visual.
get_horz_ngrams
(mention, attrib='words', n_min=1, n_max=1, lower=True, from_sentence=True)[source]¶ Return all ngrams which are visually horizontally aligned with the Mention.
Note that if a candidate is passed in, all of its Mentions will be searched.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluateattrib (
str
) – The token attribute type (e.g. words, lemmas, pos_tags). This option is valid only whenfrom_sentence==True
.n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower casefrom_sentence (
bool
) – If True, return ngrams of anySentence
that is horizontally aligned (in the same page) with the mention’sSentence
. If False, return ngrams that are horizontally aligned with the mention no matter whichSentence
they are from.
- Return type
Iterator
[str
]- Returns
a generator of ngrams
-
fonduer.utils.data_model_utils.visual.
get_page
(mention)[source]¶ Return the page number of the given mention.
If a candidate is passed in, this returns the page of its first Mention.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to get the page number of.- Return type
int
-
fonduer.utils.data_model_utils.visual.
get_page_horz_percentile
(mention, page_width=612, page_height=792)[source]¶ Return which percentile from the LEFT in the page the Mention is located in.
Percentile is calculated where the left of the page is 0.0, and the right of the page is 1.0.
Page width and height are based on pt values:
Letter 612x792 Tabloid 792x1224 Ledger 1224x792 Legal 612x1008 Statement 396x612 Executive 540x720 A0 2384x3371 A1 1685x2384 A2 1190x1684 A3 842x1190 A4 595x842 A4Small 595x842 A5 420x595 B4 729x1032 B5 516x729 Folio 612x936 Quarto 610x780 10x14 720x1008
and should match the source documents. Letter size is used by default.
Note that if a candidate is passed in, only the vertical percentile of its first Mention is returned.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluatepage_width (
int
) – The width of the page. Default to Letter paper width.page_height (
int
) – The heigh of the page. Default to Letter paper height.
- Return type
float
-
fonduer.utils.data_model_utils.visual.
get_page_vert_percentile
(mention, page_width=612, page_height=792)[source]¶ Return which percentile from the TOP in the page the Mention is located in.
Percentile is calculated where the top of the page is 0.0, and the bottom of the page is 1.0. For example, a Mention in at the top 1/4 of the page will have a percentile of 0.25.
Page width and height are based on pt values:
Letter 612x792 Tabloid 792x1224 Ledger 1224x792 Legal 612x1008 Statement 396x612 Executive 540x720 A0 2384x3371 A1 1685x2384 A2 1190x1684 A3 842x1190 A4 595x842 A4Small 595x842 A5 420x595 B4 729x1032 B5 516x729 Folio 612x936 Quarto 610x780 10x14 720x1008
and should match the source documents. Letter size is used by default.
Note that if a candidate is passed in, only the vertical percentil of its first Mention is returned.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluatepage_width (
int
) – The width of the page. Default to Letter paper width.page_height (
int
) – The heigh of the page. Default to Letter paper height.
- Return type
float
-
fonduer.utils.data_model_utils.visual.
get_vert_ngrams
(mention, attrib='words', n_min=1, n_max=1, lower=True, from_sentence=True)[source]¶ Return all ngrams which are visually vertically aligned with the Mention.
Note that if a candidate is passed in, all of its Mentions will be searched.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluateattrib (
str
) – The token attribute type (e.g. words, lemmas, pos_tags). This option is valid only whenfrom_sentence==True
.n_min (
int
) – The minimum n of the ngrams that should be returnedn_max (
int
) – The maximum n of the ngrams that should be returnedlower (
bool
) – If True, all ngrams will be returned in lower casefrom_sentence (
bool
) – If True, return ngrams of anySentence
that is vertically aligned (in the same page) with the mention’sSentence
. If False, return ngrams that are vertically aligned with the mention no matter whichSentence
they are from.
- Return type
Iterator
[str
]- Returns
a generator of ngrams
-
fonduer.utils.data_model_utils.visual.
get_visual_aligned_lemmas
(mention)[source]¶ Return a generator of the lemmas aligned visually with the Mention.
Note that if a candidate is passed in, all of its Mentions will be searched.
- Parameters
mention (
Union
[Candidate
,Mention
,TemporarySpanMention
]) – The Mention to evaluate.- Return type
Iterator
[str
]
-
fonduer.utils.data_model_utils.visual.
get_visual_header_ngrams
(c, axis=None)[source]¶ Not implemented.
-
fonduer.utils.data_model_utils.visual.
is_horz_aligned
(c)[source]¶ Return True if all the components of c are horizontally aligned.
Horizontal alignment means that the bounding boxes of each Mention of c shares a similar y-axis value in the visual rendering of the document.
- Parameters
c (
Candidate
) – The candidate to evaluate- Return type
bool
-
fonduer.utils.data_model_utils.visual.
is_vert_aligned
(c)[source]¶ Return true if all the components of c are vertically aligned.
Vertical alignment means that the bounding boxes of each Mention of c shares a similar x-axis value in the visual rendering of the document.
- Parameters
c (
Candidate
) – The candidate to evaluate- Return type
bool
-
fonduer.utils.data_model_utils.visual.
is_vert_aligned_center
(c)[source]¶ Return true if all the components are vertically aligned on their center.
Vertical alignment means that the bounding boxes of each Mention of c shares a similar x-axis value in the visual rendering of the document. In this function the similarity of the x-axis value is based on the center of their bounding boxes.
- Parameters
c (
Candidate
) – The candidate to evaluate- Return type
bool
-
fonduer.utils.data_model_utils.visual.
is_vert_aligned_left
(c)[source]¶ Return true if all components are vertically aligned on their left border.
Vertical alignment means that the bounding boxes of each Mention of c shares a similar x-axis value in the visual rendering of the document. In this function the similarity of the x-axis value is based on the left border of their bounding boxes.
- Parameters
c (
Candidate
) – The candidate to evaluate- Return type
bool
-
fonduer.utils.data_model_utils.visual.
is_vert_aligned_right
(c)[source]¶ Return true if all components vertically aligned on their right border.
Vertical alignment means that the bounding boxes of each Mention of c shares a similar x-axis value in the visual rendering of the document. In this function the similarity of the x-axis value is based on the right border of their bounding boxes.
- Parameters
c (
Candidate
) – The candidate to evaluate- Return type
bool
-
fonduer.utils.data_model_utils.visual.
same_page
(c)[source]¶ Return true if all the components of c are on the same page of the document.
Page numbers are based on the PDF rendering of the document. If a PDF file is provided, it is used. Otherwise, if only a HTML/XML document is provided, a PDF is created and then used to determine the page number of a Mention.
- Parameters
c (
Candidate
) – The candidate to evaluate- Return type
bool