Data Model Utilities

This page shows descriptions of the utility functions included with Fonduer which can be used to label candidates based on textual, structural, tabular, and visual information. We group each data model utility based on the modality of information that they leverage.


General Data Model Utilities

Fonduer data model utils.

fonduer.utils.data_model_utils.utils.get_matches(lf, candidate_set, match_values=[1, - 1])[source]

Return a list of candidates that are matched by a particular LF.

A simple helper function to see how many matches (non-zero by default) an LF gets.

Parameters
  • lf (Callable) – The labeling function to apply to the candidate_set

  • candidate_set (Set[Candidate]) – The set of candidates to evaluate

  • match_values (List[int]) – An option list of the values to consider as matched. [1, -1] by default.

Return type

List[Candidate]

fonduer.utils.data_model_utils.utils.is_superset(a, b)[source]

Check if a is a superset of b.

This is typically used to check if ALL of a list of sentences is in the ngrams returned by an lf_helper.

Parameters
  • a (Iterable) – A collection of items

  • b (Iterable) – A collection of items

Return type

bool

fonduer.utils.data_model_utils.utils.overlap(a, b)[source]

Check if a overlaps b.

This is typically used to check if ANY of a list of sentences is in the ngrams returned by an lf_helper.

Parameters
  • a (Iterable) – A collection of items

  • b (Iterable) – A collection of items

Return type

bool

Textual Data Model Utilities

Fonduer textual modality utilities.

fonduer.utils.data_model_utils.textual.get_between_ngrams(c, attrib='words', n_min=1, n_max=1, lower=True)[source]

Return the ngrams between two unary Mentions of a binary-Mention Candidate.

Get the ngrams between two unary Mentions of a binary-Mention Candidate, where both share the same sentence Context.

Parameters
  • c (Candidate) – The binary-Mention Candidate to evaluate.

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If ‘True’, all ngrams will be returned in lower case

Return type

Iterator[str]

fonduer.utils.data_model_utils.textual.get_left_ngrams(mention, window=3, attrib='words', n_min=1, n_max=1, lower=True)[source]

Get the ngrams within a window to the left from the sentence Context.

For higher-arity Candidates, defaults to the first argument.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate. If a candidate is given, default to its first Mention.

  • window (int) – The number of tokens to the left of the first argument to return.

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

Return type

Iterator[str]

fonduer.utils.data_model_utils.textual.get_neighbor_sentence_ngrams(mention, d=1, attrib='words', n_min=1, n_max=1, lower=True)[source]

Get the ngrams that are in the neighoring Sentences of the given Mention.

Note that if a candidate is passed in, all of its Mentions will be searched.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention whose neighbor Sentences are being searched

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

Return type

Iterator[str]

fonduer.utils.data_model_utils.textual.get_right_ngrams(mention, window=3, attrib='words', n_min=1, n_max=1, lower=True)[source]

Get the ngrams within a window to the right from the sentence Context.

For higher-arity Candidates, defaults to the last argument.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate. If a candidate is given, default to its last Mention.

  • window (int) – The number of tokens to the left of the first argument to return

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

Return type

Iterator[str]

fonduer.utils.data_model_utils.textual.get_sentence_ngrams(mention, attrib='words', n_min=1, n_max=1, lower=True)[source]

Get the ngrams that are in the Sentence of the given Mention, not including itself.

Note that if a candidate is passed in, all of its Mentions will be searched.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention whose Sentence is being searched

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

Return type

Iterator[str]

fonduer.utils.data_model_utils.textual.same_sentence(c)[source]

Return True if all Mentions in the given candidate are from the same Sentence.

Parameters

c (Candidate) – The candidate whose Mentions are being compared

Return type

bool

Structural Data Model Utilities

Fonduer structural modality utilities.

fonduer.utils.data_model_utils.structural.common_ancestor(c)[source]

Return the path to the root that is shared between a multinary-Mention Candidate.

In particular, this is the common path of HTML tags.

Parameters

c (Tuple[SpanMention, …]) – The multinary-Mention Candidate to evaluate

Return type

List[str]

fonduer.utils.data_model_utils.structural.get_ancestor_class_names(mention)[source]

Return the HTML classes of the Mention’s ancestors.

If a candidate is passed in, only the ancestors of its first Mention are returned.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

Return type

List[str]

fonduer.utils.data_model_utils.structural.get_ancestor_id_names(mention)[source]

Return the HTML id’s of the Mention’s ancestors.

If a candidate is passed in, only the ancestors of its first Mention are returned.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

Return type

List[str]

fonduer.utils.data_model_utils.structural.get_ancestor_tag_names(mention)[source]

Return the HTML tag of the Mention’s ancestors.

For example, [‘html’, ‘body’, ‘p’]. If a candidate is passed in, only the ancestors of its first Mention are returned.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

Return type

List[str]

fonduer.utils.data_model_utils.structural.get_attributes(mention)[source]

Return the HTML attributes of the Mention.

If a candidate is passed in, only the tag of its first Mention is returned.

A sample outout of this function on a Mention in a paragraph tag is [u’style=padding-top: 8pt;padding-left: 20pt;text-indent: 0pt;text-align: left;’]

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

Return type

List[str]

Returns

list of strings representing HTML attributes

fonduer.utils.data_model_utils.structural.get_next_sibling_tags(mention)[source]

Return the HTML tag of the Mention’s next siblings.

Next siblings are Mentions which are at the same level in the HTML tree as the given mention, but are declared after the given mention. If a candidate is passed in, only the next siblings of its last Mention are considered in the calculation.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

Return type

List[str]

fonduer.utils.data_model_utils.structural.get_parent_tag(mention)[source]

Return the HTML tag of the Mention’s parent.

These may be tags such as ‘p’, ‘h2’, ‘table’, ‘div’, etc. If a candidate is passed in, only the tag of its first Mention is returned.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

Return type

Optional[str]

fonduer.utils.data_model_utils.structural.get_prev_sibling_tags(mention)[source]

Return the HTML tag of the Mention’s previous siblings.

Previous siblings are Mentions which are at the same level in the HTML tree as the given mention, but are declared before the given mention. If a candidate is passed in, only the previous siblings of its first Mention are considered in the calculation.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

Return type

List[str]

fonduer.utils.data_model_utils.structural.get_tag(mention)[source]

Return the HTML tag of the Mention.

If a candidate is passed in, only the tag of its first Mention is returned.

These may be tags such as ‘p’, ‘h2’, ‘table’, ‘div’, etc.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

Return type

str

fonduer.utils.data_model_utils.structural.lowest_common_ancestor_depth(c)[source]

Return the lowest common ancestor depth.

In particular, return the minimum distance between a multinary-Mention Candidate to their lowest common ancestor.

For example, if the tree looked like this:

html
├──<div> Mention 1 </div>
├──table
│    ├──tr
│    │  └──<th> Mention 2 </th>

we return 1, the distance from Mention 1 to the html root. Smaller values indicate that two Mentions are close structurally, while larger values indicate that two Mentions are spread far apart structurally in the document.

Parameters

c (Tuple[SpanMention, …]) – The multinary-Mention Candidate to evaluate

Return type

int

Tabular Data Model Utilities

Fonduer tabular modality utilities.

fonduer.utils.data_model_utils.tabular.get_aligned_ngrams(mention, attrib='words', n_min=1, n_max=1, spread=[0, 0], lower=True)[source]

Get the ngrams from all Cells in the same row or column as the given Mention.

Note that if a candidate is passed in, all of its Mentions will be searched. Also note that if the mention is not tabular, nothing will be yielded.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention whose row and column Cells are being searched

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • spread (List[int]) – The number of rows/cols above/below/left/right to also consider “aligned”.

  • lower (bool) – If True, all ngrams will be returned in lower case

Return type

Iterator[str]

fonduer.utils.data_model_utils.tabular.get_cell_ngrams(mention, attrib='words', n_min=1, n_max=1, lower=True)[source]

Get the ngrams that are in the Cell of the given mention, not including itself.

Note that if a candidate is passed in, all of its Mentions will be searched. Also note that if the mention is not tabular, nothing will be yielded.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention whose Cell is being searched

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

Return type

Iterator[str]

fonduer.utils.data_model_utils.tabular.get_col_ngrams(mention, attrib='words', n_min=1, n_max=1, spread=[0, 0], lower=True)[source]

Get the ngrams from all Cells that are in the same column as the given Mention.

Note that if a candidate is passed in, all of its Mentions will be searched. Also note that if the mention is not tabular, nothing will be yielded.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention whose column Cells are being searched

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • spread (List[int]) – The number of cols left and right to also consider “aligned”.

  • lower (bool) – If True, all ngrams will be returned in lower case

Return type

Iterator[str]

fonduer.utils.data_model_utils.tabular.get_head_ngrams(mention, axis=None, attrib='words', n_min=1, n_max=1, lower=True)[source]

Get the ngrams from the cell in the head of the row or column.

More specifically, this returns the ngrams in the leftmost cell in a row and/or the ngrams in the topmost cell in the column, depending on the axis parameter.

Note that if a candidate is passed in, all of its Mentions will be searched. Also note that if the mention is not tabular, nothing will be yielded.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention whose head Cells are being returned

  • axis (Optional[str]) – Which axis {‘row’, ‘col’} to search. If None, then both row and col are searched.

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

Return type

Iterator[str]

fonduer.utils.data_model_utils.tabular.get_max_col_num(mention)[source]

Return the largest column number that a Mention occupies.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate. If a candidate is given, default to its last Mention.

Return type

Optional[int]

fonduer.utils.data_model_utils.tabular.get_max_row_num(mention)[source]

Return the largest row number that a Mention occupies.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate. If a candidate is given, default to its last Mention.

Return type

Optional[int]

fonduer.utils.data_model_utils.tabular.get_min_col_num(mention)[source]

Return the lowest column number that a Mention occupies.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate. If a candidate is given, default to its first Mention.

Return type

Optional[int]

fonduer.utils.data_model_utils.tabular.get_min_row_num(mention)[source]

Return the lowest row number that a Mention occupies.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate. If a candidate is given, default to its first Mention.

Return type

Optional[int]

fonduer.utils.data_model_utils.tabular.get_neighbor_cell_ngrams(mention, dist=1, directions=False, attrib='words', n_min=1, n_max=1, lower=True)[source]

Get ngrams from all neighbor Cells.

Get the ngrams from all Cells that are within a given Cell distance in one direction from the given Mention.

Note that if a candidate is passed in, all of its Mentions will be searched. If directions=True`, each ngram will be returned with a direction in {‘UP’, ‘DOWN’, ‘LEFT’, ‘RIGHT’}. Also note that if the mention is not tabular, nothing will be yielded.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention whose neighbor Cells are being searched

  • dist (int) – The Cell distance within which a neighbor Cell must be to be considered

  • directions (bool) – A Boolean expressing whether or not to return the direction of each ngram

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

Return type

Iterator[Union[str, Tuple[str, str]]]

Returns

a generator of ngrams (or (ngram, direction) tuples if directions=True)

fonduer.utils.data_model_utils.tabular.get_neighbor_sentence_ngrams(mention, d=1, attrib='words', n_min=1, n_max=1, lower=True)[source]

Get the ngrams that are in the neighoring Sentences of the given Mention.

Note that if a candidate is passed in, all of its Mentions will be searched.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention whose neighbor Sentences are being searched

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

Deprecated since version 0.8.3: This will be removed in 0.9.0. Use textual.get_neighbor_sentence_ngrams() instead

Return type

Iterator[str]

fonduer.utils.data_model_utils.tabular.get_row_ngrams(mention, attrib='words', n_min=1, n_max=1, spread=[0, 0], lower=True)[source]

Get the ngrams from all Cells that are in the same row as the given Mention.

Note that if a candidate is passed in, all of its Mentions will be searched. Also note that if the mention is not tabular, nothing will be yielded.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention whose row Cells are being searched

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • spread (List[int]) – The number of rows above and below to also consider “aligned”.

  • lower (bool) – If True, all ngrams will be returned in lower case

Return type

Iterator[str]

fonduer.utils.data_model_utils.tabular.get_sentence_ngrams(mention, attrib='words', n_min=1, n_max=1, lower=True)[source]

Get the ngrams that are in the Sentence of the given Mention, not including itself.

Note that if a candidate is passed in, all of its Mentions will be searched.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention whose Sentence is being searched

  • attrib (str) – The token attribute type (e.g. words, lemmas, poses)

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

Deprecated since version 0.8.3: This will be removed in 0.9.0. Use textual.get_sentence_ngrams() instead

Return type

Iterator[str]

fonduer.utils.data_model_utils.tabular.is_tabular_aligned(c)[source]

Return True if all Mentions in the given candidate are from the same Row or Col.

Parameters

c (Candidate) – The candidate whose Mentions are being compared

Return type

bool

fonduer.utils.data_model_utils.tabular.same_cell(c)[source]

Return True if all Mentions in the given candidate are from the same Cell.

Parameters

c (Candidate) – The candidate whose Mentions are being compared

Return type

bool

fonduer.utils.data_model_utils.tabular.same_col(c)[source]

Return True if all Mentions in the given candidate are from the same Col.

Parameters

c (Candidate) – The candidate whose Mentions are being compared

Return type

bool

fonduer.utils.data_model_utils.tabular.same_row(c)[source]

Return True if all Mentions in the given candidate are from the same Row.

Parameters

c (Candidate) – The candidate whose Mentions are being compared

Return type

bool

fonduer.utils.data_model_utils.tabular.same_sentence(c)[source]

Return True if all Mentions in the given candidate are from the same Sentence.

Parameters

c (Candidate) – The candidate whose Mentions are being compared

Deprecated since version 0.8.3: This will be removed in 0.9.0. Use textual.same_sentence() instead

Return type

bool

fonduer.utils.data_model_utils.tabular.same_table(c)[source]

Return True if all Mentions in the given candidate are from the same Table.

Parameters

c (Candidate) – The candidate whose Mentions are being compared

Return type

bool

Visual Data Model Utilities

Fonduer visual modality utilities.

fonduer.utils.data_model_utils.visual.get_aligned_lemmas(mention)[source]

Return a set of the lemmas aligned visually with the Mention.

Note that if a candidate is passed in, all of its Mentions will be searched.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate.

Return type

Set[str]

fonduer.utils.data_model_utils.visual.get_horz_ngrams(mention, attrib='words', n_min=1, n_max=1, lower=True, from_sentence=True)[source]

Return all ngrams which are visually horizontally aligned with the Mention.

Note that if a candidate is passed in, all of its Mentions will be searched.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

  • attrib (str) – The token attribute type (e.g. words, lemmas, pos_tags). This option is valid only when from_sentence==True.

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

  • from_sentence (bool) – If True, return ngrams of any Sentence that is horizontally aligned (in the same page) with the mention’s Sentence. If False, return ngrams that are horizontally aligned with the mention no matter which Sentence they are from.

Return type

Iterator[str]

Returns

a generator of ngrams

fonduer.utils.data_model_utils.visual.get_page(mention)[source]

Return the page number of the given mention.

If a candidate is passed in, this returns the page of its first Mention.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to get the page number of.

Return type

int

fonduer.utils.data_model_utils.visual.get_page_horz_percentile(mention, page_width=612, page_height=792)[source]

Return which percentile from the LEFT in the page the Mention is located in.

Percentile is calculated where the left of the page is 0.0, and the right of the page is 1.0.

Page width and height are based on pt values:

Letter      612x792
Tabloid     792x1224
Ledger      1224x792
Legal       612x1008
Statement   396x612
Executive   540x720
A0          2384x3371
A1          1685x2384
A2          1190x1684
A3          842x1190
A4          595x842
A4Small     595x842
A5          420x595
B4          729x1032
B5          516x729
Folio       612x936
Quarto      610x780
10x14       720x1008

and should match the source documents. Letter size is used by default.

Note that if a candidate is passed in, only the vertical percentile of its first Mention is returned.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

  • page_width (int) – The width of the page. Default to Letter paper width.

  • page_height (int) – The heigh of the page. Default to Letter paper height.

Return type

float

fonduer.utils.data_model_utils.visual.get_page_vert_percentile(mention, page_width=612, page_height=792)[source]

Return which percentile from the TOP in the page the Mention is located in.

Percentile is calculated where the top of the page is 0.0, and the bottom of the page is 1.0. For example, a Mention in at the top 1/4 of the page will have a percentile of 0.25.

Page width and height are based on pt values:

Letter      612x792
Tabloid     792x1224
Ledger      1224x792
Legal       612x1008
Statement   396x612
Executive   540x720
A0          2384x3371
A1          1685x2384
A2          1190x1684
A3          842x1190
A4          595x842
A4Small     595x842
A5          420x595
B4          729x1032
B5          516x729
Folio       612x936
Quarto      610x780
10x14       720x1008

and should match the source documents. Letter size is used by default.

Note that if a candidate is passed in, only the vertical percentil of its first Mention is returned.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

  • page_width (int) – The width of the page. Default to Letter paper width.

  • page_height (int) – The heigh of the page. Default to Letter paper height.

Return type

float

fonduer.utils.data_model_utils.visual.get_vert_ngrams(mention, attrib='words', n_min=1, n_max=1, lower=True, from_sentence=True)[source]

Return all ngrams which are visually vertically aligned with the Mention.

Note that if a candidate is passed in, all of its Mentions will be searched.

Parameters
  • mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate

  • attrib (str) – The token attribute type (e.g. words, lemmas, pos_tags). This option is valid only when from_sentence==True.

  • n_min (int) – The minimum n of the ngrams that should be returned

  • n_max (int) – The maximum n of the ngrams that should be returned

  • lower (bool) – If True, all ngrams will be returned in lower case

  • from_sentence (bool) – If True, return ngrams of any Sentence that is vertically aligned (in the same page) with the mention’s Sentence. If False, return ngrams that are vertically aligned with the mention no matter which Sentence they are from.

Return type

Iterator[str]

Returns

a generator of ngrams

fonduer.utils.data_model_utils.visual.get_vert_ngrams_center(c)[source]

Not implemented.

fonduer.utils.data_model_utils.visual.get_vert_ngrams_left(c)[source]

Not implemented.

fonduer.utils.data_model_utils.visual.get_vert_ngrams_right(c)[source]

Not implemented.

fonduer.utils.data_model_utils.visual.get_visual_aligned_lemmas(mention)[source]

Return a generator of the lemmas aligned visually with the Mention.

Note that if a candidate is passed in, all of its Mentions will be searched.

Parameters

mention (Union[Candidate, Mention, TemporarySpanMention]) – The Mention to evaluate.

Return type

Iterator[str]

fonduer.utils.data_model_utils.visual.get_visual_distance(c, axis=None)[source]

Not implemented.

fonduer.utils.data_model_utils.visual.get_visual_header_ngrams(c, axis=None)[source]

Not implemented.

fonduer.utils.data_model_utils.visual.is_horz_aligned(c)[source]

Return True if all the components of c are horizontally aligned.

Horizontal alignment means that the bounding boxes of each Mention of c shares a similar y-axis value in the visual rendering of the document.

Parameters

c (Candidate) – The candidate to evaluate

Return type

bool

fonduer.utils.data_model_utils.visual.is_vert_aligned(c)[source]

Return true if all the components of c are vertically aligned.

Vertical alignment means that the bounding boxes of each Mention of c shares a similar x-axis value in the visual rendering of the document.

Parameters

c (Candidate) – The candidate to evaluate

Return type

bool

fonduer.utils.data_model_utils.visual.is_vert_aligned_center(c)[source]

Return true if all the components are vertically aligned on their center.

Vertical alignment means that the bounding boxes of each Mention of c shares a similar x-axis value in the visual rendering of the document. In this function the similarity of the x-axis value is based on the center of their bounding boxes.

Parameters

c (Candidate) – The candidate to evaluate

Return type

bool

fonduer.utils.data_model_utils.visual.is_vert_aligned_left(c)[source]

Return true if all components are vertically aligned on their left border.

Vertical alignment means that the bounding boxes of each Mention of c shares a similar x-axis value in the visual rendering of the document. In this function the similarity of the x-axis value is based on the left border of their bounding boxes.

Parameters

c (Candidate) – The candidate to evaluate

Return type

bool

fonduer.utils.data_model_utils.visual.is_vert_aligned_right(c)[source]

Return true if all components vertically aligned on their right border.

Vertical alignment means that the bounding boxes of each Mention of c shares a similar x-axis value in the visual rendering of the document. In this function the similarity of the x-axis value is based on the right border of their bounding boxes.

Parameters

c (Candidate) – The candidate to evaluate

Return type

bool

fonduer.utils.data_model_utils.visual.same_page(c)[source]

Return true if all the components of c are on the same page of the document.

Page numbers are based on the PDF rendering of the document. If a PDF file is provided, it is used. Otherwise, if only a HTML/XML document is provided, a PDF is created and then used to determine the page number of a Mention.

Parameters

c (Candidate) – The candidate to evaluate

Return type

bool