Supervision

The fourth stage of Fonduer’s pipeline is to provide weak supervision which can be used to generate a large set of training data.

Supervision Model Classes

These are the model classes used for supervision in Fonduer.

Fonduer’s supervision model module.

class fonduer.supervision.models.GoldLabel(**kwargs)[source]

Bases: fonduer.utils.models.annotation.AnnotationMixin, sqlalchemy.orm.decl_api.Base

Gold label class.

A separate class for labels from human annotators or other gold standards.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

candidate

Candidate.

candidate_id

Id of the Candidate being annotated.

keys

List of strings of each Key name.

values: sqlalchemy.sql.schema.Column

A list of integer values for each Key.

class fonduer.supervision.models.GoldLabelKey(**kwargs)[source]

Bases: fonduer.utils.models.annotation.AnnotationKeyMixin, sqlalchemy.orm.decl_api.Base

Gold label key class.

A gold label’s key that identifies the annotator of the gold label.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

candidate_classes

List of strings of each Key name.

name

Name of the Key.

class fonduer.supervision.models.Label(**kwargs)[source]

Bases: fonduer.utils.models.annotation.AnnotationMixin, sqlalchemy.orm.decl_api.Base

Label class.

A discrete label associated with a Candidate, indicating a target prediction value.

Labels are used to represent the output of labeling functions. A Label’s annotation key identifies the labeling function that provided the Label.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

candidate

Candidate.

candidate_id

Id of the Candidate being annotated.

keys

List of strings of each Key name.

values: sqlalchemy.sql.schema.Column

A list of integer values for each Key.

class fonduer.supervision.models.LabelKey(**kwargs)[source]

Bases: fonduer.utils.models.annotation.AnnotationKeyMixin, sqlalchemy.orm.decl_api.Base

Label key class.

A label’s key that identifies the labeling function.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

candidate_classes

List of strings of each Key name.

name

Name of the Key.

class fonduer.supervision.models.StableLabel(**kwargs)[source]

Bases: sqlalchemy.orm.decl_api.Base

Stable label table.

A special secondary table for preserving labels created by human annotators in a stable format that does not cascade, and is independent of the Candidate IDs.

Note

This is currently unused.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

annotator_name

The annotator’s name

context_stable_ids

Delimited list of the context stable ids.

split

Which split the label belongs to

Core Objects

These are Fonduer’s core objects used for supervision.

Fonduer’s supervision module.

class fonduer.supervision.Labeler(session, candidate_classes, parallelism=1)[source]

Bases: fonduer.utils.udf.UDFRunner

An operator to add Label Annotations to Candidates.

Parameters
  • session (Session) – The database session to use.

  • candidate_classes (List[Type[Candidate]]) – A list of candidate_subclasses to label.

  • parallelism (int) – The number of processes to use in parallel. Default 1.

Initialize the Labeler.

apply(docs=None, split=0, train=False, lfs=None, clear=True, parallelism=None, progress_bar=True, table=<class 'fonduer.supervision.models.label.Label'>)[source]

Apply the labels of the specified candidates based on the provided LFs.

Parameters
  • docs (Optional[Collection[Document]]) – If provided, apply the LFs to all the candidates in these documents.

  • split (int) – If docs is None, apply the LFs to the candidates in this particular split.

  • train (bool) – Whether or not to update the global key set of labels and the labels of candidates.

  • lfs (Optional[List[List[Callable]]]) – A list of lists of labeling functions to apply. Each list should correspond with the candidate_classes used to initialize the Labeler.

  • clear (bool) – Whether or not to clear the labels table before applying these LFs.

  • parallelism (Optional[int]) – How many threads to use for extraction. This will override the parallelism value used to initialize the Labeler if it is provided.

  • progress_bar (bool) – Whether or not to display a progress bar. The progress bar is measured per document.

  • table (Table) – A (database) table labels are written to. Takes Label (by default) or GoldLabel.

Raises

ValueError – If labeling functions are not provided for each candidate class.

Return type

None

clear(train, split, lfs=None, table=<class 'fonduer.supervision.models.label.Label'>, **kwargs)[source]

Delete Labels of each class from the database.

Parameters
  • train (bool) – Whether or not to clear the LabelKeys.

  • split (int) – Which split of candidates to clear labels from.

  • lfs (Optional[List[List[Callable]]]) – This parameter is ignored.

  • table (Table) – A (database) table labels are cleared from. Takes Label (by default) or GoldLabel.

Return type

None

clear_all(table=<class 'fonduer.supervision.models.label.Label'>)[source]

Delete all Labels.

Parameters

table (Table) – A (database) table labels are cleared from. Takes Label (by default) or GoldLabel.

Return type

None

drop_keys(keys, candidate_classes=None)[source]

Drop the specified keys from LabelKeys.

Parameters
  • keys (Iterable[Union[str, Callable]]) – A list of labeling functions to delete.

  • candidate_classes (Union[Type[Candidate], List[Type[Candidate]], None]) – A list of the Candidates to drop the key for. If None, drops the keys for all candidate classes associated with this Labeler.

Return type

None

get_gold_labels(cand_lists, annotator=None)[source]

Load dense matrix of GoldLabels for each candidate_class.

Parameters
  • cand_lists (List[List[Candidate]]) – The candidates to get gold labels for.

  • annotator (Optional[str]) – A specific annotator key to get labels for. Default None.

Raises

ValueError – If get_gold_labels is called before gold labels are loaded, the result will contain ABSTAIN values. We raise a ValueError to help indicate this potential mistake to the user.

Return type

List[ndarray]

Returns

A list of MxN dense matrix where M are the candidates and N is the annotators. If annotator is provided, return a list of Mx1 matrix.

get_keys()[source]

Return a list of keys for the Labels.

Return type

List[LabelKey]

Returns

List of LabelKeys.

get_label_matrices(cand_lists)[source]

Load dense matrix of Labels for each candidate_class.

Parameters

cand_lists (List[List[Candidate]]) – The candidates to get labels for.

Return type

List[ndarray]

Returns

A list of MxN dense matrix where M are the candidates and N is the labeling functions.

last_docs: Set[str]

The last set of documents that apply() was called on

update(docs=None, split=0, lfs=None, parallelism=None, progress_bar=True, table=<class 'fonduer.supervision.models.label.Label'>)[source]

Update the labels of the specified candidates based on the provided LFs.

Parameters
  • docs (Optional[Collection[Document]]) – If provided, apply the updated LFs to all the candidates in these documents.

  • split (int) – If docs is None, apply the updated LFs to the candidates in this particular split.

  • lfs (Optional[List[List[Callable]]]) – A list of lists of labeling functions to update. Each list should correspond with the candidate_classes used to initialize the Labeler.

  • parallelism (Optional[int]) – How many threads to use for extraction. This will override the parallelism value used to initialize the Labeler if it is provided.

  • progress_bar (bool) – Whether or not to display a progress bar. The progress bar is measured per document.

  • table (Table) – A (database) table labels are written to. Takes Label (by default) or GoldLabel.

Return type

None

upsert_keys(keys, candidate_classes=None)[source]

Upsert the specified keys from LabelKeys.

Parameters
  • keys (Iterable[Union[str, Callable]]) – A list of labeling functions to upsert.

  • candidate_classes (Union[Type[Candidate], List[Type[Candidate]], None]) – A list of the Candidates to upsert the key for. If None, upsert the keys for all candidate classes associated with this Labeler.

Return type

None