Supervision

The fourth stage of Fonduer’s pipeline is to provide weak supervision which can be used to generate a large set of training data.

Supervision Model Classes

These are the model classes used for supervision in Fonduer.

class fonduer.supervision.models.GoldLabel(**kwargs)[source]

Bases: fonduer.utils.models.annotation.AnnotationMixin, sqlalchemy.ext.declarative.api.Base

A separate class for labels from human annotators or other gold standards.

candidate

The Candidate.

candidate_id

The id of the Candidate being annotated.

keys

A list of strings of each Key name.

values

A list of integer values for each Key.

class fonduer.supervision.models.GoldLabelKey(**kwargs)[source]

Bases: fonduer.utils.models.annotation.AnnotationKeyMixin, sqlalchemy.ext.declarative.api.Base

A gold label’s key that identifies the annotator of the gold label.

candidate_classes

The name of the Key.

name

The name of the Key.

class fonduer.supervision.models.Label(**kwargs)[source]

Bases: fonduer.utils.models.annotation.AnnotationMixin, sqlalchemy.ext.declarative.api.Base

A discrete label associated with a Candidate, indicating a target prediction value.

Labels are used to represent the output of labeling functions. A Label’s annotation key identifies the labeling function that provided the Label.

candidate

The Candidate.

candidate_id

The id of the Candidate being annotated.

keys

A list of strings of each Key name.

values

A list of integer values for each Key.

class fonduer.supervision.models.LabelKey(**kwargs)[source]

Bases: fonduer.utils.models.annotation.AnnotationKeyMixin, sqlalchemy.ext.declarative.api.Base

A label’s key that identifies the labeling function.

candidate_classes

The name of the Key.

name

The name of the Key.

class fonduer.supervision.models.StableLabel(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

A special secondary table for preserving labels created by human annotators in a stable format that does not cascade, and is independent of the Candidate IDs.

Note

This is currently unused.

annotator_name

The annotator’s name

context_stable_ids

Delimited list of the context stable ids.

split

Which split the label belongs to

Core Objects

These are Fonduer’s core objects used for supervision.

class fonduer.supervision.Labeler(session, candidate_classes, parallelism=1)[source]

Bases: fonduer.utils.udf.UDFRunner

An operator to add Label Annotations to Candidates.

Parameters:
  • session – The database session to use.
  • candidate_classes (list) – A list of candidate_subclasses to label.
  • parallelism (int) – The number of processes to use in parallel. Default 1.
apply(docs=None, split=0, train=False, lfs=None, clear=True, parallelism=None, progress_bar=True)[source]

Apply the labels of the specified candidates based on the provided LFs.

Parameters:
  • docs – If provided, apply the LFs to all the candidates in these documents.
  • split (int) – If docs is None, apply the LFs to the candidates in this particular split.
  • train (bool) – Whether or not to update the global key set of labels and the labels of candidates.
  • lfs (list of lists) – A list of lists of labeling functions to apply. Each list should correspond with the candidate_classes used to initialize the Labeler.
  • clear (bool) – Whether or not to clear the labels table before applying these LFs.
  • parallelism (int) – How many threads to use for extraction. This will override the parallelism value used to initialize the Labeler if it is provided.
  • progress_bar (bool) – Whether or not to display a progress bar. The progress bar is measured per document.
Raises:

ValueError – If labeling functions are not provided for each candidate class.

clear(train, split, lfs=None)[source]

Delete Labels of each class from the database.

Parameters:
  • train (bool) – Whether or not to clear the LabelKeys.
  • split (int) – Which split of candidates to clear labels from.
  • lfs – This parameter is ignored.
clear_all()[source]

Delete all Labels.

drop_keys(keys, candidate_classes=None)[source]

Drop the specified keys from LabelKeys.

Parameters:
  • keys (list, tuple) – A list of labeling functions to delete.
  • candidate_classes (list, tuple) – A list of the Candidates to drop the key for. If None, drops the keys for all candidate classes associated with this Labeler.
get_gold_labels(cand_lists, annotator=None)[source]

Load sparse matrix of GoldLabels for each candidate_class.

Parameters:
  • cand_lists (List of list of candidates.) – The candidates to get gold labels for.
  • annotator (str) – A specific annotator key to get labels for. Default None.
Returns:

A list of MxN sparse matrix where M are the candidates and N is the annotators. If annotator is provided, return a list of Mx1 matrix.

Return type:

list[csr_matrix]

get_keys()[source]

Return a list of keys for the Labels.

Returns:List of LabelKeys.
Return type:list
get_label_matrices(cand_lists)[source]

Load sparse matrix of Labels for each candidate_class.

Parameters:cand_lists (List of list of candidates.) – The candidates to get labels for.
Returns:A list of MxN sparse matrix where M are the candidates and N is the labeling functions.
Return type:list[csr_matrix]
update(docs=None, split=0, lfs=None, parallelism=None, progress_bar=True)[source]

Update the labels of the specified candidates based on the provided LFs.

Parameters:
  • docs – If provided, apply the updated LFs to all the candidates in these documents.
  • split – If docs is None, apply the updated LFs to the candidates in this particular split.
  • lfs – A list of lists of labeling functions to update. Each list should correspond with the candidate_classes used to initialize the Labeler.
  • parallelism (int) – How many threads to use for extraction. This will override the parallelism value used to initialize the Labeler if it is provided.
  • progress_bar (bool) – Whether or not to display a progress bar. The progress bar is measured per document.
upsert_keys(keys, candidate_classes=None)[source]

Upsert the specified keys from LabelKeys.

Parameters:
  • keys (list, tuple) – A list of labeling functions to upsert.
  • candidate_classes (list, tuple) – A list of the Candidates to upsert the key for. If None, upsert the keys for all candidate classes associated with this Labeler.