Supervision¶
The fourth stage of Fonduer’s pipeline is to provide weak supervision which can be used to generate a large set of training data.
Supervision Model Classes¶
These are the model classes used for supervision in Fonduer.
Fonduer’s supervision model module.
-
class
fonduer.supervision.models.
GoldLabel
(**kwargs)[source]¶ Bases:
fonduer.utils.models.annotation.AnnotationMixin
,sqlalchemy.orm.decl_api.Base
Gold label class.
A separate class for labels from human annotators or other gold standards.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
candidate
¶ Candidate
.
-
candidate_id
¶ Id of the
Candidate
being annotated.
-
keys
¶ List of strings of each Key name.
-
values
: sqlalchemy.sql.schema.Column¶ A list of integer values for each Key.
-
-
class
fonduer.supervision.models.
GoldLabelKey
(**kwargs)[source]¶ Bases:
fonduer.utils.models.annotation.AnnotationKeyMixin
,sqlalchemy.orm.decl_api.Base
Gold label key class.
A gold label’s key that identifies the annotator of the gold label.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
candidate_classes
¶ List of strings of each Key name.
-
name
¶ Name of the Key.
-
-
class
fonduer.supervision.models.
Label
(**kwargs)[source]¶ Bases:
fonduer.utils.models.annotation.AnnotationMixin
,sqlalchemy.orm.decl_api.Base
Label class.
A discrete label associated with a Candidate, indicating a target prediction value.
Labels are used to represent the output of labeling functions. A Label’s annotation key identifies the labeling function that provided the Label.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
candidate
¶ Candidate
.
-
candidate_id
¶ Id of the
Candidate
being annotated.
-
keys
¶ List of strings of each Key name.
-
values
: sqlalchemy.sql.schema.Column¶ A list of integer values for each Key.
-
-
class
fonduer.supervision.models.
LabelKey
(**kwargs)[source]¶ Bases:
fonduer.utils.models.annotation.AnnotationKeyMixin
,sqlalchemy.orm.decl_api.Base
Label key class.
A label’s key that identifies the labeling function.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
candidate_classes
¶ List of strings of each Key name.
-
name
¶ Name of the Key.
-
-
class
fonduer.supervision.models.
StableLabel
(**kwargs)[source]¶ Bases:
sqlalchemy.orm.decl_api.Base
Stable label table.
A special secondary table for preserving labels created by human annotators in a stable format that does not cascade, and is independent of the Candidate IDs.
Note
This is currently unused.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
annotator_name
¶ The annotator’s name
-
context_stable_ids
¶ Delimited list of the context stable ids.
-
split
¶ Which split the label belongs to
-
Core Objects¶
These are Fonduer’s core objects used for supervision.
Fonduer’s supervision module.
-
class
fonduer.supervision.
Labeler
(session, candidate_classes, parallelism=1)[source]¶ Bases:
fonduer.utils.udf.UDFRunner
An operator to add Label Annotations to Candidates.
- Parameters
session (
Session
) – The database session to use.candidate_classes (
List
[Type
[Candidate
]]) – A list of candidate_subclasses to label.parallelism (
int
) – The number of processes to use in parallel. Default 1.
Initialize the Labeler.
-
apply
(docs=None, split=0, train=False, lfs=None, clear=True, parallelism=None, progress_bar=True, table=<class 'fonduer.supervision.models.label.Label'>)[source]¶ Apply the labels of the specified candidates based on the provided LFs.
- Parameters
docs (
Optional
[Collection
[Document
]]) – If provided, apply the LFs to all the candidates in these documents.split (
int
) – If docs is None, apply the LFs to the candidates in this particular split.train (
bool
) – Whether or not to update the global key set of labels and the labels of candidates.lfs (
Optional
[List
[List
[Callable
]]]) – A list of lists of labeling functions to apply. Each list should correspond with the candidate_classes used to initialize the Labeler.clear (
bool
) – Whether or not to clear the labels table before applying these LFs.parallelism (
Optional
[int
]) – How many threads to use for extraction. This will override the parallelism value used to initialize the Labeler if it is provided.progress_bar (
bool
) – Whether or not to display a progress bar. The progress bar is measured per document.table (
Table
) – A (database) table labels are written to. Takes Label (by default) or GoldLabel.
- Raises
ValueError – If labeling functions are not provided for each candidate class.
- Return type
None
-
clear
(train, split, lfs=None, table=<class 'fonduer.supervision.models.label.Label'>, **kwargs)[source]¶ Delete Labels of each class from the database.
- Parameters
train (
bool
) – Whether or not to clear the LabelKeys.split (
int
) – Which split of candidates to clear labels from.lfs (
Optional
[List
[List
[Callable
]]]) – This parameter is ignored.table (
Table
) – A (database) table labels are cleared from. Takes Label (by default) or GoldLabel.
- Return type
None
-
clear_all
(table=<class 'fonduer.supervision.models.label.Label'>)[source]¶ Delete all Labels.
- Parameters
table (
Table
) – A (database) table labels are cleared from. Takes Label (by default) or GoldLabel.- Return type
None
-
drop_keys
(keys, candidate_classes=None)[source]¶ Drop the specified keys from LabelKeys.
- Parameters
keys (
Iterable
[Union
[str
,Callable
]]) – A list of labeling functions to delete.candidate_classes (
Union
[Type
[Candidate
],List
[Type
[Candidate
]],None
]) – A list of the Candidates to drop the key for. If None, drops the keys for all candidate classes associated with this Labeler.
- Return type
None
-
get_gold_labels
(cand_lists, annotator=None)[source]¶ Load dense matrix of GoldLabels for each candidate_class.
- Parameters
cand_lists (
List
[List
[Candidate
]]) – The candidates to get gold labels for.annotator (
Optional
[str
]) – A specific annotator key to get labels for. Default None.
- Raises
ValueError – If get_gold_labels is called before gold labels are loaded, the result will contain ABSTAIN values. We raise a ValueError to help indicate this potential mistake to the user.
- Return type
List
[ndarray
]- Returns
A list of MxN dense matrix where M are the candidates and N is the annotators. If annotator is provided, return a list of Mx1 matrix.
-
get_keys
()[source]¶ Return a list of keys for the Labels.
- Return type
List
[LabelKey
]- Returns
List of LabelKeys.
-
get_label_matrices
(cand_lists)[source]¶ Load dense matrix of Labels for each candidate_class.
- Parameters
cand_lists (
List
[List
[Candidate
]]) – The candidates to get labels for.- Return type
List
[ndarray
]- Returns
A list of MxN dense matrix where M are the candidates and N is the labeling functions.
-
last_docs
: Set[str]¶ The last set of documents that apply() was called on
-
update
(docs=None, split=0, lfs=None, parallelism=None, progress_bar=True, table=<class 'fonduer.supervision.models.label.Label'>)[source]¶ Update the labels of the specified candidates based on the provided LFs.
- Parameters
docs (
Optional
[Collection
[Document
]]) – If provided, apply the updated LFs to all the candidates in these documents.split (
int
) – If docs is None, apply the updated LFs to the candidates in this particular split.lfs (
Optional
[List
[List
[Callable
]]]) – A list of lists of labeling functions to update. Each list should correspond with the candidate_classes used to initialize the Labeler.parallelism (
Optional
[int
]) – How many threads to use for extraction. This will override the parallelism value used to initialize the Labeler if it is provided.progress_bar (
bool
) – Whether or not to display a progress bar. The progress bar is measured per document.table (
Table
) – A (database) table labels are written to. Takes Label (by default) or GoldLabel.
- Return type
None
-
upsert_keys
(keys, candidate_classes=None)[source]¶ Upsert the specified keys from LabelKeys.
- Parameters
keys (
Iterable
[Union
[str
,Callable
]]) – A list of labeling functions to upsert.candidate_classes (
Union
[Type
[Candidate
],List
[Type
[Candidate
]],None
]) – A list of the Candidates to upsert the key for. If None, upsert the keys for all candidate classes associated with this Labeler.
- Return type
None