nlpretext.augmentation module¶
Bases:
ValueError
-
nlpretext.augmentation.text_augmentation.
are_entities_in_augmented_text
(entities: list, augmented_text: str) → bool[source]¶ Given a list of entities, check if all the words associated to each entity are still present in augmented text.
- Parameters
entities (list) –
entities associated to initial text, must be in the following format: [
- {
‘entity’: str, ‘word’: str, ‘startCharIndex’: int, ‘endCharIndex’: int
}, {
…
}
]
augmented_text (str) –
- Returns
- Return type
True if all entities are present in augmented text, False otherwise
-
nlpretext.augmentation.text_augmentation.
augment_text
(text: str, method: str, stopwords: Optional[List[str]] = None, entities: Optional[list] = None) → Tuple[str, list][source]¶ Given a text with or without associated entities, generate a new text by modifying some words in the initial one, modifications depend on the chosen method (substitution with synonym, addition, deletion). If entities are given as input, they will remain unchanged. If you want some words other than entities to remain unchanged, specify it within the stopwords argument.
- Parameters
text (string) –
method ({'wordnet_synonym', 'aug_sub_bert'}) – augmenter to use (‘wordnet_synonym’ or ‘aug_sub_bert’)
stopwords (list, optional) – list of words to freeze throughout the augmentation
entities (list, optional) –
entities associated to text if any, must be in the following format: [
- {
‘entity’: str, ‘word’: str, ‘startCharIndex’: int, ‘endCharIndex’: int
}, {
…
}
]
- Returns
- Return type
Augmented text and optional augmented entities
-
nlpretext.augmentation.text_augmentation.
check_interval_included
(element1: dict, element2: dict) → Optional[Tuple[dict, dict]][source]¶ Comparison of two entities on start and end positions to find if they are nested
- Parameters
element1 (dict) –
element2 (dict) –
both of them in the following format {
’entity’: str, ‘word’: str, ‘startCharIndex’: int, ‘endCharIndex’: int
}
- Returns
If there is an entity to remove among the two returns a tuple (element to remove, element to keep)
If not, returns None
-
nlpretext.augmentation.text_augmentation.
clean_sentence_entities
(text: str, entities: list) → list[source]¶ Paired entities check to remove nested entities, the longest entity is kept
- Parameters
text (str) – augmented text
entities (list) –
entities associated to augmented text, must be in the following format: [
- {
‘entity’: str, ‘word’: str, ‘startCharIndex’: int, ‘endCharIndex’: int
}, {
…
}
]
- Returns
- Return type
Cleaned entities
-
nlpretext.augmentation.text_augmentation.
get_augmented_entities
(sentence_augmented: str, entities: list) → list[source]¶ Get entities with updated positions (start and end) in augmented text
- Parameters
sentence_augmented (str) – augmented text
entities (list) –
entities associated to initial text, must be in the following format: [
- {
‘entity’: str, ‘word’: str, ‘startCharIndex’: int, ‘endCharIndex’: int
}, {
…
}
]
- Returns
- Return type
Entities with updated positions related to augmented text
-
nlpretext.augmentation.text_augmentation.
get_augmenter
(method: str, stopwords: Optional[List[str]] = None) → nlpaug.augmenter.word.synonym.SynonymAug[source]¶ Initialize an augmenter depending on the given method.
- Parameters
method (str (supported methods: wordnet_synonym and aug_sub_bert)) –
stopwords (list) – list of words to freeze throughout the augmentation
- Returns
- Return type
Initialized nlpaug augmenter
-
nlpretext.augmentation.text_augmentation.
process_entities_and_text
(entities: list, text: str, augmented_text: str)[source]¶ Given a list of initial entities, verify that they have not been altered by the data augmentation operation and are still in the augmented text. :param entities: entities associated to text, must be in the following format:
- [
- {
‘entity’: str, ‘word’: str, ‘startCharIndex’: int, ‘endCharIndex’: int
}, {
…
}
]
- Parameters
text (str) – initial text
augmented_text (str) – new text resulting of data augmentation operation
- Returns
- Return type
Augmented text and entities with their updated position in augmented text