nlpretext.social module

nlpretext.social.preprocess.convert_emoji_to_text(text, code_delimiters=(':', ':'))str[source]

Convert emoji to their CLDR Short Name, according to the unicode convention http://www.unicode.org/emoji/charts/full-emoji-list.html eg. 😀 –> :grinning_face:

Parameters
  • text (str) –

  • code_delimiters (tuple of symbols around the emoji code.) –

  • eg ((':',':') --> :grinning_face:) –

Returns

string

Return type

str

nlpretext.social.preprocess.extract_emojis(text)list[source]

Function that extracts emojis from a text and translates them into words eg. “I take care of my skin 😀 :(” –> [“:grinning_face:”]

Parameters

text (str) –

Returns

list of all emojis converted with their unicode conventions

Return type

list

nlpretext.social.preprocess.extract_hashtags(text)list[source]

Function that extracts words preceded with a ‘#’ eg. “I take care of my skin #selfcare#selfestim” –> [“skincare”, “selfestim”]

Parameters

text (str) –

Returns

list of all hashtags

Return type

list

nlpretext.social.preprocess.extract_mentions(text)list[source]

Function that extracts words preceded with a ‘@’ eg. “I take care of my skin with @thisproduct” –> [“@thisproduct”]

Parameters

text (str) –

Returns

Return type

string

nlpretext.social.preprocess.remove_emoji(text)str[source]

Remove emoji from any str by stripping any unicode in the range of Emoji unicode as defined in the unicode convention: http://www.unicode.org/emoji/charts/full-emoji-list.html

Parameters

text (str) –

Returns

Return type

str

nlpretext.social.preprocess.remove_hashtag(text)str[source]

Function that removes words preceded with a ‘#’ eg. “I take care of my skin #selfcare#selfestim” –> “I take care of my skin”

Parameters

text (str) –

Returns

text of a post without hashtags

Return type

str

nlpretext.social.preprocess.remove_html_tags(text)str[source]

Function that removes words between < and >

Parameters

text (str) –

Returns

Return type

string

nlpretext.social.preprocess.remove_mentions(text)str[source]

Function that removes words preceded with a ‘@’

Parameters

text (str) –

Returns

Return type

string