diggrtoolbox.standardize package

Submodules

diggrtoolbox.standardize.standardize module

diggrtoolbox.standardize.standardize.remove_bracketed_text(s)[source]

Removes text in brackets from string :s: .

diggrtoolbox.standardize.standardize.remove_html(s)[source]

Removes html tags from string :s: .

diggrtoolbox.standardize.standardize.remove_punctuation(s)[source]

Removes punctuation from string

diggrtoolbox.standardize.standardize.std(s, lower=True, rm_punct=True, rm_bracket=True, rm_spaces=False, rm_strings=None)[source]

Combined string stardardization function. :lower: lower case :rm_punct: remove punctuation :rm_bracket: remove brackets () [] :rm_spaces: remove white spaces :rm_stirng: list of substrings to be removed from string before comparison

diggrtoolbox.standardize.standardize.std_url(url)[source]

Standardizes urls by removing protocoll and final slash.

Module contents

diggrtoolbox.standardize.remove_html(s)[source]

Removes html tags from string :s: .

diggrtoolbox.standardize.remove_bracketed_text(s)[source]

Removes text in brackets from string :s: .

diggrtoolbox.standardize.remove_punctuation(s)[source]

Removes punctuation from string

diggrtoolbox.standardize.std_url(url)[source]

Standardizes urls by removing protocoll and final slash.

diggrtoolbox.standardize.std(s, lower=True, rm_punct=True, rm_bracket=True, rm_spaces=False, rm_strings=None)[source]

Combined string stardardization function. :lower: lower case :rm_punct: remove punctuation :rm_bracket: remove brackets () [] :rm_spaces: remove white spaces :rm_stirng: list of substrings to be removed from string before comparison