diggrtoolbox.linking package

Submodules

diggrtoolbox.linking.config module

diggrtoolbox.linking.helpers module

diggrlink helpers module contains helper functions used for dataset linking

diggrtoolbox.linking.helpers.extract_all_numbers(a)[source]

returns all numbers (roman and arabic) in string :a:

diggrtoolbox.linking.helpers.load_excluded_titles()[source]

Load list of excudled titles from resource file

diggrtoolbox.linking.helpers.load_series()[source]

Load list of series to remove from title

diggrtoolbox.linking.helpers.remove_numbers(a)[source]

removes all numbers (arabic and roman) from string a

diggrtoolbox.linking.helpers.remove_tm(a)[source]

Removes trademark symbols from string :a:

diggrtoolbox.linking.helpers.std(a)[source]

standardizes string :a: (removes punctuation, blanks, macrons; sets string to lower case)

diggrtoolbox.linking.helpers.word_before_after(a, sep)[source]

returns word before and after :sep: in string :a:

diggrtoolbox.linking.rules module

module contains general matching rules

diggrtoolbox.linking.rules.first_letter_rule(a, b)[source]

checks if first letters of strings :a: and :b: when the strings contain max. 1 word

diggrtoolbox.linking.rules.numbering_rule(a, b)[source]

Check two stings for number at the end or inbetween followed by a colon. If a number is found in both strings and if they do not match, return penalty value.

Module contents