8000 GitHub - opensanctions/rigour: Data cleaning and validation functions for names, languages, identifiers, etc.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Data cleaning and validation functions for names, languages, identifiers, etc.

License

Notifications You must be signed in to change notification settings

opensanctions/rigour

Repository files navigation

rigour

Data cleaning and validation functions for processing various types of text emanating and describing the business world. This applies to human and company names, language, territory and country codes, corporate and tax identifiers, etc.

The underlying idea is that handling these sorts of descriptors is easy on first glance, but reveals a dizzying set of complexity when carried into production. This is why rigour consolidates implementations that have already met some edge cases and are well-tested.

Installing rigour

You can grab the latest release from PyPI:

pip install -U rigour

Usage & documentation

See: https://opensanctions.github.io/rigour/

Acknowledgements

The address formatting database contained in rigour/data/addresses/formats.yml is derived from worldwide.yml in the OpenCageData address-formatting repository. It is used to format addresses according to customs in the country that is been encoded.

rigour consolidates and includes a set of older Python libraries into a single codebase: languagecodes, pantomime, fingerprints. The development of these libraries was funded by OCCRP as part of the Aleph software project.

License

MIT. See LICENSE.

About

Data cleaning and validation functions for names, languages, identifiers, etc.

Resources

License

Stars

Watchers

Forks

Packages

No packages published
0