8000 GitHub - shamork/TwinFinder: fuzzy data matching
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
< 8000 svg aria-hidden="true" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo-forked color-fg-muted mr-2"> / TwinFinder Public
forked from JohnnyBravo75/TwinFinder

fuzzy data matching

Notifications You must be signed in to change notification settings

shamork/TwinFinder

 
 

Repository files navigation

Library for fuzzy string matching. Can be used to find doublets or similar patterns in strings. Inspired from the original SimMetrics Library from Java around 2010, before a port to C# existed (but I never released it, up to now).

string metrics/distances

  • Levenshtein
  • DamerauLevenshtein
  • Jaccard
  • ExtendedJaccard
  • JaroWinkler
  • DiceCoefficent
  • Editex
  • ExtendedEditex
  • LongestCommonSubsequence
  • MongeElkan
  • NGramDistance
  • SmithWaterman

phonetic codecs

  • Soundex
  • DoubleMetaphone
  • Phonix
  • EditexKey
  • SimpleTextKey
  • DaitchMokotoff (seems to be buggy)

string tokenizer

  • NGramTokenizer
  • FirstNCharsTokenizer
  • WhiteSpaceTokenizer
  • WordTokenizer

algorithms

  • SortedNeigborhood
  • Blocking

Different aggregators and cost functions are also implemented.

License

MIT

About

fuzzy data matching

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 100.0%
0