8000 GitHub - mz-lisec/duplicate-checker: A simple duplicate files/codes checker.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

mz-lisec/duplicate-checker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Duplicate code checker

A simple duplicate code checker based on Levenshtein distance (Editing Distance).

For the comparison between each pair of files S and T, the algorithm runs with a time complexity of $O(|S| \times |T|)$ and a memory complexity of $O(|T|)$, where $|S|$ and $|T|$ denotes the lengths of two files respectively.

Usage

make clean
make
./dupcheck [dirname]

An example directory has been given as testdata/. The command ./dupcheck testdata will generate a similarity score matrix in result.txt and another id-filename mapper in name.txt.

TODO features

  • Winnowing
  • consine similarity

About

A simple duplicate files/codes checker.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0