A simple duplicate code checker based on Levenshtein distance (Editing Distance).
For the comparison between each pair of files S and T, the algorithm runs with a time complexity of
make clean
make
./dupcheck [dirname]
An example directory has been given as testdata/
. The command ./dupcheck testdata
will generate a similarity score matrix in result.txt
and another id-filename mapper in name.txt
.
- Winnowing
- consine similarity