8000 GitHub - silence2014/simmetrics: Similarity or Distance Metrics, e.g. Levenshtein, for Java
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

silence2014/simmetrics

 
 

Repository files navigation

Maven Central Build Status Coverage Status

SimMetrics

A Java library of similarity and distance metrics e.g. Levenshtein distance and Cosine similarity. All similarity metrics return normalized values rather than unbounded similarity scores. Distance metrics return non-negative unbounded scores.

Usage

For a quick and easy use StringMetrics and StringDistances contain a collection of well known similarity and distance metrics.

	String str1 = "This is a sentence. It is made of words";
	String str2 = "This sentence is similar. It has almost the same words";
	
	StringMetric metric = StringMetrics.cosineSimilarity();
	
	float result = metric.compare(str1, str2); //0.4767

The StringMetricBuilder and StringDistanceBuilder are convenience tools to build string similarity and distance metrics. Any class implementing Metric or Distance respectively can be used to build a metric. The builders support simplification, tokenization, token-filtering, token-transformation, and caching. For usage see the examples section.

For a terse syntax use import static org.simmetrics.builders.StringMetricBuilder.with;

	String str1 = "This is a sentence. It is made of words";
	String str2 = "This sentence is similar. It has almost the same words";

	StringMetric metric =
			with(new CosineSimilarity<String>())
			.simplify(Simplifiers.toLowerCase(Locale.ENGLISH))
			.simplify(Simplifiers.replaceNonWord())
			.tokenize(Tokenizers.whitespace())
			.build();

	float result = metric.compare(str1, str2); //0.5720

About

Similarity or Distance Metrics, e.g. Levenshtein, for Java

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%
0