A fusion of algorithms in near duplicate document detection
Abstract
References
Index Terms
- A fusion of algorithms in near duplicate document detection
Recommendations
Online duplicate document detection: signature reliability in a dynamic retrieval environment
CIKM '03: Proceedings of the twelfth international conference on Information and knowledge managementAs online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retrieve search results consisting of sets of duplicate documents, whether ...
Near duplicate detection in an academic digital library
DocEng '13: Proceedings of the 2013 ACM symposium on Document engineeringThe detection and potential removal of duplicates is desirable for a number of reasons, such as to reduce the need for unnecessary storage and computation, and to provide users with uncluttered search results. This paper describes an investigation into ...
Constructing a text corpus for inexact duplicate detection
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalAs online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work is to facilitate (a) investigations into the phenomenon of near duplicates ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Sponsors
- amazon: amazon
- Sugon: Sugon
- MySQL: MySQL
- ORACLE: ORACLE
- Lenovo: Lenovo
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
Author Tags
Qualifiers
- Article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0