[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Paper
16 January 2006 Document clustering: applications in a collaborative digital library
Fuad Rahman, Aman Kumar, Yuilya Tarnikova, Hassan Alam
Author Affiliations +
Proceedings Volume 6067, Document Recognition and Retrieval XIII; 60670K (2006) https://doi.org/10.1117/12.650161
Event: Electronic Imaging 2006, 2006, San Jose, California, United States
Abstract
This paper introduces a document clustering method within a commercial document repository, FileShare(R). FileShare(R) is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft(R) Internet Explorer(R), Netscape(R) or Opera(R)) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare(R) repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Fuad Rahman, Aman Kumar, Yuilya Tarnikova, and Hassan Alam "Document clustering: applications in a collaborative digital library", Proc. SPIE 6067, Document Recognition and Retrieval XIII, 60670K (16 January 2006); https://doi.org/10.1117/12.650161
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Digital libraries

Databases

Distance measurement

Internet

Genetic algorithms

Human-machine interfaces

Visualization

RELATED CONTENT


Back to Top