research-article

A Visual Analytics Approach for Interactive Document Clustering

Authors:

Ehsan Sherkat,

Evangelos E. Milios,

Rosane MinghimAuthors Info & Claims

ACM Transactions on Interactive Intelligent Systems (TiiS), Volume 10, Issue 1

Article No.: 6, Pages 1 - 33

https://doi.org/10.1145/3241380

Published: 09 August 2019 Publication History

Get Access

Abstract

Document clustering is a necessary step in various analytical and automated activities. When guided by the user, algorithms are tailored to imprint a perspective on the clustering process that reflects the user’s understanding of the dataset. More than just allow for customized adjustment of the clusters, a visual analytics approach will provide tools for the user to draw new insights on the collection. While contributing his or her perspective, the user will also acquire a deeper understanding of the data set. To that effect, we propose a novel visual analytics system for interactive document clustering. We built our system on top of clustering algorithms that can adapt to user’s feedback. In the proposed system, initial clustering is created based on the user-defined number of clusters and the selected clustering algorithm. A set of coordinated visualizations allow the examination of the dataset and the results of the clustering. The visualization provides the user with the highlights of individual documents and understanding of the evolution of documents over the time period to which they relate. The users then interact with the process by means of changing key-terms that drive the process according to their knowledge of the documents domain. In key-term-based interaction, the user assigns a set of key-terms to each target cluster to guide the clustering algorithm. We have improved that process with a novel algorithm for choosing proper seeds for the clustering. Results demonstrate that not only the system has improved considerably its precision, but also its effectiveness in the document-based decision making. A set of quantitative experiments and a user study have been conducted to show the advantages of the approach for document analytics based on clustering. We performed and reported on the use of the framework in a real decision-making scenario that relates users discussion by email to decision making in improving patient care. Results show that the framework is useful even for more complex data sets such as email conversations.

References

[1]

Accessed: 2017-10-07. Mind Map file format Description. http://freemind.sourceforge.net.

Abstract

References

Cited By

Index Terms

Recommendations

Interactive clustering and high-recall information retrieval using language models

Interactive Document Clustering Revisited: A Visual Analytics Approach

A Visual Approach for Interactive Keyterm-Based Clustering

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations