Abstract
A cluster labeling algorithm for creating generic titles based on external resources such as WordNet is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Noyons, E.C.M., Van Raan, A.F.J.: Advanced Mapping of Science and Technology. Scientometrics 41, 61–67 (1998)
The 8th Science and Technology Foresight Survey - Study on Rapidly-developing Research Areas - Interim Report, Science and Technology Foresight Center, National Institute of Science & Technology Policy, Japan (2004)
Uchida, H., Mano, A., Yukawa, T.: Patent Map Generation Using Concept-Based Vector Space Model. In: Proceedings of the Fourth NTCIR Workshop on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Summarization, Tokyo, Japan, June 2-4 (2004)
Glenisson, P., Glänzel, W., Janssens, F., De Moor, B.: Combining Full Text and Bibliometric Information in Mapping Scientific Disciplines. Information Processing & Management 41(6), 1548–1572 (2005)
Lai, K.-K., Wu, S.-J.: Using the Patent Co-citation Approach to Establish a New Patent Classification System. Information Processing & Management 41(2), 313–330 (2005)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: A cluster-based approach to browsing large document collections. In: Proceedings of the 15th ACM-SIGIR Conference, pp. 318–329 (1992)
Hearst, M.A., Pedersen, J.O.: Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In: Proceedings of the 19th ACM-SIGIR Conference, pp. 76–84 (1996)
Yang, Y., Ault, T., Pierce, T., Lattimer, C.W.: Improving Text Categorization Methods for Event Tracking. In: Proceedings of the 23rd ACM-SIGIR Conference, pp. 65–72 (2000)
Sahami, M., Yusufali, S., Baldonaldo, M.Q.W.: SONIA: A Service for Organizing Networked Information Autonomously. In: Proceedings of the 3rd ACM Conference on Digital Libraries, pp. 200–209 (1998)
Lagus, K., Kaski, S., Kohonen, T.: Mining Massive Document Collections by the WEBSOM Method. Information Sciences 163(1-3), 135–156 (2004)
Swan, R., Allan, J.: Automatic Generation of Overview Timelines. In: Proceedings of the 23rd ACM-SIGIR Conference, pp. 49–56 (2000)
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 21st ACM-SIGIR Conference, pp. 46–54 (1998)
Document Understanding Conferences, http://www-nlpir.nist.gov/projects/duc/
Banko, M., Mittal, V.O., Witbrock, M.J.: Headline Generation Based on Statistical Translation. In: ACL 2000 (2000)
Kennedy, P.E., Hauptmann, A.G.: Automatic title generation for EM. In: Proceedings of the 5th ACM Conference on Digital Libraries (2000)
Yang, Y., Pedersen, J.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the International Conference on Machine Learning (ICML 1997), pp. 412–420 (1997)
Ng, H.T., Goh, W.B., Low, K.L.: Feature Selection, Perception Learning, and a Usability Case Study for Text Categorization. In: Proceedings of the 20th ACM-SIGIR Conference, pp. 67–73 (1997)
Feldman, R., Dagan, I., Hirsh, H.: Mining Text Using Keyword Distributions. Journal of Intelligent Information Systems 10(3), 281–300 (1998)
WordNet: a lexical database for the English language, Cognitive Science Laboratory Princeton University, http://wordnet.princeton.edu/
United States Patent and Trademark Office, http://www.uspto.gov/
Yang, Y., Liu, X.: A Re-Examination of Text Categorization Methods. In: Proceedings of the 22nd ACM-SIGIR Conference, pp. 42–49 (1999)
Tseng, Y.-H., Juang, D.-W., Wang, Y.-M., Lin, C.-J.: Text Mining for Patent Map Analysis. In: Proceedings of IACIS Pacific 2005 Conference, Taipei, Taiwan, May 19-21, pp. 1109–1116 (2005)
Tseng, Y.-H.: Automatic Thesaurus Generation for Chinese Documents. Journal of the American Society for Information Science and Technology 53(13), 1130–1138 (2002)
Information Mapping Project, Computational Semantics Laboratory, Standford University, http://infomap.stanford.edu/
Bekkerman, R., El-Yaniv, R., Winter, Y., Tishby, N.: On Feature Distributional Clustering for Text Categorization. In: Proceedings of the 24th ACM-SIGIR Conference, pp. 146–153 (2001)
Dagan, I., Feldman, R.: Keyword-based browsing and analysis of large document sets. In: Proceedings of the Symposium on Document Analysis and Information Retrieval (SDAIR 1996), Las Vegas, Nevada (1996)
Kruskal, J.B.: Multidimensional Scaling and Other Methods for Discovering Structure. In: Enslein, K., Ralston, A., Wilf, H.S. (eds.) Statistical Methods for Digital Computers, pp. 296–339. Wiley, New York (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tseng, YH., Lin, CJ., Chen, HH., Lin, YI. (2006). Toward Generic Title Generation for Clustered Documents. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_12
Download citation
DOI: https://doi.org/10.1007/11880592_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)