[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Toward Generic Title Generation for Clustered Documents

  • Conference paper
Information Retrieval Technology (AIRS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4182))

Included in the following conference series:

Abstract

A cluster labeling algorithm for creating generic titles based on external resources such as WordNet is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Noyons, E.C.M., Van Raan, A.F.J.: Advanced Mapping of Science and Technology. Scientometrics 41, 61–67 (1998)

    Article  Google Scholar 

  2. The 8th Science and Technology Foresight Survey - Study on Rapidly-developing Research Areas - Interim Report, Science and Technology Foresight Center, National Institute of Science & Technology Policy, Japan (2004)

    Google Scholar 

  3. Uchida, H., Mano, A., Yukawa, T.: Patent Map Generation Using Concept-Based Vector Space Model. In: Proceedings of the Fourth NTCIR Workshop on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Summarization, Tokyo, Japan, June 2-4 (2004)

    Google Scholar 

  4. Glenisson, P., Glänzel, W., Janssens, F., De Moor, B.: Combining Full Text and Bibliometric Information in Mapping Scientific Disciplines. Information Processing & Management 41(6), 1548–1572 (2005)

    Article  Google Scholar 

  5. Lai, K.-K., Wu, S.-J.: Using the Patent Co-citation Approach to Establish a New Patent Classification System. Information Processing & Management 41(2), 313–330 (2005)

    Article  Google Scholar 

  6. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: A cluster-based approach to browsing large document collections. In: Proceedings of the 15th ACM-SIGIR Conference, pp. 318–329 (1992)

    Google Scholar 

  7. Hearst, M.A., Pedersen, J.O.: Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In: Proceedings of the 19th ACM-SIGIR Conference, pp. 76–84 (1996)

    Google Scholar 

  8. Yang, Y., Ault, T., Pierce, T., Lattimer, C.W.: Improving Text Categorization Methods for Event Tracking. In: Proceedings of the 23rd ACM-SIGIR Conference, pp. 65–72 (2000)

    Google Scholar 

  9. Sahami, M., Yusufali, S., Baldonaldo, M.Q.W.: SONIA: A Service for Organizing Networked Information Autonomously. In: Proceedings of the 3rd ACM Conference on Digital Libraries, pp. 200–209 (1998)

    Google Scholar 

  10. Lagus, K., Kaski, S., Kohonen, T.: Mining Massive Document Collections by the WEBSOM Method. Information Sciences 163(1-3), 135–156 (2004)

    Article  Google Scholar 

  11. Swan, R., Allan, J.: Automatic Generation of Overview Timelines. In: Proceedings of the 23rd ACM-SIGIR Conference, pp. 49–56 (2000)

    Google Scholar 

  12. Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 21st ACM-SIGIR Conference, pp. 46–54 (1998)

    Google Scholar 

  13. Document Understanding Conferences, http://www-nlpir.nist.gov/projects/duc/

  14. Banko, M., Mittal, V.O., Witbrock, M.J.: Headline Generation Based on Statistical Translation. In: ACL 2000 (2000)

    Google Scholar 

  15. Kennedy, P.E., Hauptmann, A.G.: Automatic title generation for EM. In: Proceedings of the 5th ACM Conference on Digital Libraries (2000)

    Google Scholar 

  16. Yang, Y., Pedersen, J.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the International Conference on Machine Learning (ICML 1997), pp. 412–420 (1997)

    Google Scholar 

  17. Ng, H.T., Goh, W.B., Low, K.L.: Feature Selection, Perception Learning, and a Usability Case Study for Text Categorization. In: Proceedings of the 20th ACM-SIGIR Conference, pp. 67–73 (1997)

    Google Scholar 

  18. Feldman, R., Dagan, I., Hirsh, H.: Mining Text Using Keyword Distributions. Journal of Intelligent Information Systems 10(3), 281–300 (1998)

    Article  Google Scholar 

  19. WordNet: a lexical database for the English language, Cognitive Science Laboratory Princeton University, http://wordnet.princeton.edu/

  20. United States Patent and Trademark Office, http://www.uspto.gov/

  21. Yang, Y., Liu, X.: A Re-Examination of Text Categorization Methods. In: Proceedings of the 22nd ACM-SIGIR Conference, pp. 42–49 (1999)

    Google Scholar 

  22. Tseng, Y.-H., Juang, D.-W., Wang, Y.-M., Lin, C.-J.: Text Mining for Patent Map Analysis. In: Proceedings of IACIS Pacific 2005 Conference, Taipei, Taiwan, May 19-21, pp. 1109–1116 (2005)

    Google Scholar 

  23. Tseng, Y.-H.: Automatic Thesaurus Generation for Chinese Documents. Journal of the American Society for Information Science and Technology 53(13), 1130–1138 (2002)

    Article  Google Scholar 

  24. Information Mapping Project, Computational Semantics Laboratory, Standford University, http://infomap.stanford.edu/

  25. Bekkerman, R., El-Yaniv, R., Winter, Y., Tishby, N.: On Feature Distributional Clustering for Text Categorization. In: Proceedings of the 24th ACM-SIGIR Conference, pp. 146–153 (2001)

    Google Scholar 

  26. Dagan, I., Feldman, R.: Keyword-based browsing and analysis of large document sets. In: Proceedings of the Symposium on Document Analysis and Information Retrieval (SDAIR 1996), Las Vegas, Nevada (1996)

    Google Scholar 

  27. Kruskal, J.B.: Multidimensional Scaling and Other Methods for Discovering Structure. In: Enslein, K., Ralston, A., Wilf, H.S. (eds.) Statistical Methods for Digital Computers, pp. 296–339. Wiley, New York (1977)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tseng, YH., Lin, CJ., Chen, HH., Lin, YI. (2006). Toward Generic Title Generation for Clustered Documents. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_12

Download citation

  • DOI: https://doi.org/10.1007/11880592_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45780-0

  • Online ISBN: 978-3-540-46237-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics