[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1066677.1066924acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Topic activation analysis for document streams based on document arrival rate and relevance

Published: 13 March 2005 Publication History

Abstract

With the advance of network technology in recent years, the dissemination and exchange of massive documents has become commonplace. Accordingly, the importance of content analysis techniques is increasing. Topic analysis in large-scale document streams such as E-mails and news articles is an important research issue. This paper addresses techniques for "topic activation analysis" for document streams. For example, when news articles with a strong relationship to a given topic arrive frequently in a news stream, we can regard the activation level of the topic as high. In [1], Kleinberg proposed a method for analyzing document streams. Although the main objective of his method was to detect bursts of topics, it can also be used for topic activation analysis. His method, however, has a serious limitation in that it only looks at the arrival rate of documents and ignores the degree of relevance for each document. Another limitation is that his method is "batch-oriented." This paper first proposes a novel topic activation analysis scheme that incorporates both document arrival rate and relevance to address the first problem. It then presents an incremental scheme more appropriate for a document streaming environment. The proposed schemes are validated by experiments using real CNN news articles.

References

[1]
J. Kleinberg, "Bursty and Hierarchical Structure in Streams", Proc. ACM SIGKDD, 2002.]]
[2]
J. Allan, R. Papka, and V. Lavrenko, "On-line New Event Detection and Tracking," Proc. SIGIR Intl. Conf. Information Retrieval, 1998.]]
[3]
J. Allan, J. G. Carbonell, G. Doddington, J. Yamron, and Y. Yang, "Topic Detection and Tracking Pilot Study: Final Report," Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998.]]
[4]
F. Walls, H. Jin, S. Sista, and R. Schwartz, "Topic Detection in Broadcast News", Proc. DARPA Broadcast News Workshop, 1999.]]
[5]
J. M. Schultz and M. Liberman, "Topic Detection and Tracking using Idf-Weighted Cosine Coefficient", Proc. DARPA Broadcast News Workshop, 1999.]]
[6]
Y. Yang, T. Ault, T. Pierce, and C. W. Lattimer, "Improving Text Categorization Methods for Event Tracking," Proc. SIGIR Intl. Conf. Information Retrieval, 2000.]]
[7]
H. Li and K. Yamanishi, "Topic Analysis using Finite Mixture Model", Information Processing and Management, Vol. 39, 2003.]]
[8]
L. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. IEEE 77, 1989.]]
[9]
Y. Ishikawa, Y. Chen, and H. Kitagawa, "An On-Line Document Clustering Method Based on Forgetting Factors", Proc. 5th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2001), September 2001.]]
[10]
B. K. Yi, et al., "Online Data Mining for Co-Evolving Time Sequences", Proc. 16th International Conference on Data Engineering, 2000.]]
[11]
G. Salton, "Automatic Text Processing," Addison Wesley, 1989.]]
[12]
R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. "On the Bursty Evolution of Blogspace," Proc. The 12th International World Wide Web Conference, 2003.]]
[13]
T. Fujiki, T. Nanno, Y. Suzuki, and M. Okumura. "Identification of Bursts in a Document Stream," Proc. First International Workshop on Knowledge Discovery in Data Streams, 2004.]]

Cited By

View all
  • (2018)Indices of novelty for emerging topic detectionInformation Processing and Management: an International Journal10.1016/j.ipm.2011.07.00648:2(303-325)Online publication date: 29-Dec-2018
  • (2018)A Novelty-based Clustering Method for On-line DocumentsWorld Wide Web10.1007/s11280-007-0018-911:1(1-37)Online publication date: 25-Dec-2018
  • (2014)Causal Analysis for Supporting Users' Understanding of Investment TrustsProceedings of the 16th International Conference on Information Integration and Web-based Applications & Services10.1145/2684200.2684364(524-528)Online publication date: 4-Dec-2014
  • Show More Cited By

Index Terms

  1. Topic activation analysis for document streams based on document arrival rate and relevance

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '05: Proceedings of the 2005 ACM symposium on Applied computing
      March 2005
      1814 pages
      ISBN:1581139640
      DOI:10.1145/1066677
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 March 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. document stream
      2. topic activation analysis
      3. topic detection

      Qualifiers

      • Article

      Conference

      SAC05
      Sponsor:
      SAC05: The 2005 ACM Symposium on Applied Computing
      March 13 - 17, 2005
      New Mexico, Santa Fe

      Acceptance Rates

      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Upcoming Conference

      SAC '25
      The 40th ACM/SIGAPP Symposium on Applied Computing
      March 31 - April 4, 2025
      Catania , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Indices of novelty for emerging topic detectionInformation Processing and Management: an International Journal10.1016/j.ipm.2011.07.00648:2(303-325)Online publication date: 29-Dec-2018
      • (2018)A Novelty-based Clustering Method for On-line DocumentsWorld Wide Web10.1007/s11280-007-0018-911:1(1-37)Online publication date: 25-Dec-2018
      • (2014)Causal Analysis for Supporting Users' Understanding of Investment TrustsProceedings of the 16th International Conference on Information Integration and Web-based Applications & Services10.1145/2684200.2684364(524-528)Online publication date: 4-Dec-2014
      • (2014)Organizing Sightseeing Tweets Based on Content Relatedness and SharabilityWeb-Age Information Management10.1007/978-3-319-08010-9_57(510-521)Online publication date: 2014
      • (2012)Trip Tweets Search by Considering Spatio-temporal Continuity of User BehaviorDatabase and Expert Systems Applications10.1007/978-3-642-32597-7_13(141-155)Online publication date: 2012
      • (2010)Maximizing the reliability of two-state automaton for burst feature detection in news streams2010 IEEE International Conference on Progress in Informatics and Computing10.1109/PIC.2010.5687459(229-233)Online publication date: Dec-2010
      • (2009)Research intelligence involving information retrieval - An example of conferences and journalsExpert Systems with Applications: An International Journal10.1016/j.eswa.2009.03.01536:10(12151-12166)Online publication date: 1-Dec-2009
      • (2007)Feature Extraction from Microarray Expression Data by Integration of Semantic KnowledgeProceedings of the Sixth International Conference on Machine Learning and Applications10.1109/ICMLA.2007.49(606-611)Online publication date: 13-Dec-2007

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media