[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3397271.3401306acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

How UMass-FSD Inadvertently Leverages Temporal Bias

Published: 25 July 2020 Publication History

Abstract

First Story Detection describes the task of identifying new events in a stream of documents. The UMass-FSD system is known for its strong performance in First Story Detection competitions. Recently, it has been frequently used as a high accuracy baseline in research publications. We are the first to discover that UMass-FSD inadvertently leverages temporal bias. Interestingly, the discovered bias contrasts previously known biases and performs significantly better. Our analysis reveals an increased contribution of temporally distant documents, resulting from an unusual way of handling incremental term statistics. We show that this form of temporal bias is also applicable to other well-known First Story Detection systems, where it improves the detection accuracy. To provide a more generalizable conclusion and demonstrate that the observed bias is not only an artefact of a particular implementation, we present a model that intentionally leverages a bias on temporal distance. Our model significantly improves the detection effectiveness of state-of-the-art First Story Detection systems.

References

[1]
James Allan (Ed.). 2002. Topic Detection and Tracking: Event-based Information Organization .Kluwer Academic Publishers, Norwell, MA, USA.
[2]
James Allan, Jaime Carbonell, George Doddington, Jonathan Yamron, Yiming Yang, James Allan, Brian Archibald, Doug Beeferman, Adam Berger, Ralf Brown, Ira Carp, George Doddington, Alex Hauptmann, John Lafferty, Victor Lavrenko, Xin Liu, Steve Lowe, Paul Van Mulbregt, Ron Papka, Thomas Pierce, Jay Ponte, and Mike Scudder. 1998. Topic Detection and Tracking Pilot Study Final Report. In In Proceedings of the DARPA Broadcast News Transcription and Understanding .
[3]
James Allan, Victor Lavrenko, and Hubert Jin. 2000 a. First story detection in TDT is hard. conference on information and knowledge management (2000), 374--381.
[4]
James Allan, Victor Lavrenko, Daniella Malin, and Russell Swan. 2000 b. Detections, Bounds, and Timelines: UMass and TDT-3. In In Proceedings of Topic Detection and Tracking Workshop (TDT-3) .
[5]
Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. 1992. A Training Algorithm for Optimal Margin Classifiers. (1992), 144--152.
[6]
Jonathan G Fiscus. 2004. Results of the 2003 Topic Detection and Tracking Evaluation. (2004).
[7]
Jeyakumar Kannan, Ar Md Shanavas, and Sridhar Swaminathan. 2018. Real Time Event Detection Adopting Incremental TF-IDF based LSH and Event Summary Generation. International Journal of Computer Applications, Vol. 180, 13 (2018), 22--30.
[8]
Sean Moran, Richard Mccreadie, Craig Macdonald, and Iadh Ounis. 2016. Enhancing First Story Detection using Word Embeddings. international acm SIGIR conference on research and development in information retrieval (2016), 821--824.
[9]
Sasa Petrovic, Miles Osborne, and Victor Lavrenko. 2010. Streaming First Story Detection with application to Twitter. (2010), 181--189.
[10]
Sasa Petrovic, Miles Osborne, and Victor Lavrenko. 2012. Using paraphrases for improving first story detection in news and Twitter. (2012), 338--346.
[11]
Yumeng Qin, Dominik Wurzer, Victor Lavrenko, and Cunchen Tang. 2017. Counteracting Novelty Decay in First Story Detection. ECIR 2017: Advances in Information Retrieval pp 555--560 (2017).
[12]
Michael Wong, Wojciech Ziarko, and Patrick Wong. 1985. Generalized vector spaces model in information retrieval. (1985), 18--25.
[13]
Dominik Wurzer, Victor Lavrenko, and Miles Osborne. 2015. Twitter-scale New Event Detection via K-term Hashing. In Proceedings of the EMNLP (2015).
[14]
Dominik Wurzer and Yumeng Qin. 2018. Parameterizing Kterm Hashing. international acm SIGIR conference on research and development in information retrieval (2018), 945--948.
[15]
Yiming Yang, Tom Pierce, and Jaime Carbonell. 1998. A study on retrospective and on-line event detection. In In Proc. of the SIGIR Conference on Research and Development in Information Retrieval .

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2020
2548 pages
ISBN:9781450380164
DOI:10.1145/3397271
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LSH-FSD
  2. UMass-FSD
  3. first story detection
  4. temporal bias
  5. topic detection and tracking

Qualifiers

  • Short-paper

Conference

SIGIR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 65
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media