[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2247596.2247636acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

See what's enBlogue: real-time emergent topic identification in social media

Published: 27 March 2012 Publication History

Abstract

With the increasing popularity of Web 2.0 streams, people become overwhelmed by the available information. This is partly countered by tagging blog posts and tweets, so that users can filter messages according to their tags. However, this is insufficient for detecting newly emerging topics that are not reflected by a single tag but are rather expressed by unusual tag combinations. This paper presents enBlogue, an approach for automatically detecting such emergent topics. EnBlogue uses a time-sliding window to compute statistics about tags and tag-pairs. These statistics are then used to identify unusual shifts in correlations, most of the time caused by real-world events. We analyze the strength of these shifts and measure the degree of unpredictability they include, used to rank tag-pairs expressing emergent topics. Additionally, this "indicator of surprise" is carried over to subsequent time points, as user interests do not abruptly vanish from one moment to the other. To avoid monitoring all tag-pairs we can also select a subset of tags, e. g., the most popular or volatile of them, to be used as seed-tags for subsequent pair-wise correlation computations. The system is fully implemented and publicly available on the Web, processing live Twitter data. We present experimental studies based on real world datasets demonstrating both the prediction quality by means of a user study and the efficiency of enBlogue.

References

[1]
J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In SIGIR, pages 37--45, 1998.
[2]
F. Alvanaki, S. Michel, K. Ramamritham, and G. Weikum. Enblogue - emergent topic detection in web 2.0 streams. In SIGMOD Conference, 2011. http://qid3.mmci.uni-saarland.de/sigmod2011.pdf.
[3]
APE -- Ajax Push Engine. http://www.ape-project.org/.
[4]
R. Balasubramanyan, F. Lin, W. W. Cohen, M. Hurst, and N. A. Smith. From episodes to sagas: Understanding the news by identifying temporally related story sequences. In ICWSM, 2009.
[5]
N. Bansal and N. Koudas. Blogscope: A system for online analysis of high volume text streams. In VLDB, pages 1410--1413, 2007.
[6]
H. Becker, M. Naaman, and L. Gravano. Event identification in social media. In WebDB, 2009.
[7]
H. Becker, M. Naaman, and L. Gravano. Learning similarity metrics for event identification in social media. In WSDM, pages 291--300, 2010.
[8]
G. E. P. Box, W. G. Hunte, and J. S. Hunter. Statistics for experimenters: an introduction to design, data analysis, and model building. John Wiley & Sons, 1978.
[9]
C. Budak, D. Agrawal, and A. E. Abbadi. Structural trend analysis for online social networks. PVLDB, 4(10):646--656, 2011.
[10]
K. Burton, A. Java, and I. Soboroff. The ICWSM 2009 Spinn3r dataset. In ICWSM, 2009.
[11]
M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of the Tenth International Workshop on Multimedia Data Mining, MDMKDD '10, pages 4:1--4:10, New York, NY, USA, 2010. ACM.
[12]
G. Cormode and M. Hadjieleftheriou. Finding the frequent items in streams of data. Commun. ACM, 52(10):97--105, 2009.
[13]
M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and A. Tomkins. Visualizing tags over time. TWEB, 1(2), 2007.
[14]
M. N. Garofalakis. Distributed data streams. In Encyclopedia of Database Systems, pages 883--890. 2009.
[15]
M. N. Grinev, M. P. Grineva, A. Boldakov, L. Novak, A. Syssoev, and D. Lizorkin. Sifting micro-blogging stream for events of user interest. In SIGIR, page 837, 2009.
[16]
J. Han and B. Ding. Stream mining. In Encyclopedia of Database Systems, pages 2831--2834. 2009.
[17]
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.
[18]
N. Kawamae. Trend analysis model: trend consists of temporal words, topics, and timestamps. In WSDM, pages 317--326, 2011.
[19]
S. Khy, Y. Ishikawa, and H. Kitagawa. Novelty-based incremental document clustering for on-line documents. In ICDE Workshops, page 40, 2006.
[20]
G. Kumaran and J. Allan. Text classification and named entities for new event detection. In SIGIR, pages 297--304, 2004.
[21]
S. Lallich, O. Teytaud, and E. Prudhomme. Association rule interestingness: Measure and statistical validation. In Quality Measures in Data Mining, pages 251--275. 2007.
[22]
J. Leskovec, L. Backstrom, and J. M. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD, pages 497--506, 2009.
[23]
Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. In SIGIR, pages 106--113, 2005.
[24]
J. Ma and S. Perkins. Online novelty detection on temporal sequences. In KDD, pages 613--618, 2003.
[25]
C. D. Manning, P. Raghavan, and H. SchÃijtze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[26]
M. Mathioudakis and N. Koudas. Twittermonitor: trend detection over the twitter stream. In SIGMOD Conference, pages 1155--1158, 2010.
[27]
A. D. Sarma, A. Jain, and C. Yu. Dynamic relationship and event discovery. In WSDM, pages 207--216, 2011.
[28]
H. Sayyadi, M. Hurst, and A. Maykov. Event detection and tracking in social streams. In ICWSM, 2009.
[29]
M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In CIKM, pages 623--632, 2007.
[30]
P. -N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In KDD, pages 32--41, 2002.
[31]
Y. Yang, T. Pierce, and J. G. Carbonell. A study of retrospective and on-line event detection. In SIGIR, pages 28--36, 1998.
[32]
Q. Zhao and P. Mitra. Event detection and visualization for social text streams. In ICWSM, 2007.
[33]
Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB, pages 358--369, 2002.

Cited By

View all
  • (2022)Outlier and Trend Detection Using Approximate Median and Median Absolute Deviation2022 5th International Conference on Computational Intelligence and Networks (CINE)10.1109/CINE56307.2022.10037489(01-06)Online publication date: 1-Dec-2022
  • (2022)Extreme events management using multimedia social networksFuture Generation Computer Systems10.1016/j.future.2018.11.03594:C(444-452)Online publication date: 18-Apr-2022
  • (2021)Hot topic detection and evaluation of multi-relation effectsProceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1145/3487351.3490972(416-422)Online publication date: 8-Nov-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology
March 2012
643 pages
ISBN:9781450307901
DOI:10.1145/2247596
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. emergent topics
  2. web 2.0 streams

Qualifiers

  • Research-article

Conference

EDBT '12

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Outlier and Trend Detection Using Approximate Median and Median Absolute Deviation2022 5th International Conference on Computational Intelligence and Networks (CINE)10.1109/CINE56307.2022.10037489(01-06)Online publication date: 1-Dec-2022
  • (2022)Extreme events management using multimedia social networksFuture Generation Computer Systems10.1016/j.future.2018.11.03594:C(444-452)Online publication date: 18-Apr-2022
  • (2021)Hot topic detection and evaluation of multi-relation effectsProceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1145/3487351.3490972(416-422)Online publication date: 8-Nov-2021
  • (2020)Microblog topic identification using Linked Open DataPLOS ONE10.1371/journal.pone.023686315:8(e0236863)Online publication date: 11-Aug-2020
  • (2020)Topic Detection for Online Course Feedback Using LDAEmerging Technologies for Education10.1007/978-3-030-38778-5_16(133-142)Online publication date: 15-Feb-2020
  • (2019)Towards Reproducible Research of Event Detection Techniques for Twitter2019 6th Swiss Conference on Data Science (SDS)10.1109/SDS.2019.000-5(69-74)Online publication date: Jun-2019
  • (2018)Towards Detecting Social Events by Mining Geographical Patterns with VGI DataISPRS International Journal of Geo-Information10.3390/ijgi71204817:12(481)Online publication date: 17-Dec-2018
  • (2018)Exploring Entity-centric Networks in Entangled News StreamsCompanion Proceedings of the The Web Conference 201810.1145/3184558.3188726(555-563)Online publication date: 23-Apr-2018
  • (2018)Social Network Monitoring for Bursty Cascade DetectionACM Transactions on Knowledge Discovery from Data10.1145/317804812:4(1-24)Online publication date: 16-Apr-2018
  • (2018)The Method for Prediction the Distribution of Information in Social Networks Based on the AttributesDigital Transformation and Global Society10.1007/978-3-030-02843-5_42(503-515)Online publication date: 10-Nov-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media