[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2783258.2783402acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering

Published: 10 August 2015 Publication History

Abstract

Twitter is a "what's-happening-right-now" tool that enables interested parties to follow thoughts and commentary of individual users in nearly real-time. While it is a valuable source of information for real-time topic detection and tracking, Twitter data are not clean because of noisy messages and users, which significantly diminish the reliability of obtained results.
In this paper, we integrate both the extraction of meaningful topics and the filtering of messages over the Twitter stream. We develop a streaming algorithm for a sequence of document-frequency tables; our algorithm enables real-time monitoring of the top-10 topics from approximately 25% of all Twitter messages, while automatically filtering noisy and meaningless topics. We apply our proposed streaming algorithm to the Japanese Twitter stream and successfully demonstrate that, compared with other online nonnegative matrix factorization methods, our framework both tracks real-world events with high accuracy in terms of the perplexity and simultaneously eliminates irrelevant topics.

Supplementary Material

MP4 File (p417.mp4)

References

[1]
A. Agarwal and J. C. Duchi. The generalization ability of online algorithms for dependent data. IEEE Transactions on Information Theory, 59:573--587, 2013.
[2]
P. Anantharam, K. Thirunarayan, and A. Sheth. Topical anomaly detection from twitter stream. In WebSci, 2012.
[3]
S. Asur and B. A. Huberman. Predicting the future with social media. In WI-IAT, 2010.
[4]
W. Baumann. Mit technology review. http://goo.gl/2KvLV4, 2013.
[5]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.
[6]
L. Bottou. Stochastic learning. In Advanced Lectures on Machine Learning, pages 146--168. 2004.
[7]
S. S. Bucak, B. Gunsel, and O. Gursoy. Incremental non-negative matrix factorization for dynamic background modelling. In PRIS, 2007.
[8]
B. Cao, D. Shen, J.-T. Sun, X. Wang, Q. Yang, and Z. Chen. Detect and track latent factors with online nonnegative matrix factorization. In IJCAI, 2007.
[9]
M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter based on temporal and social terms evaluation. In MDMKDD, 2010.
[10]
Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. Who is tweeting on twitter: human, bot, or cyborg? In ACSAC, 2010.
[11]
A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Review, 51(4):661--703, 2009.
[12]
C. Ding, T. Li, and W. Peng. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput. Stat. Data Anal., 52(8):3913--3927, 2008.
[13]
DOMO, INC. Data never sleeps 2.0. http://goo.gl/293Bnq, 2014.
[14]
N. Guan, D. Tao, Z. Luo, and B. Yuan. Online nonnegative matrix factorization with robust stochastic approximation. IEEE Trans. Neural Netw. Learning Syst., 23(7):1087--1099, 2012.
[15]
L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In SOMA, 2010.
[16]
N. Kaji and M. Kitsuregawa. Efficient word lattice generation for joint word segmentation and pos tagging in japaneses. In IJCNLP, 2013.
[17]
J. Kunegis. Konect: the koblenz network collection. In WWW (Companion Volume), 2013.
[18]
J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. JMLR, 10:777--801, 2009.
[19]
J. Lau, N. Collier, and T. Baldwin. On-line trend analysis with topic models: #twitter trends detection topic model. In COLING, 2012.
[20]
D. D. Lee and H. S. Seung. Learning the parts of objects by nonnegative matrix factorization. Nature, 401:788--791, 1999.
[21]
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, 2000.
[22]
C.-J. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19(10):2756--2779, 2007.
[23]
R. Mehrotra, S. Sanner, W. Buntine, and L. Xie. Improving lda topic models for microblogs via tweet pooling and automatic labeling. In SIGIR, 2013.
[24]
M. Michelson and S. A. Macskassy. Discovering users' topics of interest on twitter: a first look. In AND, 2010.
[25]
A. Nedic and D. Bertsekas. Convergence rate of incremental subgradient algorithms. In Stochastic Optimization: Algorithms and Applications, pages 263--304. Kluwer, 2000.
[26]
B. O'Connor, M. Krieger, and D. Ahn. Tweetmotif: Exploratory search and topic summarization for twitter. In ICWSM, 2010.
[27]
A. Saha and V. Sindhwani. Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization. In WSDM, 2012.
[28]
T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW, 2010.
[29]
C. Shekar, S. Wakade, K. Liszka, and C.-C. Chan. Mining pharmaceutical spam from twitter. In ISDA, 2010.
[30]
J. Song, S. Lee, and J. Kim. Spam filtering in twitter using sender-receiver relationship. In RAID, 2011.
[31]
C. K. Vaca, A. Mantrach, A. Jaimes, and M. Saerens. A time-based collective factorization for topic discovery and monitoring in news. In WWW, 2014.
[32]
Q. H. Vuong. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57(2):307--333, 1989.
[33]
A. H. Wang. Don't follow me - spam detection in twitter. In SECRYPT, 2010.
[34]
D. Wang and H. Lu. On-line learning parts-based representation via incremental orthogonal projective non-negative matrix factorization. Signal Processing, 93(6):1608--1623, 2013.
[35]
F. Wang, P. Li, and A. C. König. Efficient document clustering via online nonnegative matrix factorizations. In WSDM, 2011.
[36]
Y. Wang, E. Agichtein, and M. Benzi. Tm-lda: efficient online modeling of latent topic transitions in social media. In KDD, 2012.
[37]
J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: finding topic-sensitive influential twitterers. In WSD, 2010.
[38]
Z. Xu, L. Ru, L. Xiang, and Q. Yang. Discovering user interest on twitter with a modified author-topic model. In WI-IAT, 2011.

Cited By

View all
  • (2022)Real-time event detection in social media streams through semantic analysis of noisy termsJournal of Big Data10.1186/s40537-022-00642-y9:1Online publication date: 12-Jul-2022
  • (2022)Multi-Task Learning Framework for Detecting Hashtag Hijack Attack in Mobile Social Networks2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS)10.1109/MASS56207.2022.00020(90-98)Online publication date: Oct-2022
  • (2018)Emerging Product Topics Prediction in Social Media without Social Structure InformationCompanion Proceedings of the The Web Conference 201810.1145/3184558.3191625(1661-1668)Online publication date: 23-Apr-2018
  • Show More Cited By

Index Terms

  1. Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2015
    2378 pages
    ISBN:9781450336642
    DOI:10.1145/2783258
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. noise filtering
    2. nonnegative matrix factorization
    3. streaming algorithm
    4. topic detection
    5. twitter

    Qualifiers

    • Research-article

    Conference

    KDD '15
    Sponsor:

    Acceptance Rates

    KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Real-time event detection in social media streams through semantic analysis of noisy termsJournal of Big Data10.1186/s40537-022-00642-y9:1Online publication date: 12-Jul-2022
    • (2022)Multi-Task Learning Framework for Detecting Hashtag Hijack Attack in Mobile Social Networks2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS)10.1109/MASS56207.2022.00020(90-98)Online publication date: Oct-2022
    • (2018)Emerging Product Topics Prediction in Social Media without Social Structure InformationCompanion Proceedings of the The Web Conference 201810.1145/3184558.3191625(1661-1668)Online publication date: 23-Apr-2018
    • (2018)A Comprehensive Study on Social Network Mental Disorders Detection via Online Social Media MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.278669530:7(1212-1225)Online publication date: 1-Jul-2018
    • (2018)Topic Detection with Danmaku: A Time-Sync Joint NMF ApproachDatabase and Expert Systems Applications10.1007/978-3-319-98812-2_39(428-435)Online publication date: 9-Aug-2018
    • (2017)Let's See Your DigitsProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/3097983.3098101(977-986)Online publication date: 13-Aug-2017
    • (2017)Scalable Twitter user clustering approach boosted by Personalized PageRankInternational Journal of Data Science and Analytics10.1007/s41060-017-0089-36:4(297-309)Online publication date: 29-Dec-2017
    • (2017)Detecting cooperative and organized spammer groups in micro-blogging communityData Mining and Knowledge Discovery10.1007/s10618-016-0479-531:3(573-605)Online publication date: 1-May-2017
    • (2017)Scalable Twitter User Clustering Approach Boosted by Personalized PageRankAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-57454-7_37(472-485)Online publication date: 23-Apr-2017
    • (2016)Expected tensor decomposition with stochastic gradient descentProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016167(1919-1925)Online publication date: 12-Feb-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media