Abstract
When searching a query in the microblogging, a user would typically receive an archive of tweets as part of a retrospective piece on the impact of social media. For ease of understanding the retrieved tweets, it is useful to produce a summarized timeline about a given topic. However, tweet timeline generation is quite challenging due to the noisy and temporal characteristics of microblogs. In this paper, we propose a graph-based dynamic greedy clustering approach, which considers the coverage, relevance and novelty of the tweet timeline. First, tweet embedding representation is learned in order to construct the tweet semantic graph. Based on the graph, we estimate the coverage of timeline according to the graph connectivity. Furthermore, we integrate a noise tweet elimination component to remove noisy tweets with the lexical and semantic features based on relevance and novelty. Experimental results on public Text Retrieval Conference (TREC) Twitter corpora demonstrate the effectiveness of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, M.K., Ramamritham, K., Bhide, M.: Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments. Proc. VLDB Endowment 5(10), 980–991 (2012)
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 5–14. ACM (2009)
Albakour, M., Macdonald, C., Ounis, I., et al.: On sparsity and drift for effective real-time filtering in microblogs. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 419–428. ACM (2013)
Aslam, J.A., Pelekhov, E., Rus, D.: The star clustering algorithm for static and dynamic information organization. J. Graph Algorithms Appl. 8, 95–129 (2004)
Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Comput. Linguistics 39(3), 709–754 (2013)
Joachims, T.: Optimizing search engines using clickthrough data. In: KDD, pp. 133–142 (2002)
Lappas, T., Arai, B., Platakis, M., Kotsakos, D., Gunopulos, D.: On burstiness-aware search for document sequences. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 477–486. ACM (2009)
Lee, P., Lakshmanan, L.V., Milios, E.E.: Incremental cluster evolution tracking from highly dynamic network data. In: IEEE 30th International Conference on Data Engineering (ICDE), 2014, pp. 3–14. IEEE (2014)
Lin, C., Lin, C., Li, J., Wang, D., Chen, Y., Li, T.: Generating event storylines from microblogs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 175–184. ACM (2012)
Lin, J., Efron, M.: Overview of the TREC-2013 Microblog Track. In: TREC 2013 (2013)
Lin, J., Efron, M.: Overview of the TREC-2014 Microblog Track. In: TREC 2014 (2014)
Lv, C., Fan, F., Qiang, R., Fei, Y., Yang, J.: PKUICST at TREC 2014 Microblog Track: Feature Extraction for Effective Microblog Search and Adaptive Clustering Algorithms for TTG (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the TREC-2011 Microblog Track. In: TREC 2011 (2012)
Walid, M., Wei, G., Tarek, E.: QCRI at TREC 2014: Applying the KISS Principle for TTG Task in the Microblog Track (2014)
Wang, D., Li, T., Ogihara, M.: Generating pictorial storylines via minimum-weight connected dominating set approximation in multi-view graphs. In: AAAI (2012)
Wang, X., Zhai, C.: Learn from web search logs to organize search results. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 87–94. ACM (2007)
Xu, T., McNamee, P., Oard, D.W.: HLTCOE at TREC 2014: Microblog and Clinical Decision Support (2014)
Zhai, C., Cohen, W.W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 10–17. ACM (2003)
Zhang, Y.: Using bayesian priors to combine classifiers for adaptive filtering. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 345–352. ACM (2004)
Zhou, W., Shen, C., Li, T., Chen, S., Xie, N., Wei, J.: Generating textual storyline to improve situation awareness in disaster management. In. In Proceedings of the 15th IEEE International Conference on Information Reuse and Integration (IRI 2014) (2014)
Acknowledgments
The work reported in this paper is supported by the National Natural Science Foundation of China Grant 61370116. We thank anonymous reviewers for their beneficial comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Fan, F., Qiang, R., Lv, C., Zhao, W.X., Yang, J. (2015). Tweet Timeline Generation via Graph-Based Dynamic Greedy Clustering. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-28940-3_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28939-7
Online ISBN: 978-3-319-28940-3
eBook Packages: Computer ScienceComputer Science (R0)