[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

The Impacts of Structural Difference and Temporality of Tweets on Retrieval Effectiveness

Published: 01 November 2013 Publication History

Abstract

To explore the information seeking behaviors in microblogosphere, the microblog track at TREC 2011 introduced a real-time ad-hoc retrieval task that aims at ranking relevant tweets in reverse-chronological order. We study this problem via a two-phase approach: 1) retrieving tweets in an ad-hoc way; 2) utilizing the temporal information of tweets to enhance the retrieval effectiveness of tweets. Tweets can be categorized into two types. One type consists of short messages not containing any URL of a Web page. The other type has at least one URL of a Web page in addition to a short message. These two types of tweets have different structures. In the first phase, to address the structural difference of tweets, we propose a method to rank tweets using the divide-and-conquer strategy. Specifically, we first rank the two types of tweets separately. This produces two rankings, one for each type. Then we merge these two rankings of tweets into one ranking. In the second phase, we first categorize queries into several types by exploring the temporal distributions of their top-retrieved tweets from the first phase; then we calculate the time-related relevance scores of tweets according to the classified types of queries; finally we combine the time scores with the IR scores from the first phase to produce a ranking of tweets. Experimental results achieved by using the TREC 2011 and TREC 2012 queries over the TREC Tweets2011 collection show that: (i) our way of ranking the two types of tweets separately and then merging them together yields better retrieval effectiveness than ranking them simultaneously; (ii) our way of incorporating temporal information into the retrieval process yields further improvements, and (iii) our method compares favorably with state-of-the-art methods in retrieval effectiveness.

References

[1]
Ailon, N., Charikar, M., and Newman, A. 2008. Aggregating inconsistent information: Ranking and clustering. J. ACM 55, 5, 23:1--23:27.
[2]
Amati, G., Amodeo, G., and Gaibisso, C. 2012. Survival analysis for freshness in microblogging search. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 2483--2486.
[3]
Amodeo, G., Amati, G., and Gambosi, G. 2011. On relevance, time and query expansion. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, 1973--1976.
[4]
Berberich, K., Bedathur, S., Alonso, O., and Weikum, G. 2010. A language modeling approach for temporal information needs. In Proceedings of the 32nd European conference on Advances in Information Retrieval (ECIR’10). 13--25.
[5]
Bian, J., Li, X., Li, F., Zheng, Z., and Zha, H. 2010. Ranking specialization for web search: A divide-and-conquer approach by using topical ranksvm. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 131--140.
[6]
Choi, J. and Croft, W. B. 2012. Temporal models for microblogs. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 2491--2494.
[7]
Choi, J., Croft, W. B., and Kim, J. Y. 2012. Quality models for microblog retrieval. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 1834--1838.
[8]
Cohen, W. W., Schapire, R. E., and Singer, Y. 1998. Learning to order things. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’97). 451--457.
[9]
Dai, N. and Davison, B. D. 2010. Freshness matters: In flowers, food, and web authority. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 114--121.
[10]
Dai, N., Shokouhi, M., and Davison, B. D. 2011. Learning to rank for freshness and relevance. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 95--104.
[11]
Dakka, W., Gravano, L., and Ipeirotis, P. G. 2012. Answering general time-sensitive queries. IEEE Trans. Knowl. Data Eng. 24, 220--235.
[12]
Dong, A., Chang, Y., Zheng, Z., Mishne, G., Bai, J., Zhang, R., Buchner, K., Liao, C., and Diaz, F. 2010a. Towards recency ranking in web search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM’10). 11--20.
[13]
Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Zheng, Z., and Zha, H. 2010b. Time is of the essence: Improving recency ranking using Twitter data. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 331--340.
[14]
Duan, Y., Jiang, L., Qin, T., Zhou, M., and Shum, H.-Y. 2010. An empirical study on learning to rank of tweets. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 295--303.
[15]
Efron, M. and Golovchinsky, G. 2011. Estimation methods for ranking recent information. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 495--504.
[16]
Efron, M., Organisciak, P., and Fenlon, K. 2012. Improving retrieval of short texts through document expansion. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, 911--920.
[17]
Elsas, J. L. and Dumais, S. T. 2010. Leveraging temporal dynamics of document content in relevance ranking. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM’10). 1--10.
[18]
Han, Z., Li, X., Yang, M., Qi, H., Li, S., and Zhao, T. 2012. Hit at trec 2012 microblog track. In Proceedings of Text REtrieval Conference.
[19]
Herbrich, R., Graepel, T., and Obermayer, K. 2000. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, P. J. Bartlett, B. Schölkopf, D. Schuurmans, and A. J. Smola Eds., 115--132.
[20]
Hüllermeier, E. and Fürnkranz, J. 2010. On predictive accuracy and risk minimization in pairwise label ranking. J. Comput. Syst. Sci. 76, 1, 49--62.
[21]
Joachims, T. 1999. Advances in Kernel Methods. 169--184.
[22]
Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02). 133--142.
[23]
Jones, R. and Diaz, F. 2007. Temporal profiles of queries. ACM Trans. Inf. Syst. 25, 3.
[24]
Keikha, M., Gerani, S., and Crestani, F. 2011a. Temper: A temporal relevance feedback method. In Proceedings of the 33d European Conference on Advances in Information Retrieval (ECIR’11). Springer, 436--447.
[25]
Keikha, M., Gerani, S., and Crestani, F. 2011b. Time-based relevance models. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, New York, 1087--1088.
[26]
Kulkarni, A., Teevan, J., Svore, K. M., and Dumais, S. T. 2011. Understanding temporal query dynamics. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM’11). 167--176.
[27]
Laplace, P.-S. 1774. Mémoire sur la probabilité des causes par les évènements. Mémoires de l’Academie Royale des Sciences Presentés par Divers Savan., 621--656.
[28]
Lee, J. H. 1997. Analyses of multiple evidence combination. In Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97). ACM, New York, 267--276.
[29]
Li, X. and Croft, W. B. 2003. Time-based language models. In Proceedings of the 12th ACM International Conference on Information and Knowledge Management (CIKM’03). 69--475.
[30]
Liang, F., Qiang, R., and Yang, J. 2012. Exploiting real-time information retrieval in the microblogosphere. In Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’12). 267--276.
[31]
Liu, S., Liu, F., Yu, C., and Meng, W. 2004. An effective approach to document retrieval via utilizing wordnet and recognizing phrases. In Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04). 266--272.
[32]
Massoudi, K., Tsagkias, M., de Rijke, M., and Weerkamp, W. 2011. Incorporating query expansion and quality indicators in searching microblog posts. In Proceedings of the 32nd European conference on Advances in Information Retrieval (ECIR’10). Springer, 362--367.
[33]
McCreadie, R., MacDonald, C., Santos, R., and Ounis, I. 2011. University of glasgow at trec 2011: Experiments with terrier in crowdsourcing, microblog, and web tracks. In Proceedings of Text REtrieval Conference.
[34]
Metzler, D. and Cai, C. 2011. Usc/isi at trec 2011: Microblog track (notebook version). In Proceedings of Text REtrieval Conference.
[35]
Ounis, I., MacDonald, C., Lin, J., and Soboroff, I. 2011. Overview of the trec 2011 microblog track. In Proceedings of Text REtrieval Conference.
[36]
Rijsbergen, C. J. V. 1979. Information Retrieval 2nd Ed. Butterworth-Heinemann, Newton, MA.
[37]
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1996. Okapi at TREC-3. 109--126.
[38]
Robertson, S., Zaragoza, H., and Taylor, M. 2004. Simple bm25 extension to multiple weighted fields. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04). 42--49.
[39]
Shaw, J. A., Fox, E. A., Shaw, J. A., and Fox, E. A. 1994. Combination of multiple searches. In Proceedings of the 2nd Text REtrieval Conference (TREC-2). 243--252.
[40]
Soboroff, I., Ounis, I., and Lin, J. 2012. Overview of the trec 2012 microblog track. In Proceedings of Text REtrieval Conference.
[41]
Zhang, W., Liu, S., Yu, C., Sun, C., Liu, F., and Meng, W. 2007. Recognition and classification of noun phrases in queries for effective retrieval. In Proceedings of the 16th ACM International Conference on Information and Knowledge Management (CIKM’07). ACM, New York, 711--720.
[42]
Zhang, X., He, B., Luo, T., and Li, B. 2012. Query-biased learning to rank for real-time twitter search. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, 1915--1919.

Cited By

View all

Index Terms

  1. The Impacts of Structural Difference and Temporality of Tweets on Retrieval Effectiveness

      Recommendations

      Reviews

      Xiannong Meng

      A novel method of evaluating tweets is introduced in this paper. Most retrieval algorithms do not differentiate the structure of tweets. The authors show convincingly that it does have an impact on retrieval effectiveness. In their study, two types of tweets are evaluated separately, those containing only plain text, and those containing any URLs. The key is that in ranking the tweets with URLs, the ranker considers the content of the page(s) pointed to by the URLs in addition to the tweets. After ranking the two types of tweets, a support vector machine-based classifier with 18 features is used to evaluate the relevance between the tweets and the query. If a tweet is time-sensitive, the temporal information of both the tweet and its parent is taken into consideration. Data from TREC 2011, TREC 2012, and TREC Tweets 2011 are used to evaluate the algorithm. The results indicate that (1) it is more effective to rank the two types of tweets separately and then merge them; (2) incorporating temporal information yields further improvements; and (3) the proposed "method compares favorably with state-of-the-art methods in retrieval effectiveness." The novelty of the proposed method is that it offers the ability to rank the tweets with and without URLs separately and to incorporate the temporal information in ranking the tweets. The paper is well written and self-contained. Many examples illustrate the concepts discussed. The readers can explore the topic further using the abundant references provided. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 31, Issue 4
      November 2013
      192 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/2536736
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 November 2013
      Accepted: 01 July 2013
      Revised: 01 April 2013
      Received: 01 September 2012
      Published in TOIS Volume 31, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Ad-hoc retrieval of tweets
      2. learning to rank
      3. query temporal categorization

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 17 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)An overview of aggregation methods for social networks analysisKnowledge and Information Systems10.1007/s10115-024-02296-zOnline publication date: 12-Dec-2024
      • (2019)A Deep Learning-based Ranking Approach for Microblog RetrievalProcedia Computer Science10.1016/j.procs.2019.09.190159(352-362)Online publication date: 2019
      • (2017)Microblog Retrieval Using Ensemble of Feature Sets through Supervised Feature SelectionIEICE Transactions on Information and Systems10.1587/transinf.2016DAP0032E100.D:4(793-806)Online publication date: 2017
      • (2015)Combining temporal and content aware features for microblog retrieval2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)10.1109/ICAICTA.2015.7335353(1-6)Online publication date: Aug-2015

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media