[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2806416.2806457acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries

Published: 17 October 2015 Publication History

Abstract

We analyze the question queries submitted to a large commercial web search engine to get insights about what people ask, and to better tailor the search results to the users' needs. Based on a dataset of about one billion question queries submitted during the year 2012, we investigate askers' querying behavior with the support of automatic query categorization. While the importance of question queries is likely to increase, at present they only make up 3-4% of the total search traffic.
Since questions are such a small part of the query stream, and are more likely to be unique than shorter queries, click-through information is typically rather sparse. Thus, query categorization methods based on the categories of clicked web documents do not work well for questions. As an alternative, we propose a robust question query classification method that uses the labeled questions from a large community question answering platform (CQA) as a training set. The resulting classifier is then transferred to the web search questions. Even though questions on CQA platforms tend to be different to web search questions, our categorization method proves competitive with strong baselines with respect to classification accuracy.
To show the scalability of our proposed method we apply the classifiers to about one billion question queries and discuss the trade-offs between performance and accuracy that different classification models offer.

References

[1]
Anne Aula, Rehan M. Khan, and Zhiwei Guan. How does search behavior change as search becomes more difficult? In Proceedings of CHI 2010, pages 35--44.
[2]
Peter Bailey, Ryen W. White, Han Liu, and Giridhar Kumaran. Mining historic query trails to label long and rare search engine queries. ACM Transactions on the Web, 4 (4): 15, 2010.
[3]
Judit Bar-Ilan, Zheng Zhu, and Mark Levene. Topic-specific analysis of search queries. In Proceedings of the WSCD 2009 Workshop, pages 35--42.
[4]
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, Ophir Frieder, and David Grossman. Temporal analysis of a very large topically categorized web query log. Journal of the American Society for Information Science and Technology, 58 (2): 166--178, 2007.
[5]
Steven M. Beitzel, Eric C. Jensen, David D. Lewis, Abdur Chowdhury, and Ophir Frieder. Automatic classification of web queries using very large unlabeled query logs. ACM Transactions on Information Systems, 25 (2): 9, 2007.
[6]
Jerome R Bellegarda. Spoken language understanding for natural interaction: The Siri experience. In Natural Interaction with Robots, Knowbots and Smartphones, pages 3--14. 2014.
[7]
Michael Bendersky and W. Bruce Croft. Analysis of long queries in a large scale search log. In Proceedings of the WSCD 2009 workshop, pages 8--14.
[8]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3: 993--1022, 2003.
[9]
Andrei Z. Broder, Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja Josifovski, and Tong Zhang. Robust classification of rare queries using web knowledge. In Proceedings of SIGIR 2007, pages 231--238.
[10]
Li Cai, Guangyou Zhou, Kang Liu, and Jun Zhao. Large-scale question classification in CQA by leveraging Wikipedia semantic knowledge. In Proceedings of CIKM 2011, pages 1321--1330.
[11]
Xin Cao, Gao Cong, Bin Cui, and Christian S. Jensen. A generalized framework of exploring category information for question retrieval in community question answer archives. In Proceedings of WWW 2010, pages 201--210.
[12]
Wen Chan, Weidong Yang, Jinhui Tang, Jintao Du, Xiangdong Zhou, and Wei Wang. Community question topic categorization via hierarchical kernelized classification. In Proceedings of CIKM 2013, pages 959--968.
[13]
Xueqi Cheng, Yanyan Lan, Jiafeng Guo, and Xiaohui Yan. BTM: Topic modeling over short texts. IEEE Transactions on Knowledge and Data Engineering, paper 1, 2014.
[14]
Huizhong Duan, Yunbo Cao, Chin-Yew Lin, and Yong Yu. Searching questions by identifying question topic and question focus. In Proceedings of ACL 2008, pages 156--164.
[15]
Baichuan Li, Irwin King, and Michael R Lyu. Question routing in community question answering: Putting category in its place. In Proceedings of CIKM 2011, pages 2041--2044.
[16]
Xiao Li, Ye-Yi Wang, and Alex Acero. Learning query intent from regularized click graphs. In Proceedings of SIGIR 2008, pages 339--346.
[17]
Jian Liu, Yiqun Liu, Min Zhang, and Shaoping Ma. How do users grow up along with search engines?: A study of long-term users' behavior. In Proceedings of CIKM 2013, pages 1795--1800.
[18]
Qiaoling Liu, Eugene Agichtein, Gideon Dror, Evgeniy Gabrilovich, Yoelle Maarek, Dan Pelleg, and Idan Szpektor. Predicting web searcher satisfaction with existing community-based answers. In Proceedings of SIGIR 2011, pages 415--424.
[19]
Qiaoling Liu, Eugene Agichtein, Gideon Dror, Yoelle Maarek, and Idan Szpektor. When web search fails, searchers become askers: Understanding the transition. In Proceedings of SIGIR 2012, pages 801--810.
[20]
Meredith Ringel Morris, Jaime Teevan, and Katrina Panovich. What do people ask their social networks, and why?: A survey study of status message Q&A behavior. In Proceedings of CHI 2010, pages 1739--1748.
[21]
Bo Pang and Ravi Kumar. Search in the lost sense of query: Question formulation in web search queries and its temporal changes. In Proceedings of ACL 2011, pages 135--140.
[22]
Greg Pass, Abdur Chowdhury, and Cayley Torgeson. A picture of search. In Proceedings of Infoscale 2006, paper 1.
[23]
Bo Qu, Gao Cong, Cuiping Li, Aixin Sun, and Hong Chen. An evaluation of classification models for question topic categorization. Journal of the American Society for Information Science and Technology, 63 (5): 889--903, 2012.
[24]
Matthew Richardson. Learning about the world through long-term query logs. ACM Transactions on the Web, 2 (4): 21, 2008.
[25]
Johan Schalkwyk, Doug Beeferman, Françoise Beaufays, Bill Byrne, Ciprian Chelba, Mike Cohen, Maryam Kamvar, and Brian Strope. "Your word is my command": Google search by voice: A case study. In Advances in Speech Recognition, pages 61--90. 2010.
[26]
Dou Shen, Jian-Tao Sun, Qiang Yang, and Zheng Chen. Building bridges for web query classification. In Proceedings of SIGIR 2006, pages 131--138.
[27]
Marina Sokolova and Guy Lapalme. A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45 (4): 427--437, 2009.
[28]
Amanda Spink and H. Cenk Ozmultu. Characteristics of question format web queries: An exploratory study. Information processing & management, 38 (4): 453--471, 2002.
[29]
Amanda Spink, Bernard J. Jansen, Dietmar Wolfram, and Tefko Saracevic. From e-sex to e-commerce: Web search changes. Computer, 35 (3): 107--109, 2002.
[30]
Ingmar Weber, Antti Ukkonen, and Aris Gionis. Answers, not links: Extracting tips from Yahoo!, Answers to address how-to web queries. In Proceedings of WSDM 2012, pages 613--622.
[31]
Ryen W. White and Dan Morris. Investigating the querying and browsing behavior of advanced search engine users. In Proceedings of SIGIR 2007, pages 255--262.
[32]
Xiaobing Xue, Jiwoon Jeon, and W. Bruce Croft. Retrieval models for question and answer archives. In Proceedings of SIGIR 2008, pages 475--482.
[33]
Zhe Zhao and Qiaozhu Mei. Questions about questions: An empirical analysis of information needs on Twitter. In Proceedings of WWW 2013, pages 1545--1556.

Cited By

View all
  • (2023)Better Understanding Procedural Search Tasks: Perceptions, Behaviors, and ChallengesACM Transactions on Information Systems10.1145/363000442:3(1-32)Online publication date: 29-Dec-2023
  • (2023)Understanding Procedural Search Tasks “in the Wild”Proceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578302(24-33)Online publication date: 19-Mar-2023
  • (2023) Types of domain and task‐solving information in media scholars' data interaction Journal of the Association for Information Science and Technology10.1002/asi.24863Online publication date: 26-Dec-2023
  • Show More Cited By

Index Terms

  1. What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
    October 2015
    1998 pages
    ISBN:9781450337946
    DOI:10.1145/2806416
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. community question answering (cqa)
    2. query classification
    3. query log analysis
    4. question queries

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CIKM'15
    Sponsor:

    Acceptance Rates

    CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Better Understanding Procedural Search Tasks: Perceptions, Behaviors, and ChallengesACM Transactions on Information Systems10.1145/363000442:3(1-32)Online publication date: 29-Dec-2023
    • (2023)Understanding Procedural Search Tasks “in the Wild”Proceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578302(24-33)Online publication date: 19-Mar-2023
    • (2023) Types of domain and task‐solving information in media scholars' data interaction Journal of the Association for Information Science and Technology10.1002/asi.24863Online publication date: 26-Dec-2023
    • (2022)Procedural Knowledge Search by Intelligence AnalystsProceedings of the 2022 Conference on Human Information Interaction and Retrieval10.1145/3498366.3505810(169-179)Online publication date: 14-Mar-2022
    • (2022)Identifying Argumentative Questions in Web Search LogsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531864(2393-2399)Online publication date: 6-Jul-2022
    • (2021)Misbeliefs and Biases in Health-Related SearchesProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482141(2894-2899)Online publication date: 26-Oct-2021
    • (2020)Providing Direct Answers in Search Results: A Study of User BehaviorProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412017(1635-1644)Online publication date: 19-Oct-2020
    • (2020)DaNetQA: A Yes/No Question Answering Dataset for the Russian LanguageAnalysis of Images, Social Networks and Texts10.1007/978-3-030-72610-2_4(57-68)Online publication date: 15-Oct-2020
    • (2020)RuBQ: A Russian Dataset for Question Answering over WikidataThe Semantic Web – ISWC 202010.1007/978-3-030-62466-8_7(97-110)Online publication date: 2-Nov-2020
    • (2020)Topic Modeling in Russia: Current Approaches and Issues in MethodologyThe Palgrave Handbook of Digital Russia Studies10.1007/978-3-030-42855-6_23(409-426)Online publication date: 16-Dec-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media