[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3184558.3191597acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Free access

Characterising Dataset Search Queries

Published: 23 April 2018 Publication History

Abstract

The amount of data generated and published on the web is increasing rapidly, but search for structured data on the web still presents challenges. In this paper we explore dataset search by analysing queries specifically generated for this work through a crowdsourcing experiment and comparing them to a search log analysis of queries on data portals. The change in search environment together with the task we gave people altered the generated queries. We found that queries issued in our experiment were much longer than search queries for datasets on data portals. They further contained seven times more mentions of geospatial and of temporal information and are more likely to be structured as questions. These insights can be used to tailor search functionalities to the particular information needs and characteristics of dataset search.

References

[1]
Qingyao Ai, Susan T. Dumais, Nick Craswell, and Daniel J. Liebling. 2017. Characterizing Email Search using Large-scale Behavioral Logs and Surveys. In Proceedings of the 26th International Conference on World Wide Web, WWW. 1511--1520.
[2]
Michael Bendersky and W. Bruce Croft. 2009. Analysis of Long Queries in a Large Scale Search Log. In Proceedings of the 2009 Workshop on Web Search Click Data. ACM, 8--14.
[3]
Andrei Broder. 2002. A Taxonomy of Web Search. SIGIR Forum 36, 2 (2002), 3--10.
[4]
Michael J. Cafarella, Alon Halevy, and Jayant Madhavan. 2011. Structured Data on the Web. Commun. ACM 54, 2 (2011), 72--79.
[5]
Qingqing Gan, Josh Attenberg, Alexander Markowetz, and Torsten Suel. 2008. Analysis of Geographic Queries in a Search Engine Log. In Proceedings of the First International Workshop on Location and the Web. ACM, 49--56.
[6]
Bernard J. Jansen and Amanda Spink. 2006. How Are We Searching the World Wide Web: A Comparison of Nine Search Engine Transaction Logs. Information Processing and Management 42, 1 (2006), 248--263.
[7]
Daxin Jiang, Jian Pei, and Hang Li. 2013. Mining Search and Browse Logs for Web Search: A Survey. ACM Transactions on Intelligent Systems and Technology 4, 4, Article 57 (2013), 37 pages.
[8]
Steve Jones, Sally Jo Cunningham, Rodger McNab, and Stefan Boddie. 2000. A transaction log analysis of a digital library. International Journal on Digital Libraries 3, 2 (2000), 152--169.
[9]
Emilia Kacprzak, Laura M. Koesten, Luis-Daniel Ibáñez, Elena Simperl, and Jeni Tennison. 2017. A Query Log Analysis of Dataset Search. Springer International Publishing, Cham, 429--436.
[10]
Dagmar Kern and Brigitte Mathiak. 2015. Are There Any Differences in Data Set Retrieval Compared to Well-Known Literature Retrieval. In 19th International Conference on Theory and Practice of Digital Libraries, TPDL. 197--208.
[11]
Laura M. Koesten, Emilia Kacprzak, Jenifer F. A. Tennison, and Elena Simperl. 2017. The Trials and Tribulations of Working with Structured Data - a Study on Information Seeking Behaviour. In Proceedings of Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1277--1289.
[12]
Sven R. Kunze and Soren Auer. 2013. Dataset Retrieval. In 2013 IEEE Seventh International Conference on Semantic Computing.
[13]
Xin Li, Bing Liu, and Philip S. Yu. 2010. Time Sensitive Ranking with Application to Publication Search. Springer, New York, 187--209.
[14]
Sérgio Nunes, Cristina Ribeiro, and Gabriel David. 2008. Use of temporal expressions in web search. In European Conference on Information Retrieval. Springer, 580--584.
[15]
Craig Silverstein, Hannes Marais, Monika Henzinger, and Michael Moricz. 1999. Analysis of a very large web search engine query log. ACM SIGIR Forum 33, 1 (1999), 6--12.
[16]
Amanda Spink, Dietmar Wolfram, Major BJ Jansen, and Tefko Saracevic. 2001. Searching the web: The public and their queries. Journal of the American society for information science and technology 52, 3 (2001), 226--234.
[17]
Mona Taghavi, Ahmed Patel, Nikita Schmidt, Christopher Wills, and Yiqi Tew. 2012. An analysis of web proxy logs with query distribution pattern approach for search engines. Computer Standards & Interfaces 34, 1 (2012), 162--170.
[18]
Wouter Weerkamp, Richard Berendsen, Bogomil Kovachev, Edgar Meij, Krisztian Balog, and Maarten de Rijke. 2011. People Searching for People: Analysis of a People Search Engine Log. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval.
[19]
Ryen W. White, Matthew Richardson, and Wen-tau Yih. 2015. Questions vs. Queries in Informational Search Tasks. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 135--136.

Cited By

View all
  • (2024)ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style QueriesProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657866(303-312)Online publication date: 10-Jul-2024
  • (2024)Retrievability in an integrated retrieval system: an extended studyInternational Journal on Digital Libraries10.1007/s00799-023-00363-425:2(287-301)Online publication date: 1-Jun-2024
  • (2023)A Taxonomy of Dataset SearchAdvances on Intelligent Computing and Data Science10.1007/978-3-031-36258-3_50(562-573)Online publication date: 17-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '18: Companion Proceedings of the The Web Conference 2018
April 2018
2023 pages
ISBN:9781450356404
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dataset search
  2. query generation
  3. search log analysis

Qualifiers

  • Research-article

Funding Sources

  • European Union Horizon 2020 program under the Marie Skodowska-Curie grant agreement

Conference

WWW '18
Sponsor:
  • IW3C2
WWW '18: The Web Conference 2018
April 23 - 27, 2018
Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)184
  • Downloads (Last 6 weeks)35
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style QueriesProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657866(303-312)Online publication date: 10-Jul-2024
  • (2024)Retrievability in an integrated retrieval system: an extended studyInternational Journal on Digital Libraries10.1007/s00799-023-00363-425:2(287-301)Online publication date: 1-Jun-2024
  • (2023)A Taxonomy of Dataset SearchAdvances on Intelligent Computing and Data Science10.1007/978-3-031-36258-3_50(562-573)Online publication date: 17-Aug-2023
  • (2022)Studying retrievability of publications and datasets in an integrated retrieval systemProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530931(1-9)Online publication date: 20-Jun-2022
  • (2022)VisGNN: Personalized Visualization Recommendationvia Graph Neural NetworksProceedings of the ACM Web Conference 202210.1145/3485447.3512001(2810-2818)Online publication date: 25-Apr-2022
  • (2021)BANDAR: Benchmarking Snippet Generation Algorithms for (RDF) Dataset SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3095309(1-1)Online publication date: 2021
  • (2021)PCSG: Pattern-Coverage Snippet Generation for RDF DatasetsThe Semantic Web – ISWC 202110.1007/978-3-030-88361-4_1(3-20)Online publication date: 24-Oct-2021
  • (2021)Genuine Information Needs of Social Scientists Looking for DataProceedings of the Association for Information Science and Technology10.1002/pra2.45758:1(292-302)Online publication date: 13-Oct-2021
  • (2020)Characteristics of Dataset Retrieval Sessions: Experiences from a Real-Life Digital LibraryDigital Libraries for Open Knowledge10.1007/978-3-030-54956-5_14(185-193)Online publication date: 25-Aug-2020
  • (2019)Open Data ChatbotThe Semantic Web: ESWC 2019 Satellite Events10.1007/978-3-030-32327-1_22(111-115)Online publication date: 10-Oct-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media