More Web Proxy on the site http://driver.im/

research-article

Free access

Characterising Dataset Search Queries

Authors:

Emilia Kacprzak,

Elena SimperlAuthors Info & Claims

WWW '18: Companion Proceedings of the The Web Conference 2018

Pages 1485 - 1488

https://doi.org/10.1145/3184558.3191597

Published: 23 April 2018 Publication History

All formats PDF

Abstract

The amount of data generated and published on the web is increasing rapidly, but search for structured data on the web still presents challenges. In this paper we explore dataset search by analysing queries specifically generated for this work through a crowdsourcing experiment and comparing them to a search log analysis of queries on data portals. The change in search environment together with the task we gave people altered the generated queries. We found that queries issued in our experiment were much longer than search queries for datasets on data portals. They further contained seven times more mentions of geospatial and of temporal information and are more likely to be structured as questions. These insights can be used to tailor search functionalities to the particular information needs and characteristics of dataset search.

References

[1]

Qingyao Ai, Susan T. Dumais, Nick Craswell, and Daniel J. Liebling. 2017. Characterizing Email Search using Large-scale Behavioral Logs and Surveys. In Proceedings of the 26th International Conference on World Wide Web, WWW. 1511--1520.

Digital Library

[2]

Michael Bendersky and W. Bruce Croft. 2009. Analysis of Long Queries in a Large Scale Search Log. In Proceedings of the 2009 Workshop on Web Search Click Data. ACM, 8--14.

Digital Library

[3]

Andrei Broder. 2002. A Taxonomy of Web Search. SIGIR Forum 36, 2 (2002), 3--10.

Digital Library

[4]

Michael J. Cafarella, Alon Halevy, and Jayant Madhavan. 2011. Structured Data on the Web. Commun. ACM 54, 2 (2011), 72--79.

Digital Library

[5]

Qingqing Gan, Josh Attenberg, Alexander Markowetz, and Torsten Suel. 2008. Analysis of Geographic Queries in a Search Engine Log. In Proceedings of the First International Workshop on Location and the Web. ACM, 49--56.

Digital Library

[6]

Bernard J. Jansen and Amanda Spink. 2006. How Are We Searching the World Wide Web: A Comparison of Nine Search Engine Transaction Logs. Information Processing and Management 42, 1 (2006), 248--263.

Digital Library

[7]

Daxin Jiang, Jian Pei, and Hang Li. 2013. Mining Search and Browse Logs for Web Search: A Survey. ACM Transactions on Intelligent Systems and Technology 4, 4, Article 57 (2013), 37 pages.

Digital Library

[8]

Steve Jones, Sally Jo Cunningham, Rodger McNab, and Stefan Boddie. 2000. A transaction log analysis of a digital library. International Journal on Digital Libraries 3, 2 (2000), 152--169.

[9]

Emilia Kacprzak, Laura M. Koesten, Luis-Daniel Ibáñez, Elena Simperl, and Jeni Tennison. 2017. A Query Log Analysis of Dataset Search. Springer International Publishing, Cham, 429--436.

[10]

Dagmar Kern and Brigitte Mathiak. 2015. Are There Any Differences in Data Set Retrieval Compared to Well-Known Literature Retrieval. In 19th International Conference on Theory and Practice of Digital Libraries, TPDL. 197--208.

[11]

Laura M. Koesten, Emilia Kacprzak, Jenifer F. A. Tennison, and Elena Simperl. 2017. The Trials and Tribulations of Working with Structured Data - a Study on Information Seeking Behaviour. In Proceedings of Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1277--1289.

Digital Library

[12]

Sven R. Kunze and Soren Auer. 2013. Dataset Retrieval. In 2013 IEEE Seventh International Conference on Semantic Computing.

Digital Library

[13]

Xin Li, Bing Liu, and Philip S. Yu. 2010. Time Sensitive Ranking with Application to Publication Search. Springer, New York, 187--209.

[14]

Sérgio Nunes, Cristina Ribeiro, and Gabriel David. 2008. Use of temporal expressions in web search. In European Conference on Information Retrieval. Springer, 580--584.

Digital Library

[15]

Craig Silverstein, Hannes Marais, Monika Henzinger, and Michael Moricz. 1999. Analysis of a very large web search engine query log. ACM SIGIR Forum 33, 1 (1999), 6--12.

Digital Library

[16]

Amanda Spink, Dietmar Wolfram, Major BJ Jansen, and Tefko Saracevic. 2001. Searching the web: The public and their queries. Journal of the American society for information science and technology 52, 3 (2001), 226--234.

Digital Library

[17]

Mona Taghavi, Ahmed Patel, Nikita Schmidt, Christopher Wills, and Yiqi Tew. 2012. An analysis of web proxy logs with query distribution pattern approach for search engines. Computer Standards & Interfaces 34, 1 (2012), 162--170.

Digital Library

[18]

Wouter Weerkamp, Richard Berendsen, Bogomil Kovachev, Edgar Meij, Krisztian Balog, and Maarten de Rijke. 2011. People Searching for People: Analysis of a People Search Engine Log. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval.

Digital Library

[19]

Ryen W. White, Matthew Richardson, and Wen-tau Yih. 2015. Questions vs. Queries in Informational Search Tasks. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 135--136.

Digital Library

Cited By

Chen QLuo WHuang ZLin TWang XSoylu AEll BZhou BKharlamov ECheng GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style QueriesProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657866(303-312)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657866
Roy DCarevic ZMayr P(2024)Retrievability in an integrated retrieval system: an extended studyInternational Journal on Digital Libraries10.1007/s00799-023-00363-425:2(287-301)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00799-023-00363-4
Almuntashiri AIbáñez LChapman A(2023)A Taxonomy of Dataset SearchAdvances on Intelligent Computing and Data Science10.1007/978-3-031-36258-3_50(562-573)Online publication date: 17-Aug-2023
https://doi.org/10.1007/978-3-031-36258-3_50
Show More Cited By

Index Terms

Characterising Dataset Search Queries
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query log analysis

Recommendations

Improving dense retrieval models with LLM augmented data for dataset search
Abstract
Data augmentation for training supervised models has achieved great results in different areas. With the popularity of Large Language Models (LLMs), a research area has emerged focused on applying LLMs for text data augmentation. This approach is ...
Privacy-preserving Spatial Dataset Search in Cloud
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

The development of cloud computing has met the growing demand for dataset search in the era of massive data. In the field of spatial dataset search, the high prevalence of sensitive information in spatial datasets underscores the necessity of privacy-...
Characterising dataset search—An analysis of search logs and data requests
Abstract
Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '18: Companion Proceedings of the The Web Conference 2018

April 2018

2023 pages

ISBN:9781450356404

General Chairs:
Pierre-Antoine Champin
Université Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, CNRS, LIRIS, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

European Union Horizon 2020 program under the Marie Skodowska-Curie grant agreement

Conference

WWW '18

Sponsor:

IW3C2

WWW '18: The Web Conference 2018

April 23 - 27, 2018

Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
767
Total Downloads

Downloads (Last 12 months)184
Downloads (Last 6 weeks)35

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen QLuo WHuang ZLin TWang XSoylu AEll BZhou BKharlamov ECheng GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style QueriesProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657866(303-312)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657866
Roy DCarevic ZMayr P(2024)Retrievability in an integrated retrieval system: an extended studyInternational Journal on Digital Libraries10.1007/s00799-023-00363-425:2(287-301)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00799-023-00363-4
Almuntashiri AIbáñez LChapman A(2023)A Taxonomy of Dataset SearchAdvances on Intelligent Computing and Data Science10.1007/978-3-031-36258-3_50(562-573)Online publication date: 17-Aug-2023
https://doi.org/10.1007/978-3-031-36258-3_50
Roy DCarevic ZMayr PAizawa AMandl TCarevic ZHinze AMayr PSchaer P(2022)Studying retrievability of publications and datasets in an integrated retrieval systemProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530931(1-9)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1145/3529372.3530931
Ojo FRossi RHoffswell JGuo SDu FKim SXiao CKoh E(2022)VisGNN: Personalized Visualization Recommendationvia Graph Neural NetworksProceedings of the ACM Web Conference 202210.1145/3485447.3512001(2810-2818)Online publication date: 25-Apr-2022
https://doi.org/10.1145/3485447.3512001
Wang XCheng GPan JKharlamov EQu Y(2021)BANDAR: Benchmarking Snippet Generation Algorithms for (RDF) Dataset SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3095309(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3095309
Wang XCheng GLin TXu JPan JKharlamov EQu Y(2021)PCSG: Pattern-Coverage Snippet Generation for RDF DatasetsThe Semantic Web – ISWC 202110.1007/978-3-030-88361-4_1(3-20)Online publication date: 24-Oct-2021
https://dl.acm.org/doi/10.1007/978-3-030-88361-4_1
Papenmeier AKrämer TFriedrich THienert DKern D(2021)Genuine Information Needs of Social Scientists Looking for DataProceedings of the Association for Information Science and Technology10.1002/pra2.45758:1(292-302)Online publication date: 13-Oct-2021
https://doi.org/10.1002/pra2.457
Carevic ZRoy DMayr P(2020)Characteristics of Dataset Retrieval Sessions: Experiences from a Real-Life Digital LibraryDigital Libraries for Open Knowledge10.1007/978-3-030-54956-5_14(185-193)Online publication date: 25-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-54956-5_14
Keyner SSavenkov VVakulenko S(2019)Open Data ChatbotThe Semantic Web: ESWC 2019 Satellite Events10.1007/978-3-030-32327-1_22(111-115)Online publication date: 10-Oct-2019
https://doi.org/10.1007/978-3-030-32327-1_22

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents