[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3576840.3578275acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open access

Direct, Orienting, and Scenic Paths: How Users Navigate Search in a Research Data Archive

Published: 20 March 2023 Publication History

Abstract

Social scientists increasingly share data so others can evaluate, replicate, and extend their research. To understand the process of data discovery as a precursor to data use, we study prospective users’ interactions with archived data. We gathered data for 98,000 user sessions initiated at a large social science data archive, the Inter-university Consortium for Political and Social Research (ICPSR). Our data reflect four years (2012-16) of users’ interactions with archival resources, including a data catalog, study-level metadata, variables, and publications that cite nearly 10,000 datasets. We constructed a network of user interactions linking website landing (e.g., site entrances) to exit pages, from which we identified three types of paths that users take through the research data archive: direct, orienting, and scenic. We also interpreted points of failure (e.g., drop-offs) and recurring behaviors (e.g., sensemaking) that support or impede data discovery along search paths. We articulate strategies that users adopt as they navigate data search and suggest ways to enhance the accessibility of data, metadata, and the systems that organize each.

References

[1]
Marcia J Bates. 1989. The design of browsing and berrypicking techniques for the online search interface. Online Review 13, 5 (Jan. 1989), 407–424. https://doi.org/10.1108/eb024320
[2]
Carolyn Bishoff and Lisa Johnston. 2015. Approaches to data sharing: An analysis of NSF data management plans from a large research university. Journal of Librarianship and Scholarly Communication 3, 2(2015), eP1231. https://doi.org/10.7710/2162-3309.1231
[3]
Christine L Borgman, Andrea Scharnhorst, and Milena S Golshan. 2019. Digital data archives as knowledge infrastructures: Mediating data sharing and reuse. J. Assoc. Inf. Sci. Technol. 70, 8 (Aug. 2019), 888–904. https://doi.org/10.1002/asi.24172
[4]
Dan Brickley, Matthew Burgess, and Natasha Noy. 2019. Google Dataset Search: Building a Search Engine for Datasets in an Open Web Ecosystem. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 1365–1375. https://doi.org/10.1145/3308558.3313685
[5]
Andrei Broder. 2002. A taxonomy of web search. SIGIR Forum 36, 2 (Sept. 2002), 3–10. https://doi.org/10.1145/792550.792552
[6]
Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez, Emilia Kacprzak, and Paul Groth. 2019. Dataset Search: A Survey. The VLDB Journal 29, 1 (Aug. 2019), 251–272. https://doi.org/10.1007/s00778-019-00564-x
[7]
Brenda Dervin. 1998. Sense‐making theory and practice: an overview of user interests in knowledge seeking and use. Journal of Knowledge Management 2, 2 (Jan. 1998), 36–46. https://doi.org/10.1108/13673279810249369
[8]
Carsten Eickhoff, Jaime Teevan, Ryen White, and Susan Dumais. 2014. Lessons from the Journey: A Query Log Analysis of within-Session Learning. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (New York, New York, USA) (WSDM ’14). Association for Computing Machinery, New York, NY, USA, 223–232. https://doi.org/10.1145/2556195.2556217
[9]
David Ellis. 1989. A behavioural model for information retrieval system design. J. Inf. Sci. Eng. 15, 4-5 (Aug. 1989), 237–247. https://doi.org/10.1177/016555158901500406
[10]
K R Eschenfelder, K Shankar, and others. 2022. The financial maintenance of social science data archives: Four case studies of long‐term infrastructure work. Journal of the Association for Information Science and Technology 73, 12(2022), 1723–1740. https://doi.org/10.1002/asi.24691
[11]
Rebecca D Frank. 2020. The social construction of risk in digital preservation. Journal of the Association for Information Science and Technology 71, 4(2020), 474–484. https://doi.org/10.1002/asi.24247
[12]
Kathleen Gregory, Paul Groth, Helena Cousijn, Andrea Scharnhorst, and Sally Wyatt. 2019. Searching Data: A Review of Observational Data Retrieval Practices in Selected Disciplines. J Assoc Inf Sci Technol 70, 5 (May 2019), 419–432. https://doi.org/10.1002/asi.24165
[13]
Kathleen Gregory, Paul Groth, Andrea Scharnhorst, and Sally Wyatt. 2020. Lost or Found? Discovering Data Needed for Research. Harvard Data Science Review 2, 2 (Apr. 30 2020). https://doi.org/10.1162/99608f92.e38165eb
[14]
Marti Hearst. 2009. Search User Interfaces. Cambridge University Press, New York.
[15]
Cynthia Hudson-Vitale and Heather Moulaison-Sandy. 2019. Data Management Plans: A Review. DESIDOC Journal of Library & Information Technology 39, 6(2019), 322–328.
[16]
H C Huurdeman and J Kamps. 2018. Collaborative Approach to Research Data Management in a Web Archive Context. In Research Data Management - A European Perspective, Filip Kruse and Jesper Boserup Thestrup (Eds.). De Gruyter Saur, Berlin, [Germany], 55–78. https://doi.org/10.1515/9783110365634-005
[17]
Yongyao Jiang, Yun Li, Chaowei Yang, Edward M Armstrong, Thomas Huang, and David Moroni. 2016. Reconstructing sessions from data discovery and access logs to build a semantic knowledge base for improving data discovery. ISPRS International Journal of Geo-Information 5, 5(2016), 54. https://doi.org/10.3390/ijgi5050054
[18]
Emilia Kacprzak, Laura M Koesten, Luis-Daniel Ibáñez, Elena Simperl, and Jeni Tennison. 2017. A query log analysis of dataset search. In International Conference on Web Engineering. Springer, Cham: Springer International Publishing, New York, 429–436. https://doi.org/10.1007/978-3-319-60131-1_29
[19]
Dagmar Kern and Brigitte Mathiak. 2015. Are There Any Differences in Data Set Retrieval Compared to Well-Known Literature Retrieval?. In Research and Advanced Technology for Digital Libraries(Lecture Notes in Computer Science), Sarantos Kapidakis, Cezary Mazurek, and Marcin Werla (Eds.). Springer International Publishing, Cham, 197–208. https://doi.org/10.1007/978-3-319-24592-8_15
[20]
Laura Koesten, Elena Simperl, Tom Blount, Emilia Kacprzak, and Jeni Tennison. 2020. Everything you always wanted to know about a dataset: Studies in data summarisation. Int. J. Hum. Comput. Stud. 135 (March 2020), 102367. https://doi.org/10.1016/j.ijhcs.2019.10.004
[21]
Thomas Krämer, Andrea Papenmeier, Zeljko Carevic, Dagmar Kern, and Brigitte Mathiak. 2021. Data-Seeking Behaviour in the Social Sciences. International Journal on Digital Libraries 22, 2 (2021), 175–195. https://doi.org/10.1007/s00799-021-00303-0
[22]
F Kruse and J B Thestrup. 2018. Archiving the Web – a Data Management Perspective. In Research Data Management - A European Perspective, Filip Kruse and Jesper Boserup Thestrup (Eds.). De Gruyter Saur, Berlin, [Germany], 79–101.
[23]
Werner Kuhn. 1993. Metaphors create theories for users. In Spatial Information Theory - A Theoretical Basis for GIS. Springer, Berlin, Heidelberg, 366–376. https://doi.org/10.1007/3-540-57207-4_24
[24]
George Lakoff and Mark Johnson. 1980. The Metaphorical Structure of the Human Conceptual System. Cogn. Sci. 4, 2 (1980), 195–208. https://doi.org/10.1207/s15516709cog0402_4
[25]
Bin Liu and H V Jagadish. 2009. DataLens: making a good first impression. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (Providence, Rhode Island, USA) (SIGMOD ’09). Association for Computing Machinery, New York, NY, USA, 1115–1118. https://doi.org/10.1145/1559845.1559997
[26]
Gary Marchionini. 2006. Exploratory search: from finding to understanding. Commun. ACM 49, 4 (April 2006), 41–46. https://doi.org/10.1145/1121949.1121979
[27]
National Research Council, Policy and Global Affairs, Division on Engineering and Physical Sciences, and Committee on Applied and Theoretical Statistics. 2010. Steps Toward Large-Scale Data Integration in the Sciences: Summary of a Workshop. National Academies Press, Washington, D.C.
[28]
National Academies of Sciences Engineeringand Medicine. 2018. Open science by design: Realizing a vision for 21st century research. National Academies Press, Washington, D.C.
[29]
Andrea Papenmeier, Thomas Krämer, Tanja Friedrich, Daniel Hienert, and Dagmar Kern. 2021. Genuine information needs of social scientists looking for data. Proc. Assoc. Inf. Sci. Technol. 58, 1 (Oct. 2021), 292–302. https://doi.org/10.1002/pra2.457
[30]
Irene V Pasquetto, Christine L Borgman, and Morgan F Wofford. 2019. Uses and reuses of scientific data: The data creators’ advantage. Harvard Data Science Review 1, 2 (Nov. 2019). https://doi.org/10.1162/99608f92.fc14bf2d
[31]
Amy M Pienta, Dharma Akmon, Justin Noble, Lynette Hoelter, and Susan Jekielek. 2018. A Data-Driven Approach to Appraisal and Selection at a Domain Data Repository. International Journal of Digital Curation 12, 2 (June 2018), 362–375. https://doi.org/10.2218/ijdc.v12i2.500
[32]
Peter Pirolli and Stuart Card. 1999. Information foraging. Psychol. Rev. 106, 4 (1999), 643–675. https://doi.org/10.1037/0033-295X.106.4.643
[33]
Jonas Recker, Wolfgang Zenk-Möltgen, and Reiner Mauer. 2017. Applications of Research Data Management at GESIS Data Archive for the Social Sciences. Berlin/Boston: Walter de Gruyter GmbH, Berlin, Boston, 119–146. https://doi.org/10.1515/9783110365634
[34]
Soo Young Rieh, Kevyn Collins-Thompson, Preben Hansen, and Hye-Jung Lee. 2016. Towards searching as a learning process: A review of current perspectives and future directions. J. Inf. Sci. Eng. 42, 1 (Feb. 2016), 19–34. https://doi.org/10.1177/0165551515615841
[35]
H Riley and M Crookston. 2015. Awareness and Use of the New Zealand Web Archive: A survey of New Zealand academics. Technical Report. National Library of New Zealand.
[36]
Heather Moulaison Sandy, Anthony J Million, and Cynthia Hudson-Vitale. 2020. Innovating support for research: the coalescence of scholarly communication?College & Research Libraries 81, 2 (2020), 193. https://doi.org/10.5860/crl.81.2.193
[37]
Reijo Savolainen. 1993. The sense-making theory: Reviewing the interests of a user-centered approach to information seeking and use. Information Processing & Management 29, 1 (1993), 13–28. https://doi.org/10.1016/0306-4573(93)90020-E
[38]
Romina Sharifpour, Mingfang Wu, and Xiuzhen Zhang. 2022. Large-scale analysis of query logs to profile users for dataset search. https://doi.org/10.1108/JD-12-2021-0245
[39]
David Canfield Smith, Charles Irby, Ralph Kimball, and Eric Harslem. 1982. The Star User Interface. In Proceedings of the June 7-10, 1982, National Computer Conference on - AFIPS ’82. Springer New York, New York, NY, 515–528. https://doi.org/10.1145/1500774.1500840
[40]
Pertti Vakkari, Mikko Pennanen, and Sami Serola. 2003. Changes of search terms and tactics while writing a research proposal: A longitudinal case study. Inf. Process. Manag. 39, 3 (May 2003), 445–463. https://doi.org/10.1016/S0306-4573(02)00031-6
[41]
Xiaoguang Wang, Qingyu Duan, and Mengli Liang. 2021. Understanding the process of data reuse: An extensive review. J. Assoc. Inf. Sci. Technol. 72, 9 (Sept. 2021), 1161–1182. https://doi.org/10.1002/asi.24483
[42]
Mark D Wilkinson, Michel Dumontier, Ijsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E Bourne, Jildau Bouwman, Anthony J Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J G Gray, Paul Groth, Carole Goble, Jeffrey S Grethe, Jaap Heringa, Peter A C ’t Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J Lusher, Maryann E Martone, Albert Mons, Abel L Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, and Barend Mons. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 1 (2016), 160018–160018. https://doi.org/10.1038/sdata.2016.18
[43]
Mingfang Wu, Fotis Psomopoulos, Siri Jodha Khalsa, and Anita de Waard. 2019. Data discovery paradigms: User requirements and recommendations for data repositories. Data Sci. J. 18 (Jan. 2019). https://doi.org/10.5334/dsj-2019-003
[44]
Laura Wynholds. 2011. Linking to scientific data: Identity problems of unruly and poorly bounded digital objects. Int. J. Digit. Curation 6, 1 (March 2011), 214–225. https://doi.org/10.2218/ijdc.v6i1.183
[45]
Fanghui Xiao, Daqing He, Yu Chi, Wei Jeng, and Christinger Tomer. 2019. Challenges and Supports for Accessing Open Government Datasets: Data Guide for Better Open Data Access and Uses. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (Glasgow, Scotland UK) (CHIIR ’19). Association for Computing Machinery, New York, NY, USA, 313–317. https://doi.org/10.1145/3295750.3298958
[46]
Ayoung Yoon, Andrea Copeland, and Paula Jo McNally. 2018. Empowering communities with data: Role of data intermediaries for communities’ data utilization. Proceedings of the Association for Information Science and Technology 55, 1 (2018), 583–592. https://doi.org/10.1002/pra2.2018.14505501063
[47]
Jeremy York. 2022. Seeking Equilibrium in Data Reuse: A Study of Knowledge Satisficing. Ph. D. Dissertation. University of Michigan.
[48]
Guilan Zhang, Jian Wang, Jianping Liu, and Yao Pan. 2021. Relationship between the metadata and relevance criteria of scientific data. Data Science Journal 20, 1 (2021). https://doi.org/10.5334/dsj-2021-005

Cited By

View all
  • (2024)Exploratory and directed search strategies at a social science data archiveIASSIST Quarterly10.29173/iq108748:1Online publication date: 28-Mar-2024
  • (2024)Re-use of research data in the social sciences. Use and users of digital data archivePLOS ONE10.1371/journal.pone.030319019:5(e0303190)Online publication date: 10-May-2024
  • (2024)Does the use of unusual combinations of datasets contribute to greater scientific impact?Proceedings of the National Academy of Sciences10.1073/pnas.2402802121121:41Online publication date: 2-Oct-2024
  • Show More Cited By

Index Terms

  1. Direct, Orienting, and Scenic Paths: How Users Navigate Search in a Research Data Archive

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval
      March 2023
      520 pages
      ISBN:9798400700354
      DOI:10.1145/3576840
      • Editors:
      • Jacek Gwizdka,
      • Soo Young Rieh
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 March 2023

      Check for updates

      Author Tags

      1. data reuse
      2. information access
      3. information seeking
      4. user behaviors
      5. web analytics

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      CHIIR '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 55 of 163 submissions, 34%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)297
      • Downloads (Last 6 weeks)35
      Reflects downloads up to 18 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Exploratory and directed search strategies at a social science data archiveIASSIST Quarterly10.29173/iq108748:1Online publication date: 28-Mar-2024
      • (2024)Re-use of research data in the social sciences. Use and users of digital data archivePLOS ONE10.1371/journal.pone.030319019:5(e0303190)Online publication date: 10-May-2024
      • (2024)Does the use of unusual combinations of datasets contribute to greater scientific impact?Proceedings of the National Academy of Sciences10.1073/pnas.2402802121121:41Online publication date: 2-Oct-2024
      • (2024)Comparative Analysis: User Interactions in Public and Private Digital Libraries DatasetsLinking Theory and Practice of Digital Libraries10.1007/978-3-031-72440-4_16(162-172)Online publication date: 24-Sep-2024
      • (2023) DataChat : Prototyping a Conversational Agent for Dataset Search and Visualization Proceedings of the Association for Information Science and Technology10.1002/pra2.82060:1(586-591)Online publication date: 22-Oct-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media