More Web Proxy on the site http://driver.im/

research-article

From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search

Authors:

Harrisen Scells,

Guido ZucconAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 3176 - 3186

https://doi.org/10.1145/3477495.3531748

Published: 07 July 2022 Publication History

Abstract

Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research question. To aid query formulation, information specialists use a set of exemplar documents, called 'seed studies', prior to query formulation. Seed studies help verify the effectiveness of a query prior to the full assessment of retrieved studies. Beyond this use of seeds, specific IR methods can exploit seed studies for guiding both automatic query formulation and new retrieval models. One major limitation of work to date is that these methods exploit 'pseudo seed studies' through retrospective use of included studies (i.e., relevance assessments). However, we show pseudo seed studies are not representative of real seed studies used by information specialists. Hence, we provide a test collection with real world seed studies used to assist with the formulation of queries. To support our collection, we provide an analysis, previously not possible, on how seed studies impact retrieval and perform several experiments using seed study based methods to compare the effectiveness of using seed studies versus pseudo seed studies. We make our test collection and the results of all of our experiments and analysis available at http://github.com/ielab/sysrev-seed-collection.

References

[1]

Mustafa Abualsaud, Nimesh Ghelani, Haotian Zhang, Mark D Smucker, Gordon V Cormack, and Maura R Grossman. 2018. A system for efficient high-recall retrieval. In The 41st international ACM SIGIR conference on research & development in information retrieval . 1317--1320.

Digital Library

[2]

Amal Alharbi and Mark Stevenson. 2019. A dataset of systematic review updates. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval . 1257--1260.

Digital Library

[3]

Amal Alharbi and Mark Stevenson. 2020. Refining Boolean queries to identify relevant studies for systematic review updates. Journal of the American Medical Informatics Association, Vol. 27, 11 (2020), 1658--1666.

[4]

Krystal Bullers, Allison M Howard, Ardis Hanson, William D Kearns, John J Orriola, Randall L Polo, and Kristen A Sakmar. 2018. It Takes Longer than You Think: Librarian Time Spent on Systematic Review Tasks. Journal of the Medical Library Association, Vol. 106, 2 (2018), 198.

[5]

Miew Keen Choong, Filippo Galgani, Adam G Dunn, and Guy Tsafnat. 2014. Automatic evidence retrieval for systematic reviews. Journal of medical Internet research, Vol. 16, 10 (2014), e3369.

[6]

Justin Clark. 2013. Systematic Reviewing. In Methods of Clinical Epidemiology, Gail M. Williams Suhail A. R. Doi (Ed.).

[7]

Justin Clark, Paul Glasziou, Chris Del Mar, Alexandra Bannach-Brown, Paulina Stehlik, and Anna Mae Scott. 2020. A full systematic review was completed in 2 weeks using automation tools: a case study. Journal of Chronic Diseases, Vol. 121 (May 2020), 81--90. https://doi.org/10.1016/j.jclinepi.2020.01.008

[8]

Francesco Colace, Massimo De Santo, Luca Greco, and Paolo Napoletano. 2011. Improving text retrieval accuracy by using a minimal relevance feedback. In International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management. Springer, 126--140.

[9]

Gordon V Cormack and Maura R Grossman. 2014. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval . 153--162.

Digital Library

[10]

Gordon V Cormack and Maura R Grossman. 2015. Autonomy and reliability of continuous active learning for technology-assisted review. arXiv preprint arXiv:1504.06868 (2015).

[11]

Gordon V Cormack and Maura R Grossman. 2016. Scalability of continuous active learning for reliable high-recall text classification. In Proceedings of the 25th ACM international on conference on information and knowledge management . 1039--1048.

Digital Library

[12]

Giorgio Maria Di Nunzio. 2020. A Study on a Stopping Strategy for Systematic Reviews Based on a Distributed Effort Approach. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 112--123.

Digital Library

[13]

Trisha Greenhalgh and Richard Peacock. 2005. Effectiveness and efficiency of search methods in systematic reviews of complex evidence: audit of primary sources. Bmj, Vol. 331, 7524 (2005), 1064--1065.

[14]

Neal Haddaway, Matt Grainger, and Charles Gray. 2021. citationchaser: An R package and Shiny app for forward and backward citations chasing in academic searching. https://doi.org/10.5281/zenodo.4543513

[15]

Elke Hausner, Siw Waffenschmidt, Thomas Kaiser, and Michael Simon. 2012. Routine Development of Objectively Derived Search Strategies. Systematic reviews, Vol. 1, 1 (2012), 19.

[16]

Sampath Jayarathna, Atish Patra, and Frank Shipman. 2015. Unified relevance feedback for multi-application user interest modeling. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries . 129--138.

Digital Library

[17]

Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2017. CLEF 2017 technologically assisted reviews in empirical medicine overview. In CEUR workshop proceedings, Vol. 1866. 1--29.

[18]

Evangelos Kanoulas, Dan Li, Leif Azzopardi, and René Spijker. 2018. CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview. In CLEF .

[19]

Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2019. CLEF 2019 technology assisted reviews in empirical medicine overview. In CEUR workshop proceedings, Vol. 2380.

[20]

Rianne Kaptein, Jaap Kamps, and Djoerd Hiemstra. 2008. The impact of positive, negative and topical relevance feedback . Technical Report. AMSTERDAM UNIV (NETHERLANDS).

[21]

Athanasios Lagopoulos, Antonios Anagnostou, Adamantios Minas, and Grigorios Tsoumakas. 2018. Learning-to-rank and relevance feedback for literature appraisal in empirical medicine. In International conference of the cross-language evaluation forum for European languages. Springer, 52--63.

[22]

Victor Lavrenko and W Bruce Croft. 2017. Relevance-based language models. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 260--267.

Digital Library

[23]

Grace E Lee and Aixin Sun. 2018. Seed-driven document ranking for systematic reviews in evidence-based medicine. In The 41st international ACM SIGIR conference on research & development in information retrieval. 455--464.

Digital Library

[24]

Dan Li and Evangelos Kanoulas. 2020. When to stop reviewing in technology-assisted reviews: Sampling from an adaptive distribution to estimate residual relevant documents. ACM Transactions on Information Systems (TOIS), Vol. 38, 4 (2020), 1--36.

Digital Library

[25]

Donna Maglott, Jim Ostell, Kim D Pruitt, and Tatiana Tatusova. 2005. Entrez Gene: gene-centered information at NCBI. Nucleic acids research, Vol. 33, suppl_1 (2005), D54--D58.

[26]

Jessie McGowan and Margaret Sampson. 2005. Systematic Reviews Need Systematic Searchers (IRP ). Journal of the Medical Library Association, Vol. 93, 1 (2005), 74.

[27]

Christopher Norman, Mariska Leeflang, and Aurélie Névéol. 2018. Data extraction and synthesis in systematic reviews of diagnostic test accuracy: a corpus for automating and evaluating the process. In AMIA Annual Symposium Proceedings, Vol. 2018. American Medical Informatics Association, 817.

[28]

Mateus Pereira, Elham Etemad, and Fernando Paulovich. 2020. Iterative learning to rank from explicit relevance feedback. In Proceedings of the 35th Annual ACM Symposium on Applied Computing. 698--705.

Digital Library

[29]

Animesh Prasad, Manpreet Kaur, and Min-Yen Kan. 2018. Neural ParsCit : A Deep Learning Based Reference String Parser. Journal on Digital Libraries, Vol. 19 (2018), 323--337.

Digital Library

[30]

Piotr Przybyła, Austin J Brockmeier, Georgios Kontonatsios, Marie-Annick Le Pogam, John McNaught, Erik von Elm, Kay Nolan, and Sophia Ananiadou. 2018. Prioritising references for systematic reviews with RobotAnalyst: a user study. Research synthesis methods, Vol. 9, 3 (2018), 470--488.

[31]

JJ Rocchio and Gerard Salton. 1965. Information search optimization and interactive retrieval techniques. In Proceedings of the November 30--December 1, 1965, fall joint computer conference, part I . 293--305.

Digital Library

[32]

Claude Sammut and Geoffrey I. Webb (Eds.). 2010. Leave-One-Out Cross-Validation .Springer US, Boston, MA, 600--601. https://doi.org/10.1007/978-0--387--30164--8_469

[33]

Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2019. Automatic Boolean Query Refinement for Systematic Review Literature Search. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). Association for Computing Machinery, New York, NY, USA, 1646--1656. https://doi.org/10.1145/3308558.3313544

Digital Library

[34]

Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2020 a. You can teach an old dog new tricks: Rank fusion applied to coordination level matching for ranking in systematic reviews. In European conference on information retrieval . Springer, 399--414.

Digital Library

[35]

Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2021. A comparison of automatic Boolean query formulation for systematic reviews. Information Retrieval Journal, Vol. 24, 1 (2021), 3--28.

Digital Library

[36]

Harrisen Scells, Guido Zuccon, Bevan Koopman, and Justin Clark. 2020 b. Automatic Boolean query formulation for systematic review literature search. In Proceedings of The Web Conference 2020. 1071--1081.

Digital Library

[37]

Harrisen Scells, Guido Zuccon, Bevan Koopman, and Justin Clark. 2020 c. A computational approach for objectively derived systematic review search strategies. In European conference on information retrieval. Springer, 385--398.

Digital Library

[38]

Harrisen Scells, Guido Zuccon, Bevan Koopman, Anthony Deacon, Leif Azzopardi, and Shlomo Geva. 2017. A test collection for evaluating retrieval of studies for inclusion in systematic reviews. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval . 1237--1240.

Digital Library

[39]

Harrisen Scells, Guido Zuccon, Mohamed A. Sharaf, and Bevan Koopman. 2020 d. Sampling Query Variations for Learning to Rank to Improve Automatic Boolean Query Generation in Systematic Reviews .Association for Computing Machinery, New York, NY, USA, 3041--3048. https://doi.org/10.1145/3366423.3380075

Digital Library

[40]

Alison Sneyd and Mark Stevenson. 2021. Stopping Criteria for Technology Assisted Reviews based on Counting Processes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2293--2297.

Digital Library

[41]

Shuai Wang, Hang Li, Harrisen Scells, Daniel Locke, and Guido Zuccon. 2021. MeSH Term Suggestion for Systematic Review Literature Search. In Proceedings of the 25th Australasian Document Computing Symposium (Virtual Event, Australia) (ADCS '21). Association for Computing Machinery, New York, NY, USA, Article 8. https://doi.org/10.1145/3503516.3503530

Digital Library

[42]

Shuai Wang, Harrisen Scells, Ahmed Mourad, and Guido Zuccon. 2022. Seed-Driven Document Ranking for Systematic Reviews: A Reproducibility Study. In European Conference on Information Retrieval. Springer, 686--700.

[43]

Haotian Zhang, Mustafa Abualsaud, Nimesh Ghelani, Mark D Smucker, Gordon V Cormack, and Maura R Grossman. 2018. Effective user interaction for high-recall retrieval: Less is more. In Proceedings of the 27th ACM international conference on information and knowledge management . 187--196.

Digital Library

[44]

Jie Zou, Dan Li, and Evangelos Kanoulas. 2018. Technology assisted reviews: Finding the last few relevant documents by asking yes/no questions to reviewers. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval . 949--952.

Digital Library

Cited By

Staudinger MKusa WPiroi FLipani AHanbury ASakai TIshita EOhshima HRadboud FMao JJose J(2024)A Reproducibility and Generalizability Study of Large Language Models for Query GenerationProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698432(186-196)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698432
Mao XZhuang SKoopman BZuccon GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening PrioritisationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657921(2357-2362)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657921
Wang SScells HKoopman BPotthast MZuccon G(2023)Generating Natural Language Queries for More Effective Systematic Review Screening PrioritisationProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625322(73-83)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625322
Show More Cited By

Index Terms

From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search
1. Information systems
  1. Information retrieval

Recommendations

Assessing the Impact of Vocabulary Similarity on Multilingual Information Retrieval for Bantu Languages
FIRE '16: Proceedings of the 8th Annual Meeting of the Forum for Information Retrieval Evaluation

Despite the availability of massive open information and efforts to promote multilingualism on the Web, content in Bantu languages remains negligible. Additionally, Information Retrieval (IR) systems, such as the Google search engine, use algorithms ...
Building a web test collection using social media
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Community Question Answering (CQA) platforms contain a large number of questions and associated answers. Answerers sometimes include URLs as part of the answers to provide further information. This paper describes a novel way of building a test ...
A Test Collection for Evaluating Retrieval of Studies for Inclusion in Systematic Reviews
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

This paper introduces a test collection for evaluating the effectiveness of different methods used to retrieve research studies for inclusion in systematic reviews. Systematic reviews appraise and synthesise studies that meet specific inclusion ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Australian Research Council

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
203
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)8

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Staudinger MKusa WPiroi FLipani AHanbury ASakai TIshita EOhshima HRadboud FMao JJose J(2024)A Reproducibility and Generalizability Study of Large Language Models for Query GenerationProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698432(186-196)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698432
Mao XZhuang SKoopman BZuccon GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening PrioritisationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657921(2357-2362)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657921
Wang SScells HKoopman BPotthast MZuccon G(2023)Generating Natural Language Queries for More Effective Systematic Review Screening PrioritisationProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625322(73-83)Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1145/3624918.3625322
Scells HPotthast MChen HDuh WHuang HKato MMothe JPoblete B(2023)pybool_ir: A Toolkit for Domain-Specific Search ExperimentsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591819(3190-3194)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591819
Scells HSchlatt FPotthast MChen HDuh WHuang HKato MMothe JPoblete B(2023)Smooth Operators for Effective Systematic Review QueriesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591768(580-590)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591768
Guo FLuo YYang LZhang YChen HDuh WHuang HKato MMothe JPoblete B(2023)SciMine: An Efficient Systematic Prioritization Model Based on Richer Semantic InformationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591764(205-215)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591764
Wang SScells HKoopman BZuccon GChen HDuh WHuang HKato MMothe JPoblete B(2023)Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591703(1426-1436)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591703
Wang SLi HZuccon GChua TLauw HSi LTerzi ETsaparas P(2023)MeSH Suggester: A Library and System for MeSH Term Suggestion for Systematic Review Boolean Query ConstructionProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3573025(1176-1179)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3573025
Al-Anany AHooey PCook JBurrows LMartyniuk JHynes AGerman G(2023)Phage Therapy in the Management of Urinary Tract Infections: A Comprehensive Systematic ReviewPHAGE10.1089/phage.2023.00244:3(112-127)Online publication date: 1-Sep-2023
https://doi.org/10.1089/phage.2023.0024
Wang SScells HKoopman BZuccon G(2022)Neural Rankers for Effective Screening Prioritisation in Medical Systematic Review Literature SearchProceedings of the 26th Australasian Document Computing Symposium10.1145/3572960.3572980(1-10)Online publication date: 15-Dec-2022
https://dl.acm.org/doi/10.1145/3572960.3572980

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents