[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3084226.3084243acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
research-article

A Machine Learning Approach for Semi-Automated Search and Selection in Literature Studies

Published: 15 June 2017 Publication History

Abstract

Background. Search and selection of primary studies in Systematic Literature Reviews (SLR) is labour intensive, and hard to replicate and update. Aims. We explore a machine learning approach to support semi-automated search and selection in SLRs to address these weaknesses. Method. We 1) train a classifier on an initial set of papers, 2) extend this set of papers by automated search and snowballing, 3) have the researcher validate the top paper, selected by the classifier, and 4) update the set of papers and iterate the process until a stopping criterion is met. Results. We demonstrate with a proof-of-concept tool that the proposed automated search and selection approach generates valid search strings and that the performance for subsets of primary studies can reduce the manual work by half. Conclusions. The approach is promising and the demonstrated advantages include cost savings and replicability. The next steps include further tool development and evaluate the approach on a complete SLR.

References

[1]
S. Augier, G. Venturini, and Y. Kodratoff. 1995. Learning first order logic rules with a genetic algorithm. In Proc. of The 1st International Conference on Knowledge Discovery and Data Mining (KDD-95).
[2]
D. Badampudi, C. Wohlin, and K. Petersen. 2015. Experiences from Using Snowballing and Database Searches in Systematic Literature Studies. In Proc. of the 19th International Conference on Evaluation and Assessment in Software Engineering (EASE '15). ACM, New York, NY, USA, Article 17, 10 pages.
[3]
S. Bird, E. Klein, and E. Loper. 2009. Natural Language Processing with Python -- Analyzing Text with the Natural Language Toolkit. O'Reilly Media, Inc.
[4]
D. M. Blei, A. Y. Ng, and J. I. Michael. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3, Jan (2003), 993--1022.
[5]
C.-C. Chang and C.-J. Lin. 2011. LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1--27:27. Issue 3.
[6]
O. Chapelle and L. Li. 2011. An Empirical Evaluation of Thompson Sampling. In Proc. of the 24th International Conference on Neural Information Processing Systems (NIPS'11). 2249--2257.
[7]
M. K. Choong, F. Galgani, A. G. Dunn, and G. Tsafnat. 2014. Automatic Evidence Retrieval for Systematic Reviews. Journal of Medical Internet Research 10, e223 (Oct 2014).
[8]
C. Cortes and V. Vapnik. 1995. Support-Vector Networks. Machine Learning 20, 3 (1995), 273--297.
[9]
D. S. Cruzes and T. Dybå. 2011. Research Synthesis in Software Engineering: A Tertiary Study. Information and Software Technology 53, 5 (2011), 440--455.
[10]
F. Q. B. daSilva, A.L. M.Santos, S. Soares, A. C França, C. V F. Monteiro, and F.F. Maciel. 2011. Six Years of Systematic Literature Reviews in Software Engineering: An Updated Tertiary Study. Information and Software Technology 53, 9 (2011), 899--913.
[11]
O. Dieste, A. Grimán, and N. Juristo. 2009. Developing Search Strategies for Detecting Relevant Experiments. Empirical Software Engineering 14, 5 (2009), 513--539.
[12]
R. E. Fan, K. W. Chang, C.J. Hsieh, X. R. Wang, and C.J. Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research 9 (2008), 1871--1874.
[13]
M. Ghafari, M. Saleh, and T. Ebrahimi. 2012. A Federated Search Approach to Facilitate Systematic Literature Review in Software Engineering. International Journal of Software Engineering & Applications 3, 2 (2012), 13--24.
[14]
S. Jalali and C. Wohlin. 2012. Systematic Literature Studies: Database Searches vs. Backward Snowballing. In Proc. of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. 29--38.
[15]
B. A. Kitchenham and P. Brereton. 2013. A Systematic Review of Systematic Review Process Research in Software Engineering. Information and Software Technology 55, 12 (2013), 2049--2075.
[16]
B. A. Kitchenham, D. Budgen, and P. Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews. CRC Press.
[17]
B. A. Kitchenham, Z. Li, and A. Burn. 2011. Validating Search Processes in Systematic Literature Reviews. In Proc. of the 1st International Workshop on Evidential Assessment of Software Technologies.
[18]
B. A. Kitchenham, R. Pretorius, D. Budgen, P. Brereton, M. Turner, M. Niazi, and S. Linkman. 2010. Systematic Literature Reviews in Software Engineering - A Tertiary Study. Information and Software Technology 52, 8 (2010), 792--805.
[19]
C. Marshall and P. Brereton. 2013. Tools to Support Systematic Literature Reviews in Software Engineering: A Mapping Study. In ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 296--299.
[20]
C. Marshall, P. Brereton, and B. A. Kitchenham. 2014. Tools to Support Systematic Reviews in Software Engineering: A Feature Analysis. In Proc. of the 18th International Conference on Evaluation and Assessment in Software Engineering. ACM, 13.
[21]
M. Miwa, J. Thomas, A. OfiMara-Eves, and S. Ananiadou. 2014. Reducing Systematic Review Workload Through Certainty-based Screening. Journal of Biomedical Informatics 51 (2014), 242--253.
[22]
K. P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press.
[23]
D. Q. Nguyen. 2015. jLDADMM: A Java Package for the LDA and DMM Topic Models. http://jldadmm.sourceforge.net/. (2015).
[24]
B. K. Olorisade, E. de Quincey, P. Brereton, and P. Andras. 2016. A Critical Analysis of Studies That Address the Use of Text Mining for Citation Screening in Systematic Reviews. In Proc. of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE '16). ACM, 14:1--14:11.
[25]
A. O'Mara-Eves" J. Thomas, J. McNaught, M. Miwa, and S. Ananiadou. 2015. Using Text Mining for Study Identification in Systematic Reviews: A Systematic Review of Current Approaches. Systematic Reviews 4, 1 (2015), 5.
[26]
J. R. Quinlan. 1986. Induction of Decision Trees. Machine Learning 1, 1 (1986), 81--106.
[27]
K. A. Robinson, A. G. Dunn, G. Tsafnat, and P. Glasziou. 2014. Citation Networks of Related Trials are Often Disconnected: Implications for Bidirectional Citation Searches. Journal of Clinical Epidemiology 67, 7 (2014), 793 - 799.
[28]
G. Salton, E. A. Fox, and H. Wu. 1983. Extended Boolean Information Retrieval. Communication of the ACM 26, 11 (1983), 1022--1036.
[29]
B. Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin-Madison. 11 pages. https://minds.wisconsin.edu/handle/1793/60660.
[30]
B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. 2016. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. of the IEEE 104, 1 (2016), 148--175.
[31]
M. Skoglund and P. Runeson. 2009. Reference-based Search Strategies in Systematic Reviews. In Proc. of the 13th international conference on Evaluation and Assessment in Software Engineering (EASE'09). 31--40.
[32]
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. 2006. Hierarchical Dirichlet Processes. J. Amer. Statist. Assoc. 101, 476 (2006), 1566--1581.
[33]
G. Tsafnat, P. Glasziou, M. K. Choong, A. Dunn, F. Galgani, and E. Coiera. 2014. Systematic Review Automation Technologies. Systematic Reviews 3, 1 (2014), 74.
[34]
B. C. Wallace, K. Small, C. E. Brodley, J. Lau, C. H. Schmid, L. Bertram, C. M. Lill, J. T. Cohen, and T. A. Trikalinos. 2012. Toward Modernizing the Systematic Review Pipeline in Genetics: Efficient Updating via Data Mining. Genetics in Medicine 14, 7 (2012), 663--669.
[35]
C. Wohlin, P. Runeson, P. A. da Mota Silveira, E. Engstrom, I. do Carmo Machado, and E. S. de Almeida. 2013. On the Reliability of Mapping Studies in Software Engineering. Journal of Systems and Software 86, 10 (2013), 2594--2610.
[36]
H. Zhang, M. A. Babar, and P. Tell. 2011. Identifying Relevant Studies in Software Engineering. Information and Software Technology 53, 6 (2011), 625--637.

Cited By

View all
  • (2024)Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic ReviewsBig Data and Cognitive Computing10.3390/bdcc80901108:9(110)Online publication date: 4-Sep-2024
  • (2024)The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analysesSystematic Reviews10.1186/s13643-024-02502-713:1Online publication date: 1-Mar-2024
  • (2024)Discovery of alkaline laccases from basidiomycete fungi through machine learning-based approachBiotechnology for Biofuels and Bioproducts10.1186/s13068-024-02566-617:1Online publication date: 11-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering
June 2017
405 pages
ISBN:9781450348041
DOI:10.1145/3084226
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • School of Computing, BTH: Blekinge Institute of Technology - School of Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Automation
  2. Machine learning
  3. Reinforcement learning
  4. Research identification
  5. Study selection
  6. Systematic literature review

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EASE'17

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)149
  • Downloads (Last 6 weeks)21
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic ReviewsBig Data and Cognitive Computing10.3390/bdcc80901108:9(110)Online publication date: 4-Sep-2024
  • (2024)The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analysesSystematic Reviews10.1186/s13643-024-02502-713:1Online publication date: 1-Mar-2024
  • (2024)Discovery of alkaline laccases from basidiomycete fungi through machine learning-based approachBiotechnology for Biofuels and Bioproducts10.1186/s13068-024-02566-617:1Online publication date: 11-Sep-2024
  • (2024)A systematic review of peer support interventions to improve psychosocial functioning among cancer survivors: can findings be translated to survivors with a rare cancer living rurally?Orphanet Journal of Rare Diseases10.1186/s13023-024-03477-319:1Online publication date: 20-Dec-2024
  • (2024)Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project HealthACM Transactions on Software Engineering and Methodology10.1145/363025233:3(1-22)Online publication date: 14-Mar-2024
  • (2024)Approaches, enablers and barriers to govern the private sector in health in low- and middle-income countries: a scoping reviewBMJ Global Health10.1136/bmjgh-2024-0157718:Suppl 5(e015771)Online publication date: 13-Nov-2024
  • (2024)How to Best Measure Academic Dishonesty in StudentsEuropean Journal of Psychological Assessment10.1027/1015-5759/a000861Online publication date: 18-Nov-2024
  • (2024)Digital Solutions to Optimize Guideline-Directed Medical Therapy Prescriptions in Heart Failure Patients: Current Applications and Future DirectionsCurrent Heart Failure Reports10.1007/s11897-024-00649-x21:2(147-161)Online publication date: 16-Feb-2024
  • (2024)Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational ResearchEducational Psychology Review10.1007/s10648-024-09862-536:1Online publication date: 8-Feb-2024
  • (2024)Towards the automation of systematic reviews using natural language processing, machine learning, and deep learning: a comprehensive reviewArtificial Intelligence Review10.1007/s10462-024-10844-w57:8Online publication date: 9-Jul-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media