[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3698587.3701339acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Free access

Optimal protospacer sequences recommended by ensemble deep learning for high-efficiency base editing

Published: 16 December 2024 Publication History

Abstract

Identification of an optimal protospacer at a disease-associated gene is critically important for the design of a high-efficiency base editor that does not have much bystander effect to edit this gene. Current machine learning methods are prone to overestimating the editing efficiencies of those low-efficiency editors, leading to wrong recommendations of high-efficiency protospacers; meanwhile they made separate predictions for editing efficiency and outcome proportion using two independent models, giving rise to performance inconsistency and confusing the identification of optimal protospacers. We propose an ensemble of paired convolutional neural networks for accurate prediction of outcome proportions and then we derive editing efficiencies directly from the proportions. Our method is able to significantly reduce the performance inconsistency between editing efficiency and editing proportion predictions caused by the two-model approach. Our method generalizes well to work on a range of different editing platforms. Furthermore, our method recommends optimal protospacers by ranking the candidates' picking score which we newly defined as a harmonic indexing score integrating both of on-target editing efficiency and bystander editing effect.

Supplemental Material

ZIP File - Optimal protospacer sequences recommended by ensemble deep learning for high-efficiency base editing
Optimal protospacer sequences recommended by ensemble deep learning for high-efficiency base editing

References

[1]
Mandana Arbab, Zaneta Matuszek, Kaitlyn M Kray, Ailing Du, Gregory A Newby, Anton J Blatnik, Aditya Raguram, Michelle F Richter, Kevin T Zhao, Jonathan M Levy, et al. 2023. Base editing rescue of spinal muscular atrophy in cells and in mice. Science 380, 6642 (2023), eadg6518.
[2]
Mandana Arbab, Max W Shen, Beverly Mok, Christopher Wilson, Żaneta Matuszek, Christopher A Cassa, and David R Liu. 2020. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 2 (2020), 463--480.
[3]
Sylvain Arlot and Alain Celisse. 2010. A survey of cross-validation procedures for model selection. Statistics Surveys 4 (2010), 40--79.
[4]
Sourav K Bose, Brandon M White, Meghana V Kashyap, Apeksha Dave, Felix R De Bie, Haiying Li, Kshitiz Singh, Pallavi Menon, Tiankun Wang, Shiva Teerdhala, et al. 2021. In utero adenine base editing corrects multi-organ pathology in a lethal lysosomal storage disease. Nature Communications 12, 1 (2021), 1--16.
[5]
Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123--140.
[6]
Michael W Browne. 2000. Cross-validation methods. Journal of Mathematical Psychology 44, 1 (2000), 108--132.
[7]
Liang Chen, Mengjia Hong, Changming Luan, Hongyi Gao, Gaomeng Ru, Xinyuan Guo, Dujuan Zhang, Shun Zhang, Changwei Li, Jun Wu, et al. 2024. Adenine transversion editors enable precise, efficient A• T-to-C• G base editing in mammalian cells and embryos. Nature Biotechnology 42, 4 (2024), 638--650.
[8]
Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 1 (2020), 1--13.
[9]
Thomas G Dietterich. 2000. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 2 (2000), 139--157.
[10]
Nicole M Gaudelli, Alexis C Komor, Holly A Rees, Michael S Packer, Ahmed H Badran, David I Bryson, and David R Liu. 2017. Programmable base editing of A• T to G• C in genomic DNA without DNA cleavage. Nature 551, 7681 (2017), 464--471.
[11]
Jason M Gehrke, Oliver Cervantes, M Kendell Clement, Yuxuan Wu, Jing Zeng, Daniel E Bauer, Luca Pinello, and J Keith Joung. 2018. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nature Biotechnology 36, 10 (2018), 977--982.
[12]
Göknur Giner, Saima Ikram, Marco J Herold, and Anthony T Papenfuss. 2023. A systematic review of computational methods for designing efficient guides for CRISPR DNA base editor systems. Briefings in Bioinformatics 24, 4 (2023), bbad205.
[13]
Jiayin Guo, Xue Zhang, Xiaoxu Chen, Haifeng Sun, Yichen Dai, Jianying Wang, Xuezhen Qian, Lei Tan, Xin Lou, and Bin Shen. 2021. Precision modeling of mitochondrial diseases in zebrafish via DdCBE-mediated mtDNA base editing. Cell Discovery 7, 78 (2021), 1--5.
[14]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456.
[15]
Nahye Kim, Sungchul Choi, Sungjae Kim, Myungjae Song, Jung Hwa Seo, Seonwoo Min, Jinman Park, Sung-Rae Cho, and Hyongbum Henry Kim. 2024. Deep learning models to predict the editing efficiencies and outcomes of diverse base editors. Nature Biotechnology 42, 3 (2024), 484--497.
[16]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[17]
Luke W Koblan, Michael R Erdos, Christopher Wilson, Wayne A Cabral, Jonathan M Levy, Zheng-Mei Xiong, Urraca L Tavarez, Lindsay M Davison, Yantenew G Gete, Xiaojing Mao, et al. 2021. In vivo base editing rescues Hutchinson-Gilford progeria syndrome in mice. Nature 589, 7843 (2021), 608--614.
[18]
Alexis C Komor, Kevin T Zhao, Michael S Packer, Nicole M Gaudelli, Amanda L Waterbury, Luke W Koblan, Y Bill Kim, Ahmed H Badran, and David R Liu. 2017. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C: G-to-T: A base editors with higher efficiency and product purity. Science Advances 3, 8 (2017), eaao4774.
[19]
Jake Lever, Martin Krzywinski, and Naomi Altman. 2016. Points of significance: model selection and overfitting. Nature Methods 13, 9 (2016), 703--705.
[20]
Jianan Li, Wenxia Yu, Shisheng Huang, Susu Wu, Liping Li, Jiankui Zhou, Yu Cao, Xingxu Huang, and Yunbo Qiao. 2021. Structure-guided engineering of adenine base editor with minimized RNA off-targeting activity. Nature Communications 12, 1 (2021), 1--8.
[21]
Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J Smola. 2014. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 661--670.
[22]
Colin KW Lim, Michael Gapinske, Alexandra K Brooks, Wendy S Woods, Jackson E Powell, Jackson Winter, Pablo Perez-Pinera, Thomas Gaj, et al. 2020. Treatment of a mouse model of ALS by in vivo base editing. Molecular Therapy 28, 4 (2020), 1177--1189.
[23]
Kim F Marquart, Ahmed Allam, Sharan Janjuha, Anna Sintsova, Lukas Villiger, Nina Frey, Michael Krauthammer, and Gerald Schwank. 2021. Predicting base editing outcomes with an attention-based deep learning algorithm trained on high-throughput target library screens. Nature Communications 12, 1 (2021), 1--9.
[24]
Seonwoo Min, Byunghan Lee, and Sungroh Yoon. 2017. Deep learning in bioinformatics. Briefings in Bioinformatics 18, 5 (2017), 851--869.
[25]
Rukmini Mishra, Raj Kumar Joshi, and Kaijun Zhao. 2020. Base editing in crops: current advances, limitations and future implications. Plant Biotechnology Journal 18, 1 (2020), 20--31.
[26]
Gregory A Newby, Jonathan S Yen, Kaitly J Woodard, Thiyagaraj Mayuranathan, Cicera R Lazzarotto, Yichao Li, Heather Sheppard-Tillman, Shaina N Porter, Yu Yao, Kalin Mayberry, et al. 2021. Base editing of haematopoietic stem cells rescues sickle cell disease in mice. Nature (2021), 1--8.
[27]
Ngoc G Nguyen, Vu Anh Tran, Dau Phan, Favorisen R Lumbanraja, Mohammad Reza Faisal, Bahriddin Abapihi, Mamoru Kubo, and Kenji Satou. 2016. DNA sequence classification by convolutional neural network. Journal Biomedical Science and Engineering 9, 5 (2016), 280--286.
[28]
Elizabeth M Porto, Alexis C Komor, Ian M Slaymaker, and Gene W Yeo. 2020. Base editing: advances and therapeutic opportunities. Nature Reviews Drug Discovery 19, 12 (2020), 839--859.
[29]
Holly A Rees and David R Liu. 2018. Base editing: precision chemistry on the genome and transcriptome of living cells. Nature Reviews Genetics 19, 12 (2018), 770--788.
[30]
Michelle F Richter, Kevin T Zhao, Elliot Eton, Audrone Lapinaite, Gregory A Newby, Benjamin W Thuronyi, Christopher Wilson, Luke W Koblan, Jing Zeng, Daniel E Bauer, et al. 2020. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nature Biotechnology 38, 7 (2020), 883--891.
[31]
Savio D Rodrigues, Mansour Karimi, Lennert Impens, Els Van Lerberge, Griet Coussens, Stijn Aesaert, Debbie Rombaut, Dominique Holtappels, Heba MM Ibrahim, Marc Van Montagu, et al. 2021. Efficient CRISPR-mediated base editing in Agrobacterium spp. Proceedings of the National Academy of Sciences 118, 2 (2021).
[32]
Juan José Rodriguez, Ludmila I Kuncheva, and Carlos J Alonso. 2006. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 10 (2006), 1619--1630.
[33]
Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 ( 2016).
[34]
Seuk-Min Ryu, Taeyoung Koo, Kyoungmi Kim, Kayeong Lim, Gayoung Baek, Sang-Tae Kim, Heon Seok Kim, Da-eun Kim, Hyunji Lee, Eugene Chung, et al. 2018. Adenine base editing in mouse embryos and an adult mouse model of Duchenne muscular dystrophy. Nature Biotechnology 36, 6 (2018), 536--539.
[35]
Hiroki Sasaguri, Kenichi Nagata, Misaki Sekiguchi, Ryo Fujioka, Yukio Matsuba, Shoko Hashimoto, Kaori Sato, Deepika Kurup, Takanori Yokota, and Takaomi C Saido. 2018. Introduction of pathogenic mutations into the mouse Psen1 gene by Base Editor and Target-AID. Nature Communications 9, 1 (2018), 1--8.
[36]
Johannes Schmidt-Hieber. 2020. Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics 48, 4 (2020), 1875--1897.
[37]
Chun-Qing Song, Tingting Jiang, Michelle Richter, Luke H Rhym, Luke W Koblan, Maria Paz Zafra, Emma M Schatoff, Jordan L Doman, Yueying Cao, Lukas E Dow, et al. 2020. Adenine base editing in an adult mouse model of tyrosinaemia. Nature Biomedical Engineering 4, 1 (2020), 125--130.
[38]
Myungjae Song, Hui Kwon Kim, Sungtae Lee, Younggwang Kim, Sang-Yeon Seo, Jinman Park, Jae Woo Choi, Hyewon Jang, Jeong Hong Shin, Seonwoo Min, et al. 2020. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nature Biotechnology 38, 9 (2020), 1037--1043.
[39]
Sudhir Varma and Richard Simon. 2006. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7, 1 (2006), 1--8.
[40]
Lukas Villiger, Tanja Rothgangl, Dominik Witzigmann, Rurika Oka, Paulo JC Lin, Weihong Qi, Sharan Janjuha, Christian Berk, Femke Ringnalda, Mitchell B Beattie, et al. 2021. In vivo cytidine base editing of hepatocytes without detectable off-target mutations in RNA and DNA. Nature biomedical engineering 5, 2 (2021), 179--189.
[41]
Qian Wang, Jie Yang, Zhicheng Zhong, Xue Gao, and Anatoly Kolomeisky. 2021. A General Theoretical Framework to Design Base Editors with Reduced Bystander Effects. Nature Communications 12 (2021), 6529.
[42]
Xinyou Yin, JAN Goudriaan, Egbert A Lantinga, JAN Vos, and Huub J Spiertz. 2003. A flexible sigmoid function of determinate growth. Annals of botany 91, 3 (2003), 361--371.
[43]
Tanglong Yuan, Leilei Wu, Shiyan Li, Jitan Zheng, Nana Li, Xiao Xiao, Haihang Zhang, Tianyi Fei, Long Xie, Zhenrui Zuo, et al. 2024. Deep learning models incorporating endogenous factors beyond DNA sequences improve the prediction accuracy of base editing outcomes. Cell Discovery 10, 1 (2024), 20.
[44]
Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2020. A comprehensive survey on transfer learning. Proc. IEEE 109, 1 (2020), 43--76.

Index Terms

  1. Optimal protospacer sequences recommended by ensemble deep learning for high-efficiency base editing
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Please enable JavaScript to view thecomments powered by Disqus.

              Information & Contributors

              Information

              Published In

              cover image ACM Conferences
              BCB '24: Proceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
              November 2024
              614 pages
              ISBN:9798400713026
              DOI:10.1145/3698587
              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Sponsors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              Published: 16 December 2024

              Permissions

              Request permissions for this article.

              Check for updates

              Author Tags

              1. Base editing
              2. Ensemble Paired Convolutional Neural Network
              3. Picking score

              Qualifiers

              • Research-article
              • Research
              • Refereed limited

              Funding Sources

              • start-up funding grant at Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
              • National Innovation Fellow Program of the MOST of China
              • MOE AcRF Tier 2 award
              • MOE AcRF Tier 1 award

              Conference

              BCB '24
              Sponsor:

              Acceptance Rates

              Overall Acceptance Rate 254 of 885 submissions, 29%

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • 0
                Total Citations
              • 8
                Total Downloads
              • Downloads (Last 12 months)8
              • Downloads (Last 6 weeks)8
              Reflects downloads up to 19 Dec 2024

              Other Metrics

              Citations

              View Options

              View options

              PDF

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              Login options

              Media

              Figures

              Other

              Tables

              Share

              Share

              Share this Publication link

              Share on social media