[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3620665.3640417acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

An Encoding Scheme to Enlarge Practical DNA Storage Capacity by Reducing Primer-Payload Collisions

Published: 27 April 2024 Publication History

Abstract

Deoxyribonucleic Acid (DNA), with its ultra-high storage density and long durability, is a promising long-term archival storage medium and is attracting much attention today. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. Many encoding schemes have been proposed to enlarge DNA storage capacity by increasing DNA encoding density. However, only increasing encoding density is insufficient because enhancing DNA storage capacity is a multifaceted problem.
This paper assumes that random accesses are necessary for practical DNA archival storage. We identify all factors affecting DNA storage capacity under current technologies and systematically investigate the practical DNA storage capacity with several popular encoding schemes. The investigation result shows the collision between primers and DNA payload sequences is a major factor limiting DNA storage capacity. Based on this discovery, we designed a new encoding scheme called Collision Aware Code (CAC) to trade some encoding density for the reduction of primer-payload collisions. Compared with the best result among the five existing encoding schemes, CAC can extricate 120% more primers from collisions and increase the DNA tube capacity from 211.96 GB to 295.11 GB. Besides, we also evaluate CAC's recoverability from DNA storage errors. The result shows CAC is comparable to those of existing encoding schemes.

References

[1]
1996. Internet Archive Public library. https://archive.org/. Accessed: 2023-08-12.
[2]
2010. OligoArchitect Online - Glossary of Parameters. https://www.gene-quantification.de/oligo_architect_glossary.pdf. Accessed: 2023-08-12.
[3]
2011. The digital side of biology. https://phys.org/news/2011-03-digital-side-biology.html. Accessed: 2023-08-12.
[4]
2017. What is delta G value? https://www.researchgate.net/post/What-is-delta-G-value. Accessed: 2023-08-12.
[5]
2018. The Future of DNA Data Storage. https://potomacinstitute.org/images/studies/Future_of_DNA_Data_Storage.pdf. Accessed: 2023-08-12.
[6]
2019. DNA Storage Simulation. https://master.dbahb2jho41s4.amplifyapp.com/. Accessed: 2023-08-12.
[7]
2019. Sequencing depth. https://www.genomicseducation.hee.nhs.uk/glossary/read-depth/. Accessed: 2023-08-12.
[8]
2021. Worldwide Global StorageSphere Forecast, 2021--2025: To Save or Not to Save Data, That Is the Question. https://www.idc.com/getdoc.jsp?containerId=US47509621. Accessed: 2023-08-12.
[9]
2022. Integrated DNA technology: OligoAnalyzer. https://www.idtdna.com/calc/analyzer. Accessed: 2023-08-12.
[10]
Kamel A Abd-Elsalam. 2003. Bioinformatic tools and guideline for PCR primer design. african Journal of biotechnology 2, 5 (2003), 91--95.
[11]
DNA Data Storage Alliance. 2021. Preserving Our Digital Legacy: an Introduction To Dna Data Storage. Technical Report. tech. rep. June.
[12]
Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. 1990. Basic local alignment search tool. Journal of molecular biology 215, 3 (1990), 403--410.
[13]
Leon Anavy, Inbal Vaknin, Orna Atar, Roee Amit, and Zohar Yakhini. 2018. Improved DNA based storage capacity and fidelity using composite DNA letters. bioRxiv (2018), 433524.
[14]
Tugkan Batu, Sampath Kannan, Sanjeev Khanna, and Andrew McGregor. 2004. Reconstructing strings from random traces. Departmental Papers (CIS) (2004), 173.
[15]
Meinolf Blawat, Klaus Gaedke, Ingo Huetter, Xiao-Ming Chen, Brian Turczyk, Samuel Inverso, Benjamin W Pruitt, and George M Church. 2016. Forward error correction for DNA data storage. Procedia Computer Science 80 (2016), 1011--1022.
[16]
James Bornholt, Randolph Lopez, Douglas M Carmean, Luis Ceze, Georg Seelig, and Karin Strauss. 2016. A DNA-based archival storage system. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. 637--649.
[17]
Luis Ceze, Jeff Nivala, and Karin Strauss. 2019. Molecular digital data storage using DNA. Nature Reviews Genetics 20, 8 (2019), 456--466.
[18]
Yeongjae Choi, Taehoon Ryu, Amos C Lee, Hansol Choi, Hansaem Lee, Jaejun Park, Suk-Heung Song, Seoju Kim, Hyeli Kim, Wook Park, et al. 2018. Addition of degenerate bases to DNA-based data storage for increased information capacity. bioRxiv (2018), 367052.
[19]
George M Church, Yuan Gao, and Sriram Kosuri. 2012. Next-generation digital information storage in DNA. Science 337, 6102 (2012), 1628--1628.
[20]
Simantini Das, Satish C Mohapatra, and James T Hsu. 1999. Studies on primer-dimer formation in polymerase chain reaction (PCR). Biotechnology Techniques 13, 10 (1999), 643--646.
[21]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.
[22]
CW Dieffenbach, TM Lowe, and GS Dveksler. 1993. General concepts for PCR primer design. PCR methods appl 3, 3 (1993), S30--S37.
[23]
Yiming Dong, Fajia Sun, Zhi Ping, Qi Ouyang, and Long Qian. 2020. DNA storage: research landscape and future prospects. National Science Review 7, 6 (2020), 1092--1107.
[24]
Andrea Doricchi, Casey M Platnich, Andreas Gimpel, Friederikee Horn, Max Earle, German Lanzavecchia, Aitziber L Cortajarena, Luis M Liz-Marzán, Na Liu, Reinhard Heckel, et al. 2022. Emerging approaches to DNA data storage: Challenges and prospects. ACS nano 16, 11 (2022), 17552--17571.
[25]
Yaniv Erlich and Dina Zielinski. 2017. DNA Fountain enables a robust and efficient storage architecture. Science 355, 6328 (2017), 950--954.
[26]
Nick Goldman, Paul Bertone, Siyuan Chen, Christophe Dessimoz, Emily M LeProust, Botond Sipos, and Ewan Birney. 2013. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 7435 (2013), 77--80.
[27]
Robert N Grass, Reinhard Heckel, Michela Puddu, Daniela Paunescu, and Wendelin J Stark. 2015. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angewandte Chemie International Edition 54, 8 (2015), 2552--2555.
[28]
Siddhartha Jain, Farzad Farnoud, Moshe Schwartz, and Jehoshua Bruck. 2020. Coding for Optimized Writing Rate in DNA Storage. 2020 IEEE International Symposium on Information Theory (ISIT) (2020), 711--716.
[29]
Srinivasaraghavan Kannan and Martin Zacharias. 2007. Folding of a DNA hairpin loop structure in explicit solvent using replica-exchange molecular dynamics simulations. Biophysical journal 93, 9 (2007), 3218--3228.
[30]
Sriram Kosuri and George M Church. 2014. Large-scale de novo DNA synthesis: technologies and applications. Nature methods 11, 5 (2014), 499.
[31]
Henry H Lee, Reza Kalhor, Naveen Goela, Jean Bolot, and George M Church. 2018. Enzymatic DNA synthesis for digital information storage. bioRxiv (2018), 348987.
[32]
Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi. 2020. Coding Over Sets for DNA Storage. IEEE Transactions on Information Theory 66 (2020), 2331--2351.
[33]
Bingzhe Li, Li Ou, and David Du. 2021. Img-dna: approximate dna storage for images. In Proceedings of the 14th ACM International Conference on Systems and Storage. 1--9.
[34]
Bingzhe Li, Nae Young Song, Li Ou, and David HC Du. 2020. Can We Store the Whole World's Data in {DNA} Storage?. In 12th {USENIX} Workshop on Hot Topics in Storage and File Systems (HotStorage 20).
[35]
Dehui Lin, Yasamin Tabatabaee, Yash Pote, and Djordje Jevdjic. 2022. Managing reliability skew in DNA storage. In Proceedings of the 49th Annual International Symposium on Computer Architecture. 482--494.
[36]
Xiaozhou Lu and Sunghwan Kim. 2021. Design of Nonbinary Error Correction Codes With a Maximum Run-Length Constraint to Correct a Single Insertion or Deletion Error for DNA Storage. IEEE Access 9 (2021), 135354--135363.
[37]
Hairong Ma, David J Proctor, Elzbieta Kierzek, Ryszard Kierzek, Philip C Bevilacqua, and Martin Gruebele. 2006. Exploring the energy landscape of a small RNA hairpin. Journal of the American Chemical Society 128, 5 (2006), 1523--1530.
[38]
Karishma Matange, James M Tuck, and Albert J Keung. 2021. DNA stability: a central design consideration for DNA data storage systems. Nature communications 12, 1 (2021), 1--9.
[39]
Ethan L Miller. 2020. The Future of the Past: Challenges in Archival Storage. (2020).
[40]
Thomas P Niedringhaus, Denitsa Milanova, Matthew B Kerby, Michael P Snyder, and Annelise E Barron. 2011. Landscape of next-generation sequencing technologies. Analytical chemistry 83, 12 (2011), 4327--4341.
[41]
Lee Organick, Siena Dumas Ang, Yuan-Jyue Chen, Randolph Lopez, Sergey Yekhanin, Konstantin Makarychev, Miklos Z Racz, Govinda Kamath, Parikshit Gopalan, Bichlien Nguyen, et al. 2018. Random access in large-scale DNA data storage. Nature biotechnology 36, 3 (2018), 242.
[42]
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5206--5210.
[43]
Omer S Sella, Amir Apelbaum, Thomas Heinis, Jasmine Quah, and Andrew W Moore. 2021. DNA archival storage, a bottom up approach. In Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems. 58--63.
[44]
Vinay K Singh, R Govindarajan, Sita Naik, and Anil Kumar. 2000. The effect of hairpin structure on PCR amplification efficiency. Mol Biol Today 1, 3 (2000), 67--69.
[45]
SM Tabatabaei Yazdi, Yongbo Yuan, Jian Ma, Huimin Zhao, and Olgica Milenkovic. 2015. A rewritable, random-access DNA-based storage system. Scientific reports 5, 1 (2015), 1--10.
[46]
Jianjun Wu, Shufang Zhang, Tao Zhang, and Yuhong Liu. 2021. HD-Code: End-to-End High Density Code for DNA Storage. IEEE Transactions on NanoBioscience 20 (2021), 455--463.
[47]
SM Yazdi, Ryan Gabrys, and Olgica Milenkovic. 2017. Portable and error-free DNA-based data storage. Scientific reports 7, 1 (2017), 1--6.
[48]
SM Hossein Tabatabaei Yazdi, Yongbo Yuan, Jian Ma, Huimin Zhao, and Olgica Milenkovic. 2015. A rewritable, random-access DNA-based storage system. Scientific reports 5 (2015), 14138.
[49]
Qiang Yin, Yanfen Zheng, Bin Wang, and Qiang Zhang. 2021. Design of Constraint Coding Sets for Archive DNA Storage. IEEE/ACM transactions on computational biology and bioinformatics PP (2021).

Index Terms

  1. An Encoding Scheme to Enlarge Practical DNA Storage Capacity by Reducing Primer-Payload Collisions

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
      April 2024
      1299 pages
      ISBN:9798400703850
      DOI:10.1145/3620665
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 April 2024

      Check for updates

      Author Tags

      1. DNA storage
      2. DNA encoding scheme
      3. primer-payload collision

      Qualifiers

      • Research-article

      Conference

      ASPLOS '24

      Acceptance Rates

      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 220
        Total Downloads
      • Downloads (Last 12 months)220
      • Downloads (Last 6 weeks)26
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media