A performance analysis of genome search by matching whole targeted reads on different environments

Jaehee Jung¹ &
Gangman Yi²

142 Accesses
2 Citations
Explore all metrics

Abstract

An increase in the size of next-generation sequencing (NGS) data owing to the development of the novel computation power has made an automated analysis system increasingly desirable. To automatically predict genes for the unknown sequences, several pipeline steps are required. The first step involves the acquisition of various NGS fragment reads, followed by assembler of the fragment reads of 100 bp to 10 Kbp. Upon accurate assembler of NGS fragment reads of a sufficient size, a de novo assembler is used to construct the whole genome. However, reads are assembled on the basis of overlaps in the reference sequences instead of using the de novo assembler, owing to inaccuracy and short length. The next step is the prediction of genes in whole assembled contigs. Upon matching candidate sequences with references sequences, genes can be annotated. In each processing step, different formatted inputs and outputs are required; hence, data files of different formats must be managed. To reduce these redundant processes, we herein propose an approach referred to as the genome search system. This system automatically identifies genes from assembled sequences and reference amino acids sequences. However, challenge associated with this is that BLAST and analysis of results for each gene are computationally intensive processes; hence, reduces the use of hardware resources to process whole assembled reads. This helps improve performance and shorten the execution time to identify genes. Based on this result, this study reviews this approach of identifying genes and compare the performance of different system environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

HIA: a genome mapper using hybrid index-based sequence alignment

Article Open access 23 December 2015

ReneGENE-Novo: Co-designed Algorithm-Architecture for Accelerated Preprocessing and Assembly of Genomic Short Reads

Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework

Article Open access 09 December 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477, 05
Article MathSciNet Google Scholar
Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618, 06
Article Google Scholar
Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WE, Wetter T, Suhai S (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14:1147–1159, 06
Article Google Scholar
Darling AE, Carey L, Feng WC (2003) The design, implementation, and evaluation of mpiBLAST, San Jose, CA, p 6
Jung J, Kim JI, Jeong Y-S, Yi G (2017) A robust method for finding the automated best matched genes based on grouping similar fragments of large-scale references for genome assembly. Symmetry 9(9):192. https://www.mdpi.com/2073-8994/9/9/192
Kim JI, Moore CE, Archibald JM, Bhattacharya D, Yi G, Yoon HS, Shin W (2017) Evolutionary dynamics of cryptophyte plastid genomes. Genome Biol Evol 9(7):1859–1872
Article Google Scholar
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2003) Versatile and open software for comparing large genomes. Genome Biol 5:R12–R12
Article Google Scholar
Langmead B (2010) Aligning short sequencing reads with bowtie. In: Baxevanis AD et al (ed) Current protocols in bioinformatics/editorial board, vol CHAPTER, pp. Unit–11.7, 12
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760, 07
Article Google Scholar
Liu L, Wang Y, He P, Li P, Lee J, Soltis DE, Fu C (2018) Chloroplast genome analyses and genomic resource development for epilithic sister genera oresitrophe and mukdenia (saxifragaceae), using genome skimming data. BMC Genomics 19:235
Article Google Scholar
Lohse M, Drechsel O, Bock R (2007) OrganellarGenomeDRAW (OGDRAW): a tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr Genet 52:267–274
Article Google Scholar
Lohse M, Drechsel O, Kahlau S, Bock R (2013) OrganellarGenomeDRAW—a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic Acids Res 41(W1):W575–W581
Article Google Scholar
Lowe TM, Chan PP (2016) trnascan-se on-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res 44:W54–W57
Article Google Scholar
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu S-M, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam T-W, Wang J (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1:18–18
Article Google Scholar
Mathog DR (2003) Parallel BLAST on split databases. Bioinformatics 19(14):1865–1866
Article Google Scholar
Oehmen C, Nieplocha J (2006) Scalablast: a scalable implementation of blast for high-performance data-intensive bioinformatics analysis. IEEE Trans Parallel Distrib Syst 17:740–749
Article Google Scholar
Paszkiewicz K, Studholme DJ (2010) De novo assembly of short sequence reads. Brief Bioinform 11(5):457–472
Article Google Scholar
Sawyer SE, Rekepalli B, Horton MD, Brook RG (2015) HPC-BLAST: distributed BLAST for Xeon Phi clusters. In: BCB ’15. ACM, New York
Schmidt B, Hildebrandt A (2017) Next-generation sequencing: big data meets high performance computing. Drug Discov Today 22:712–717
Article Google Scholar
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123, 06
Article Google Scholar
Song HJ, Lee J, Graf L, Rho M, Qiu H, Bhattacharya D, Yoon HS (2016) A novice’s guide to analyzing NGS-derived organelle and metagenome data. ALGAE 31(2):137–154
Article Google Scholar
Wang X, Cheng F, Rohlsen D, Bi C, Wang C, Xu Y, Wei S, Ye Q, Yin T, Ye N (2018) Organellar genome assembly methods and comparative analysis of horticultural plants. Hortic Res 5:3
Article Google Scholar
Yang Y, Xie B, Yan J (2014) Application of next-generation sequencing technology in forensic science. Genomics Proteomics Bioinform 12:190–197
Article Google Scholar
Yim WC, Cushman JC, Papaleo E (2017) Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments. PeerJ 5:e3486
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korea government (MSIP) NRF - 2016R1C1B1007929, NRF - 2016R1D1A1A09919318, Hongik University Research Fund of 2018 and Dongguk University Research Fund of 2016.

Author information

Authors and Affiliations

Department of General Education, Hongik University, Seoul, 04066, Korea
Jaehee Jung
Department of Multimedia Engineering, Dongguk University, Seoul, 04620, Korea
Gangman Yi

Authors

Jaehee Jung
View author publications
You can also search for this author in PubMed Google Scholar
Gangman Yi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gangman Yi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by X. Wang, A.K. Sangaiah, M. Pelillo.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jung, J., Yi, G. A performance analysis of genome search by matching whole targeted reads on different environments. Soft Comput 23, 9153–9160 (2019). https://doi.org/10.1007/s00500-018-3573-3

Download citation

Published: 16 October 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s00500-018-3573-3

A performance analysis of genome search by matching whole targeted reads on different environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HIA: a genome mapper using hybrid index-based sequence alignment

ReneGENE-Novo: Co-designed Algorithm-Architecture for Accelerated Preprocessing and Assembly of Genomic Short Reads

Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A performance analysis of genome search by matching whole targeted reads on different environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HIA: a genome mapper using hybrid index-based sequence alignment

ReneGENE-Novo: Co-designed Algorithm-Architecture for Accelerated Preprocessing and Assembly of Genomic Short Reads

Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation