Sardaraz et al., 2016 - Google Patents

Advances in high throughput DNA sequence data compression

Sardaraz et al., 2016

Document ID: 2042151735877119476
Author: Sardaraz M; Tahir M; Ikram A
Publication year: 2016
Publication venue: Journal of bioinformatics and computational biology

External Links

Cited by

Snippet

Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data …

Continue reading at www.worldscientific.com (other versions)

238000007906 compression 0 title abstract description 143

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30156—De-duplication implemented within the file system, e.g. based on file segments
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/22—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30289—Database design, administration or maintenance
- G06F17/30303—Improving data quality; Data cleansing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30153—Redundancy elimination performed by the file system using compression, e.g. sparse files
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/14—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for phylogeny or evolution, e.g. evolutionarily conserved regions determination or phylogenetic tree construction
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/28—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for programming tools or database systems, e.g. ontologies, heterogeneous data integration, data warehousing or computing architectures
- H—ELECTRICITY
- H03—BASIC ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
- H03M7/4031—Fixed length to variable length coding
- H—ELECTRICITY
- H03—BASIC ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method

Similar Documents

Publication	Publication Date	Title
Wandelt et al.	2014	Trends in genome compression
Sardaraz et al.	2016	Advances in high throughput DNA sequence data compression
Wandelt et al.	2013	FRESCO: Referential compression of highly similar sequences
Kuruppu et al.	2011	Iterative dictionary construction for compression of large DNA data sets
Deorowicz et al.	2011	Robust relative compression of genomes with random access
Kuruppu et al.	2011	Optimized relative Lempel-Ziv compression of genomes
Bonfield et al.	2013	Compression of FASTQ and SAM format sequencing data
Wandelt et al.	2013	RCSI: Scalable similarity search in thousand (s) of genomes
Bakr et al.	2013	DNA lossless compression algorithms
Saha et al.	2016	NRGC: a novel referential genome compression algorithm
Najam et al.	2019	Pattern matching for DNA sequencing data using multiple bloom filters
Sardaraz et al.	2014	SeqCompress: An algorithm for biological sequence compression
Yao et al.	2019	HRCM: an efficient hybrid referential compression method for genomic big data
Grabowski et al.	2022	MBGC: multiple bacteria genome compressor
Elnady et al.	2022	Hadc: A hybrid compression approach for dna sequences
Selva et al.	2013	SRComp: short read sequence compression using burstsort and Elias omega coding
Pizzi et al.	2008	Fast profile matching algorithms—A survey
Law	2019	Application of signal processing for DNA sequence compression
Gilmary et al.	2021	Compression techniques for dna sequences: A thematic review
Kumar et al.	2018	WBFQC: A new approach for compressing next-generation sequencing data splitting into homogeneous streams
Kumar et al.	2023	A new efficient referential genome compression technique for FastQ files
Zhang et al.	2020	Efficient Search Over Genomic Short Read Data
Gupta et al.	2011	A novel approach for compressing DNA sequences using semi-statistical compressor
Neha et al.	2019	Towards context-aware DNA sequence compression algorithms
Sun et al.	2023	PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering