Deorowicz et al., 2022 - Google Patents
AGC: Compact representation of assembled genomesDeorowicz et al., 2022
View PDF- Document ID
- 16550133715318327664
- Author
- Deorowicz S
- Danek A
- Li H
- Publication year
- Publication venue
- bioRxiv
External Links
Snippet
High-quality sequence assembly is the ultimate representation of complete genetic information of an individual. Several ongoing pangenome projects are producing collections of high-quality assemblies of various species. Here, we show how to represent the …
- 241000894007 species 0 abstract description 8
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/22—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30153—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/28—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for programming tools or database systems, e.g. ontologies, heterogeneous data integration, data warehousing or computing architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/24—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for machine learning, data mining or biostatistics, e.g. pattern finding, knowledge discovery, rule extraction, correlation, clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/18—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for functional genomics or proteomics, e.g. genotype-phenotype associations, linkage disequilibrium, population genetics, binding site identification, mutagenesis, genotyping or genome annotation, protein-protein interactions or protein-nucleic acid interactions
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pierce et al. | Large-scale sequence comparisons with sourmash | |
Johnson et al. | Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes | |
Keegan et al. | MG-RAST, a metagenomics service for analysis of microbial community structure and function | |
Zou et al. | HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy | |
Zhao et al. | RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data | |
Huang et al. | Short read alignment with populations of genomes | |
Ciccarelli et al. | Toward automatic reconstruction of a highly resolved tree of life | |
Zhu et al. | High-throughput DNA sequence data compression | |
Li et al. | SOAP2: an improved ultrafast tool for short read alignment | |
Giongo et al. | PANGEA: pipeline for analysis of next generation amplicons | |
Deorowicz et al. | Genome compression: a novel approach for large collections | |
Dannemiller et al. | Fungal high‐throughput taxonomic identification tool for use with next‐generation sequencing (FHiTINGS) | |
Layer et al. | Efficient genotype compression and analysis of large genetic-variation data sets | |
Liu et al. | Index suffix–prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression | |
Saha et al. | ERGC: an efficient referential genome compression algorithm | |
Bose et al. | BIND–An algorithm for loss-less compression of nucleotide sequence data | |
Deorowicz et al. | AGC: compact representation of assembled genomes with fast queries and updates | |
CN105760706A (en) | Compression method for next generation sequencing data | |
Shi et al. | High efficiency referential genome compression algorithm | |
Deorowicz et al. | AGC: Compact representation of assembled genomes | |
Wertenbroek et al. | XSI—a genotype compression tool for compressive genomics in large biobanks | |
Rossi et al. | MONI: A pangenomics index for finding MEMs | |
Tatusova | Update on genomic databases and resources at the national center for biotechnology information | |
Tang et al. | Sketch distance-based clustering of chromosomes for large genome database compression | |
Ahmed et al. | Spumoni 2: Improved pangenome classification using a compressed index of minimizer digests |