[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Deorowicz et al., 2022 - Google Patents

AGC: Compact representation of assembled genomes

Deorowicz et al., 2022

View PDF
Document ID
16550133715318327664
Author
Deorowicz S
Danek A
Li H
Publication year
Publication venue
bioRxiv

External Links

Snippet

High-quality sequence assembly is the ultimate representation of complete genetic information of an individual. Several ongoing pangenome projects are producing collections of high-quality assemblies of various species. Here, we show how to represent the …
Continue reading at www.biorxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30312Storage and indexing structures; Management thereof
    • G06F17/30321Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30613Indexing
    • G06F17/30619Indexing indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/22Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30067File systems; File servers
    • G06F17/30129Details of further file system functionalities
    • G06F17/3015Redundancy elimination performed by the file system
    • G06F17/30153Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/28Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for programming tools or database systems, e.g. ontologies, heterogeneous data integration, data warehousing or computing architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/24Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for machine learning, data mining or biostatistics, e.g. pattern finding, knowledge discovery, rule extraction, correlation, clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/18Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for functional genomics or proteomics, e.g. genotype-phenotype associations, linkage disequilibrium, population genetics, binding site identification, mutagenesis, genotyping or genome annotation, protein-protein interactions or protein-nucleic acid interactions

Similar Documents

Publication Publication Date Title
Pierce et al. Large-scale sequence comparisons with sourmash
Johnson et al. Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
Keegan et al. MG-RAST, a metagenomics service for analysis of microbial community structure and function
Zou et al. HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy
Zhao et al. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data
Huang et al. Short read alignment with populations of genomes
Ciccarelli et al. Toward automatic reconstruction of a highly resolved tree of life
Zhu et al. High-throughput DNA sequence data compression
Li et al. SOAP2: an improved ultrafast tool for short read alignment
Giongo et al. PANGEA: pipeline for analysis of next generation amplicons
Deorowicz et al. Genome compression: a novel approach for large collections
Dannemiller et al. Fungal high‐throughput taxonomic identification tool for use with next‐generation sequencing (FHiTINGS)
Layer et al. Efficient genotype compression and analysis of large genetic-variation data sets
Liu et al. Index suffix–prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression
Saha et al. ERGC: an efficient referential genome compression algorithm
Bose et al. BIND–An algorithm for loss-less compression of nucleotide sequence data
Deorowicz et al. AGC: compact representation of assembled genomes with fast queries and updates
CN105760706A (en) Compression method for next generation sequencing data
Shi et al. High efficiency referential genome compression algorithm
Deorowicz et al. AGC: Compact representation of assembled genomes
Wertenbroek et al. XSI—a genotype compression tool for compressive genomics in large biobanks
Rossi et al. MONI: A pangenomics index for finding MEMs
Tatusova Update on genomic databases and resources at the national center for biotechnology information
Tang et al. Sketch distance-based clustering of chromosomes for large genome database compression
Ahmed et al. Spumoni 2: Improved pangenome classification using a compressed index of minimizer digests