Ndiaye et al., 2024 - Google Patents
When less is more: sketching with minimizers in genomicsNdiaye et al., 2024
View HTML- Document ID
- 1342497440134114666
- Author
- Ndiaye M
- Prieto-Baños S
- Fitzgerald L
- Yazdizadeh Kharrazi A
- Oreshkov S
- Dessimoz C
- Sedlazeck F
- Glover N
- Majidian S
- Publication year
- Publication venue
- Genome biology
External Links
Snippet
The exponential increase in sequencing data calls for conceptual and computational advances to extract useful biological insights. One such advance, minimizers, allows for reducing the quantity of data handled while maintaining some of its key properties. We …
- 238000000034 method 0 abstract description 71
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30533—Other types of queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/22—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30587—Details of specialised database models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/28—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for programming tools or database systems, e.g. ontologies, heterogeneous data integration, data warehousing or computing architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/24—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for machine learning, data mining or biostatistics, e.g. pattern finding, knowledge discovery, rule extraction, correlation, clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/18—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for functional genomics or proteomics, e.g. genotype-phenotype associations, linkage disequilibrium, population genetics, binding site identification, mutagenesis, genotyping or genome annotation, protein-protein interactions or protein-nucleic acid interactions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/14—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for phylogeny or evolution, e.g. evolutionarily conserved regions determination or phylogenetic tree construction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/20—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for hybridisation or gene expression, e.g. microarrays, sequencing by hybridisation, normalisation, profiling, noise correction models, expression ratio estimation, probe design or probe optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rautiainen et al. | GraphAligner: rapid and versatile sequence-to-graph alignment | |
US11702708B2 (en) | Systems and methods for analyzing viral nucleic acids | |
US20240096450A1 (en) | Systems and methods for adaptive local alignment for graph genomes | |
Chikhi et al. | Data structures to represent a set of k-long DNA sequences | |
Canzar et al. | Short read mapping: an algorithmic tour | |
Harris | Improved pairwise alignment of genomic DNA | |
Mäkinen et al. | Genome-scale algorithm design | |
Marçais et al. | Sketching and sublinear data structures in genomics | |
US10346551B2 (en) | Systems, methods and computer-accessible mediums for utilizing pattern matching in stringomes | |
Hoffmann et al. | Fast mapping of short sequences with mismatches, insertions and deletions using index structures | |
Snir et al. | Quartets MaxCut: a divide and conquer quartets algorithm | |
US20160019339A1 (en) | Bioinformatics tools, systems and methods for sequence assembly | |
Ndiaye et al. | When less is more: sketching with minimizers in genomics | |
Marchet et al. | A resource-frugal probabilistic dictionary and applications in bioinformatics | |
Mäkinen et al. | Genome-scale algorithm design: bioinformatics in the era of high-throughput sequencing | |
Vaddadi et al. | Read mapping on genome variation graphs | |
Vasimuddin et al. | Identification of significant computational building blocks through comprehensive investigation of NGS secondary analysis methods | |
Chen et al. | CGAP-align: a high performance DNA short read alignment tool | |
Esmat et al. | A parallel hash‐based method for local sequence alignment | |
Ekim et al. | Minimizer-space de Bruijn graphs | |
Aydın | Whole Genome Alignment via Alternating Lyndon Factorization Tree Traversal | |
Weese et al. | DNA-Seq Error Correction Based on Substring Indices | |
Ekim | Scalable sketching and indexing algorithms for large biological datasets | |
Mohamadi | Parallel algorithms and software tools for high-throughput sequencing data | |
Turner | Discovering genetic variation in populations using next generation sequencing and de novo assembly |