Open
Description
This is a continuously updated list of Todos. If you have a suggestion, comment to the issue. I will update the list if the suggestion fits in.
Please note that none of the items will be guaranteed to be implemented. If you want to make sure something is done, consider to contribute the code to Rust-Bio, any help is welcome! Of course you will be listed as one of the authors.
If you want to implement an item, please post it in this thread and I will mark it in the list so that we don't duplicate work.
Changes to current code
- make SAIS implementation generic over the used alphabet data type and dynamically choose the right type depending on the number of characters. This will further reduce memory usage.
- investigate no_mangle flag to allow easy integration of Rust-Bio into e.g. Python via ctypes.
New code
- add indexed FASTA reader based on the seek method of current fasta reader and https://github.com/BurntSushi/rust-csv
- allow to call current pairwise alignment implementation without traceback for distance computation (hamming, edit, affine)
- add common score matrices (e.g. PAM250, BLOSUM62 for amino acids) _[@rilut]*
- add (gapped) q-gram/kmer implementation for counting and index building
- add bed reader
- add gff3 reader
- add gtf/gff !=3 reader based on https://github.com/BurntSushi/rust-csv [@natir]
- Hidden-Markov-Models (Viterby, Baum-Welch, ...)
- add Myers Algorithm for approximate pattern matching
- basic NGS statistics (qual distribution, nucleotide distribution)?
- add amino acid alphabet to
bio::alphabets
- add multiple sequence alignment computation (partial order alignment for now) [@bnbowman]
- provide an interval tree, segment tree or clustertree implementation
- add parallelization examples
- Add speed benchmarks for all algorithms (right now we have them for pattern matching and SAIS).
- Add mate-pair FASTQ reader (using the zip method).
- SDP algorithms for sparse alignment [@bnbowman]
- Add tabix interface to Rust-Htslib
- de Bruijn Graph implementation and related algorithms [@rob-p]
- Overlap/String graphs
- graph genome support (see work by Veli Mäkkinen, Richard Durbin et al.)
- BCALM algorithm (@bryceperkins)
- rewrite gff/gtf parser (and other parsers) using nom in order to support comments (see GTF parser breaks on GTF file from GENCODE #115). Look at the ideas of BioJulia.
Metadata
Metadata
Assignees
Labels
No labels