A curated list of bioinformatics tools, frameworks, libraries, etc. Feel free to contribute anything.
- parallel: GNU parallel is a shell tool for executing jobs in parallel using one or more computers . Here are some examples
- xsv: A fast CSV command line toolkit written in Rust
- ripgrep: combines the usability of The Silver Searcher with the raw speed of grep
- seqtk: Toolkit for processing sequences in FASTA/Q formats
- bedtools2: The swiss army knife for genome arithmetic
- fastp: An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
- fastqc: A quality control analysis tool for high throughput sequencing data
- bwa: Burrow-Wheeler Aligner for short-read alignment
- minimap2: A versatile pairwise aligner for genomic and spliced nucleotide sequences
- hisat2: Graph-based alignment (Hierarchical Graph FM index)
- STAR: RNA-seq aligner
- TopHat: Aligns RNA-Seq reads to a genome in order to identify exon-exon splice junctions.
- Bowtie: An ultrafast, memory-efficient short read aligner
- freebayes: Bayesian haplotype-based polymorphism discovery and genotyping.
- GATK: Variant Discovery in High-Throughput Sequencing Data
- samtools/bcftools/htslib: A suite of tools for manipulating next-generation sequencing data
- delly: Structural variant discovery by integrated paired-end and split-read analysis.
- lumpy: lumpy: a general probabilistic framework for structural variant discovery.
- manta: Structural variant and indel caller for mapped sequencing data
- crest: maps somatic structural variation in cancer genomes with base-pair resolution. Here is the paper.
- breakdancer: Genome-wide detection of structural variants from next generation paired-end sequencing reads. Here is the paper.
- cnvkit: Copy number variant detection from targeted DNA sequencing.
- control-freec: Prediction of copy numbers and allelic content using deep-sequencing data.
- nextflow: A fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner.
- snakemake: Create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language.
- WDL: The Workflow Description Language (WDL) is a way to specify data processing workflows with a human-readable and writeable syntax.
- CWL: An open standard for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.
- cromwell:A Workflow Management System geared towards scientific workflows.
- galaxy - a popular open-source, web-based platform for data intensive biomedical research. Has several features, from data analysis to workflow management to visualization tools.
- Awesome-Pipeline: A curated list of awesome pipeline toolkits.
- awesome-nextflow: A curated list of nextflow based pipelines.
- nf-core: A collection of high quality Nextflow pipelines.
- pysam: Python wrapper for samtools.
- rust-htslib: HTSlib bindings and a high level Rust API for reading and writing BAM files.
- bam: Rust crate for reading and writing BAM and BGZIP files.
- d3: Bring data to life with SVG, Canvas and HTML.
- echarts: A powerful, interactive charting and visualization library.
- matplotlib: Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python
- ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. Here is a book
- plotly: Modern Analytic Apps for the Enterprise
- bokeh: Publish Sophisticated Dashboards
- antv: Liven Data Lively
- circos: Perl package for circular plots, which are well suited for genomic rearrangements.
- mongodb: A general purpose, document-based, distributed database built for modern application
- mysql: Open-Source Relational Database Management System.
- leveldb: LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
-
An Introduction to Applied Bioinformatics
An Introduction to Applied Bioinformatics (or IAB) is a free, open source interactive text that introduces readers to core concepts of bioinformatics in the context of their implementation and application.
-
bioinformatics: Path to a free self-taught education in Bioinformatics!
This is a solid path for those of you who want to complete a Bioinformatics course on your own time, for free, with courses from the best universities in the World.
In our curriculum, we give preference to MOOC (Massive Open Online Course) style courses because these courses were created with our style of learning in mind.
To become a bioinformatician, you have to learn quite a lot of science, so be ready for subjects like; Biology, Chemistry, etc...
- MultiQC: Aggregate results from bioinformatics analyses across many samples into a single report.