MAGmax is a tool to maximize the yield of Metagenome-Assembled Genomes (MAGs) through bin Merging and reAssembly.
magmax -b binsdir -m mapid_dir -r readdir -f fasta -t 24
magmax -b binsdir -m mapid_dir -r readdir -f fasta -t 24 -q quality_report.tsv // if CheckM2 result is already available
magmax -b binsdir -m mapid_dir -r readdir -f fasta -t 24 --split // if input bins are not already split by sample id
magmax -b test/bins -m test/mapids -r test/reads -t 24 -q test/quality_report.tsv
- Rust: Follow the instructions here to install Rust.
- Conda: You can install Conda via Miniconda or Anaconda.
Option 1: Use conda package
conda install -c bioconda magmax
or
mamba install -c bioconda magmax # faster installation
Option 2: Use the pre-built executable.
# For x86_64 Linux (glibc-based systems)
wget https://github.com/soedinglab/MAGma/releases/download/v1.0.0/magmax-linux.tar.gz
cd magmax-linux/bin
chmod +x magmax
./magmax -h
sudo cp magmax /usr/local/bin/ # to access globally
To use this option, CheckM2, skani, SPAdes and MEGAHIT must be installed already and available in your PATH. Alternatively, use environment.yml to create conda environment and activate it to run magmax.
conda env create -f environment.yml
conda activate magmax_env
Option 3: Build from source
git clone https://github.com/soedinglab/MAGma.git
cd MAGma
conda env create -f environment.yml
conda activate magmax_env
cargo install --path .
magmax -h
-b, --bindir <BINDIR>
Directory containing fasta files of bins
-i, --ani <ANI>
ANI for clustering bins (%) [default: 99]
-c, --completeness <COMPLETENESS_CUTOFF>
Minimum completeness of bins (%) [default: 50]
-p, --purity <PURITY_CUTOFF>
Mininum purity (1- contamination) of bins (%) [default: 95]
-m, --mapdir <MAPDIR>
Directory containing mapids files
-r, --readdir <READDIR>
Directory containing read files
-f, --format <FORMAT>
Bin file extension [default: fasta]
-t, --threads <THREADS>
Number of threads to use [default: 8]
--split
Split clusters into sample-wise bins before processing
-q, --qual <QUAL>
Quality file produced by CheckM2 (quality_report.tsv)
--assembler <ASSEMBLER>
assembler choice for reassembly step (spades|megahit) [default: spades, recommended]
-h, --help
Print help
-V, --version
Print version
-
Input contigs should have id prefixed with the sample ID, separated by 'C'. Perform mapping and binning on contig files with these updated contig ids.
-
Mapid files can be generated using aligner2counts (https://github.com/soedinglab/binning_benchmarking/tree/main/util#aligner2counts) with
only-mapids
option.File name:
<sampleid>_mapids
read1_id sampleidCcontig1_id read2_id sampleidCcontig2_id read2_id sampleidCcontig4_id read3_id sampleidCcontig2_id read4_id sampleidCcontig3_id read4_id sampleidCcontig4_id
-
If input bins are not separated by sample IDs, such as when using MetaBAT2 or COMEBin on a concatenated set of contigs, use the
--split
option to automatically separate input bin by sample IDs. -
Make sure that headers in the read fastq files have read_id separated by space/tab (not by
.
) from other sequencer details. This is important forseqtk
to fetch reads correctly.Correct format: @SRR25448374.1 A00214R:157:HLMVMDSXY:1:1101:19868:1016:N:0.length=151#0/1
Wrong format: @SRR25448374.1.A00214R:157:HLMVMDSXY:1:1101:19868:1016:N:0.length=151#0/1
When read ids are not seperated by space in the headers, run the below script and use the updated read file for mapping.
sed -i -E 's/^(@[^.]+\.[^.]+)\./\1 /' read.fastq
MAGma works for paired-end (in separate files: SRR25448374_1.fastq and SRR25448374_2.fastq) and single-end read files.
- Sample IDs must be in the file name of fastq and mapid files. (E.g., SRR25448374_1.fastq & SRR25448374_2.fastq or SRR25448374.fastq and SRR25448374_mapids)
- We recommend Spades for reassembly which produces bins with higher purity than bins assembled using Megahit.