MAGmax

MAGmax is a tool to maximize the yield of Metagenome-Assembled Genomes (MAGs) through bin Merging and reAssembly.

Example run

magmax -b binsdir -m mapid_dir -r readdir -f fasta -t 24
magmax -b binsdir -m mapid_dir -r readdir -f fasta -t 24 -q quality_report.tsv // if CheckM2 result is already available
magmax -b binsdir -m mapid_dir -r readdir -f fasta -t 24 --split // if input bins are not already split by sample id

Test run

magmax -b test/bins -m test/mapids -r test/reads -t 24 -q test/quality_report.tsv

Install

Prerequisites

Rust: Follow the instructions here to install Rust.
Conda: You can install Conda via Miniconda or Anaconda.

Dependencies

CheckM2: Install CheckM2, download checkm2 database and set CHECKM2DB variable correctly. CheckM2 should already be installed and accessible in your PATH, regardless of the options used to install MAGmax.

Option 1: Use conda package

conda install -c bioconda magmax
or
mamba install -c bioconda magmax # faster installation

Option 2: Use the pre-built executable.

# For x86_64 Linux (glibc-based systems)
wget https://github.com/soedinglab/MAGma/releases/download/v1.0.0/magmax-linux.tar.gz
cd magmax-linux/bin
chmod +x magmax
./magmax -h
sudo cp magmax /usr/local/bin/ # to access globally

To use this option, in addition to CheckM2, skani, SPAdes, and seqtk, and MEGAHIT (optional) must be installed already and available in your PATH. Alternatively, use environment.yml to create conda environment and activate it to run magmax.

conda env create -f environment.yml
conda activate magmax_env

Option 2: Build from source

git clone https://github.com/soedinglab/MAGmax.git
cd MAGma
conda env create -f environment.yml
conda activate magmax_env
cargo install --path .
magmax -h

Options

    -b, --bindir <BINDIR>
            Directory containing fasta files of bins
    -i, --ani <ANI>
            ANI for clustering bins (%) [default: 99]
    -c, --completeness <COMPLETENESS_CUTOFF>
            Minimum completeness of bins (%) [default: 50]
    -p, --purity <PURITY_CUTOFF>
            Mininum purity (1- contamination) of bins (%) [default: 95]
    -m, --mapdir <MAPDIR>
            Directory containing mapids files
    -r, --readdir <READDIR>
            Directory containing read files
    -f, --format <FORMAT>
            Bin file extension [default: fasta]
    -t, --threads <THREADS>
            Number of threads to use [default: 8]
        --split
            Split clusters into sample-wise bins before processing
    -q, --qual <QUAL>
            Quality file produced by CheckM2 (quality_report.tsv)
        --assembler <ASSEMBLER>
            assembler choice for reassembly step (spades|megahit) [default: spades, recommended]
    -h, --help
            Print help
    -V, --version
            Print version

Notes

Input contigs should have id prefixed with the sample ID, separated by 'C'. Perform mapping and binning on contig files with these updated contig ids.

Mapid files can be generated using aligner2counts (https://github.com/soedinglab/binning_benchmarking/tree/main/util#aligner2counts) with only-mapids option.

File name: <sampleid>_mapids

read1_id    sampleidCcontig1_id
read2_id    sampleidCcontig2_id
read2_id    sampleidCcontig4_id
read3_id    sampleidCcontig2_id
read4_id    sampleidCcontig3_id
read4_id    sampleidCcontig4_id

If input bins are not separated by sample IDs, such as when using MetaBAT2 or COMEBin on a concatenated set of contigs, use the --split option to automatically separate input bin by sample IDs.
Make sure that headers in the read fastq files have read_id separated by space/tab (not by .) from other sequencer details. This is important for seqtk to fetch reads correctly.

Correct format: @SRR25448374.1 A00214R:157:HLMVMDSXY:1:1101:19868:1016:N:0.length=151#0/1

Wrong format: @SRR25448374.1.A00214R:157:HLMVMDSXY:1:1101:19868:1016:N:0.length=151#0/1

When read ids are not seperated by space in the headers, run the below script and use the updated read file for mapping.

sed -i -E 's/^(@[^.]+\.[^.]+)\./\1 /' read.fastq

MAGma works for paired-end (in separate files: SRR25448374_1.fastq and SRR25448374_2.fastq) and single-end read files.

Sample IDs must be in the file name of fastq and mapid files. (E.g., SRR25448374_1.fastq & SRR25448374_2.fastq or SRR25448374.fastq and SRR25448374_mapids)
We recommend Spades for reassembly which produces bins with higher purity than bins assembled using Megahit.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github/workflows		.github/workflows
src		src
test		test
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAGmax

Example run

Test run

Install

Prerequisites

Dependencies

Options

Notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Languages

License

soedinglab/MAGmax

Folders and files

Latest commit

History

Repository files navigation

MAGmax

Example run

Test run

Install

Prerequisites

Dependencies

Options

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Languages

Packages