8000 GitHub - amromics/panta: Pan-genome Analysis
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

amromics/panta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pan-genome pipeline

Installation

Depencencies:

  • bedtools
  • CD-HIT
  • BLAST
  • DIAMOND
  • MCL
  • pandas
  • SeqIO

The simplest method is installed via conda:

  1. Download and install the appropriate conda, such as miniconda from here

  2. Create a conda environment with all the necessary dependencies: From the repository directory run

conda create -y -c conda-forge -c defaults --name panta python=3.10 mamba

conda activate panta

mamba install -y -c conda-forge -c bioconda -c anaconda -c defaults  --file requirements.txt

pip install .

Or via docker:

docker pull amromics/panta:latest

if you want to build the image yourself:

docker build -t amromics/panta:latest .

Usage

Activate conda enviroment:

source activate panta

Main pipeline: run pan-genome analysis for the first time

usage: panta main [-h] [-g [GFF ...]] [-f TSV] -o OUTDIR [-s] [-b {diamond,blast}] [-i IDENTITY] [--LD LD] [--AL AL] [--AS AS] [-e EVALUE]
                          [-t THREADS] [--table TABLE] [-a [{nucleotide,protein} ...]]

Main pipeline: run pan-genome analysis for the first time

options:
  -h, --help            show this help message and exit
  -g [GFF ...], --gff [GFF ...]
                        gff input files (default: None)
  -f TSV, --tsv TSV     tsv input file (default: None)
  -o OUTDIR, --outdir OUTDIR
                        output directory (default: None)
  -s, --dont-split      dont split paralog clusters (default: False)
  -b {diamond,blast}, --blast {diamond,blast}
                        method for all-against-all alignment (default: diamond)
  -i IDENTITY, --identity IDENTITY
                        minimum percentage identity (default: 0.7)
  --LD LD               length difference cutoff between two sequences (default: 0.7)
  --AL AL               alignment coverage for the longer sequence (default: 0)
  --AS AS               alignment coverage for the shorter sequence (default: 0)
  -e EVALUE, --evalue EVALUE
                        Blast evalue (default: 1e-06)
  -t THREADS, --threads THREADS
                        number of threads to use, 0 for all (default: 0)
  --table TABLE         codon table (default: 11)
  -a [{nucleotide,protein} ...], --alignment [{nucleotide,protein} ...]
                        run alignment for each gene cluster (default: None)

Add pipeline: add sample into previous collection

usage: panta add [-h] [-g [GFF ...]] [-f TSV] -c COLLECTION_DIR [-s] [-b {diamond,blast}] [-i IDENTITY] [--LD LD] [--AL AL] [--AS AS]
                         [-e EVALUE] [-t THREADS] [--table TABLE] [-a [{nucleotide,protein} ...]]

Add pipeline: add sample into previous collection

options:
  -h, --help            show this help message and exit
  -g [GFF ...], --gff [GFF ...]
                        gff input files (default: None)
  -f TSV, --tsv TSV     tsv input file (default: None)
  -c COLLECTION_DIR, --collection-dir COLLECTION_DIR
                        previous collection directory (default: None)
  -s, --dont-split      dont split paralog clusters (default: False)
  -b {diamond,blast}, --blast {diamond,blast}
                        method for all-against-all alignment (default: diamond)
  -i IDENTITY, --identity IDENTITY
                        minimum percentage identity (default: 0.7)
  --LD LD               length difference cutoff between two sequences (default: 0.7)
  --AL AL               alignment coverage for the longer sequence (default: 0)
  --AS AS               alignment coverage for the shorter sequence (default: 0)
  -e EVALUE, --evalue EVALUE
                        Blast evalue (default: 1e-06)
  -t THREADS, --threads THREADS
                        number of threads to use, 0 for all (default: 0)
  --table TABLE         codon table (default: 11)
  -a [{nucleotide,protein} ...], --alignment [{nucleotide,protein} ...]
                        run alignment for each gene cluster (default: None)

Example

Basic:

panta main -o examples/test/output -g examples/test/main/*.gff
panta add -c examples/test/output -g examples/test/add/*.gff

Via docker from the repository directory run:

docker run -it --rm -v $PWD:/tmp amromics/panta:latest panta main -o examples/test/output -g examples/test/main/*.gff

About

Pan-genome Analysis

Resources

License

Stars

Watch 45CB ers

Forks

Packages

No packages published
0