8000 GitHub - appliedbinf/el_gato
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Apr 16, 2025. It is now read-only.

appliedbinf/el_gato

Repository files navigation

el_gato

Epidemiology of Legionella : Genome-bAsed Typing:

This repo will be archived. All future updates will be found at: https://github.com/CDCgov/el_gato

El_gato is a bioinformatics tool that utilizes either Illumina paired-end reads (.fastq) or a genome assembly (.fasta) as input to derive Legionella pneumophila Sequence Type (ST) from a database in contrast to the original method which relied on Sanger seqeunces.

ST is used to describe relatedness of L. pneumophila isolates. The sequence of a portion of seven L. pneumophila genes (flaA, pilE, asd, mip, mompS, proA, and neuA/neuAh) is compared to a curated database of alleles and STs maintained by the European Society of Clinical Microbiology and Infectious Diseases Study Group for Legionella Infections (ESGLI) in which each unique allele is denoted with an allele number. The combination of allele numbers for all seven genes reported in order, corresponds to an allelic profile. The allelic profile, in turn, denotes a unique ST.

Codebase stage: Maintenance
Developers, maintainers, and testers: Alan Collins, Will Overholt, Jenna Hamlin

Previous developers and maintainers: Dev Mashruwala, Andrew Conley, Lavanya Rishishwar, Emily T. Norris, Anna Gaines, Vasanta Chivukula

Installation

Method 1: Using Conda

# Create an environment, here named elgato, and install el_gato.py
# along with all dependencies
conda create -n elgato -c bioconda -c conda-forge el_gato

# Activate the environment to use el_gato.py
conda activate elgato

Method 2: Using pip

Note Using this method requires you to install all Dependencies

# Download el_gato by cloning the git repository
git clone https://github.com/appliedbinf/el_gato.git

# Move into the el_gato directory and install with pip
cd el_gato/
python3 -m pip install .

Dependencies

Usage

Quickstart Guide

An example of a basic run using paired-end reads or assemblies as input. We recommend using reads whenever available, as read-based sequence typing is more reliable (see input for more information).

# Paired-end:
el_gato.py --read1 read1.fastq.gz --read2 read2.fastq.gz --out output_folder/

# Assembly:
el_gato.py --assembly assembly_file.fna --out output_folder/

All available arguments

Legionella in silico SBT script. 
    Requires paired-end reads files (preferred) or a genome assembly.

    Notes on arguments:
    (1) If only reads are provided, SBT is called using a mapping/alignment approach.
    (2) If only an assembly is provided, a BLAST and in silico PCR based approach is adopted. 

Input files:
  Please specify either reads files and/or a genome assembly file

  --read1 Read 1 file, -1 Read 1 file
                        Input Read 1 (forward) file
  --read2 Read 2 file, -2 Read 2 file
                        Input Read 2 (reverse) file
  --assembly Assembly file, -a Assembly file
                        Input assembly fasta file

Optional arguments:
  --help, -h            Show this help message and exit
  --version, -v         Print the version
  --threads THREADS, -t THREADS
                        Number of threads to run the programs (default: 1)
  --depth DEPTH, -d DEPTH
                        Specify the minimum depth used to identify loci in paired-end reads (default: 10)
  --kmer-size KMER_SIZE, -k KMER_SIZE
                        Specify the kmer sized used for mapping by minimap2. Max acceptable: 28. (default: 21)
  --out OUT, -o OUT     Output folder name (default: out)
  --sample SAMPLE, -n SAMPLE
                        Sample name (default: <Inferred from input file>)
  --overwrite, -w       Overwrite output directory (default: False)
  --sbt SBT, -s SBT     Database containing SBT allele and ST mapping files (default: /scicomp/home-pure/ptx4/el_gato/el_gato/db)
  --profile PROFILE, -p PROFILE
                        Name of allele profile to ST mapping file (default: /scicomp/home-pure/ptx4/el_gato/el_gato/db/lpneumophila.txt)
  --verbose             Print what the script is doing (default: False)
  --header, -e          Include column headers in the output table (default: False)
  --length LENGTH, -l LENGTH
                        Specify the BLAST hit length threshold for identifying multiple loci in assembly (default: 0.3)
  --sequence SEQUENCE, -q SEQUENCE
                        Specify the BLAST hit percent identity threshold for identifying multiple loci in assembly (default: 95.0)
  --samfile, -m         Specify whether or not the SAM file is included in the output directory (default: False)

Acknowledgements

We greatly appreciate the United Kingdom Health Security Agency (UKHSA) for curating and sharing the L. pneumophila database. You can learn more about UKHSA here: https://www.gov.uk/government/organisations/uk-health-security-agency. Please contact legionella-sbt@ukhsa.gov.uk for enquiries about the database.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 8

Languages

0