GitHub - azureycy/dbAPIS: dbAPIS is a database of anti-prokaryotic immune system proteins. The repository contains the codes and scripts used to generate and maintain the database.

dbAPIS website: https://bcb.unl.edu/dbAPIS

Yan, Y., Zheng, J., Zhang, X., & Yin, Y. (2023). dbAPIS: a database of anti-prokaryotic immune system genes. Nucleic Acids Research. https://doi.org/10.1093/nar/gkad932

Tools and databases

Blast+: compare sequences to database.
MMseqs2: sequence search and clustering.
MAFFT: multiple sequence alignment.
HMMER: sequence analysis using profile HMMs.
hh-suite: remote protein homology detection suite.
Foldseek: protein structure comparison.
DIAMOND: sequence aligner for protein and translated DNA searches.
Pfam: protein domain family database.
PHROG: prokaryotic Virus Remote Homologous Groups database.
ColabFold for AlphaFold2 structure prediction.
clinker: gene cluster comparison figure generator

Database content processing

Create APIS protein families and add newly curated proteins

BLASTP homology search: blast_seed.sh
MMseqs2 clustering/searching: create_family_and_update.sh

Build APIS protein family HMMs

family_msa_hmm.sh

Searching homologous families using HHsearch

hhsearch_homolog_family.sh

Protein function annotation

Pfam and PHROG annotation: phrog_pfam_annotation.sh

Run APIS protein annotation with DIAMOND and HMMscan locally

Run HMMscan on your local server

Download the APIS protein family profile HMMs

wget https://bcb.unl.edu/dbAPIS/downloads/dbAPIS.hmm

prepare a profile database by constructing binary compressed datafiles

hmmpress dbAPIS.hmm

Four files are created: dbAPIS.hmm.h3m, dbAPIS.hmm.h3i, dbAPIS.hmm.h3f, and dbAPIS.hmm.h3p.

Run hmmscan for your amino acid sequences

hmmscan --domtblout hmmscan.out --noali dbAPIS.hmm your_sequence.faa

--domtblout option produces the space-separated domain hits table. There is one line for each domain. --noali option is used to omit the alignment section from output and reduce the output volume. More hmmscan information please see hmmer user guide.

Run DIAMOND on your local server

Download the APIS protein sequences

wget https://bcb.unl.edu/dbAPIS/downloads/anti_defense.pep

Build diamond database with APIS protein sequences

diamond makedb --in anti_defense.pep -d APIS_db

Run diamond for your amino acid sequences

diamond blastp --db APIS_db -q your_sequence.faa -f 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen -o diamond.out --max-target-seqs 10000

-f 6 option generates tabular-separated format (a BLAST output format using the option -outfmt 6), which composed of the customized fields. --max-target-seqs means maximum number of target sequences to report alignments for. More diamond details please see diamond tutorial.

Parse annotation output

Download the family member mapping table and parser script

wget https://bcb.unl.edu/dbAPIS/downloads/seed_family_mapping.tsv
wget https://bcb.unl.edu/dbAPIS/downloads/parse_annotation_result.sh

Run script to parse annotation output files

bash parse_annotation_result.sh hmmscan.out diamond.out

This will generate parsed output files of hmmscan and diamond respectively

hmmscan.out.parsed.tsv contains 13 columns:

Query: query sequence ID
Query len: query sequence length
Hit family: hit family ID
Defense type: hit family inhibited defense system type
Hit CLAN: hit clan ID
Hit CLAN defense type: hit clan inferred (predicted) inhibited defense system type
Family len: length of the target family profile
Domain c-evalue: the “conditional E-value”, a permissive measure of how reliable this particular domain may be
Domain score: the bit score for this domain
Query from: query start position
Query to: query end position
HMM from: the start of the MEA alignment of this domain with respect to the profile
HMM to: the end of the MEA alignment of this domain with respect to the profile

diamond.out.parsed.tsv contains 12 columns:

qseqid: query sequence ID
famid: hit family ID
Defense type: hit family inhibited defense system type
Hit CLAN: hit clan ID
Hit CLAN defense type: hit clan inferred (predicted) inhibited defense system type
seqid: hit sequence ID
pident: the percentage of identical amino acid residues that were aligned
align length: the total length of the alignment, including matching, mismatching and gap positions of query and subject
evalue: the expected value of the hit
bitscore: a scoring matrix independent measure of the (local) similarity of the two aligned sequences, with higher numbers meaning more similar
qcov: query coverage, the percentage of the query sequence that aligned
scov: subject coverage, the percentage of the hit sequence that aligned

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dbAPIS website: https://bcb.unl.edu/dbAPIS

Tools and databases

Database content processing

Create APIS protein families and add newly curated proteins

Build APIS protein family HMMs

Searching homologous families using HHsearch

Protein function annotation

Protein structure prediction

Searching protein structure homologs using Foldseek

Genomic context visualization using jbrowse

Gene cluster comparison using clinker

Run APIS protein annotation with DIAMOND and HMMscan locally

Run HMMscan on your local server

Run DIAMOND on your local server

Parse annotation output

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
README.md		README.md
blast_seed.sh		blast_seed.sh
clinker_gene_loci_plot.sh		clinker_gene_loci_plot.sh
create_family_and_update.sh		create_family_and_update.sh
family_msa_hmm.sh		family_msa_hmm.sh
foldseek_homolog_structure.sh		foldseek_homolog_structure.sh
generate_jbrowse.sh		generate_jbrowse.sh
hhsearch_homolog_family.sh		hhsearch_homolog_family.sh
parse_annotation_result.sh		parse_annotation_result.sh
phrog_pfam_annotation.sh		phrog_pfam_annotation.sh
protein_structure_predict.sh		protein_structure_predict.sh
upload_data.sql		upload_data.sql

azureycy/dbAPIS

Folders and files

Latest commit

History

Repository files navigation

dbAPIS website: https://bcb.unl.edu/dbAPIS

Tools and databases

Database content processing

Create APIS protein families and add newly curated proteins

Build APIS protein family HMMs

Searching homologous families using HHsearch

Protein function annotation

Protein structure prediction

Searching protein structure homologs using Foldseek

Genomic context visualization using jbrowse

Gene cluster comparison using clinker

Run APIS protein annotation with DIAMOND and HMMscan locally

Run HMMscan on your local server

Run DIAMOND on your local server

Parse annotation output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages