8000 GitHub - Hua-CM/TEtools: My own transposon element(TE) analysis scripts
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Hua-CM/TEtools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Transponson Element

My own transponson element analysis pipeline

transposon element identification

Using EDTA identification

singularity exec -B /mnt:/mnt /home/assembly/tools/EDTA.sif EDTA.pl \
--genome/path/to/genome.fasta \
--cds /path/to/genome.cds.fa \
--sensitive 1 --anno 1

Get the RepeatMasker output from EDTA out put,the path is *.mod.EDTA.final/*.mod.EDTA.raw.fa.preFriJul152239272022.RMoutput/*.mod.EDTA.raw.fa.out

awk '!/\*/' RM.out > RM_nostar.out

using RM_TRIPS clean RM results and you will get RM_tidy.out

TE classification (optional)

If you have lots of unknown TEs, I recommend use DeepTE to reannotate your TE sequences(genome.fasta.mod.EDTA.TElib.fa in EDTA results)

DeepTE.py -d working_dir -o output_dir -i TElib.fa -sp F -m_dir /path/to/fungi/model

if your organism is plant, I also recommend usng TEsorter

Genomic distribution

Calculate LTR genomic distribution. I split the genomic region into four parts:intergenetic, upstream of downstream 2kb of gene, upstream of downstream 2~5kb of gene, inside gene.

python ./TE/Distribution.py -i RM_tidy.out -o RM_tidy_distribution.out -g /path/to/genomic.gff

ps: Make ensure the gff file is correct

Insert time

Only intact LTR could calculate insert time. Using EDTA results.

python Insert_time.py -i /path/to/EDTA/genome.fasta.mod.EDTA.raw/genome.fasta.mod.LTR.intact.gff3 -o insert_time.lst

Intact, truncated and 6070 solo LTR

Please refer to reference 1 if you don't know what is intact, truncated and solo LTR and why calculate their ration. You need download sequence from GyDB. The url for the download data is https://gydb.org/extensions/Collection/collection/db/GyDB_collection.zip.
Requierment: BLAST and silix

unzip GyDB_collection.zip
cat GyDB_collection/consensus/GAG_* > gag.fa
python solo_detect.py -g genome.fasta -i ../genome.fasta.mod.EDTA.raw/genome.fasta.mod.LTR.intact.fa --gag gag.fa -t 10 -o out.tsv

ps: EDTA will release a version that can calculate solo LTR directly soon.

Reference

  1. Lyu, H., He, Z., Wu, C.-I. and Shi, S. (2018), Convergent adaptive evolution in marginal environments: unloading transposable elements as a common strategy among mangrove genomes. New Phytol, 217: 428-438. https://doi.org/10.1111/nph.14784
  2. Miele, V., Penel, S. & Duret, L. Ultra-fast sequence clustering from similarity networks with SiLiX. BMC Bioinformatics 12, 116 (2011). https://doi.org/10.1186/1471-2105-12-116
  3. Ou S., Su W., Liao Y., Chougule K., Agda J. R. A., Hellinga A. J., Lugo C. S. B., Elliott T. A., Ware D., Peterson T., Jiang N.✉, Hirsch C. N.✉ and Hufford M. B.✉ (2019). Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline. Genome Biol. 20(1): 275.
  4. Zhang RG, Li GL, Wang XL et. al. TEsorter: an accurate and fast method to classify LTR retrotransposons in plant genomes. Horticulture Research, 2022, 9: uhac017 https://doi.org/10.1093/hr/uhac017
  5. Llorens C, Futami R, Covelli L et. al. The Gypsy Database (GyDB) of mobile genetic elements: release 2.0. Nucleic Acids Research, 2011, 39: 70–74 https://doi.org/10.1093/nar/gkq1061
  6. Bell, E. A., Butler, C. L., Oliveira, C., Marburger, S., Yant, L., & Taylor, M. I. (2022). Transposable element annotation in non-model species: The benefits of species-specific repeat libraries using semi-automated EDTA and DeepTE de novo pipelines. Molecular Ecology Resources, 22, 823– 833. https://doi.org/10.1111/1755-0998.13489

About

My own transposon element(TE) analysis scripts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0