Polyadenylation plays a crucial role in transcript maturation, and it is widespread in eukaryotic mRNA. We designed the PyPAD - a command-line tool that detects polyadenylation in the available RNA-seq sequencing data sets. The tool combined Python script with commonly used genomic tools (Hisat2, samtools, bamtools). We assumed that there were transcripts in the RNAseq data that would not map to the genome due to the presence of a polyA tail. We have extracted reads that have a polyA tail from the unmapped reads. Then, we have cut nucleotides at the 3 ' end of the RNA one by one and remapped reads to obtain a pool of reads having polyA tails.
- Requirenments
- Usage
- Authors
- Founding
To run PyPAD, please save PyPAD.py in your local directory where you have fastq file to analyse. In the directory, you should prepare folder 'reference' containing built index for HiSat2. (Feel free to use another aligner. To do it, you should change the code in PyPAD.py carefully.)
- Build Hisat2 index according to documentation, and save your reference in "reference" file
- Select unmapped reads -- we recommend to preprocessed data in a common way (quality control, trimming adaptors), and than do alignment, and select unmapped reads. We extracted unmapped reads from bam file using samtools:
$ samtools view -b -f 4 input.bam > output_unmapped.bam
- Run the code:
$ python PyPolyADetector.py [optional arguments]
- Help & Options
usage: PyPolyADetector.py [-h] [--strandness {forward,reverse}]
[--selectminNnucteotides SELECTMINNNUCTEOTIDES]
[--mintail {re.compile'[AT]{6,}$'),re.compile('^[AT]{6,}')}]
[--pattern_loop {re.compile('[AT]{1}$'),re.compile('^[AT]{1}'),re.compile('[A]{1}$'),re.compile('^[A]{1}'}]
[--infile_path INFILE_PATH]
[--outfile_path OUTFILE_PATH]
Welcome to PyPAD - a tool that detects polyadenylation in the available RNA- seq sequencing data. Maintained at https://github.com/igib-rna-tails/PyPAD_PolyA-detector.
optional arguments:
-h, --help show this help message and exit
--strandness {forward,reverse}
Forward reads (R1, reading from 5' to 3') or reverse
reads (R2, reading from 3' to 5'
--selectminNnucteotides SELECTMINNNUCTEOTIDES
Option in PyPAD to pre-select reads having eg 6 nt in
the tail before the proceduce of triming one by one
nucleotide from the tail, and realign fastq file.
--mintail {re.compile('[AT]{6,}$'),re.compile('^[AT]{6,}')}
Option of pattern to preselect by
--selectminNnucteotides.
--pattern_loop {re.compile('[AT]{1}$'),re.compile('^[AT]{1}'),re.compile('[A]{1}$'),re.compile('^[A]{1}')}
Pattern of nucleotides cut one by one from 3' tail
--infile_path INFILE_PATH
Path to the fastq file with unmapped reads to analyse.
Please prepared data before the analysis with PyPAD
--outfile_path OUTFILE_PATH
Path to the output fastq file after PyPAD analysis.
The fastq file contains reads with precise polyA tail
sequence in the header.
Lidia Lipińska-Zubrycka, Maciej Grochowski, Michał Małecki (Institute of Genetics and Biotechnology, University of Warsaw, Poland)
Work was supported by Foundation for Polish Science (grant no. POIR.04.04.00-00-4316/17-00) and National Science Centre (grant no. 2019/03/X/NZ2/00787).