PyPAD (Python PolyA Detector)

Detection of polyadenylation in RNA-seq data

Polyadenylation plays a crucial role in transcript maturation, and it is widespread in eukaryotic mRNA. We designed the PyPAD - a command-line tool that detects polyadenylation in the available RNA-seq sequencing data sets. The tool combined Python script with commonly used genomic tools (Hisat2, samtools, bamtools). We assumed that there were transcripts in the RNAseq data that would not map to the genome due to the presence of a polyA tail. We have extracted reads that have a polyA tail from the unmapped reads. Then, we have cut nucleotides at the 3 ' end of the RNA one by one and remapped reads to obtain a pool of reads having polyA tails.

Simplified scheme of PyPAD

Table of content

Requirenments
Usage
Authors
Founding

Requirenements

Usage

To run PyPAD, please save PyPAD.py in your local directory where you have fastq file to analyse. In the directory, you should prepare folder 'reference' containing built index for HiSat2. (Feel free to use another aligner. To do it, you should change the code in PyPAD.py carefully.)

Build Hisat2 index according to documentation, and save your reference in "reference" file
Select unmapped reads -- we recommend to preprocessed data in a common way (quality control, trimming adaptors), and than do alignment, and select unmapped reads. We extracted unmapped reads from bam file using samtools:

$  samtools view -b -f 4 input.bam > output_unmapped.bam

Run the code:

$  python PyPolyADetector.py [optional arguments]

Help & Options

usage: PyPolyADetector.py [-h] [--strandness {forward,reverse}]
                          [--selectminNnucteotides SELECTMINNNUCTEOTIDES]
                          [--mintail {re.compile'[AT]{6,}$'),re.compile('^[AT]{6,}')}]
                          [--pattern_loop {re.compile('[AT]{1}$'),re.compile('^[AT]{1}'),re.compile('[A]{1}$'),re.compile('^[A]{1}'}]
                          [--infile_path INFILE_PATH]
                          [--outfile_path OUTFILE_PATH]

Welcome to PyPAD - a tool that detects polyadenylation in the available RNA- seq sequencing data. Maintained at https://github.com/igib-rna-tails/PyPAD_PolyA-detector.

optional arguments:
  -h, --help            show this help message and exit
  --strandness {forward,reverse}
                        Forward reads (R1, reading from 5' to 3') or reverse
                        reads (R2, reading from 3' to 5'
  --selectminNnucteotides SELECTMINNNUCTEOTIDES
                        Option in PyPAD to pre-select reads having eg 6 nt in
                        the tail before the proceduce of triming one by one
                        nucleotide from the tail, and realign fastq file.
  --mintail {re.compile('[AT]{6,}$'),re.compile('^[AT]{6,}')}
                        Option of pattern to preselect by
                        --selectminNnucteotides.
  --pattern_loop {re.compile('[AT]{1}$'),re.compile('^[AT]{1}'),re.compile('[A]{1}$'),re.compile('^[A]{1}')}
                        Pattern of nucleotides cut one by one from 3' tail
  --infile_path INFILE_PATH
                        Path to the fastq file with unmapped reads to analyse.
                        Please prepared data before the analysis with PyPAD
  --outfile_path OUTFILE_PATH
                        Path to the output fastq file after PyPAD analysis.
                        The fastq file contains reads with precise polyA tail
                        sequence in the header.

Authors

Lidia Lipińska-Zubrycka, Maciej Grochowski, Michał Małecki (Institute of Genetics and Biotechnology, University of Warsaw, Poland)

Founding

Work was supported by Foundation for Polish Science (grant no. POIR.04.04.00-00-4316/17-00) and National Science Centre (grant no. 2019/03/X/NZ2/00787).

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
LICENSE		LICENSE
PyPAD_scheme_github.png		PyPAD_scheme_github.png
PyPolyADetector.py		PyPolyADetector.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyPAD (Python PolyA Detector)

Detection of polyadenylation in RNA-seq data

Simplified scheme of PyPAD

Table of content

Requirenements

Usage

Authors

Founding

About

Uh oh!

Releases

Packages

Languages

License

igib-rna-tails/PyPAD_PolyA-detector

Folders and files

Latest commit

History

Repository files navigation

PyPAD (Python PolyA Detector)

Detection of polyadenylation in RNA-seq data

Simplified scheme of PyPAD

Table of content

Requirenements

Usage

Authors

Founding

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages