EAP: an integrated R toolkit that aims to facilitate the epitranscriptome data analysis in plants

EAP is an integrated R toolkit that aims to facilitate the epitranscriptome data analysis in plants. This toolkit contains a comprehensive set of functions for read mapping, CMR (chemical modifications of RNA) calling from epitranscriptome sequencing data, CMR prediction at the transcriptome scale, and CMR annotation (location distribution analysis, motif scanning and discovery, and gene functional enrichment analysis)

Version and download

Version 1.0 -First version released on september, 21th, 2017

Dependency

R environment

R (>= 3.3.1)
randomForst (>= 0.6)
seqinr (>= 3.4-5)
stringr (>= 1.2.0)
snowfall (>= 1.84-6.1)
ggplot2 (>= 2.2.1)
bigmemory (>= 4.5.19)
Rsamtools (>= 1.28.0)
motifRG (>= 1.21.0)
devtools (>= 1.13.3)

Global software environment

Tophat/Tophat2:Reads mapping
Bowtie/Bowtie2:Reads mapping
Hisat/Hisat2:Reads mapping

Python environment

macs2:Peak calling

Dependency installation

## Install R Dependency
dependency.packages <- c("randomForst", "seqinr", "stringr", "snowfall",
                          "ggplot2", "bigmemory", "Rsamtools", "motifRG",
                          "devtools")
install.pacakages(dependency.packages)

## Install Tophat/Tophat2
sudo apt-get update
sudo apt-get install tophat or sudo apt-get install tophat2
## Install Bowtie/Bowtie2
sudo apt-get update
sudo apt-get install bowtie or sudo apt-get install bowtie2
## Install Hisat/Hisat2
sudo apt-get update
sudo apt-get install hisat or sudo apt-get install hisat2

pip install macs2

pip

Installation

install.package("Download path/EAP_1.0.tar.gz",repos = NULL, type = "source")

Quick start

More details please see user manual

1.CMR calling

1.1 Arabidopsis thaliana m6A sequencing datasets

## download and convert example data
# Here, supposing that ‘sratoolkit’ has been downloaded in your device in the directory: /home/malab14/, then the following command will convert the sra format to fastq format
/home/malab14/sratoolkit/bin/fastq-dump --split-3 SRR1508369.sra 
#Otherwise, you can also download the fastqc toolkit to perform quality control for fastq formatted files.
#Downloading the fastqc
wget http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.5.zip
#decompressing
unzip fastqc_v0.11.5.zip
#Running fastq
./FastQC/fastqc SRR1508369.fastq

1.2 Read mapping

input.fq <- "/home/malab14/input.fastq"  
RIP.fq <- "/home/malab14/RIP.fastq"  
referenceGenome <- "/home/malab14/tair10.fa"  
GTF <- "/home/malab14/Arabidopsis.gtf"  
input.bam <- readsMapping(alignment = "tophat", fq = input.fq,   
                          refGenome = referenceGenome, paired = F,
                          bowtie1 = NULL, p = 5, G = GTF)
RIP.bam <- readsMapping(alignment = "tophat", fq = RIP.fq,   
                        refGenome = referenceGenome, paired = F,
                        bowtie1 = NULL, p = 5, G = GTF)

1.3 CMR calling from read-alignment files

################m6A peak calling through "SlidingWindow"##################  
cmrMat <- CMRCalling(CMR = "m6A", method = "SlidingWindow",  
                     IPBAM = RIP.bam, inputBAM = input.bam)

2.Transcriptome-level CMR prediction

2.1 Arabidopsis m6A benchmark dataset construction

cDNA <- "/home/malab14/tair10_cDNA.fa"  
GTF <- "/home/malab14/Arabidopsis.gtf"  
###Convert genomic position to cDNA position  
peaks <- G2T(bedPos = cmrMat, GTF = GTF)  
###Search consensus motif in cDNA sequence  
motifPos <- searchMotifPos(RNAseq = cDNA)  
posSamples <- findConfidentPosSamples(peaks = peaks,  
                                      motifPos = motifPos)  
unlabelSamples <- findUnlabelSamples(cDNAID = posSamples$cDNAID,   
                                     motifPos = motifPos,   
                                     posSamples = posSamples$positives)

2.2 Sample vectorization with three feature encoding schemes

#########################Extract sequence#################################  
posSeq <- extractSeq(RNAseq = cDNA, sampleMat = posSamples, seqLen = 101)  
unlabelSeq <- extractSeq(RNAseq = cDNA, sampleMat = unlabelSamples, 
                         seqLen = 101)  
#########################Feature encoding#################################  
posFeatureMat <- featureEncoding(posSeq)  
unlabelFeatureMat <- featureEncoding(unlabelSeq<
6BDE
/span>) 
featureMat <- rbind(posFeatureMat, unlabelFeatureMat)

3.3 m6A predictor construction using ML-based PSOL algorithm

###Setting the psol directory and running the PSOL-based ML classification###  
PSOLResDic <- "/home/malab14/psol/"  
psolResults <- PSOL(featureMatrix = featureMat, positives = positives,   
                    unlabels = unlabels, PSOLResDic = PSOLResDic, cpus = 5) 

####Ten-fold cross-validation and ROC curve analysis.
cvRes <- cross_validation(featureMat = featureMat,   
                          positives = rownames(posFeatureMat),  
                          negatives = rownames(unlabelFeatureMat),  
                          cross = 10, cpus = 1)

3.CMR annotation

3.1 CMR location distribution

GTF <- "/home/malab14/Arabidopsis_tair10.gtf"  
#####Extract the UTR position information from GTF file and perform CMR location distribution analysis.  
UTRMat <- getUTR(GTF = GTF)  
results <- CMRAnnotation(cmrMat = cmrMat, SNR = T, UTRMat = UTRMat)

3.2 Motif scanning and discovery

RNAseq <- "/home/malab14/tair10.fa"  
testSeqs <- extractSeqs(cmrMat = cmrMat, RNAseq = RNAseq)  
results <- motifScan(sequence = testSeqs, motif = "[AG][AG]AC[ACT]")
motifs <- motifDetect (sequence = testSeqs)

3.3 Functional enrichment analysis of CMR corresponded genes

enrichements <- runTopGO(geneID = geneID,   
                         dataset = "athaliana_eg_gene",  
                         topNodes = 20)

Ask questions

Please use EAP/issues for how to use DeepGS and reporting bugs.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EAP: an integrated R toolkit that aims to facilitate the epitranscriptome data analysis in plants

Version and download

Dependency

R environment

Global software environment

Python environment

Dependency installation

Installation

Contents

CMR calling

Transcriptome-level CMR prediction

CMR annotation

Quick start

1.CMR calling

2.Transcriptome-level CMR prediction

3.CMR annotation

Ask questions

About

Uh oh!

Releases

Packages

Uh oh!

qchengray/EAP

Folders and files

Latest commit

History

Repository files navigation

EAP: an integrated R toolkit that aims to facilitate the epitranscriptome data analysis in plants

Version and download

Dependency

R environment

Global software environment

Python environment

Dependency installation

Installation

Contents

CMR calling

Transcriptome-level CMR prediction

CMR annotation

Quick start

1.CMR calling

2.Transcriptome-level CMR prediction

3.CMR annotation

Ask questions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages