Open
Description
Description of the bug
Hi nf-core team,
We have recently analyzed RNA-seq data for the Mus musculus
species using the default parameters. We identified differentially expressed genes that are currently deprecated or retired in the Ensembl database. After the investigation, we noticed that the GTF used by the pipeline appears to be outdated. We used gffcompare
to compare with Ensembl GTF (Mus_musculus.GRCm38.102.chr.gtf.gz
). I made sure to compare the genome.filtered.gtf
produced by the pipeline.
# gffcompare v0.12.9 | Command line was:
#./gffcompare-0.12.9.Linux_x86_64/gffcompare -r genome.filtered.gtf -o gtfcmp_new genome.filtered_updated.gtf
#
#= Summary for dataset: genome.filtered_updated.gtf
# Query mRNAs : 142604 in 53448 loci (115576 multi-exon transcripts)
# (20372 multi-transcript loci, ~2.7 transcripts per locus)
# Reference mRNAs : 109160 in 44274 loci (89147 multi-exon)
# Super-loci w/ reference transcripts: 43571
#-----------------| Sensitivity | Precision |
Base level: 99.0 | 79.4 |
Exon level: 98.7 | 85.2 |
Intron level: 99.5 | 88.6 |
Intron chain level: 97.9 | 75.5 |
Transcript level: 79.9 | 61.2 |
Locus level: 59.9 | 49.4 |
Matching intron chains: 87258
Matching transcripts: 87258
Matching loci: 26524
Missed exons: 924/377955 ( 0.2%)
Novel exons: 35793/447306 ( 8.0%)
Missed introns: 562/253399 ( 0.2%)
Novel introns: 18019/284759 ( 6.3%)
Missed loci: 348/44274 ( 0.8%)
Novel loci: 9842/53448 ( 18.4%)
Total union super-loci across all input datasets: 53413
142604 out of 142604 consensus transcripts written in gtfcmp_new.annotated.gtf (0 discarded as redundant)
Is it possible to update the default GTF for Mus musculus
(and perhaps other species too)?
Command used and terminal output
Relevant files
No response
System information
No response