Description
Dear Authors,
First of all, thank you very much for the great tool.
I have been using Pangolin with the default masking option and noticed that sometimes it masks the positions where no splice sites are annotated in GENCODE. It happens when there is more than one annotated gene on the same strand. I think that the reason for this is that the masking that was done on the gain
and loss
arrays for the first gene leaks to the subsequent genes on the strand.
To reproduce this issue, I used this variant: chr9:37521865:C>T (hg38). The variant affects two genes on the reverse strand: ENSG00000147912 and ENSG00000256966. For ENSG00000147912, the position 37521838 is annotated as a splice site and the gain score at this position gets masked. However, for ENSG00000256966, this position is not annotated as a splice site, but the gain score at this position for this gene is still masked. For this variant, it affects the max gain score predicted for ENSG00000256966.