Description
Hi Readman Chiu,
I am interested in straglr in genotyping ONT reads and I did a test using few loci on chr22. However, I found the input filters are not really filtering out the results.
Here is my example bed locus extracted from simple_repeats.bed
chr22 10706266 10706584 CTAAAACAAGAATATGTAGACAGTAGCTACATTGGATTTCTGTGGCTAAAGAGTTGTTCATTTCTCTGGTAATGTGTAGAATTACTAAAATGCAGCATGACAACTGTTTCTCTGTAATAGTGATCCAACATGACGTGTAGTATTACACAGGGTTG
and my code is
time straglr.py ${BAM} ${REF} ${OUTPREFIX} --loci ${BED} --genotype_in_size --min_support 2 --max_str_len 10 --max_num_clusters 2 --max_cov 30
The repeat motif length is 155 and the coverage should be above 200 (from allele1:support). I was assuming my --max_str_len 10 --max_cov 30 would make straglr discard this locus and stop genotyping it, but i still got the results in the vcf/tsv/bed outputs. So I am wondering how these input filters work.
I think these filters are quite important for genome-wide genotyping because such long motif and high coverage will largely increase the run time, and also regions with much higher coverage usually indicates mapping issues.
I tested on both 1.5.3 and 1.5.0. I also tested some other examples and I found --min_support is working normally.
Thanks
Best,
xbwdk