8000 Expanded allele not detected · Issue #56 · bcgsc/straglr · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Expanded allele not detected #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stfacc opened this issue Nov 25, 2024 · 4 comments
Open

Expanded allele not detected #56

stfacc opened this issue Nov 25, 2024 · 4 comments

Comments

@stfacc
Copy link
stfacc commented Nov 25, 2024

Is there a reason why the longest allele is not detected for this locus:

reads.txt

Running with these options:

straglr.py \
        $bam \
        $reference \
        $prefix \
        --min_ins_size 20 --min_str_len 3  --max_str_len 6 \
        --nprocs 80 \
        --trf_args 2 5 5 80 10 10 6 \
        --debug

Version: 1.5.2

@readmanchiu
Copy link
Collaborator

Clustering cannot cluster the 3kb with the other sizes.
This seems a rather complex locus with both CAA and CGG repeats detected, and each one has some big and small alleles. Have you confirmed this is correct?
Also the coordinate suggests this is not a human sample?

@stfacc
Copy link
Author
stfacc commented Nov 27, 2024

Thanks for the quick answer.

This is a human sample, it's an expansion in NUTM2B-AS1

A similar targeted analysis with TRGT shows the following structure (CAA probably a sequencing artifact):

S1 NUTM2B-AS1 wf

I appreciate that it's not easy to cluster, my only concern is that this would be completely missed in a genome-wide scan to detect novel expansions.

For a similar sample (with better depth), the same expansion is identified:

reads_s2.txt

S2 NUTM2B-AS1 wf

@readmanchiu
Copy link
Collaborator

So TGRT seems to support Straglr's tsv report. The CAA repeat cannot be regarded as a sequencing artifact.
From the pictures it's clear there is an expansion going in this locus, yet the expansion seems mosaic and a concise summarization of the genotype cannot be achieved. Need to have some thoughts on how to represent it in vcf too.
Thanks very much for providing this example. There will be some overhauls to the code to be made to make the change happen, which hopefully will be incorporated into the next release.

@HLHsieh
Copy link
HLHsieh commented Apr 11, 2025

Hi,

I would like to follow up on this issue and add a few questions. I am currently analyzing a sample with mosaic repeat expansion.

I wonder if Straglr is capable of handling mosaic repeat loci. Specifically:
1. Would adjusting the parameter --max_num_clusters help in resolving mosaic alleles?
2. I noticed that Straglr includes some assessment of mosaicism — could you please clarify how this is done?
3. Do you have any recommendations or best practices for analyzing mosaic repeat expansions using Straglr?

Any guidance or insights would be greatly appreciated. Thank you!

Best regards,
Hsin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0