Analysis of Short Tandem Repeat Expansions in a Cohort of 12,496 Exomes from Patients with Neurological Diseases Reveals Variable Genotyping Rate Dependent on Exome Capture Kits
<p>Schematic overview of the study workflow.</p> "> Figure 2
<p>Cohort overview and study design. The map illustrates the global distribution of 12,496 cases included in the cohort, with participant numbers represented by coloured circles: Europe (N = 8649, blue), East Asia (N = 1602, yellow), Africa (N = 404, red), America (N = 334, dark red), and South Asia (N = 68, green). The right panel provides the demographic information and diagnostic categories included in the analysis. The study design is summarised in the blue boxes at the bottom.</p> "> Figure 3
<p>Total number of repeat expansions identified by EH, visual inspection and PCR validation. (<b>A</b>) 365 repeat expansions identified by EH with the visual inspection outcome. Loci are divided into three groups: coding, intron and UTR. Green bars represent calls that passed visual inspection, yellow bars are for calls that were categorised in the “borderline” group and red bars indicate samples that failed visual inspection. Loci that do not have a bar next to them did not have any expanded calls predicted by EH. (<b>B</b>) The outcome of PCR-tested samples. The light blue bars indicate samples that tested positive for PCR, while the pink bars represent samples that tested negative. Stripes indicate cases that were in the visual inspection “Pass” category, whereas dots represent cases that were “borderline” after visual inspection.</p> "> Figure 4
<p>Pedigree of SCA3 family and MRI scan of proband. The red arrow shows the proband. (<b>A</b>) Square = male; circle = female; black filled symbol = affected individual; white symbols = unaffected individuals; diagonal line = deceased individual. Double lines indicate consanguinity. (<b>B</b>) MRI scan of patient IV.8. The red arrow indicates cerebellar atrophy.</p> "> Figure 5
<p>Targeted loci and coverage according to the four most used exome sequencing kits in this cohort. (<b>A</b>) The RED loci are categorised based on their genomic location: coding, intron and UTR. Target (purple): the specific region of the gene is targeted by the exome kit. Not target (yellow): the region of interest is not covered by the exome kit. The percentage indicates how much of the region is not covered. For example, in <span class="html-italic">ATN1</span>, 60% of the region of interest is not covered by the SureSelect V4 kit. When not specified, the percentage of target or not target is 0%. The exome sequencing kits are represented by different bars: SureSelect V6, SureSelect V4, Nextera and TruSeq. The dashed lines under each group indicate the total number of RED loci analysed in each category: 12 coding, 7 intronic and 8 UTRs. (<b>B</b>) Heatmap showing the coverage of the analysed RED loci across different genomic regions. Coverage is represented by the number of sequencing reads mapping to each locus, as indicated by the colour scale. (<b>C</b>) 3D plots of the genotyping rate for EH-generated calls by read length and sequencing kit. The three plots show EH calls in coding, intron and UTR loci. In each plot, calls are divided by locus and read length. The four different colours represent the different exome capture kits used.</p> ">
Abstract
:1. Introduction
2. Methods
2.1. Cohort
2.2. Repeat Genotyping
2.3. Visual Inspection
2.4. PCR Validation
3. Results
Clinical Description of the SCA3 Case
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Marwaha, S.; Knowles, J.W.; Ashley, E.A. A guide for the diagnosis of rare and undiagnosed disease: Beyond the exome. Genome Med. 2022, 14, 23. [Google Scholar] [CrossRef] [PubMed]
- Record, C.J.; Reilly, M.M. Lessons and pitfalls of whole genome sequencing. Pract. Neurol. 2024, 24, 263–274. [Google Scholar] [CrossRef] [PubMed]
- Bansal, V.; Boucher, C. Sequencing Technologies and Analyses: Where Have We Been and Where Are We Going? IScience 2019, 18, 37–41. [Google Scholar] [CrossRef]
- Ibañez, K.; Polke, J.; Hagelstrom, R.T.; Dolzhenko, E.; Pasko, D.; Thomas, E.R.A.; Daugherty, C.L.; Kasperaviciute, D.; Smith, R.K.; WGS for Neurological Diseases Group; et al. Whole genome sequencing for the diagnosis of neurological repeat expansion disorders in the UK: A retrospective diagnostic accuracy and prospective clinical validation study. Lancet Neurol. 2022, 21, 234–245. [Google Scholar] [CrossRef] [PubMed]
- Van der Sanden, B.P.G.H.; Corominas, J.; De Groot, M.; Pennings, M.; Meijer, R.P.P.; Verbeek, N.; Van de Warrenburg, B.; Schouten, M.; Yntema, G.H.; Vissers, E.L.M.L.; et al. Systematic analysis of short tandem repeats in 38,095 exomes provides an additional diagnostic yield. Genet. Med. 2021, 23, 1569–1573. [Google Scholar] [CrossRef] [PubMed]
- Dolzhenko, E.; Deshpande, V.; Schlesinger, F.; Krusche, P.; Petrovski, R.; Chen, S.; Emig-Agius, D.; Gross, A.; Narzisi, G.; Bowman, B.; et al. Expansion Hunter: A sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics 2019, 35, 4754–4756. [Google Scholar] [CrossRef]
- Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef] [PubMed]
- Van der Auwera, G.A.; Carneiro, M.O.; Hartl, C.; Poplin, R.; Del Angel, G.; Levy-Moonshine, A.; Jordan, T.; Shakir, K.; Roazen, D.; Thibault, J.; et al. From FastQ data to high confidence variant calls: The Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinform. 2013, 43, 11.10.1–11.10.33. [Google Scholar] [CrossRef]
- Pedersen, B.S.; Quinlan, A.R. Who’s who? Detecting and resolving sample anomalies in human DNA sequencing studies with peddy. Am. J. Hum. Genet. 2017, 100, 406–413. [Google Scholar] [CrossRef] [PubMed]
- Samuels, D.C.; Han, L.; Li, J.; Quanghu, S.; Clark, T.A.; Shyr, Y.; Guo, Y. Finding the lost treasures in exome sequencing data. Trends Genet. 2013, 29, 593–599. [Google Scholar] [CrossRef] [PubMed]
- Méreaux, J.L.; Davoine, C.S.; Coutelier, M.; Guillot-Noël, L.; Castrioto, A.; Charles, P.; Coarelli, G.; Ewenczyk, C.; Klebe, S.; Heinzmann, A.; et al. Fast and reliable detection of repeat expansions in spinocerebellar ataxia using exomes. J. Med. Genet. 2023, 60, 717–721. [Google Scholar] [CrossRef] [PubMed]
- Yoon, J.G.; Lee, S.; Cho, J.; Kim, N.; Kim, S.; Kim, M.J.; Kim, S.Y.; Moon, J.; Chae, J.-H. Diagnostic uplift through the implementation of short tandem repeat analysis using exome sequencing. Eur. J. Hum. Genet. 2024, 32, 584–587. [Google Scholar] [CrossRef] [PubMed]
Gene | Disease | Repeat Motif | Cutoff (Repeat Units) | Cutoff in bp | Genomic Location |
---|---|---|---|---|---|
AR | Spinal and bulbar muscular atrophy | CAG | 36 | 108 | Coding |
ATN1 | Dentatorubral–pallidoluysian atrophy | CAG | 35 | 105 | Coding |
ATXN10 | Spinocerebellar ataxia 10 | ATTCT | 33 | 165 | Intron |
ATXN1 | Spinocerebellar ataxia 1 | CAG | 39 | 117 | Coding |
ATXN2 | Spinocerebellar ataxia 2 | CAG | 32 | 96 | Coding |
ATXN3 | Spinocerebellar ataxia 3 | CAG | 45 | 135 | Coding |
PHOX2B | Congenital central hypoventilation syndrome | GCN | NA | NA | Coding |
ATXN7 | Spinocerebellar ataxia 8 | CAG | 33 | 99 | Coding |
ATXN80S | Spinocerebellar ataxia 8 | CAG | 40 | 120 | 3′UTR |
C9orf72 | Frontotemporal dementia and/or amyotrophic lateral sclerosis | GGGGCC | 30 | 180 | Intron |
CACNA1A | Spinocerebellar ataxia 6 | CAG | 19 | 57 | Coding |
CNBP | Myotonic dystrophy 2 | CCTG | 27 | 108 | Intron |
DMPK | Myotonic dystrophy 1 | CTG | 36 | 108 | 3′UTR |
FMR1 | FMR1-related disorders | CGG | 55 | 165 | 5′UTR |
FXN | Friedreich ataxia | GAA | 34 | 102 | Intron |
HTT | Huntington disease | CAG | 35 | 105 | Coding |
JPH3 | Huntington disease-like 2 | CTG | 49 | 147 | Exon |
NOP56 | Spinocerebellar ataxia 36 | GGCCTG | 15 | 90 | Intron |
PPP2R2B | Spinocerebellar ataxia 12 | CAG | 33 | 99 | 5′UTR |
TBP | Spinocerebellar ataxia 17 | CAG | 43 | 129 | Coding |
NIPA1 | Hereditary Spastic Paraplegia type 6 | GCG | NA | NA | 5′UTR |
NOTCH2NL | Neuronal intranuclear inclusion disease | GGC | 55 | 165 | 5′UTR |
RFC1 | Cerebellar ataxia, neuropathy, and vestibular areflexia syndrome | AAGGG | 0 | 5 | Intron |
PABN1 | Oculopharyngeal muscular dystrophy | GCN | NA | NA | Coding |
CSTB | Progressive myoclonic epilepsy 1A | CCCCGCCCCGCG | 4 | 48 | Intron |
GLS | Global developmental delay, progressive ataxia, and elevated glutamine | GCA | 30 | 90 | 5′UTR |
Gene | Repeat GT | Sex | Ethnicity | Clinical Details |
---|---|---|---|---|
ATN1 | 16/>80 | Female | South Asian | Epilepsy with developmental delays. |
ATXN2 | 23/38 | Male | European | Slowly progressive cerebellar ataxia syndrome with evidence of weakness in the lower limbs and mild spastic increased tone. No extrapyramidal signs. |
ATXN3 | 23/>81 | Female | South Asian | Speech regression, wasting of muscles, motor axonal polyneuropathy and seizures. |
ATXN3 | 27/60 | Male | American | Ataxia. |
DMPK | 8/60 | Female | South Asian | Myotonic dystrophy as well as motor and sensory neuropathy. |
DMPK | 12/>150 | Male | South Asian | Hereditary peripheral neuropathy and myotonic features; 46 y at examination. |
DMPK | 9/>150 | Male | Unknown | Affected brother. Myotonia atrophica and hereditary motor and sensory neuropathy; 40 y at examination. |
DMPK | 12/60 | Male | Unknown | Hereditary peripheral neuropathy; 42 y at examination. |
DMPK | 9/>150 | Male | European | Myotonia atrophica and hereditary motor and sensory neuropathy; 50 y at examination. |
DMPK | 8/>150 | Male | European | Myotonia atrophica and hereditary motor and sensory neuropathy. Affected nephew; 35 y at examination. |
DMPK | 9/>150 | Female | European | Myotonia atrophica and hereditary motor and sensory neuropathy. Affected niece; 29 y at examination. |
HTT | 29/53 | Female | European | Ataxia, hyperreflexia, chorea, neurodevelopmental delays, no extra-ocular or sphincter involvement, cerebellar and brain stem atrophy on MRI. |
TBP | 38/57 | Male | Unknown | Familial SCA, early death. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rocca, C.; Murphy, D.; Clarkson, C.; Zanovello, M.; Gagliardi, D.; Genomics, Q.S.; Kaiyrzhanov, R.; Alvi, J.; Maroofian, R.; Efthymiou, S.; et al. Analysis of Short Tandem Repeat Expansions in a Cohort of 12,496 Exomes from Patients with Neurological Diseases Reveals Variable Genotyping Rate Dependent on Exome Capture Kits. Genes 2025, 16, 169. https://doi.org/10.3390/genes16020169
Rocca C, Murphy D, Clarkson C, Zanovello M, Gagliardi D, Genomics QS, Kaiyrzhanov R, Alvi J, Maroofian R, Efthymiou S, et al. Analysis of Short Tandem Repeat Expansions in a Cohort of 12,496 Exomes from Patients with Neurological Diseases Reveals Variable Genotyping Rate Dependent on Exome Capture Kits. Genes. 2025; 16(2):169. https://doi.org/10.3390/genes16020169
Chicago/Turabian StyleRocca, Clarissa, David Murphy, Chris Clarkson, Matteo Zanovello, Delia Gagliardi, Queen Square Genomics, Rauan Kaiyrzhanov, Javeria Alvi, Reza Maroofian, Stephanie Efthymiou, and et al. 2025. "Analysis of Short Tandem Repeat Expansions in a Cohort of 12,496 Exomes from Patients with Neurological Diseases Reveals Variable Genotyping Rate Dependent on Exome Capture Kits" Genes 16, no. 2: 169. https://doi.org/10.3390/genes16020169
APA StyleRocca, C., Murphy, D., Clarkson, C., Zanovello, M., Gagliardi, D., Genomics, Q. S., Kaiyrzhanov, R., Alvi, J., Maroofian, R., Efthymiou, S., Sultan, T., Vandrovcova, J., Polke, J., Labrum, R., Houlden, H., & Tucci, A. (2025). Analysis of Short Tandem Repeat Expansions in a Cohort of 12,496 Exomes from Patients with Neurological Diseases Reveals Variable Genotyping Rate Dependent on Exome Capture Kits. Genes, 16(2), 169. https://doi.org/10.3390/genes16020169