Abstract
Spinal muscular atrophy (SMA) is a genetic disorder that causes progressive degeneration of lower motor neurons and the subsequent loss of muscle function throughout the body. It is the second most common recessive disorder in individuals of European descent and is present in all populations. Accurate tools exist for diagnosing SMA from short read and long read genome sequencing data. However, there are no publicly available tools for GRCh38-aligned data from panel or exome sequencing assays which continue to be used as first line tests for neuromuscular disorders. We therefore developed and extensively validated a new tool - SMA Finder - that can diagnose SMA not only in genome, but also exome and targeted sequencing samples aligned to GRCh37, GRCh38, or T2T-CHM13. It works by evaluating aligned reads that overlap the c.840 position of SMN1 and SMN2 in order to detect the most common molecular causes of SMA. We applied SMA Finder to 16,626 exomes and 3,911 genomes from heterogeneous rare disease cohorts sequenced at the Broad Institute Center for Mendelian Genomics as well as 1,157 exomes and 8,762 targeted sequencing samples from Tartu University Hospital. SMA Finder correctly identified all 16 known SMA cases and reported nine novel diagnoses which have since been confirmed by clinical testing, with another four novel diagnoses undergoing validation. Notably, out of the 29 total SMA positive cases, 21 had an initial clinical diagnosis of muscular dystrophy, congenital myasthenic syndrome, or congenital myopathy. This underscored the frequency with which SMA can be misdiagnosed as other neuromuscular disorders and confirmed the utility of using SMA Finder to reanalyze phenotypically diverse neuromuscular disease cohorts. Finally, we evaluated SMA Finder on 198,868 individuals that had both exome and genome sequencing data within the UK Biobank (UKBB) and found that SMA Finder’s overall false positive rate was less than 1 / 200,000 exome samples, and its positive predictive value (PPV) was 96%. We also observed 100% concordance between UKBB exome and genome calls. This analysis showed that, even though it is located within a segmental duplication, the most common causal variant for SMA can be detected with comparable accuracy to monogenic disease variants in non-repetitive regions. Additionally, the high PPV demonstrated by SMA Finder, the existence of treatment options for SMA in which early diagnosis is imperative for therapeutic benefit, as well as widespread availability of clinical confirmatory testing for SMA, may warrant the addition of SMN1 to the ACMG list of genes with reportable secondary findings after genome and exome sequencing.
Competing Interest Statement
HLR receives research funding from Microsoft and previously received funding from Illumina to support rare disease gene discovery and diagnosis. AODL has consulted for Tome Biosciences and Ono Pharma USA Inc, and is member of the scientific advisory board for Congenica Inc and the Simons Foundation SPARK for Autism study. PBK has received research support from ML Bio and Sarepta Therapeutics, and has consulted for Lupin, Neurogene, NS Pharma, and Teneofour.
Funding Statement
The Broad CMG sequencing and analysis was funded by the National Human Genome Research Institute (NHGRI), the National Eye Institute, the National Heart, Lung and Blood Institute grant UM1HG008900 and NHGRI grants U01HG011755 and R01HG009141. GH is supported by the GREGoR Consortium, and research in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number U24HG011746. This publication has also been made possible in part by CZI grants 2019-19927, 2020-224274, and 2022-309464 https://doi.org/10.37921/236582yuakxy, from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation (funder DOI 10.13039/100014989). The Tartu University Hospital cohort analysis was funded by the Estonian Research Council grants PSG774 and PRG471. VSG was supported by NIH NHGRI grant T32HG010464. Work in CGB's laboratory is supported by intramural funds of NINDS/NIH.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB of Mass General Brigham gave ethical approval for this work
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Figure 3 panels C and D added to visualize the Tartu University Hospital samples in the same way as CMG (Figure 3A,B) and UKBB (Figure 3E,F) samples; Supplementary section "Detailed description of the SMA Finder algorithm" updated to clarify key assumptions, constants, and formulas.
Data Availability
Genomic and phenotypic data for GREGoR and CMG cohorts is available via dbGaP accession numbers phs003047 and phs001272. Access is managed by a data access committee designated by dbGaP and is based on intended use of the requester and allowed use of the data submitter as defined by consent codes. The Tartu University cohort is not publicly available. UKBB data is accessible to researchers who obtain Tier 3 access to the UK Biobank.