Introduction

Restless legs syndrome (RLS) is a common sensorimotor disorder that is known to impact quality of life and health1,2. The prevalence ranges from 5 to 18.8% in European populations3,4,5 with approximately 2 to 3% of the general population thought to benefit from medical treatments that ameliorate symptoms5,6,7. RLS symptoms include uncomfortable sensations predominantly localized in the legs that are experienced as pain in at least one-third of subjects, which elicit a strong urge to move for symptomatic relief. The symptoms increase in the evening and at night. Consequently, the onset and maintenance of sleep are negatively impacted in most RLS patients, which in turn, is thought to impair daytime cognition and mental well-being8. The majority of RLS patients experience involuntary leg movements at transitions to sleep, and during sleep (periodic leg movements in sleep (PLMS)). Many also have social activities and work productivity interrupted by RLS symptoms2.

One of the underlying pathophysiological mechanisms of RLS involves impaired re-uptake of synaptic dopamine and reduced D2 receptor density, explaining why the disorder can sometimes be treated with dopamine-based therapies9. It is hypothesized that the re-uptake of synaptic dopamine is affected by brain iron level9. Supporting this, in RLS patients low brain iron has been found in the substantia nigra and the striatum, whose roles in regulating reward, motivation, and movement are well established10,11,12.

Moreover, a variety of modifiable health and lifestyle risk factors that accompany or aggravate RLS have been reported, including obesity, smoking, high alcohol intake, and sedentary lifestyle3,13. The prevalence is greater in individuals with reduced iron reserves14. Even though iron supplementation can be effective in relieving symptoms, especially in patients with iron deficiency, there are currently limited treatment options for RLS15,16, which also appears to be underdiagnosed17. Existing treatments address symptoms rather than the underlying cause of the disease. A fundamental reason for this is our relatively limited knowledge of the pathogenesis of the disorder. One way to increase our understanding of RLS is to expand knowledge of the genetic architecture of the disorder, which is complex and polygenic in nature6. Genome-wide association studies (GWAS) of European ancestry populations have yielded 20 single nucleotide polymorphisms (SNPs) in 19 loci that associate with RLS6,18,19,20,21,22,23,24.

The aim of the present study was to search for additional RLS-associated loci that might provide new insights into the disease pathophysiology and be useful in the discovery of new drugs or repurposing of existing drugs for RLS treatment. To this end, a meta-analysis of GWAS of RLS including 480,982 adults of European ancestry (recruited from Iceland, Denmark, United Kingdom (UK), Netherlands and the United States (USA)) was conducted. Following this, novel findings were tested for replication in two additional case-control sets of European ancestry, the EU-RLS-GENE and RBC-Omics cohorts. Subsequently, all cohorts were meta-analyzed. Finally, to search for traits associated with RLS, we calculated polygenic risk scores for RLS (RLS-PRS) for the UK Biobank subjects and tested associations between RLS-PRS and 12,075 traits (binary and quantitative). The UK Biobank is one of the largest and most widely used recourses for studying health and well-being. The biobank sample is population-based, and the 500,000 volunteer participants provide health information to approved researchers by allowing the UK Biobank to link to existing health records, such as those from general practice and hospitals25,26. This study confirms 19 of the 20 previously reported RLS sequence variants at 19 loci and identifies three novel RLS-associated variants. Cis-eQTL analysis indicates a potential causal impact on gene expression at four of the 22 RLS loci. Finally, investigating traits associated with polygenic risk score for RLS, this study confirms and adds to prior epidemiological findings by implicating among other factors obesity, smoking and high alcohol intake as lifestyle risk factors for RLS.

Results

Genome-wide association study: discovery and replication

The discovery meta-analysis confirmed 19 of the 20 previously reported RLS variants6 (Fig. 1 and Supplementary Tables 13). The remaining SNP, rs12962305-T, had an effect size that was significantly smaller than previously reported meta-analyses (Table 1). The P-values of association with five sequence variants, at loci not previously associated with RLS, were below 5 × 10−8 in the discovery sample and were tested in a follow up sample, including the EU-RLS-GENE cohort (6228 cases and 10,992 controls) and the RBC-Omics cohort (423 cases and 7,334 controls) (Supplementary Table 1 and Supplementary Figs. 15 for regional association plots). Three of the tested variants surpassed genome-wide significance in the meta-analysis of all samples27,28 (Table 1). The novel RLS-associated sequence variants are; rs10068599-T in an intron of RANBP17 on 5q35.1 (OR = 1.09, P = 6.9 × 10−10, 95% CI: 1.06–1.12), rs112716420-G in close proximity of MICALL2 on 7p22.3 (OR 1.25, P = 1.5 × 10−18, 95% CI: 1.19–1.31) and rs10769894-A near LMO1 and STK33 on 11p15.4 (OR = 0.90, P = 9.4 × 10−14, 95% CI: 0.88–0.93) (Table 1).

Fig. 1: Manhattan plot displaying results from the RLS discovery meta-analysis for N = 480,982 independent biological samples.
figure 1

Variants labeled orange are previously reported variants. Variants labeled blue and green are novel variants (five) that were tested in a follow-up sample. Of the five novel variants, three were confirmed (green diamond shape) in the follow up analysis and met the genome-wide significance threshold27,28, whereas two did not (Table 1). (see Supplementary Table 1 for details; See Supplementary Figs. 15 for regional Manhattan plots displaying the five novel RLS-associated variants).

Table 1 Sequence variants associated with RLS.

Cis-co-localization analysis of RLS variants using GTEx

To identify the RLS variants acting as cis-expression quantitative trait loci (cis-eQTL) sharing the same signal with top eQTL of respective gene and tissue, we performed stepwise pairwise co-localization analysis. We investigated cis-eQTL of RLS variants in 54 tissues reported in the GTEx database. Of the 23 tested RLS variants (20 previously reported and three novel), we found cis-eQTL data for 11 impacting 17 genes (Supplementary Tables 4 and 5). Of the 11 with data, 10 strongly associate with cis-gene expression (P < 3.3 × 10−06, Supplementary Table 6). Six of these 10 variants are in LD (r2 > 0.3) with top-eQTL for the respective gene (Supplementary Table 4). To ascertain that RLS variants and top-eQTLs share the same signal, we further evaluated these six variants by two-way approximate conditional analysis, which was implemented in COJO29. Therein, conditional analysis using RLS effect sizes showed that four RLS variants and eQTLs share the same signal (Supplementary Table 5). Additionally, conditional analysis using GTEx effect sizes also confirmed these as the same associated signals (Supplementary Table 6). Hence, four RLS variants (rs10068599-T, rs1063756-CACAG, rs12450895-A, and rs3784709-T) co-localize with top eQTLs for five genes respectively (RANBP17, CASC16, HOXB2, MAP2K5, and SKOR1) (Fig. 2) (for all RLS-associated variants see Supplementary Fig. 2).

Fig. 2: Cis-co-localization of RLS variants using 54 GTEx tissues. Displaying eQTL variants.
figure 2

We found cis-eQTL data for 11 of the 23 RLS variants impacting 17 genes. Figure 2 displays the four variants that are significantly associated with cis-gene expression at least in one tissue tested are in linkage disequilibrium (LD) (r2 > 0.30) and share the same causal signal (as confirmed through approximate conditional analysis) with the top eQTL variant of the respective genes (results for the remaining variants are displayed in Supplementary Fig. 6). Cis-eQTL effect estimates (normalized) are provided and those sharing same causal signal (COJO conditional analysis, results from this are displayed in Supplementary Table 5) with eQTL and are Bonferroni significant (P < 3.3 × 10−06) are labeled with an asterisk.

rs10068599-T is associated with a lower expression of RANBP17 in brain subcortical regions, mainly in the basal ganglia and in the liver, thyroid and heart left ventricle. rs3784709-T is associated with a lower expression of SKOR1 in pituitary, pancreas, and mammary tissues, while the variant also is associated with a lower expression of MAP2K5 in the left ventricle of the heart. Moreover, rs10653756-CACAG appears to be associated with a specific effect on CASC16 expression in testes. rs12450895-A affects the expression of HOXB2 by lowering it in suprapubic skin, fibroblasts cells, and in the omentum (visceral adipose tissue) (Fig. 2).

Genetic risk and LD regression analysis

We used RLS-PRS to predict RLS clinical cases (N = 1916 with the ICD10:G25.8 diagnostic code) in UK Biobank data. The analysis showed that RLS-PRS explains 0.97% of the phenotypic variance (Supplementary Fig. 7). One SD increase in RLS-PRS increases the odds of RLS 1.40-fold over that in population controls (P = 4.4 × 10−46, OR = 1.40, 95% CI: 1.35–1.45). Area under the curve and receiver operator curve analysis show that the risk for RLS increases for ascending quartiles (Supplementary Table 7 and Supplementary Fig. 8). RLS-PRS was used to identify traits associated with the score in the UK Biobank. Our analysis showed that higher RLS-PRS burden is negatively associated with educational attainment (P = 2.7 × 10−25, regression coefficient (β, continuous trait) = −0.02, standard error (SE): 0.002) and cognitive performance (P = 4.4 × 10−07, β = −0.01, SE: 0.002) and age at first time giving birth (P = 5.9 × 10−16, β = −0.02, SE: 0.003). The-PRS score furthermore associates positively with neuroticisms (P = 8.0 × 10−23, β = 0.01, SE: 0.002), as well as fat percentage in legs (P = 1.4 × 10−10, β = 0.01, SE: 0.002), and in the whole body (P = 4.7 × 10−07, β = 0.008, SE: 0.002) (Supplementary Tables 8 and 9). Results from LD score regression30 and PRS-association analysis are in keeping (Supplementary Tables 10 and 11). The gene-set enrichment/pathway analysis using MAGMA31 on a molecular signature database32 recourse did not reveal any significant associations after correction for multiple testing (Supplementary Table 12).

Discussion

Several sequence variants have been shown to associate with RLS, although causal variants at the associated loci and their functional relevance remains largely unknown. In a previous meta-analysis of RLS, 20 sequence variants at 19 loci were associated with RLS6. Here, we confirm associations with 19 of the 20 variants and report three novel associations bringing the number of RLS-associated variants to 23 at 22 loci. The three novel variants are rs112716420-G, rs10068599-T, and rs10769894-A.

The known protein-coding genes closest to rs112716420-G on chromosome 7 are MICALL2 and UNCX. Variants in these genes are associated with red blood cell count and volume (i.e., hematocrit values), hemoglobin concentration and glomerular filtration rate33,34,35. rs112716420-G, however, does not associate significantly with these phenotypes in our samples. Hence, it does not appear that rs112716420-G impacts iron homeostasis, which is thought to be involved in the pathogenesis of RLS11. It is known that peripheral iron deficiency affects brain iron availability, although the specific mechanisms explaining how iron moves between the periphery and the nervous system remain unclear9. Moreover, the homeobox comprising transcription factor Uncx4.1 has been found to be expressed in glutamatergic, GABAergic and dopaminergic neurons in the mouse midbrain36.

rs10068599-T is in an intron of RANBP17 (Ran-binding protein 17) on chromosome 5, which is a protein-coding gene of the exportin family. The cis-gene expression analysis showed that the rs10068599-T lowers the expression of RANBP17 mainly in the basal ganglia and in the cerebral cortex. Previous studies have found that variants in RANBP17 are associated with visceral fat37, body mass index (BMI)38, high-density lipoprotein (HDL) cholesterol39, smoking status40 and alcohol consumption41.

The closest protein-coding gene to rs10769894-A on chromosome 11 is LMO1. This gene encodes the protein rhombotin-1, which is normally expressed in neural lineage cells42,43. Variants in LMO1 have been associated with BMI44 and neuroblastoma and T-cell leukemia45,46, which is of interest since the strongest genetic predictor for RLS is a variant in MEIS1 that affects cancers such as leukemia and neuroblastoma47,48,49.

By integrating association statistics with gene expression data, we identified potential causal variants and genes affected at four of the 22 loci. As mentioned, the variant rs10068599-T lowers the expression of RANBP17 in brain subcortical regions. rs3784709-T lowers the expression of SKOR1 in pituitary, pancreas and mammary tissues. MEIS1 is considered an upstream activator of SKOR150, while rs12450895-A lowers the expression of HOXB2 in adipose tissue and skin. Finally, we found that rs10653756-CACAG affects the expression of CASC16 in testis. Hence, these variants may exert their causal effects through their impact on gene expression.

Our analysis showed that RLS-PRS, the aggregated genetic predisposition for RLS, correlates negatively with years of education and performance on cognitive tests but positively with neuroticism score. The RLS-PRS also correlates negatively with age at first birth and positively with several anthropometric measures, including whole body fat, percentage fat in trunk, legs and arms and waist-to-hip ratio. These findings extend prior epidemiological studies3 and both confirm and extend those of Schormair et al.6 who searched for diseases and other traits associating with RLS-PRS. RLS has consistently been associated with modifiable lifestyles broadly considered to be unhealthy. In a prospective cohort study including 55,540 US adults, for example, RLS prevalence was lower among individuals who had a normal body weight, who were physically active, who were non-smokers, and who had an alcohol intake below the medium amount13.

RLS is a complex polygenic sensorimotor disorder strongly influenced by lifestyle. This study increases the number of known independent RLS-associated genes to 23 in 22 loci, and cis-eQTL highlights genes at four of the loci giving more insights into RLS etiology. Future studies investigating the effect of drugs targeting the implicated physiological pathways and behavioral lifestyle changes on RLS as a therapeutic regime may provide valuable knowledge on the pathophysiology and the most prudent treatment modalities for RLS.

Methods

RLS status in the discovery samples

The GWAS meta-analysis included 480,982 (10,257 cases and 470,725 controls) adults of European ancestry. Mean ages in included cohorts: Iceland 47.2 (SD, 14.06); Demark, 41.1 (SD, 12.3); the UK (Interval), 43.3 (SD, 14.1); the UK Biobank 60.0 (SD, 8.70); the Netherlands, 45.0 (14.0); and the US 56.5 (SD, 16.6). In total the analysis comprised 14,084 subjects from deCODE Genetics (Iceland) (2636 cases and 11,448 screened controls)51, 26,565 subjects from The Danish Blood Donor Study (DBDS) (Denmark) (1379 cases)52,53, 27,988 subjects from the INTERVAL study (UK) (3065 cases)54, 408,565 subjects from the UK Biobank (UK) (1916 cases)55, 2363 subjects from the Donor InSight-III cohort (The Netherlands) (565 cases)56 and 1417 subjects from the Department of Neurology and Program in Sleep at Emory University (Emory cohort) (US) (696 cases) (Fig. 3).

Fig. 3: Overview of cohorts included in this study and the study scheme.
figure 3

Displays the number of cases and controls of each cohort included in the present study—both in the Discovery meta-analysis (N = 480,982 independent biological samples), the follow-up analysis (N = 24,977 independent biological samples) and in the meta-analysis combining Discovery and Follow-up samples (N = 505,959 independent biological samples).

We used clinical diagnosis or questionnaire data to assess RLS status in the participants, either applying questions based on the International RLS Study Group (IRLSSG) diagnostic criteria for RLS57,58 or the Cambridge-Hopkins RLS questionnaire (CH-RLSq), which is also based on these criteria. Definite and probable RLS cases were combined into one group59,60 (questionnaires are displayed in “Questionnaires used to assess RLS” on page 4 in Supplementary material). For subjects in the UK Biobank, the clinical diagnostic code ICD10:G25.8 was used to inform affectation status, whereas for the Emory cohort, gold standard diagnosis derived from face-to-face clinical evaluations by RLS specialists was used and the controls were determined for those lacking symptoms and signs associated with RLS.

Discovery meta-analysis

In total, we tested 15,838,848 sequence variants (1000 Genome phase 3 panel markers) for association with RLS (For a more detailed description of the included cohorts, see section “Cohorts included in the discovery meta-analysis” on page 2 in Supplementary material and section “Genotyping, imputation, and association analysis of cohorts included in the discovery meta-analysis” on page 7 for a detailed description of the methods). The GWAS results from the six cohorts (Iceland, Denmark, UK INTERVAL, UK Biobank, US Emory, and the Netherlands) were combined using a fixed effect inverse variance model61 allowing different allele frequencies (of genotypes) in each populations, i.e., based on the effect estimates and standard error. Moreover, to control for a heterogenetic effect of the markers tested in the populations, we used a likelihood ratio test (Cochran’s Q) and so evaluated their test statistics.

Before conducting the meta-analysis, variants in each dataset were mapped to NCBI Genome reference Consortium Build 38 (GRCh38) positions and matched to the Icelandic variants based on position and alleles. We included variants that were properly imputed in all datasets and which have a minor allele frequency >0.1% in more than one cohort. For the suggestive associations we used conventional genome-wide P-value threshold of P < 5 × 10−08 to find lead associations and to test those for replication. To claim a novel genome-wide association the sequence variants used in the meta-analysis (N = 15,838,848) were split into five classes based on their genome annotation and the weighted significance threshold for each class was used28 (for QQ-plot see Supplementary Fig. 9, and for principal component analysis plots see Supplementary Figs. 10 and 11).

Replication of novel variants

Novel variants identified in the discovery phase of our study were tested for association in two replication datasets consisting of subjects of European ancestry, the EU-RLS-GENE consortium6 (6228 cases and 10,992 controls) and the RBC-Omics cohort (423 cases and 7334 controls)62. In both replication tests, analyses were adjusted for age, sex, and the first 10 principal components of ancestry in a logistic regression model (For a more detailed description of the included cohorts, see section “Cohorts used for follow-up/replication analysis” on page 6 in Supplementary material) (Fig. 3). For the suggestive associations we used conventional genome-wide threshold (P < 5 × 10−08) to find lead associations, which were tested for replication. To claim a novel genome-wide association the sequence variants used in the meta-analysis (n = 15,838,848) were split into five classes based on their genome annotation, and the weighted significance threshold for each class was used28.

Gene expression

We assessed cis-eQTL effects of the variants associated with RLS. RNA sequencing data from 54 human tissues was obtained from the Genotype-Tissue Expression (GTEx) portal63. We tested all genes in a one Mb window centered on the 23 variants. In total 15,153 tests were performed, and Bonferroni threshold was applied to the P-value. Therefore, P < 0.05/15,153 = 3.3 × 10−06 was considered statistically significant.

Genetic risk

To assess the impact conferred by the confluence of common RLS variants we calculated a RLS-PRS for each of the 500,000 UK Biobank subjects. The RLS-PRSs were calculated using summary statistics from a subset of the RLS-GWAS meta-analysis (UK participants from the INTERVAL and the UK Biobank excluded). Briefly, to generate the RLS-PRS for the UK Biobank sample we used 630,000 informative SNPs across the genome and constructed locus allele-specific weightings by applying LDpred to the summary data from the subset meta-analysis GWAS64. Constructing individual weightings, we were able to calculate an aggregated score of genetic susceptibility for RLS in all included individuals. Subsequently, we assessed the impact of RLS-PRS on 12,075 traits (binary and quantitative) resulting in a Bonferroni significant threshold of P < 0.05/12,075 = 4.14 × 10−06.

URLs

GTEx, https://www.gtexportal.org/. The Genotype-Tissue Expression (GTEx).

COJO, https://cnsgenomics.com/software/gcta/#Overview.

SHAPEIT, https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html.

PLINK2, https://www.cog-genomics.org/plink/2.0/

IMPUTE 2, https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#download

Ethics

All sample identifiers were encrypted in accordance with the regulations of the Icelandic Data Protection Authority and written informed consent was collected from all study participants. The deCODE dataset was approved by the National Bioethics Committee of Iceland. The DBDS dataset was approved by The Scientific Ethical Committee of Central Denmark (M-20090237) and by the Danish Data Protection agency (30-0444). GWAS studies in DBDS were approved by the National Ethical Committee (NVK-1700407). The INTERVAL dataset was approved by the National Research Ethics Service Committee East of England - Cambridge East (Research Ethics Committee (REC: 11/EE/0538). The Emory dataset was approved by an institutional review board at Emory University, Atlanta, Georgia, US (HIC ID 133-98). The Donor InSight-III dataset was approved by the Medical Ethical Committee of the Academic Medical Center (AMC) in the Netherlands, and Sanquin’s Ethical Advisory Board approved DIS-III and all participants gave their written informed consent. UK Biobank is approved by the North West Multi-center Research Ethics Committee, and by the Patient Information advisory Group, the National Information Governance Board for Health and Social Care, and from the Community Health Index Advisory Group. UK Biobank also holds a Human Tissue Authority license65.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.