Main

Clinical laboratory measurements (e.g., blood test results) are powerful intermediate phenotypes that can be used to diagnose and monitor human diseases. Elucidation of the underlying genetics, as well as inference of genetic relationships to diseases and implicated cell types, can provide clues about disease biology. To this end, GWASs have been conducted to investigate various quantitative traits, including anthropometric1,2,3, metabolic4,5, kidney-related6,7, hematological8,9, and blood pressure traits10,11,12. The interplay between the genetics of quantitative traits and diseases has been assessed by several approaches, such as pleiotropy13,14, genetic correlation15,16, and Mendelian randomization17. For example, recent large-scale studies of body mass index (BMI), a key measure for assessing obesity, revealed shared genetic effects on metabolic traits and the involvement of the central nervous system2 and immune cells3 in obesity susceptibility. However, previous studies primarily examined subjects of European ancestry, and each study separately focused on few quantitative traits. For the creation of a comprehensive landscape, additional studies of non-European populations are warranted that simultaneously investigate a wide range of clinical measurements and extensively interrogate their relevance to complex diseases.

Here we report a GWAS of 58 quantitative traits in 162,255 Japanese individuals from the BioBank Japan Project (BBJ)18,19, one of the largest non-European single-descent biobanks with detailed phenotypes, to broaden the current knowledge and understanding of the genetics and biology of these traits. Moreover, we incorporated additional GWASs of complex diseases and traits in Japanese subjects, and evaluated pleiotropy, genetic correlation, and cell-type specificity with respect to the quantitative traits. Our study provides many insights into the genetic basis of various quantitative traits and illuminates the complex genetic links among clinical measurements, complex diseases, and relevant cell types.

Results

Genome-wide association analysis of 58 quantitative traits

We tested 5,961,600 autosomal variants and 147,353 X-chromosome variants (imputed with 1000 Genomes Project Phase 120; Methods) for association with 58 quantitative traits in 162,255 Japanese individuals. The studied traits covered a wide range of clinical measurements, grouped into nine distinct categories (Table 1): metabolic (n = 6), serum protein (n = 4), kidney-related (n = 4), electrolyte (n = 5), liver-related (n = 6), other biochemical (n = 6), hematological (n = 13), blood pressure (n = 4), and echocardiographic (n = 9). The study design is illustrated in Supplementary Fig. 1, and the detailed characteristics of the subjects, phenotype source, and exclusion criteria are described in Supplementary Tables 1 and 2.

Table 1 Overview of the studied quantitative traits

Overall, we identified 1,407 trait-associated loci for 53 quantitative traits that satisfied a genome-wide significance threshold of P = 5.0 × 10−8 (Methods). Of these, 679 loci were novel (Table 1 and Supplementary Table 3). When we applied multiple-testing correction to the number of the studied traits, 943 trait-associated loci for 51 traits showed significant associations (P < 5.0 × 10−8/58 = 8.6 × 10−10), of which 372 loci were novel. Stepwise conditional analysis for each trait-associated locus further identified 267 additional independent signals at 158 trait-associated loci for 39 traits (Supplementary Table 4). We observed multiple additional independent signals at 49 trait-associated loci, with the maximum number of 11 independent signals at 11q13.1 for uric acid (the top associated signal was rs57633992 on NRXN2; P = 7.30 × 10−845) (Supplementary Fig. 2). Although the genomic inflation factors (λGC) of several traits showed considerable inflation (mean λGC = 1.11), linkage disequilibrium (LD) score regression21 analysis confirmed no existence of substantial confounding biases for all traits (mean intercept: 1.04), as shown in Supplementary Table 5. Given the substantial sample sizes in our GWASs, these statistics suggest that a majority of the inflation was due to polygenic effects, and population stratification and other potential biases were strictly controlled3,21. Manhattan, quantile–quantile, and LD score plots are provided in Supplementary Dataset 1. Detailed regional plots for each locus are provided in Supplementary Dataset 2.

Trans-ethnic comparison of the allele frequencies of the identified loci between East Asians and Europeans showed an overall shared allelic spectrum across populations (r = 0.687; Supplementary Fig. 3). The novel loci tended to have higher allele frequencies in East Asians than in Europeans, as 60 novel loci (8.8%) were common (minor allele frequency ≥ 5%) in East Asians but rare (≤1%) in Europeans. Of note, the associated single-nucleotide polymorphisms (SNPs) in 15 unique loci (for example, ALDH2, EGF, and SUFU) were monomorphic in Europeans but had higher frequencies in East Asians (≥10%). These observations show the contribution of population-specific factors, such as evolutionary selection pressure, to the identified loci. The percentage of mean heritability of the traits explained by the significant loci was 2.84% (Supplementary Table 6). On average, the known loci in Europeans explained 1.92%, the overall known loci explained 2.03%, and the newly identified loci explained 0.84%. The percentage of global heritability explained by the genome-wide common SNPs was on average 8.60%, which is comparable to previous findings in Europeans (Supplementary Table 5).

Pleiotropy of top associated quantitative trait loci

Pleiotropy, defined here as the sharing of risk alleles across multiple traits, is a key concept in investigations of cross-phenotype relationships across human traits, leading us to decipher a shared genetic etiology underlying a complex genetic architecture13,14. To identify major pleiotropic loci, we assessed pleiotropy at the single-locus level across 763 unique loci (derived from the 1,407 trait-associated loci for 53 quantitative traits mentioned above; Methods). We identified numerous pleiotropic loci among the quantitative traits (n = 313), representing approximately 41% of the unique loci (Fig. 1 and Supplementary Table 7). Of these, 88 loci showed pleiotropy across traits in multiple trait categories (intercategorical pleiotropy), whereas the other 225 loci showed pleiotropy across traits in a single category (intracategorical pleiotropy).

Fig. 1: Overview of the identified loci and their pleiotropy.
figure 1

a, The number of identified trait-associated loci for each trait, grouped by type of pleiotropy. b, Identified loci and their pleiotropy. Each dot corresponds to a trait-associated locus. Larger dots highlight a pleiotropic association. Each radial line connects all dots for an intercategorical pleiotropic locus with a locus symbol. c, The number of associated traits for each intercategorical pleiotropic locus. The orders of the quantitative traits in each trait category are the same as those presented in Table 1.

We observed the most abundant intercategorical pleiotropy at ALDH2 (12q24.12), associated with 21 traits in seven categories (Supplementary Fig. 4). The most significant associations were at rs79105258 (the top associated signal was γ-glutamyl transferase (GGT); P = 9.98 × 10−100), which shows high minor allele frequency in East Asians (0.24) but is monomorphic in other ancestral populations20. Other pleiotropic loci that showed intercategorical pleiotropy included GCKR (2p23.3), associated with 18 traits in seven categories (rs1260326 for triglyceride; P = 1.69 × 10−94); ABO (9q34.2), associated with 15 traits in six categories (rs2519093 for alkaline phosphatase; P = 2.02 × 10−887); and RGS12 (4p16.3), associated with nine traits in six categories (rs4690095 for albumin; P = 1.63 × 10−22). Although RGS12 (4p16.3) has received little attention as a pleiotropic locus compared with the other loci mentioned13, this locus has shown associations with several traits and diseases, including serum lipids4 and inflammatory bowel disease22. Our results expand its associations with additional traits, including kidney function, serum calcium, GGT, and platelet count (Plt).

Polygenic correlations across quantitative traits

Another approach to infer genetic overlap across traits is to estimate a genetic correlation, that is, a correlation of causal effect sizes at a genome-wide level15,16. Rather than using a single-locus-level analysis, we evaluated genetic correlations under a polygenic model that could take into account the consistency of effect directions, unlike pleiotropy analysis, to disentangle the polygenic architecture of the studied traits. We incorporated additional GWAS results for the anthropometric traits BMI3 and adult height, obtained from ongoing studies under the BBJ (Supplementary Note 1), to gain a broader perspective on quantitative traits. We carried out bivariate LD score regression15 to estimate pairwise genetic correlations across the 59 quantitative traits (we excluded the E/A ratio, a marker of heart function, owing to small sample size; Methods). We found 173 significant genetic correlations (false discovery rate (FDR) < 0.05), 100 (58%) of which were intercategorical (Supplementary Fig. 5 and Supplementary Table 8).

We observed the greatest number of significant intercategorical genetic correlations with BMI, which showed significant correlations with 22 quantitative traits in seven trait categories (the most significant correlation (P = 9.83 × 10−17) was with mean arterial pressure). Total protein and height had the second highest numbers of correlated categories (n = 6), followed by triglycerides, non-albumin protein, and Plt (n = 5). Although some of the significant intercategorical genetic correlations had been suggested previously (for example, BMI and serum lipids in Europeans15), most were newly identified. Notably, most of these links were consistent with observations in epidemiological studies, thus demonstrating the robustness and potential of the genetics-based studies to elucidate novel biological and medical architectures of human traits without prior knowledge (Supplementary Table 8). For example, the observed negative correlation between white blood cell (WBC) count and total bilirubin was suggested in an epidemiological study23, but our study corroborated this correlation on the basis of genetics, thus providing empirical support for the hypothesis of the anti-inflammatory activities of bilirubin24.

Genetic correlations among quantitative traits and diseases

Given that clinical measurements are informative as intermediate phenotypes for the assessment of complex human diseases, we reasoned that additional exploration of genetic correlations between quantitative traits and diseases would provide more empirical corroboration of shared genetic architecture, which could illuminate the underlying etiology and pathogenesis. To this end, we additionally incorporated 30 case–control GWAS results for complex diseases in Japanese individuals (Table 2 and Supplementary Note 1)25,26,27,28,29,30, including cardiometabolic (n = 6), immune-related (n = 6), hematologic (n = 1), psychiatric (n = 2), and musculoskeletal diseases (n = 2); cancer (n = 7); and other diseases (n = 6).

Table 2 Summary of the additional case–control GWASs of the 30 complex diseases

We then estimated pairwise genetic correlations across the 59 quantitative traits and 30 diseases. We identified 68 significant genetic correlations (FDR < 0.05), which supported the biological relevance of associations between clinical measurements and complex diseases (Fig. 2 and Supplementary Table 8; the full results are presented in Supplementary Fig. 6 and Supplementary Table 9). Among the 68 significant correlations, 52 (76.5%) involved cardiometabolic diseases, correlating with quantitative traits in seven categories. Indeed, type 2 diabetes showed the greatest number of significant correlations with quantitative traits (n = 15), and demonstrated the most significant genetic correlation with hemoglobin A1c (rg = 0.724; P = 2.54 × 10−22). We also observed other significant correlations, such as those between ischemic stroke and uric acid (rg = 0.254; P = 5.74 × 10−5), and between myocardial infarction and albumin/globulin ratio (rg = −0.174; P = 1.06 × 10−3). Among the remaining 16 significant genetic correlations (other than for cardiometabolic diseases), the most significant correlation was between asthma and eosinophil count (rg = 0.348; P = 3.76 × 10−4). Other significant correlations included those between urolithiasis and systolic blood pressure (rg = 0.272; P = 7.22 × 10−4), asthma and systolic blood pressure (rg = 0.214; P = 8.84 × 10−4), and colorectal cancer and height (rg = 0.164; P = 2.92 × 10−3).

Fig. 2: Genetic correlations between the 59 quantitative traits and 30 diseases.
figure 2

Pairwise genetic correlations (n = 3,916) were estimated via bivariate LD score regression. Only correlations between the 59 quantitative traits and 30 diseases (n = 1,770) are shown (full results are shown in Supplementary Fig. 6). Positive genetic correlations are shown in blue, and negative correlations are in red. FDR was calculated via the Benjamini–Hochberg method to correct for multiple testing of all 3,916 correlations. Larger squares correspond to more significant FDRs. Significant correlations (FDR < 0.05) are indicated by asterisks. The bars at the right and bottom of the figure show the number of significant genetic correlations per disease and quantitative trait, respectively. Abbreviations are defined as in Tables 1 and 2.

In addition to the suggested genetic correlations in Europeans (type 2 diabetes and BMI; triglycerides, blood sugar, and hemoglobin A1c; coronary artery disease and BMI; and high-density-lipoprotein cholesterol and triglycerides)15, we empirically corroborated novel genetic correlations that have been implicated in Mendelian randomization analyses (e.g., type 2 diabetes and alanine aminotransferase31; atrial fibrillation and height32; asthma and eosinophil count9; and colorectal cancer and height33) and epidemiological studies (e.g., ischemic stroke and uric acid34; myocardial infarction and albumin/globulin ratio35; peripheral artery disease and total bilirubin36; and urolithiasis and blood pressure37) (Supplementary Table 8). Thus, we further investigated causal relationships between the significant pairs of quantitative traits and diseases by using a Mendelian randomization approach (Methods). We identified 24 significant causal associations (P < 9.43 × 10−4 (= 0.05/53)), 15 of which had not been previously suggested by genetic causal relationships (Supplementary Fig. 7 and Supplementary Table 10). To distinguish bias due to pleiotropy, we further applied MR-Egger regression38 as a sensitivity test, and confirmed the robustness of the identified causal relationships (P > 0.05 for intercept after Bonferroni correction).

To facilitate understanding of the complex inter-relations between clinical measurements and diseases, we constructed a network from the genetic correlation matrix (Fig. 3). In the network, the distance between correlated phenotypes is determined by weighting of the magnitudes of their correlations (Methods). Although we constructed our genetic correlation network without prior biological knowledge of cross-phenotype relationships, we observed distinctive clusters of biologically related phenotypes. The largest cluster was composed of cardiometabolic diseases and their biomarkers, interconnected with various clinical measurements, such as kidney-related, liver-related, and hematological traits. The network also depicted cross-disease interplay, including the positive correlation of autoimmune diseases (rheumatoid arthritis and Graves’ disease) and chronic inflammatory diseases (asthma and chronic obstructive pulmonary disease), as well as the negative correlation of glaucoma and Graves’ disease. These results suggest that the polygenic landscapes of traits reflect their biological backgrounds, and thus could be used to elucidate the unknown etiology of diseases.

Fig. 3: Genetic correlation network across the 59 quantitative traits and 30 diseases.
figure 3

Genetically correlated traits and diseases are clustered close together. Each circle represents a trait, and each edge represents a significant genetic correlation (FDR < 0.05). Pairwise genetic correlations (n = 3,916) were estimated via bivariate LD score regression, and their P values were corrected via the Benjamini–Hochberg method for multiple testing of all 3,916 correlations. Positive and negative genetic correlations are indicated by color according to the key. Thicker edges correspond to more significant FDRs. The dashed line separates unconnected plots. Definitions of trait abbreviations can be found in Tables 1 and 2.

Shared cell-type specificity among human complex traits

The identification of trait-relevant cell types is essential for fine-mapping of candidate causal variants, the identification of potent therapeutic targets, and, ultimately, full understanding of disease biology39,40,41. To assess the cell-type specificity of human traits and diseases on the basis of heritability enrichment, we applied stratified LD score regression39 to the GWAS results for the 59 quantitative traits and 30 diseases using 220 cell-type-specific annotations for histone modifications (H3K4me1, H3K4me3, H3K9ac, and H3K27ac) constructed from the Roadmap Epigenomics Project dataset39,42.

To create a broad picture of trait-relevant cell types, we first assessed heritability enrichment in ten major cell-type groups, defined as unions of 220 cell-type-specific annotations, representing their system- or organ-level structure39. We observed 72 significant heritability enrichments (FDR < 0.05) in the cell-type groups for 44 quantitative traits and diseases (Fig. 4a and Supplementary Table 11). The top significant enrichments in each quantitative trait category included connective or bone for height (P = 4.89 × 10−9), kidney for estimated glomerular filtration rate (P = 2.59 × 10−7), liver for GGT (P = 2.54 × 10−6), immune or hematopoietic for mean corpuscular volume (P = 6.46 × 10−6), and skeletal muscle for creatine kinase (P = 2.77 × 10−5), consistent with known biology (Fig. 4b and Supplementary Fig. 8). The same held true for the diseases—for example, significant enrichments in immune or hematopoietic for rheumatoid arthritis (P = 9.19 × 10−6) and Graves’ disease (P = 3.81 × 10−5).

Fig. 4: Heritability enrichment in the ten cell-type groups.
figure 4

a, Significance of heritability enrichment in the ten cell-type groups for the 59 quantitative traits and 30 diseases estimated via stratified LD score regression. b, Selected quantitative traits in five quantitative trait categories. FDR was calculated via the Benjamini–Hochberg method. Vertical dashed lines indicate FDR = 0.05. Definitions for trait and disease abbreviations can be found in Tables 1 and 2.

Although the cell-type-group-level analysis successfully identified a trait-relevant group for most of the quantitative traits and diseases, we hypothesized that more detailed assessment at the cell-type level would differentiate a trait-relevant cell type within the group. We thus assessed heritability enrichment in each of the 220 cell-type-specific annotations. We identified 384 significant heritability enrichments (FDR < 0.05) for 50 quantitative traits and diseases (Supplementary Table 12). To further explore the complex systems of trait-relevant cell types, we carried out hierarchical clustering based on the earned profile of heritability enrichment for the 59 quantitative traits and 30 diseases in the 220 cell-type annotations (Fig. 5a).

Fig. 5: Hierarchical clustering of heritability enrichments in the 220 cell-type-specific annotations.
figure 5

a, The significance of heritability enrichment among the 59 quantitative traits and 30 diseases, estimated via stratified LD score regression. Horizontal color bars correspond to the quantitative trait category for each column. Vertical color bars correspond to the cell-type groups for each row. b, A selected cluster of immune or hematopoietic cell types and various relevant traits and diseases. Another cluster highlighted in a is shown in Supplementary Fig. 9. FDR was calculated via the Benjamini–Hochberg method. Tmem, memory T cell; Treg, regulatory T cell; TH, helper T cell; TH17, IL-17-producing helper T cell; TH1, type 1 helper T cell; TH2, type 2 helper T cell; TH0, naive T cell; stim, stimulatory. Full trait and diseases abbreviations can be found in Tables 1 and 2.

We observed several distinct clusters that specifically comprised related traits and cell types. The most distinct cluster involved a great majority of immune or hematopoietic cell types enriched in hematological traits and in autoimmune, allergic, and infectious diseases, representing a wide range of immune-related diseases and traits (Fig. 5b). The most significant heritability enrichment was for mean corpuscular hemoglobin in mobilized CD34 (P = 2.01 × 10−9; H3K4me1). All CD34-related epigenetic annotations also showed significant enrichments for red blood cell, WBC, and Plt-related hematological traits. Because CD34 is recognized as a marker of hematopoietic progenitor cells43, our findings suggest that variants in the regulatory region of CD34+ primary cells affect hematopoietic cell differentiation and the number of hematopoietic cells.

Finally, to highlight shared cell types involved in human diseases and traits, we constructed a directed network matrix of cell-type-specific heritability annotations (Fig. 6; details are also presented in the Methods section). We identified several independent networks of cell-type specificity. The largest network was composed of three major clusters connected via the significant enrichment of adipose nuclei for (i) WBC count, (ii) lymphocyte count, and (iii) height. In addition to the contribution of CD34, we observed heritability enrichments in regulatory regions of CD14+ and CD15+ primary cells for WBC counts and WBC subtypes (i.e., monocytes and neutrophils), representing their specificity in myeloid lineages (CD14 for monocytes and macrophages44, and CD15 for granulocyte series cells45). Primary cells expressing CD19 and CD20, surface markers of B cells46, also showed enrichment for non-albumin protein and albumin/globulin ratio, potentially reflecting immunoglobulin-synthesis functions of B cells. Moreover, various CD4+ and CD8+ T cells showed enrichment for autoimmune diseases such as Graves’ disease and rheumatoid arthritis. We note that the enrichment of regulatory T cells (Treg cells) in Graves’ disease, a human autoimmune thyroiditis, is concordant with the biological finding that Treg-cell-depleted mice develop thyroiditis47. Other observed links between allergic diseases (atopic dermatitis and asthma) and helper T cells, or about the contribution of fetal or chondrogenic tissues to height, also supported biological and medical findings.

Fig. 6: Cell-type-specificity network across the 59 quantitative traits and 30 diseases in the 220 cell-type-specific annotations.
figure 6

Significant heritability enrichments connect cell types to phenotypes. Each circle represents a trait, and each square represents a cell type. Arrows denote significant heritability enrichments for the indicated traits. The significance of heritability enrichment was estimated via stratified LD score regression. FDR was calculated via the Benjamini–Hochberg method. Thicker edges/arrows correspond to more significant FDRs. The dashed line separates unconnected plots. We used highly significant enrichments (FDR < 0.01). Definitions for abbreviations are as in Fig. 5 and Tables 1 and 2.

These results demonstrate that ‘individual cell-type level’ analysis can successfully recapture the biology of human traits, without prior knowledge of ‘consolidated cell-type group-level’ analysis. The cell-type-specificity networks pinpoint potent causal cell types that cooperatively affect human phenotypes, providing promising resources for novel therapeutic targets. Nevertheless, integration of cell-type specificity in addition to polygenic genetic correlations clearly expanded the current knowledge of cross-phenotype relationships and underlying genetic mechanisms of diseases.

Discussion

We have presented one of the largest non-European GWASs of quantitative traits so far, identifying 1,407 trait-associated loci for 53 traits in 162,255 Japanese individuals. By incorporating additional GWAS results for 32 complex diseases and traits in Japanese individuals, we further identified numerous pleiotropic loci, wide-ranging genetic correlations, and distinct cell-type specificity among the quantitative traits and diseases that confirmed or expanded our current understanding of biology.

Our findings suggest that there are complex inter-relations between clinical measurements and diseases, demonstrating the value of GWASs for a variety of traits in a single large-scale cohort with detailed clinical information. We report novel genetic correlations, some of which are consistent with the results of epidemiological studies. These findings substantially expand the knowledge of genetic relationships across clinical measurements and diseases. We also highlight shared cell-type specificity by linking cell types to diseases. These results shed light on the underlying genetic mechanisms, revealing shared etiology and pathogenesis of complex diseases by using clinical measurements as an intermediate phenotype.

Although our work provided various insights into the genetics corresponding to clinical measurements in Japanese subjects, we should address several limitations of this study. First, we did not have a replication cohort for validation of the identified loci, but the majority of the trait-associated loci were previously reported (n = 728; 51.7%). This issue partly reflects a dilemma in the present study, namely, that extensive phenotypes were covered simultaneously, which makes replication more challenging. Second, our subjects for each trait mostly overlapped. Although bivariate LD score regression has elegantly modeled overlapping samples and their phenotypic correlation15, such sample overlap might exert an upward bias in interpretation of the genetic overlaps. Third, although we adopted a linear regression model for unrelated subjects, the application of a linear mixed model for both related and unrelated subjects could potentially have increased the statistical power of the study48. Fourth, the causal inference of clinical measurements for complex diseases in the present study could be limited because of the handling of the single cohort. Further application of Mendelian randomization17 in independent validation cohorts is warranted. Finally, our cell-type analysis was inevitably limited by the availability of the cell-type-specific annotations regarding the variety of cell types and epigenetic markers. More acquisition and integration of cell-type-specific resources would further facilitate the exploration of causal cell types in human diseases.

In conclusion, we conducted a large-scale GWAS of 58 quantitative traits in Japanese individuals and demonstrated complex inter-relations with human diseases via pleiotropy, genetic correlation, and cell-type-specificity analyses. We further visualized the results as networks, depicting the genetic links among clinical measurements, human diseases, and relevant cell types. Our findings will contribute to future studies and serve as a fundamental resource for understanding the genetics and biology underlying clinical measurements and human diseases.

URLs

BBJ, https://biobankjp.org/english/index.html; JENGER, http://jenger.riken.jp/en/; 1000 Genomes Project, http://www.1000genomes.org/; GWAS catalog, https://www.ebi.ac.uk/gwas/; PLINK 1.9, https://www.cog-genomics.org/plink2; ldsc, https://github.com/bulik/ldsc/; LD score, http://data.broadinstitute.org/alkesgroup/LDSCORE/; MACH, http://csg.sph.umich.edu//abecasis/MaCH/; Minimac, https://genome.sph.umich.edu/wiki/Minimac; ANNOVAR, http://annovar.openbioinformatics.org/en/latest/; R, https://www.r-project.org/; Locuszoom, http://locuszoom.sph.umich.edu/locuszoom/; Circos, http://circos.ca/; NBDC Human Database, https://humandbs.biosciencedbc.jp/en/.

Methods

Subjects

All the subjects enrolled in this study were collected under the BioBank Japan Project (BBJ). The BBJ is a multi-institutional hospital-based registry that collected DNA, serum, and clinical information of approximately 200,000 patients from 66 hospitals affiliated with 12 medical institutes between fiscal years 2003 and 2007. All study participants had been diagnosed with one or more of 47 target diseases by physicians at the cooperating hospitals as described in the previous reports18,19. Written informed consent was obtained from all participants, as approved by the ethics committees of RIKEN Center for Integrative Medical Sciences and the Institute of Medical Sciences, the University of Tokyo. Detailed characteristics of the subjects for each trait are shown in Supplementary Table 1.

Phenotype

BBJ collected baseline clinical information through interviews and reviews of medical records using a standardized questionnaire. Among the quantitative traits included in this study, age, height, and weight were retrieved from the self-reported questionnaire for all participants. Laboratory measurements were retrieved from medical records of routine laboratory examination for all participants. Because dyslipidemia and diabetes were the most common diseases registered in the BBJ, around half of the study participants (41.8%) had these two diseases. Echocardiographic traits were retrieved from medical records only for the subjects with cardiovascular diseases, dyslipidemia, and diabetes. The measured values of each quantitative trait (or common log-transformed values if required, to achieve normality) were adjusted for age, sex, top ten principal components of genetic ancestry, disease status (affected versus non-affected) for the 47 target diseases in the BBJ, and any necessary trait-specific covariates in a linear regression model. We then normalized the resulting residuals by applying an appropriate trait-specific transformation (Z-score or rank-based inverse normal transformation) as detailed in Supplementary Table 2.

Genotyping and imputation

We genotyped samples with the Illumina HumanOmniExpressExome BeadChip or a combination of the Illumina HumanOmniExpress and HumanExome BeadChips. We excluded samples with (i) sample call rate < 0.98, (ii) closely related individuals identified by identity-by-descent analysis, and (iii) non–East Asian outliers identified by principal component analysis of the studied samples and the three major reference populations (Africans, Europeans, and East Asians) in the International HapMap Project49. We then applied standard quality-control criteria for variants, excluding those with (i) SNP call rate < 0.99, (ii) minor allele frequency < 1%, and (iii) Hardy–Weinberg equilibrium P value ≤ 1.0 × 10−6. We prephased the genotypes with MACH50 and imputed dosages with minimac and the 1000 Genomes Project Phase 1 (version 3) East Asian reference haplotypes20. For the X chromosome, we performed prephasing and imputation separately for females and males. Imputed SNPs with an imputation quality Rsq < 0.7 were excluded from the subsequent association analysis.

Genome-wide association analysis

For each quantitative trait, we conducted a GWAS using a linear regression model under the assumption of additive allelic effects of the SNP dosages via mach2qtl50. We set a genome-wide significance threshold at the level of P = 5.0 × 10−8 (ref. 51) and a study-wide significance threshold at the level of P = 8.6 × 10−10 (= 5.0 × 10−8/58) by applying Bonferroni correction based on the number of studied traits. We defined independent associated loci on the basis of genomic positions at least 1 Mb apart from each other. We call such independent associated loci for each trait ‘trait-associated loci’, and these could overlap other trait-associated loci (i.e., multiple trait-associated loci could be mapped to one unique locus). We considered a trait-associated locus as novel when it was (i) located at a distance of >500 kb from the nearest locus and (ii) not in LD (r2 < 0.1) in both East Asians and Europeans with the previously reported loci of the same quantitative trait. For the X chromosome, we conducted GWASs separately for females and males, and meta-analyzed association results. We performed stepwise conditional analysis to identify additional independent signals around associated loci (each region ± 500 kb) by adjusting the most significant variant of the region in each step until none met the genome-wide significance threshold. For extremely significant variants showing P < 1.0 × 10−300, we calculated P values in R (ver. 3.3.1) with the Rmpfr package. We calculated the genomic inflation factor λGC in R. The variance explained by the significantly associated SNPs was estimated with the formula 2 f (1−f) β2, where f represents the allele frequency and β represents the additive effect. We then summed the resulting values to calculate the total variance explained by the significant SNPs for each of the 53 quantitative traits that showed at least one genome-wide significant locus. We carried out LD score regression21 with ldsc (v. 1.0.0; commit 23a94fc) to estimate confounding bias and heritability explained by the genome-wide high-quality common SNPs present in the HapMap 3 reference panel. We generated regional plots with LocusZoom52 (v. 1.3) and R.

Pleiotropy analysis

We assessed pleiotropy at a unique locus using the following criteria: top-associated variants of different quantitative traits were (i) in LD (r2 ≥ 0.5) or (ii) closely located (physical distance within 25 kb). We calculated r2 of two variants using PLINK 1.9053 and the 1000 Genomes Project Phase 3 (version 5) East Asian dataset20. We used Circos54 to visualize the results.

Additional GWAS results for anthropometric traits and diseases in Japanese subjects

We additionally obtained two quantitative trait GWAS results for anthropometric traits (BMI3 and height), and 30 case–control GWAS results for complex diseases in the Japanese population from both published25,26,27,28,29,30 and unpublished studies in the BioBank Japan Project (Table 2). For the two anthropometric traits, results for most of the subjects overlapped with those from the present study (n = 152,667 (94.1%) and 153,456 (94.6%) for BMI and height, respectively). For the 30 complex diseases, the 26 disease cases were recruited through BBJ, whereas subjects with rheumatoid arthritis, bipolar disorder, schizophrenia, and adolescent idiopathic scoliosis were recruited by collaborators as described elsewhere26,28,29. The controls were constructed from three population-based cohorts (the Tohoku Medical Megabank organization, the Japan Public Health Center–based Prospective study, and the Japan Multi-Institutional Collaborative Cohort Study) or a mixture of the cases in BBJ as detailed in Supplementary Note 1. We incorporated these additional GWAS results into the original GWAS results for the 58 quantitative traits in the subsequent analyses.

Genetic correlation

We conducted bivariate LD score regression15 to quantify genetic correlations across the 59 traits and 30 complex diseases in the Japanese population. To maintain sufficient statistical power15, we excluded one GWAS result (E/A ratio of echocardiographic trait) for which the sample size was far less than 10,000. For the regression, we used the East Asian LD score and summary statistics of high-quality common SNPs present in the HapMap 3 reference panel for each available trait or disease. We excluded SNPs found in the major histocompatibility complex (MHC) region (chromosome 6: 25–34 Mb) from the analysis because of its complex LD structure27,39,55,56. We defined significant genetic correlations as those with FDR < 0.05, calculated via the Benjamini–Hochberg method to correct multiple testing of all 3,916 pairwise correlations among the 59 quantitative traits and 30 diseases.

For network visualization, we constructed a network from the genetic correlation matrix of the 59 traits and 30 diseases. Specifically, each phenotype was represented as a node, and the nodes were connected by edges if they were genetically correlated. We assigned a weight to each edge based on the magnitude of the corresponding genetic correlation. To highlight biological patterns in the network and to prevent it from becoming too dense, we used only significant genetic correlations (FDR < 0.05). Node layout was determined by the Fruchterman–Reingold algorithm given edge weights, with strongly correlated phenotypes placed closer together. We used R (ver. 3.3.1) with the igraph package for this network analysis.

Mendelian randomization

Given the 68 significant genetic correlations between clinical measurements and complex diseases, we carried out a Mendelian randomization analysis for each pair of them to evaluate potential causal effects of clinical measurements on complex diseases. Because most of the samples overlapped in the present study and the disease GWAS, we excluded overlapping samples from disease cases with clinical measurements available for each pair, to avoid potential bias. We selected 53 pairs on the basis of the following criteria: (i) raw genotypes of disease cases were available (i.e., the cases were recruited through BBJ (Table 2 and Supplementary Note 1)), (ii) more than three loci were identified in clinical measurement GWASs, and (iii) unique samples remained after the removal of overlapping samples. We note that this sample exclusion might have led to decreased statistical power compared with that of the original disease GWAS. For each pair, we calculated a weighted genetic risk score by summing the product of risk allele dosage and the effect sizes of the identified alleles influencing each clinical measurement. Associations between the genetic risk score and disease were quantified via a logistic regression model. To further test pleiotropy, we applied MR-Egger regression38 as sensitivity analysis. We used R (ver. 3.3.1) with the MendelianRandomization package57.

Partitioning heritability

We carried out stratified LD score regression39 to partition heritability into multiple functional categories. We used the 220 cell-type-specific and the 10 cell-type-group-specific annotations constructed based on the Roadmap Epigenomics Project42 available at the authors’ website (see “URLs”). Because only European references are provided for partitioning heritability analysis, we generated the East Asian LD Score reference for each annotation using the 1000 Genomes Project Phase 3 (version 5) East Asian reference haplotypes20 according to standard procedures. For each annotation, we calculated the P value of the regression coefficient τc of the annotation. We defined significant heritability enrichments as those with FDR < 0.05, calculated via the Benjamini–Hochberg method.

We performed hierarchical clustering on the matrix of enrichment significance for the 59 quantitative traits and 30 diseases in the 220 cell-type-specific annotations, using Spearman’s correlation distance and the group average method. We also constructed a network from the matrix to represent the heritability enrichment of cell types to phenotypes. We assigned each phenotype and cell type to a node, and linked a pair of them with an arrow if a cell type was enriched for a phenotype. We assigned a weight to each arrow on the basis of the corresponding enrichment significance. For the sake of clarity, we used only highly significant enrichments (FDR < 0.01). Node layout was determined with the Fruchterman–Reingold algorithm given edge weights, with significantly enriched pairs of phenotypes and cell types placed closer together.

Life Sciences Reporting Summary

Further information on experimental design is available in the Life Sciences Reporting Summary.

Data availability

GWAS summary statistics of the 58 quantitative traits are publically available at our website (JENGER; see “URLs”) and the National Bioscience Database Center (NBDC) Human Database (Research ID: hum0014) as open data without any access restrictions. GWAS genotype data from the subjects was deposited at the NBDC Human Database (Research ID: hum0014).