Introduction

Circulating lipid concentrations are among the strongest modifiable risk factors for coronary artery disease (CAD). The wide inter-individual variation observed in lipid concentrations is influenced by both environmental and genetic factors. Large genome-wide association studies (GWAS) have identified >150 loci associated with concentrations of one or more lipid traits.1, 2 Nevertheless, only 10–15% of overall variation in lipid concentrations can be attributed to genetic variants identified, despite an estimated heritability of 40–50%.3, 4

Understanding the genetic architecture for lipid traits is critical for developing risk prediction approaches to CAD. Most genetic studies have focused on Caucasian populations and limited efforts have been made to generalize Caucasian findings to populations of African ancestry such as African–Americans (AAs). Earlier studies suggested that less than half of lipid-associated genetic findings replicated in African populations.5 The Global Lipids Genetics Consortium (GLGC) attempted to replicate their Caucasian meta-analyses findings in AAs;2 only one locus for high-density lipoprotein cholesterol (HDL-C), CETP, three loci for low-density lipoprotein cholesterol (LDL-C), SORT1, LDLR and APOE, and none for triglycerides (TG) replicated. There has been little additional work to further replicate and fine-map LDL-C, HDL-C, TG and total cholesterol (TC) loci in AA populations.

Rare (<1%) and low-frequency (1–5%) variants are inadequately represented on common GWAS genotyping platforms. Given that rare variants play a dominant role in the etiology of Mendelian disorders, they may also convey larger risk than common variants for common phenotypes. The increased genetic diversity in the AA population provides an opportunity to study low-frequency variants related to lipid metabolism and the potential to identify novel drug targets to lower CAD risk. One of such recent example is variation in the proprotein convertase subtilisin/kexin type 9 (PCSK9) gene. By fine-mapping the PCSK9 gene locus in an AA cohort, Cohen et al.6 identified loss-of-function variants associated with markedly decreased LDL-C levels and with decreased CAD risk.6, 7, 8 This finding triggered the development of a new class of lipid-lowering drugs,6, 7, 8 the PCSK9 inhibitors, which have recently been approved by the FDA.9, 10, 11 In Caucasians, efforts have been made to identify and understand low-frequency variants for LDL-C;12 however, less is known in AAs, particularly for other lipid traits such as HDL-C,TC and TG although they are associated with CAD risk.13

Non-HDL cholesterol (non-HDL-C) is a measure of cholesterol containing all atherogenic particles. Observations from several large longitudinal cohorts show that non-HDL-C is a better predictor of CAD than LDL-C.14, 15, 16, 17 In the third Adult Treatment Panel (ATPIII) guidelines of the US National Cholesterol Education Program, non-HDL-C was introduced as a secondary target of therapy in patients with TG >200 mg dl−1. By sequencing Caucasians, LDLR and ASGR1 variants were associated with both non-HDL-C and CAD risk.18, 19 However, no genome-wide scale analysis has been done to understand genetic predictors for non-HDL-C in AAs. Another lipid trait, remnant cholesterol, is a measure of cholesterol content of triglyceride-rich lipoproteins and includes very low-density lipoprotein, intermediate-density lipoprotein and chylomicron remnants.20, 21 Elevated remnant cholesterol levels have been associated with increased risk of CAD independent of HDL-C.22 However, little is known about the genetic predictors of remnant cholesterol level in AAs.

In this study of AAs, we set out to trans-ethnically replicate genetic variants reported by the GLGC to be associated with lipid traits in Caucasians,1, 2 followed by fine-mapping of those loci using all available variants on the MetaboChip, a platform designed for metabolic traits23, 24 and enriched for rare and low-frequency variants.

Materials and methods

Study population

The study was approved by the Institutional Review Board of Vanderbilt University. The study cohort consisted of third party-identified AAs older than 18 years of age who had lipid measurements and genotyping information available in the Vanderbilt DNA biobank (BioVU). BioVU accrues DNA samples from blood drawn for routine clinical testing after these samples are scheduled to be discarded. BioVU and sample handling have been previously described.25 Samples and existing genotypes in BioVU are linked to a de-identified version of each individual’s electronic health record (EHR).

Phenotyping: extraction of lipid measurements from electronic health records

Lipid measurements were extracted from each individual’s EHR, and median LDL-C, HDL-C, TC and TG calculated. Lipid measurements after statin exposure were excluded from the analyses. We manually reviewed charts for individuals having extreme lipid levels to exclude the data entry errors. We further calculated non-HDL-C for each individual by subtracting HDL-C from TC and remnant cholesterol by subtracting both HDL-C and LDL-C from TC.

Genotyping

Genotyping was performed using the Illumina MetaboChip23, 24 a custom BeadChip targeting 196 725 genetic variants. The genotyping data were curated for quality control using PLINK.26 We removed SNPs with a call rate <95% and SNPs that deviated significantly from Hardy–Weinberg equilibrium (HWE) (P1.0 × 10−6). We removed samples: (1) with per-individual call rate <95%; (2) with mismatch between genetic and EHR sex; (3) with a cryptic relationship closer than a third-degree relative. Twenty-nine individuals were removed from the analysis.

Candidate SNPs

The GLGC previously identified genetic variants associated with four lipid traits.1, 2 We extracted genotype information for these variants, including 63 variants for HDL-C, 46 variants for LDL-C, 56 variants for TC and 34 variants for TG. We also extracted all available genetic variants (from transcription start to transcription end) for GLGC-identified genes, including 3939 variants for HDL-C, 1995 variants for LDL-C, 2430 variants for TC and 3888 variants for TG.

Statistical analyses

The data were adjusted for population stratification using principal component analyses implemented in EIGENSOFT4.2.27, 28 BioVU contains observer-reported ancestry information obtained at the time of clinical visit. This information has been confirmed to achieve high accuracy.29

Genetic association analysis was performed using PLINK v1.07.26 Median lipid levels were natural log transformed. We tested the association between genetic variants and lipid traits (HDL-C, LDL-C, TC, TG, non-HDL-C and remnant cholesterol). An additive inheritance model was assumed and tested using a linear regression model. Analyses were adjusted for age (at the time of median lipid measurement), sex, and six principal components (PCs) for ancestry. We conducted three-tiers of analyses: (1) testing lipid-associated candidate SNPs reported by the GLGC; (2) testing SNPs within the lipid-associated gene identified by the GLGC—a gene locus was defined as the range of the gene transcript; and (3) testing all available SNPs on the MetaboChip. The associations were further conditioned on the lead SNPs within the regions. Specifically, we conditioned on rs28362286 for PCSK9 region, rs7412 for APOE region, rs34065661 for CETP region and rs4389957 for LPL region. All analyses were adjusted for multiple testing accordingly. The regional association plots were generated using LocusZoom30 (http://locuszoom.sph.umich.edu/locuszoom/). With adjustment for multiple testing, we defined levels of significance as follows: (1) for candidate SNPs: 8.9 × 10−4 for TC, 7.9 × 10−4 for HDL-C, 1.1 × 10−3 for LDL-C and 1.5 × 10−3 for TG; (2) for SNPs in candidate gene regions: 2.1 × 10−5 for TC, 1.3 × 10−5 for HDL-C, 2.5 × 10−5 for LDL-C and 1.3 × 10−5 for TG; (3) for the entire MetaboChip analyses: 2.8 × 10−7 for all lipid levels.

Heritability analyses

We used Genome-wide Complex Trait Analysis (GCTA, version 1.24.7)31 to estimate the polygenic variance attribute to all genotyped MetaboChip SNPs. We excluded SNPs: (1) with MAF less than 0.01, (2) with HWE <0.000001, (3) with SNP call rate <0.02. We also removed individuals with call rate <0.95. We utilized the identified set of SNPs (n=11 4 451) to calculate a genetic relatedness matrix. Heritability estimates were adjusted for age, sex and 20 PCs.

Results

Lipid levels in AAs genotyped on the MetaboChip were available for TC (n=2778), LDL-C (n=2438), HDL-C (n=2550), TG (n=2690), non-HDL-C (n=2468), and remnant cholesterol (n=2262). Demographic characteristics are summarized in Table 1.

Table 1 Cohort characteristics

Replication of GLGC reported genetic variants

First, we set out to trans-ethnically replicate the GLGC variants1, 2 in the BioVU AA cohort. After correcting for multiple testing, we replicated one of 56 SNPs for TC (rs6511720 in LDLR, P=2.15 × 10−8), one of 63 SNPs for HDL-C (rs3764261 in CETP, P=1.13 × 10−5), two of 46 SNPs for LDL-C (rs629301 in CELSR2/SORT1, P=1.11 × 10−5 and rs6511720 in LDLR, P=2.47 × 10−5) and one of 34 SNPs for TG (rs645040 in MSL2L1, P=4.29 × 10−4) (Table 2, Supplementary Table 1). These variants were all associated with predicted lipid changes in the same direction as observed in Caucasians.1 Variants in SORT1, CETP and LDLR had similar effect size in AAs as in Caucasians; however, rs645040 in MSL2L1 was associated with larger changes in TG in AAs than in Caucasians—the minor allele was associated with −8.56 mg dl−1 TG change in AAs, but only −2.2 mg dl−1 change in Caucasians1 (Table 2).

Table 2 Assocaition between Global Lipid Genetics Consortium SNPs and lipid traitsa

Fine-mapping of GLGC loci significantly associated with lipid traits

Second, we tested all genetic variants present on the MetaboChip within loci identified by the GLGC as associated with lipids. Locus fine-mapping yielded additional associations (Table 3), which were not previously reported as leading SNPs from GLGC analyses in Caucasians.

  1. 1)

    Associations with TC. One additional PCSK9 variant, rs28362286, associated with TC (Table 3, P=1.50 × 10−10). A well-characterized APOE variant, rs7412, was also associated with TC with genome-wide significance (P=3.09 × 10−22).

  2. 2)

    Associations with HDL-C. Six additional variants were associated with HDL-C in AAs—one in LPL and five in CETP. Two SNPs in CETP achieved genome-wide significance: rs7499892 (P=1.51 × 10−10) was previously reported in several Caucasian GWAS studies,32, 33 and rs34065661 (P=1.53 × 10−13), a missense variant, was the variant most significantly associated HDL-C in our analyses (Table 3). We queried the lead CETP SNP (rs34065661) and the linked SNPs in Genotype-Tissue Expression (GTEx) database. Rs711752, which associated with rs34065661 (D′=1.0, R2=0.192, Supplementary Table 2), was found to be an eQTL variant and carriers of the variant had significantly lower CETP expression (P=1.6 × 10−5) compared to non-carriers (Figure 1a).

    Figure 1
    figure 1

    Association plots for gene expression and genetic variants. The plots show the associations between gene expression and genetic variants, including rs711752 in CETP gene (a) and rs7412 in APOE gene (b). Reference in the tables.32, 33, 37, 38, 39, 40 A full color version of this figure is available at the Journal of Human Genetics journal online.

  3. 3)

    Associations with LDL-C. Three variants were associated with LDL-C. Rs28362261 (P=2.08 × 10−5) and rs28362286 (P=1.99 × 10−13) in the PCSK9 locus, and rs7412 (P=2.48 × 10−44) in the APOE locus, which accounted for −22.08 mg dl−1 LDL-C change, comparable to −22.52 mg dl−1 from NHANES III.34 Using the GTEx database, rs7412 is predicted to significantly alter APOE expression (P=7.3 × 10−6) (Figure 1b).

  4. 4)

    No additional associations were observed for TG.

Table 3 Additional variants associated with lipid traits by fine-mapping candidate loci

Associations between variants in the entire MetaboChip and GLGC lipid traits

Third, by analyzing the entire MetaboChip, we identified additional variants significantly associated with lipids (Supplementary Table 3). For each identified gene, we defined a gene region as the gene transcript±50 kb, and generated regional association plots (Figure 2). We further identified 8 SNPs in the CETP region associated with HDL-C and seven SNPs in the APOE region associated with LDL-C (Supplementary Table 3). The eight CETP variants are in strong linkage disequilibrium (Supplementary Figure 1). Most associations were attributed to their linkage with the lead SNPs in the regions (Supplementary Table 3). After conditioning on the lead SNPs, two variants remained significantly associated with HDL-C (rs4783961 in CETP region and rs4389957 in LPL/SLC18A2 region, Table 4), and one variant significantly associated with LDL-C (rs611917 in CELSR2/SORT1 region) (Table 4).

Figure 2
figure 2

Regional association plots of the genome-wide significant associations with lipids traits in AA cohort. The plots show the genome-wide significant associated loci in BioVU AA cohort (generated using LocusZoom (http://locuszoom.sph.umich.edu/locuszoom/), including the APOE locus in association with TC (a), LDL-C (c) and non-HDL-C (d), the CETP locus in association with HDL-C (b). The RefSeq genes in the region are shown in lower panel. The red line represents genome-wide significant cutoff for MetaboChip (2.7 × 10−7). P-value were generated using linear regression analysis. A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 4 Additional variants associated with lipid traits by scanning entire MetaboChip

No additional associated variant was identified for TC and TG.

Genetic predictors for non-HDL-C

We tested the association between all MetaboChip variants and non-HDL-C levels, and identified two loci that contained 5 SNPs significantly associated with non-HDL-C levels (Table 5). All five SNPs were also associated with either TC or LDL in previous analyses (Table 5).

Table 5 Genetic variants associated with non-HDL cholesterola

Genetic predictors for remnant cholesterol

We further tested the association with remnant cholesterol, and observed no association in AAs (data not shown).

Heritability of lipid traits explained by MetaboChip

By estimating the percentage of heritability explained by available genotypes in MetaboChip, we found that additive genetic components explained 22.52±8.86% of TC (P=0.0048), 19.22±9.59% of HDL-C (P=0.023) and 28.51±9.74% of LDL-C (P=0.0013), 34.35±10.9% of non-HDL-C (P=0.00063), but only 8.25±8.51% of TG (P=0.159).

Discussion

In this study, we sought to replicate variants associated with lipid traits identified by the GLGC and to fine-map significantly associated loci in ~2800 AAs. There are two major findings of the study: (1) relatively few lipid-associated variants identified by the GLGC replicated in AAs; and (2) we identified additional variants associated with TC (APOE), HDL-C (LPL and CETP) and LDL-C (APOE), and two loci significantly associated with non-HDL-C (APOE/APOC1/TOMM40 and PCSK9).

The GLGC previously identified 157 loci associated with one or more lipid traits from a cohort of predominantly Caucasians; only a few variants replicated in AAs. Fine-mapping and analyses using the entire MetaboChip identified a few additional genetic associations with GLGC lipid traits in AAs and also 5 SNPs associated with non-HDL-C.

In addition to the complexity of the genetic architecture in individuals of African ancestry, several reasons could explain why so few GLGC variants replicated in AAs. It is likely that because the 157 loci account for a small portion of overall inter-individual variability of lipids in Caucasians, the contribution of those variants could be even harder to detect if their frequency is lower in AAs than in Caucasians. Nevertheless, we were able to confirm the role of APOE, CETP, PCSK9 and LPL in regulating lipid levels in AA population:

Rs7412 in APOE is one of the most promising predictors of LDL-C and TC levels in AAs. APOE is a ligand for LDL receptor, and therefore is involved in the removal of LDL from the circulation. The rs7412 SNP changes the amino acid at position 158 in APOE from Arg to Cys and in the GTEx database, rs7412 carriers have lower APOE expression compared to non-carriers. Previous reports from NHANES III suggested that rs7412 was associated with 22.52 mg dl−1 lower LDL-C concentrations and 20.68 mg dl−1 lower TC concentrations per minor allele.34 In AAs, rs7412 was associated with 22.08 mg dl−1 lower LDL-C, similar to findings in other populations.

Human cholesteryl ester transfer (CETP) protein plays a crucial role in lipid metabolism by mediating the transfer of cholesteryl esters from HDL to apolipoprotein (apo) B rich lipoproteins in exchange for TG. High CETP activity contributes to an unfavorable plasma lipoprotein profile by lowering HDL-C and increasing LDL-C. In the current analyses, variants in the CETP region were identified with genome-wide significance. Further efforts to sequence CETP in AAs may provide additional insight into the gene function and its genetic structure.

PCSK9 is an enzyme that regulates LDL-C levels by promoting the degradation of LDL receptors that are responsible for the removal of LDL particles from the circulation into the liver. Rs28362286 is a well-characterized African-specific variant in PCSK9 and has been associated with altered PCSK9 function, lower LDL-C concentrations, and reduced CAD.7 The variant results in a truncated PCSK9 protein with reduced activity that is therefore less efficient in degrading LDL receptors; consequently, more LDL is removed from the circulation. In the current cohort, the variant was significantly associated with TC, LDL-C and non-HDL-C concentrations. It is possible that additional ancestry-specific PCSK9 variants contribute further to inter-individual variation in lipid concentrations but high-density genotyping or sequencing will be needed to fully elucidate the function of ancestry-specific variants.

In addition to LDL-C, HDL-C, TC and TG, we further sought to identify the genetic determinants of non-HDL-C. A previous study identified a splice region variant in LDLR associated with lower non-HDL-C in a Caucasian population.18 We failed to identify any LDLR or other novel variant that predicted non-HDL-C levels in AAs. We did observe that variants in PCSK9 and APOE associated with other lipid traits were also significantly associated with non-HDL-C. This is not unexpected since non-HDL-C contains LDL-C, TC and remnant cholesterol. The extent to which genetic associations with non-HDL-C are driven by association with LDL-C or TC, or by an unidentified relationship with remnant cholesterol, is unclear.

Furthermore, no genetic predictor was identified for remnant cholesterol. A large-scale Mendelian randomization study suggested that remnant cholesterol is a causal risk factor for ischemic heart disease independent of HDL levels.22 However, no genetic factor has been reported to affect remnant cholesterol levels exclusively and in clinical practice, remnant cholesterol varies more than other lipid traits over years;22 therefore, identification of its genetic predictors will be challenging.

Although we replicated some genetic associations for TC, HDL-C and LDL-C, no significant association was observed for TG, even after scanning the entire MetaboChip. Compared to other lipid traits, TG is affected more by environmental factors such as diet and medications.35 The GLGC previously estimated that known genetic predictors explained ~15% of overall variance for TC, LDL-C and HDL-C, and ~11% for TG.1, 2 Compared to the other lipid traits, less TG heritability was explained by genotypes on the MetaboChip. However, given that the MetaboChip was designed to capture most loci associated with metabolic traits in Caucasians, it may not be the best tool to estimate overall heritability of lipid traits in AAs.

We acknowledge several limitations: (1) using the MetaboChip, we only fine-mapped regions known to be GWAS-significant regions for metabolic traits. Because most of the early GWAS that informed the construction of the MetaboChip were conducted in Caucasians, African-exclusive loci will be underrepresented. Pilot studies from the African Genome Variation Project have identified a substantial proportion of unshared (11–23%) and novel (16–24%) variants in populations of African ancestry.36 (2) Power to detect associations with rare variants was limited. With a cohort of ~2800 AA individuals, there was adequate power to detect associations for variants with >1% frequency and an effect size larger than 5 mg dl−1. Nevertheless, power was limited for rare or private variants. Sequencing the whole exome or whole genome in an adequate number of individuals of African ancestry would assist in understanding the genetic architecture of lipid metabolism in African populations.

In conclusion, we identified both known and novel genetic variants, which significantly associated lipids traits in AAs. The observations will contribute to understand genetic architecture of lipid, especially in minorities.