Research paperKernel machine SNP set analysis provides new insight into the association between obesity and polymorphisms located on the chromosomal 16q.12.2 region: Tehran Lipid and Glucose Study
Introduction
Obesity is a serious health problem with an upward trend in across many populations (Uzogara, 2017). This complex disease can lead to low quality of life as well as early mortality through increasing the risk of cancers (Cao and Giovannucci, 2016; Jochem and Leitzmann, 2016; Michaud, 2016), cardiovascular diseases, Alzheimer (Emmerzaal et al., 2015) and other non-communicable disease (Uzogara, 2017; Jochem and Leitzmann, 2016; Emmerzaal et al., 2015; Claussnitzer et al., 2015; Hill et al., 2012). Furthermore, there are some consequences of obesity including depression (Sherman, 2016), low self-esteem, reduction of fertility rate in women (Uzogara, 2017; van der Steeg et al., 2008), fatty liver, bleeding problems and so on (Uzogara, 2017). Regarding BMI >30 as a non-invasive proxy for adiposity, worldwide obesity of adults increased about 8% from 1980 to 2013 (Ng et al., 2014). At the moment, the prevalence of obesity among children and adolescents has increased in developed (1.8%) and developing (4.9%) countries (Ng et al., 2014). Given the high heritability of BMI, many genetic association studies and meta-analyses of GWAS have been conducted to assess the association between BMI and genetic variants (Clifton et al., 2017; Locke et al., 2015). New meta-analyses of 82 genome wide association studies reported 97 genetic variants associated with BMI that contain 56 new loci (Clifton et al., 2017; Locke et al., 2015). The previous studies showed that 97 genetic variants account to 2.7% of BMI variation and most of the genetic variability in BMI remains unexplained (Clifton et al., 2017; Locke et al., 2015).
Genome wide association studies (GWAS) are known as powerful tools for discovering genetic markers that are associated with increased risk of chronic disease (Scherer and Christensen, 2016; Bush and Moore, 2012). In this kind of association analysis as well as candidate locus association analysis, a common strategy for genome wide is one SNP at a time regression analysis (Wu et al., 2010; Schifano et al., 2012). Although such individual SNP analysis has been found useful in association analysis for common variants, it has been shown that single SNP analysis does not have enough power for rare variants analysis (Ionita-Laza et al., 2013). In other words, SNPs genotyped according to a given GWAS platform may not include the true causal SNP, and SNPs that are only in linkage disequilibrium (LD) with the causal SNP will show moderate effects (Wu et al., 2010). Therefore, to increase the power of association test, it would be useful to make SNP sets such that each set consists of SNPs with high correlation (measured by r2 or D′) between each other and probably with causal SNP and conduct association analysis for each set, separately (Wu et al., 2010; Kwee et al., 2008). This method helps not only in increasing the power of the association test but also in accommodating epistasis effects (Wu et al., 2010; Schifano et al., 2012). SNP set analysis could be analyzed through a kernel machine regression (KMR) method (Wu et al., 2010; Schifano et al., 2012). The method has an advantage of solving high dimensional space problems and testing the combined effect of many SNPs as well as their interactions on the phenotype (Wu et al., 2010; Schifano et al., 2012). One popular kernel function is identity by state (IBS) kernel that accounts for the fraction of alleles that every two individuals share purely by state. Moreover, this kernel is powerful in detecting epistasis effects between SNPs in a set (Wu et al., 2010; Schifano et al., 2012; Kwee et al., 2008; Adeyemo et al., 2010; Wessel and Schork, 2006). Therefore, the IBS kernel machine regression imposes a pair-wise similarity matrix to regression model to account for unknown cluster nature among individuals (Kwee et al., 2008; Wessel and Schork, 2006). In addition, such the analysis would be useful in adjusting the effect of confounding variables that lead to the disappearance of the independence in a population-based study (Kwee et al., 2008; Wessel and Schork, 2006). Another advantage of kernel machine is that different weights can be considered for kernel function to account for different types of intuition for rare and common variants that lead to improving the power of association test (Wu et al., 2010; Wu et al., 2011). For example, weighting IBS kernel by allele frequency information makes it possible to consider more similarity between individuals who share rare alleles than individuals who share common alleles (Wessel and Schork, 2006; Wu et al., 2011; Madsen and Browning, 2009). We use such methods to identify subgroups of population according to the observed genotype data (Wessel and Schork, 2006).
Regarding our literature review, SNPs located at 16q12.2 region show association signals with obesity (Claussnitzer et al., 2015; Clifton et al., 2017; Adeyemo et al., 2010; Belo et al., 2013; Van Hul and Lijnen, 2008; Sattari et al., 2017; Wing et al., 2009). This region contains FTO (i.e., fat mass and obesity associated) gene (Brunkwall et al., 2013). Various GWA and Meta-analysis on GWA have proved that FTO is associated with BMI, and it has an effect on dietary intake and preference for certain energy-dense foods, as well (Clifton et al., 2017; Locke et al., 2015; Brunkwall et al., 2013; Kamura et al., 2016; Loos and Yeo, 2014).
In the present study, we considered BMI as a quantitative measure of obesity and investigated the association between BMI and 986 SNPs, including rare and common variants, located at the 16q.12.2 region using SNP set IBS kernel machine regression method with Madsen and Browning weight in Tehran, the capital city of Iran, adolescent population.
Section snippets
Subjects and data
In this study, information of the people who participated in Tehran cardio-metabolic genetic study (TCGS) was used (Daneshpour and Fallah, 2017), which this is a part of Tehran Lipid and Glucose Study, an ongoing cohort study in Tehran urban district 13 (Azizi et al., 2009). Subjects were signed consent forms and were interviewed for obtaining demographic data before referring to trained physicians and a laboratory for blood sampling. The details are presented elsewhere (Azizi et al., 2009). In
Subjects information
Information from 6928 individuals aged between 20 and 96 years was used in the present study. The mean ± standard deviation of age for males and females were 59.5 ± 13.6 and 53.9 ± 14.1 years, respectively. From 6928 individuals, 2821 individuals (~40%) were male and 20,402 individuals had BMI >30 kg/m2. Table 1 shows the anthropometric and clinical information of study subjects categorized by sex and BMI value (e.g., non-obese group with BMI ≤ 30 and obese group with BMI > 30). According to
Discussion
We investigate the association between 367 SNP sets and BMI in 2968 unrelated subjects. It was the first study on Iranian population for assessing the association between BMI and 982 SNPs located on gene and intergenic loci of 16q12.2 region. The kernel machine regression model result revealed that significant SNP sets were almost distributed randomly along the 16q12.2 regions. However, they had more density in some regions including intron of AKTIP gene, the intron of FTO gene and nearby the
Conclusion
In conclusion, we used kernel machine regression to assess the association between BMI and 986 SNPs located on 16q12.2 regions in Iranian population. Significantly associated sets were distributed through the region with more density around the intron of FTO, AIKTIP and MMP2 genes. They were also nearby to the LINCO2140 and IRX3 genes.
There are some studies on the linkage between variants on FTO and expression of AIKTIP and IRX3 as well as some association studies between MMP2 and obesity
Acknowledgments
This paper was a part of Ph.D. thesis approved by Hamadan University of Medical Sciences (grant No. 9506233687). The authors would like to express their gratitude to the patients participating in the Tehran lipid and glucose study. Also especial thanks for DeCODE genetic company for doing the genetic screening. We also appreciate Pro. Jurg Ott, Rockefeller University, New York, and Dr. Marcella Devoto, The Children's Hospital of Philadelphia, and University of Pennsylvania, Philadelphia PA, for
Funding
In this study, we used a data set that was funded by the Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences (Tehran, Iran) ethics committee (code of “IR.SBMU.ENDOCRINE.REC.1395.366”), and also the scientific and financial support of deCODE genetic company (Reykjavik, Iceland). Iranian molecular medicine network supported the genomic bank.
References (44)
- et al.
Second-generation PLINK: rising to the challenge of larger and richer datasets
Gigascience
(2015) - et al.
Sequence kernel association tests for the combined effect of rare and common variants
Am. J. Hum. Genet.
(2013) - et al.
A powerful and flexible multilocus association test for quantitative traits
Am. J. Hum. Genet.
(2008) - et al.
Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013
Lancet
(2014) - et al.
Partition-ligation–expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms
Am. J. Hum. Genet.
(2002) - et al.
Association between matrix metaloproteinases 2-1306C/T polymorphism and the risk of coronary artery disease in Iranian population
Pathophysiology
(2017) - et al.
A functional role of gelatinase A in the development of nutritionally induced obesity in mice
J. Thromb. Haemost.
(2008) - et al.
Generalized genomic distance–based regression methodology for multilocus association analysis
Am. J. Hum. Genet.
(2006) - et al.
Powerful SNP-set analysis for case-control genome-wide association studies
Am. J. Hum. Genet.
(2010) - et al.
Rare-variant association testing for sequencing data with the sequence kernel association test
Am. J. Hum. Genet.
(2011)