Brief CommunicationGenerating SNP barcode to evaluate SNP–SNP interaction of disease by particle swarm optimization
Introduction
Single-nucleotide polymorphisms (SNPs) are the most common type of DNA sequence variation in the human genome (Brookes, 1999). The association studies of polygenic human diseases or cancers, which arise from the combined contribution of multiple independently acting and/or interacting polymorphic genes, is still challenging due to the huge number of SNPs involved (Hirschhorn and Daly, 2005, Thornton-Wells et al., 2004).
Recently, a combinational approach with the aim of balancing their specific strengths may be the optimal approach to investigate gene–gene interactions in human data (Musani et al., 2007). However, many approaches to handle the computational burden of association of disease with multiple SNPs simultaneously were reviewed (Musani et al., 2007). Recently, the disease risk evaluation of MDR (Ritchie et al., 2001) for each combination of genotypes is improved by introducing the odds ratio calculation to MDR (Chung et al., 2007).
Here, we introduced the discrete binary particle swarm optimization (DBPSO) to solve the optimization problems associated with SNP–SNP interactions. The “SNP barcode” used in this study are regarded as the combined SNPs with genotypes, e.g. TT, TC, and CC for a SNP with T/C polymorphism. Furthermore, DBPSO can determine the best SNP barcode without calculating each combination separately and is able to provide the best SNP barcodes with maximal difference between controls and cases. The odds ratio of each combination of genotypes, i.e., SNP barcode, was regarded as a quantitative measure of disease risk. Therefore, we propose the odds ratio-based DBPSO (OR-DBPSO) method to generate SNP barcodes of genotypes to predict disease susceptibility statistically.
Section snippets
Method
We propose the OR-DBPSO method to generate the best SNP barcodes of genotypes to predict disease susceptibility such as osteoporosis for post-menopausal women. Two stages illustrate the procedure to implement the OR-DBPSO method. Stage 1 is the DBPSO method. We retrieved eleven SNP data sets obtained from our previous study (Lin et al., 2008). DBPSO weakens quickly, and optimal solutions can be found in a short time out of a wide solution space, meaning that we can look for the combination of
Example
The dataset obtained from our previous osteoporosis association study (Lin et al., 2008). In this paper, we focus only on the selection of the best combination of SNPs with genotypes using DBPSO. Information pertaining to the 11 SNPs in the dataset is described in Table 1; the complete original data set is available on the website (http://bioinfo.kmu.edu.tw/OS-original-meno-only.xls).
Identification of the Best SNP Barcode with Maximal Difference Between Cases and Controls
Among the combinations, as shown in Table 2, the SNP1-genotype 3 and SNP5-genotype 3
Conclusions
The DBPSO methodology presented in this study provides for an unlimited number of combined SNPs. However, an optimal sample size depends on many factors and needs to be experimentally determined. It is a commonly encountered inference problem of multiple testing in large-scale genetic association studies as a result of simultaneous testing of multiple hypotheses (Musani et al., 2007). To control errors likely to result from stochastic variations and correlation between tests is highly suggested
Acknowledgements
This work is partly supported by the National Science Council in Taiwan under grant NSC96-2311-B037-002, NSC96-2622-E214-004-CC3, NSC95-2221-E214-087, NSC94-2622-E-151-025-CC3, NSC94-2311-B037-001, NSC93-2213-E-214-037 and the grant KMU-EM-97-1.1a.
References (9)
The essence of SNPs
Gene
(1999)- et al.
Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer
Am. J. Hum. Genet.
(2001) - et al.
Genetics, statistics and human disease: analytical retooling for complexity
Trends Genet.
(2004) - et al.
Odds ratio based multifactor-dimensionality reduction method for detecting gene–gene interactions
Bioinformatics
(2007)
Cited by (17)
A comparative analysis of chaotic particle swarm optimizations for detecting single nucleotide polymorphism barcodes
2016, Artificial Intelligence in MedicineCitation Excerpt :Recent studies have focused on establishing associations between genetic loci in various diseases [4,5]. SNP barcodes were introduced as the combination of SNPs with genotypes to represent the association between genes [6]. SNP barcodes could suppress or increase the genetic effect of particular genes.
Discovering SNP Interactions Associated with Breast Cancer Using Evolutionary Algorithms
2016, Procedia Computer ScienceA new technique for generating pathogenic barcodes in breast cancer susceptibility analysis
2015, Journal of Theoretical BiologyCitation Excerpt :Therefore, individuals carrying high-risk barcodes are potential patients. Many approaches have been designed for generating SNP barcodes using intelligent algorithm such as DBPSO (Chang et al., 2009 and IGA (Yang et al., 2013) or exact algorithm such as branch and bound method IBBFS (Chuang et al., 2013). These methods have some advantages.
Preventive SNP-SNP interactions in the mitochondrial displacement loop (D-loop) from chronic dialysis patients
2013, MitochondrionCitation Excerpt :However, the evaluation of SNP–SNP interactions is complex as it involves rigorous interactions of many SNPs (Moore et al., 2010), and has thus remained a challenge. Recently, many computational methodologies and algorithms have been developed to analyse SNP–SNP interactions (Chang et al., 2009; Chuang et al., 2012a, 2012b; Li et al., 2009; Lin et al., 2012; Lucas et al., 2012; Winham et al., 2012; Yang et al., 2009, 2011a, 2011b); however, they have not yet been applied in the chronic dialysis association study. A genetic algorithm (GA) is an efficient and powerful population-based stochastic search technique for complex and difficult problems over a continuous problem space.