Generating SNP barcode to evaluate SNP–SNP interaction of disease by particle swarm optimization

doi:10.1016/j.compbiolchem.2008.07.029

Computational Biology and Chemistry

Volume 33, Issue 1, February 2009, Pages 114-119

https://doi.org/10.1016/j.compbiolchem.2008.07.029 Get rights and content

Abstract

Genome-wide association analysis involved many single-nucleotide polymorphisms (SNPs) data is challenging mathematically and computationally. Hence, we propose the odds ratio-based discrete binary particle swarm optimization (OR-DBPSO) method that uses the OR as a new quantitative measure of disease risk among many SNP combinations with genotypes called “SNP barcode”. DBPSO are applied to generate SNP barcode, which computes the maximal difference of occurrence between the case and control groups, to predict disease susceptibility such as osteoporosis. Different SNP barcode patterns may occur several times in either low or high bone mineral density (BMD) groups. Our results showed that a DBPSO can effectively identify a specific SNP barcode with an optimized fitness value. SNP barcodes with a low fitness value will naturally be discarded from the population. A representative SNP barcode with a variable number of SNPs is processed to OR analysis to determine the maximum difference between the low and high BMD groups in statistics manner. Therefore, this paper introduces a powerful procedure to analyze disease-associated SNP–SNP interaction in genome-wide genes.

Introduction

Single-nucleotide polymorphisms (SNPs) are the most common type of DNA sequence variation in the human genome (Brookes, 1999). The association studies of polygenic human diseases or cancers, which arise from the combined contribution of multiple independently acting and/or interacting polymorphic genes, is still challenging due to the huge number of SNPs involved (Hirschhorn and Daly, 2005, Thornton-Wells et al., 2004).

Recently, a combinational approach with the aim of balancing their specific strengths may be the optimal approach to investigate gene–gene interactions in human data (Musani et al., 2007). However, many approaches to handle the computational burden of association of disease with multiple SNPs simultaneously were reviewed (Musani et al., 2007). Recently, the disease risk evaluation of MDR (Ritchie et al., 2001) for each combination of genotypes is improved by introducing the odds ratio calculation to MDR (Chung et al., 2007).

Here, we introduced the discrete binary particle swarm optimization (DBPSO) to solve the optimization problems associated with SNP–SNP interactions. The “SNP barcode” used in this study are regarded as the combined SNPs with genotypes, e.g. TT, TC, and CC for a SNP with T/C polymorphism. Furthermore, DBPSO can determine the best SNP barcode without calculating each combination separately and is able to provide the best SNP barcodes with maximal difference between controls and cases. The odds ratio of each combination of genotypes, i.e., SNP barcode, was regarded as a quantitative measure of disease risk. Therefore, we propose the odds ratio-based DBPSO (OR-DBPSO) method to generate SNP barcodes of genotypes to predict disease susceptibility statistically.

Section snippets

Method

We propose the OR-DBPSO method to generate the best SNP barcodes of genotypes to predict disease susceptibility such as osteoporosis for post-menopausal women. Two stages illustrate the procedure to implement the OR-DBPSO method. Stage 1 is the DBPSO method. We retrieved eleven SNP data sets obtained from our previous study (Lin et al., 2008). DBPSO weakens quickly, and optimal solutions can be found in a short time out of a wide solution space, meaning that we can look for the combination of

Example

The dataset obtained from our previous osteoporosis association study (Lin et al., 2008). In this paper, we focus only on the selection of the best combination of SNPs with genotypes using DBPSO. Information pertaining to the 11 SNPs in the dataset is described in Table 1; the complete original data set is available on the website (http://bioinfo.kmu.edu.tw/OS-original-meno-only.xls).

Identification of the Best SNP Barcode with Maximal Difference Between Cases and Controls

Among the combinations, as shown in Table 2, the SNP1-genotype 3 and SNP5-genotype 3

Conclusions

The DBPSO methodology presented in this study provides for an unlimited number of combined SNPs. However, an optimal sample size depends on many factors and needs to be experimentally determined. It is a commonly encountered inference problem of multiple testing in large-scale genetic association studies as a result of simultaneous testing of multiple hypotheses (Musani et al., 2007). To control errors likely to result from stochastic variations and correlation between tests is highly suggested

Acknowledgements

This work is partly supported by the National Science Council in Taiwan under grant NSC96-2311-B037-002, NSC96-2622-E214-004-CC3, NSC95-2221-E214-087, NSC94-2622-E-151-025-CC3, NSC94-2311-B037-001, NSC93-2213-E-214-037 and the grant KMU-EM-97-1.1a.

References (9)

A.J. Brookes
The essence of SNPs
Gene
(1999)
M.D. Ritchie et al.
Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer
Am. J. Hum. Genet.
(2001)
T.A. Thornton-Wells et al.
Genetics, statistics and human disease: analytical retooling for complexity
Trends Genet.
(2004)
Y. Chung et al.
Odds ratio based multifactor-dimensionality reduction method for detecting gene–gene interactions
Bioinformatics
(2007)

There are more references available in the full text version of this article.

Cited by (17)

A comparative analysis of chaotic particle swarm optimizations for detecting single nucleotide polymorphism barcodes
2016, Artificial Intelligence in Medicine
Citation Excerpt :
Recent studies have focused on establishing associations between genetic loci in various diseases [4,5]. SNP barcodes were introduced as the combination of SNPs with genotypes to represent the association between genes [6]. SNP barcodes could suppress or increase the genetic effect of particular genes.
Evolutionary algorithms could overcome the computational limitations for the statistical evaluation of large datasets for high-order single nucleotide polymorphism (SNP) barcodes. Previous studies have proposed several chaotic particle swarm optimization (CPSO) methods to detect SNP barcodes for disease analysis (e.g., for breast cancer and chronic diseases). This work evaluated additional chaotic maps combined with the particle swarm optimization (PSO) method to detect SNP barcodes using a high-dimensional dataset.
Nine chaotic maps were used to improve PSO method results and compared the searching ability amongst all CPSO methods. The XOR and ZZ disease models were used to compare all chaotic maps combined with PSO method. Efficacy evaluations of CPSO methods were based on statistical values from the chi-square test (χ²).
The results showed that chaotic maps could improve the searching ability of PSO method when population are trapped in the local optimum. The minor allele frequency (MAF) indicated that, amongst all CPSO methods, the numbers of SNPs, sample size, and the highest χ² value in all datasets were found in the Sinai chaotic map combined with PSO method. We used the simple linear regression results of the gbest values in all generations to compare the all methods. Sinai chaotic map combined with PSO method provided the highest β values (β ≥ 0.32 in XOR disease model and β ≥ 0.04 in ZZ disease model) and the significant p-value (p-value < 0.001 in both the XOR and ZZ disease models).
The Sinai chaotic map was found to effectively enhance the fitness values (χ²) of PSO method, indicating that the Sinai chaotic map combined with PSO method is more effective at detecting potential SNP barcodes in both the XOR and ZZ disease models.
Discovering SNP Interactions Associated with Breast Cancer Using Evolutionary Algorithms
2016, Procedia Computer Science
Genetic association is a challenging task for the identification and characterization of genes that increase the susceptibility to common complex multifactorial diseases. To fully execute genetic studies of complex diseases, modern geneticists face the challenge of detecting interactions between loci. In this paper, two evolutionary methods were compared to detect associations of single nucleotide polymorphisms (SNPs): a genetic algorithm and Gauss particle swarm optimization. Genetic algorithm was developed with partial matched crossover operator and two different strategies for initialization: regular initialization and top-5 strategy initialization. In both methods for different SNP barcodes (SNP combinations with their corresponding genotypes) the difference between case and control data is computed systematically. The algorithms look for the best combination which is the barcode with maximum difference between the two groups. Analysis results support that the genetic algorithm with top-5 strategy for initialization provides higher frequency difference values than the Gauss particle swarm optimization. It is also proved that a genetic algorithm reduces a computational cost for obtaining higher frequency difference between the case and control group.
A new technique for generating pathogenic barcodes in breast cancer susceptibility analysis
2015, Journal of Theoretical Biology
Citation Excerpt :
Therefore, individuals carrying high-risk barcodes are potential patients. Many approaches have been designed for generating SNP barcodes using intelligent algorithm such as DBPSO (Chang et al., 2009 and IGA (Yang et al., 2013) or exact algorithm such as branch and bound method IBBFS (Chuang et al., 2013). These methods have some advantages.
Complex diseases usually involve complex interactions between multiple loci. The artificial intelligent algorithm is a plausible strategy to evade combinatorial explosion. However, the randomness of solution of this algorithm loses decreases the confidence of biological researchers on this algorithm. Meanwhile, the lack of an efficient and effective measure to profile the distribution of cases and controls impedes the discovery of pathogenic epistasis. Here we present an efficient method called maximum dissimilarity–minimum entropy (MDME) to analyze breast cancer single-nucleotide polymorphism (SNP) data. The method searches risky barcodes, which to increase the odds ratio and relative risk of the breast cancer. This method based on the hypothesis that if a specific barcode is associated with a disease, then the barcode permits distinction of cases from controls and more importantly it shows a relative consistent pattern in cases. An analysis based on simulated dataset explains the necessity of minimum entropy. Experimental results show that our method can find the most risky barcode that contributes to breast cancer susceptibility. Our method may also mine several pathogenic barcodes that condition the different subtypes of cancer.
Preventive SNP-SNP interactions in the mitochondrial displacement loop (D-loop) from chronic dialysis patients
2013, Mitochondrion
Citation Excerpt :
However, the evaluation of SNP–SNP interactions is complex as it involves rigorous interactions of many SNPs (Moore et al., 2010), and has thus remained a challenge. Recently, many computational methodologies and algorithms have been developed to analyse SNP–SNP interactions (Chang et al., 2009; Chuang et al., 2012a, 2012b; Li et al., 2009; Lin et al., 2012; Lucas et al., 2012; Winham et al., 2012; Yang et al., 2009, 2011a, 2011b); however, they have not yet been applied in the chronic dialysis association study. A genetic algorithm (GA) is an efficient and powerful population-based stochastic search technique for complex and difficult problems over a continuous problem space.
Chronic dialysis association study involving individual single nucleotide polymorphisms (SNPs) in the mitochondrial displacement loop (D-loop) has previously been reported. However, possible SNP–SNP interactions for SNPs in the D-loop which could be associated with a reduced risk for chronic dialysis were not investigated. The purpose of this study was to propose an effective algorithm to identify protective SNP–SNP interactions in the D-loop from chronic dialysis patients. We introduce ISGA that uses an initialization strategy for genetic algorithms (GA) to improve the computational analysis for protective SNP–SNP interactions. ISGA generates genotype patterns with combined SNPs (SNP barcodes) for chronic dialysis. Using our previously reported 77 SNPs in the D-loop, the algorithm-generated protective SNP barcodes for chronic dialysis were evaluated. ISGA provides the SNP barcodes with the maximum frequency differences of occurrence between the cases and controls. The identified SNP barcodes with the lowest odds ratio (OR) values were regarded as the best preventive SNP barcodes against chronic dialysis. The best ISGA-generated SNP barcodes (two to nine SNPs) are more closely associated with the prevention of chronic dialysis when more SNPs are chosen (OR = 0.64 to 0.32; 95% confidence interval = 0.882 to 0.198). The cumulative effects of SNP–SNP interactions were more dominant in ISGA rather than in GA without the initialization strategy. We provide a fast identification of chronic dialysis-associated protective SNP barcodes and demonstrate that the SNP–SNP interactions may have a cumulative effect on prediction for chronic dialysis.
Application of simulation-based CYP26 SNP-environment barcodes for evaluating the occurrence of oral malignant disorders by odds ratio-based binary particle swarm optimization: A case-control study in the Taiwanese population
2019, PLoS ONE
Use of Genetic Algorithm Combinational Single-nucleotide Polymorphisms Could Modify the Association of Blood Lead Levels and Bone Matrix Density
2017, Epidemiology

View all citing articles on Scopus

View full text

Brief CommunicationGenerating SNP barcode to evaluate SNP–SNP interaction of disease by particle swarm optimization

Abstract

Introduction

Section snippets

Method

Example

Identification of the Best SNP Barcode with Maximal Difference Between Cases and Controls

Conclusions

Acknowledgements

Gene

Am. J. Hum. Genet.

Trends Genet.

Odds ratio based multifactor-dimensionality reduction method for detecting gene–gene interactions

Bioinformatics

Brief Communication
Generating SNP barcode to evaluate SNP–SNP interaction of disease by particle swarm optimization