doi:10.1016/j.future.2006.09.001
Copyright © 2006 Elsevier Ltd All rights reserved.
A parallel hybrid genetic algorithm for protein structure prediction on the computational grid
aLaboratoire d’Informatique Fondamentale de Lille, LIFL/CNRS UMR 8022, DOLPHIN Project - INRIA Futurs, Cité Scientifique, 59655 - Villeneuve d’Ascq Cedex, France
bCNRS UMR8576, Université des Sciences et Technologies de Lille, Bâtiment C9, Cité Scientifique 59655 - Villeneuve d’Ascq Cedex, France
Received 2 February 2006;
revised 5 August 2006;
accepted 7 September 2006.
Available online 1 November 2006.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Solving the structure prediction problem for complex proteins is difficult and computationally expensive. In this paper, we propose a bicriterion parallel hybrid genetic algorithm (GA) in order to efficiently deal with the problem using the computational grid. The use of a near-optimal metaheuristic, such as a GA, allows a significant reduction in the number of explored potential structures. However, the complexity of the problem remains prohibitive as far as large proteins are concerned, making the use of parallel computing on the computational grid essential for its efficient resolution. A conjugated gradient-based Hill Climbing local search is combined with the GA in order to intensify the search in the neighborhood of its provided configurations. In this paper we consider two molecular complexes: the tryptophan-cage protein (Brookhaven Protein Data Bank ID 1L2Y) and α-cyclodextrin. The experimentation results obtained on a computational grid show the effectiveness of the approach.
Keywords: Protein structure prediction; Genetic algorithm; Hill climbing; Parallel computing; Grid computing
Fig. 1. x1 dominates x2; x1 non-dominated with x3 and x2 non-dominated with x3.
Fig. 2. Pareto front formed of: x1,x2,x3,x4; supported points (points located on the convex hull enclosing the entire set of solutions): x1,x2,x4; non-supported point (point at the interior of the convex hull): x3; dominated point: x5.
Fig. 3. Chromosome encoding based on specifying the backbone torsional angles.
Fig. 4. Energy surface for α-cyclodextrin. High energy points are depicted in light colors, the low energy points being identified by the dark areas. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 5. The bonded atom derived energy component is represented by the blue grid. The non-bonded atom derived energy component is given by the smoother surface, with red grid lines. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 6. Energy surface obtained after applying a Lamarck local search on the initial set of conformations.
Fig. 7. The two components of the energy surface for the conformations obtained after applying the Lamarck local search. The upper and the lower surface correspond to the non-bonded atom derived energy, and, to the bonded atoms derived energy, respectively.
Fig. 8. A layered architecture of ParadisEO-CMW.
Fig. 9. GRID5000 centers are marked in grey, the colored disks around them offering a visual feedback regarding the status of their afferent workstations. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 10. Speed-up for the tryptophan-cage protein—marked with red rectangles—and α-cyclodextrin—blue triangles. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 11. 1L2Y Pareto front. Zero-energy conformation: 46.446 (non-bonded energy: 34.230, bonded energy: 12.216).
Fig. 12. α-cyclodextrin Pareto front. Zero-energy conformation: 242.157 (non-bonded energy: 216.579, bonded energy: 25.578).
Fig. 13. Improvements in the value of a function generally attract a degradation in the value of the other function.
Table 1.
Active elements for the performed experiments

Table 2.
Execution times for the performed experiments
