Regular article
Development and validation of a genetic algorithm for flexible docking1

https://doi.org/10.1006/jmbi.1996.0897Get rights and content

Abstract

Prediction of small molecule binding modes to macromolecules of known three-dimensional structure is a problem of paramount importance in rational drug design (the “docking” problem). We report the development and validation of the program GOLD (Genetic Optimisation for Ligand Docking). GOLD is an automated ligand docking program that uses a genetic algorithm to explore the full range of ligand conformational flexibility with partial flexibility of the protein, and satisfies the fundamental requirement that the ligand must displace loosely bound water on binding. Numerous enhancements and modifications have been applied to the original technique resulting in a substantial increase in the reliability and the applicability of the algorithm. The advanced algorithm has been tested on a dataset of 100 complexes extracted from the Brookhaven Protein DataBank. When used to dock the ligand back into the binding site, GOLD achieved a 71% success rate in identifying the experimental binding mode.

Introduction

Protein binding sites exhibit highly selective recognition of small organic molecules, in that evolution has equipped them with a complex three-dimensional “lock” into which only specific “keys” will fit. This has been exploited by medicinal chemists in the design of molecules selectively to augment or retard biochemical pathways and so exhibit a clinical effect. X-ray crystallography has revealed the structure of a significant number of these binding sites. It would be advantageous in attempting the computer-aided design of therapeutic molecules to be able to predict and to explain the binding mode of novel chemical entities (the “docking” problem) when the active site geometry is known.

Any solution to the docking problem requires both a powerful search technique to explore the conformation space available to the protein and the ligand and a good understanding of the process of molecular recognition to devise scoring functions that can reliably predict binding modes. Furthermore, since many putative dockings will require evaluation before elucidating the binding mode, any scoring function must be rapid in operation.

There are currently many different approaches to solving the docking problem Blaney and Dixon 1993, Jones and Willett 1995. Early approaches to ligand docking consider both protein and ligand to be rigid, as typified by the DOCK program (Kuntz et al., 1982). Since the bioactive conformation of a bound ligand rarely corresponds to the isolated ligand X-ray structure (Nicklaus et al., 1995), recent techniques have dealt with the issue of conformational flexibility. Deterministic approaches include the FLOG system of Miller et al. (1994) and FLEXX of Rarey et al. (1996). The latter algorithm is very efficient and has been verified on 19 protein-ligand complexes. Alternative, stochastic sampling techniques include genetic algorithms Jones et al 1995a, Judson et al 1994, Oshiro et al 1995, simulated annealing (Goodsell & Olsen, 1990) and evolutionary programming (Gehlhaar et al., 1995).

Inspection of the X-ray crystallographic structures of proteins with associated high-affinity ligands reveals that the ligands appear to conform closely to the shape of the binding cavity, maximising the hydrophobic contribution to binding, and to interact at a number of hydrogen bonding sites. The optimal binding mode may thus involve the ligand forming hydrogen bonds at key hydrogen-bonding sites, accompanied by hydrophobic surface area burial. The most significant contributions to apolar surface area burial are likely to be dispersive interactions between protein and ligand atoms together with an entropic contribution from the displacement of ordered water from the active site into the solvent. Sufficiently accurate simulation of many of these interactions may be enough to predict the binding mode of the majority of high-affinity ligands.

We have reported the use of a genetic algorithm, hereinafter a GA Davis 1991, Goldberg 1989, Holland 1992 to perform protein docking (Jones et al., 1995a), where an evolutionary strategy is employed to explore the conformational variability of a flexible ligand while simultaneously sampling available binding modes into a partially flexible protein active site. The GA provides a search paradigm that enables the rapid identification of good, though not necessarily optimal, solutions to combinatorial optimisation problems. Of particular interest is the use of GAs in performing conformational analysis of both small molecules Jones et al 1995b, Brodmeier and Pretsch 1994, Clark et al 1994 and macromolecules Dandekar and Argos 1996, Sun 1993.

Here, we describe a docking program called GOLD (Genetic Optimisation for Ligand Docking) that is based on the algorithm described by (Jones et al., 1995a). GOLD performs automated docking with full acyclic ligand flexibility, partial cyclic ligand flexibility and partial protein flexibility in the neighbourhood of the protein active site. In order to search the space of available binding modes efficiently, hydrogen bond motifs have been directly encoded into the GA. A simple scoring function was used to rank generated binding modes. This comprised a term for hydrogen bonding (which took account of the fundamental requirement that water must be displaced from both donor and acceptor before a bond is formed); a pairwise dispersion potential that was able to describe a significant contribution to the hydrophobic energy of binding; and a molecular mechanics term for the internal energy of the ligand. The original algorithm has now been substantially enhanced, as detailed in Materials and Methods. The resulting algorithm has been tested on a number of complex ligands and the result of docking NADPH into dihydrofolate reductase (DHFR) is reported here as an example of the power of this technique. In order to probe the strengths and weaknesses of GOLD in a more rigorous manner, 100 protein-ligand complexes were selected from the Protein Data Bank (PDB: Bernstein et al., 1977). These complexes were selected on the basis of pharmacological interest and whether or not the ligands involved were “drug like”. The result was a varied and demanding test set of complexes. We report here the results obtained by using GOLD to predict the binding modes for these test complexes and compare these predictions against the crystallographically observed binding modes.

Section snippets

Results

The GA described in Materials and Methods required as input the approximate size and location of the active site, together with coordinates of the protein and a ligand conformation. As GOLD used a cavity detection procedure to further define the active site, the size and location input by the user was not critical. Although the determination of the active site is not currently automated, there are techniques available that are capable of predicting the location of the active site with

Discussion

Here, we have described the development of GOLD, a GA for flexible ligand docking. The effectiveness of the approach has been illustrated by the docking of NADPH to DHFR. The method has been verified by testing the program on a set of 100 complexes selected from the PDB. During this process GOLD achieved a 71% success rate in reproducing the experimentally observed binding mode. While this was a very encouraging result, an analysis of the results was performed in order to determine common

Genetic algorithms

A GA is a computer program that mimics the process of evolution by manipulating a collection of data structures called chromosomes. Each of these structures encodes a possible solution (i.e. a possible ligand orientation within the protein binding site) to the docking problem and may be assigned a fitness score based on the relative merit of that solution. A steady-state operator-based GA was used to explore conformation space and ligand binding modes (Davis, 1991). This GA is illustrated in

Acknowledgements

We thank J.C. Cole and J.P.M. Lommerse for useful discussions, the Biotechnology and Biological Sciences Research Council, the Cambridge Crystallographic Data Centre, the Department of Trade and Industry, Glaxo Wellcome Ltd. and the Medical Research Council for financial support and Tripos Inc. for the provision of software.

References (87)

  • D Ghosh et al.

    Mechanism of inhibition of 3α,20β-hydroxysteroid dehydrogenase by a licorice-derived steroidal inhibitor

    Structure

    (1994)
  • J.P Glusker

    Structural Aspects of metal liganding to functional groups in proteins

    Advan. Protein Chem.

    (1991)
  • J.A Hamilton et al.

    The X-ray crystal structure refinements of normal human transthyretin and the amyloidogenic Val30 → Met variant to 1.7 Å resolution

    J. Biol. Chem.

    (1993)
  • O Herzberg

    Refined crystal structure of β-lactamase from Staphylococcus aureus PC1 at 2.0 Å resolution

    J. Mol. Biol.

    (1991)
  • G Jones et al.

    Docking small molecule ligands into active sites

    Curr. Opin. Biotechnology

    (1995)
  • G Jones et al.

    Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation

    J. Mol. Biol.

    (1995)
  • R.S Judson et al.

    A genetic algorithm method for docking flexible molecules

    J. Mol. Struct.

    (1994)
  • G Klebe

    Mapping common molecular fragments in crystal structures to explore conformation and configuration space under the conditions of a molecular environment

    J. Mol. Struct.

    (1994)
  • I.D Kuntz et al.

    A geometric approach to macromolecule-ligand interactions

    J. Mol. Biol.

    (1982)
  • A.G.W Leslie

    Refined crystal structure of type III chloramphenicol acetyltransferase at 1.75 Å resolution

    J. Mol. Biol.

    (1990)
  • D.A Matthews et al.

    Refined crystal structures of Escherichia coli and chicken liver dihydrofolate reductase containing bound trimethoprim

    J. Biol. Chem.

    (1985)
  • J.B.O Mitchell et al.

    On the relative strengths of amide..amide and amide..water hydrogens bonds

    Chem. Phys. Letters

    (1991)
  • K.H.M Murthy et al.

    The crystal structures at 2.2-Å resolution of hydroxyethylene-based inhibitors bound to human immunodeficiency virus type 1 protease show that the inhibitors are present in two distinct orientations

    J. Biol. Chem.

    (1992)
  • M.C Nicklaus et al.

    Conformational changes of small molecules binding to proteins

    Bioorg. Med. Chem.

    (1995)
  • A.W.R Payne et al.

    Molecular recognition using a binary genetic search algorithm

    J. Mol. Graph.

    (1993)
  • E.A Padlan et al.

    On the specificity of antibody 3-antigen interactionsphosphocholine binding to MCPC603 and the correlation of three-dimensional structure and sequence data

    Ann. Immunol. (Paris)

    (1985)
  • K.P Peters et al.

    The automatic search for ligand binding sites in proteins of known three-dimensional structure using only geometric criteria

    J. Mol. Biol.

    (1996)
  • M Rarey et al.

    Predicting receptor-ligand interactions by an incremental construction algorithm

    J. Mol. Biol.

    (1996)
  • L Tong et al.

    Crystal structures of HIV-2 protease in complex with inhibitors containing the hydroxyethylamine dipeptide isostere

    Structure

    (1995)
  • G.D Van Duyne et al.

    Atomic structures of human immunophilin FKBP-12 complexes with FK506 and rapamycin

    J. Mol. Biol.

    (1993)
  • A Zhang et al.

    Structure determination of antiviral compound SCH 38057 complexed with human rhinovirus 14

    J. Mol. Biol.

    (1993)
  • F.H Allen et al.

    The development of versions 3 and 4 of the Cambridge Structural Database system

    J. Chem. Inform. Comput. Sci.

    (1991)
  • F.H Allen et al.

    Correlation of the hydrogen-bond acceptor properties of nitrogen with the geometry of the Nsp2 → Nsp3 transition in R1(X=)C-N R2R3 substructuresreaction pathway for the protonation of nitrogen

    Acta Crystallog. sect. B

    (1995)
  • Ajay et al.

    Computational methods to predict binding free energy in ligand-receptor complexes

    J. Med. Chem.

    (1995)
  • J Badger et al.

    Structural analysis of antiviral agents that interact with the capsid of human rhinoviruses

    Proteins: Struct. Funct. Genet.

    (1989)
  • J.M Blaney et al.

    A good ligand is hard to findautomated docking methods

    Perspect. Drug Discov. Res.

    (1993)
  • Z Böcskei et al.

    Pheromone binding to two rodent urinary proteins revealed by X-ray crystallography

    Nature

    (1992)
  • B Borah et al.

    Nuclear magnetic resonance and neutron-diffraction studies of the complex of ribonuclease-A with uridine vanadate, a transition-state analog

    Biochemistry

    (1985)
  • H Brandsetter et al.

    Refined 2.3 Angstrom X-ray crystal structure of bovine thrombin complexes formed with the benzamidine and arginine-based thrombin inhibitoes NAPDP 4-TAPAP and MQPAa starting point for improving antithrombotics

    J. Mol. Biol.

    (1992)
  • T Brodmeier et al.

    Application of genetic algorithms in molecular modelling

    J. Comput. Chem.

    (1994)
  • C Bystroff et al.

    Crystal structure of unliganded Escherichia coli dihydrofolate reductase-ligand-induced conformational changes and cooperativity in binding

    Biochemistry

    (1991)
  • D.E Clark et al.

    Pharmacophoric pattern matching in files of three-dimensional structurescomparison of conformational-searching algorithms for flexible searching

    J. Chem. Inform. Comput. Sci.

    (1994)
  • M Clark et al.

    Validation of the general-purpose TRIPOS 5.2 force field

    J. Comput. Chem.

    (1989)
  • Cited by (0)

    1

    Edited by F. E. Cohen

    2

    Present address: R. C. Glen, Tripos Inc., 1699 South Hanley Road, St Louis, MO 63144, USA

    View full text