Paper The following article is Open access

Exploring the role of water in molecular recognition: predicting protein ligandability using a combinatorial search of surface hydration sites

, and

Published 1 July 2016 © 2016 IOP Publishing Ltd
, , Special issue on molecular scale theories and simulations of fluid systems: from microscopic to macroscopic effects Citation Sinisa Vukovic et al 2016 J. Phys.: Condens. Matter 28 344007 DOI 10.1088/0953-8984/28/34/344007

0953-8984/28/34/344007

Abstract

The interaction between any two biological molecules must compete with their interaction with water molecules. This makes water the most important molecule in medicine, as it controls the interactions of every therapeutic with its target. A small molecule binding to a protein is able to recognize a unique binding site on a protein by displacing bound water molecules from specific hydration sites. Quantifying the interactions of these water molecules allows us to estimate the potential of the protein to bind a small molecule. This is referred to as ligandability. In the study, we describe a method to predict ligandability by performing a search of all possible combinations of hydration sites on protein surfaces. We predict ligandability as the summed binding free energy for each of the constituent hydration sites, computed using inhomogeneous fluid solvation theory. We compared the predicted ligandability with the maximum observed binding affinity for 20 proteins in the human bromodomain family. Based on this comparison, it was determined that effective inhibitors have been developed for the majority of bromodomains, in the range from 10 to 100 nM. However, we predict that more potent inhibitors can be developed for the bromodomains BPTF and BRD7 with relative ease, but that further efforts to develop inhibitors for ATAD2 will be extremely challenging. We have also made predictions for the 14 bromodomains with no reported small molecule Kd values by isothermal titration calorimetry. The calculations predict that PBRM1(1) will be a challenging target, while others such as TAF1L(2), PBRM1(4) and TAF1(2), should be highly ligandable. As an outcome of this work, we assembled a database of experimental maximal Kd that can serve as a community resource assisting medicinal chemistry efforts focused on BRDs. Effective prediction of ligandability would be a very useful tool in the drug discovery process.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Introduction

Water molecules at protein surfaces can be weakly or strongly bound. Therefore, the binding of a small molecule to a particular region of the surface is affected by the existing network of hydration sites in the absence of the small molecule. Quantifying the propensity of a protein to bind a small molecule provides an estimate of ligandability in drug design. The druggability of a protein target is defined as the relative ease or difficulty of developing a small molecule that can effectively modulate the protein's activity in vivo [1]. However, there are many complex pharmacodynamics and pharmacokinetic factors influencing the ability of a small molecule to be effective as a drug [2, 3]. For these reasons predicting druggability is an extremely difficult task [4, 5]. However, a good indicator of protein druggability is protein ligandability. The ligandability of a protein is defined as the relative ease or difficulty of developing a small molecule that can inhibit the protein in vitro [6]. Predicting ligandability is somewhat more straightforward and, since it is necessary for druggability (though not sufficient), it is of great interest in drug discovery. For this reason, computational prediction of ligandability is a useful goal. In many cases, ligandability is phrased as the maximal affinity of a ligand for the given target [79]. There are a number of computational methods to predict ligandability including MAPPOD [9], SiteMap [10], fpocket [11], DLID [12], DrugFEATURE [13], LIGSITE [14], Q-SiteFinder [15], i-SITE [16], DrugPred [17], JEDI [18]. All of these methods apply large-scale data analysis to structural, and physicochemical data that is empirically derived using a training sets for ligands, proteins, and protein–ligand complexes.

Methods such as FTMap take a different approach, namely to predict ligandability without any dependence on training set data [19, 20]. FTMap tests different molecular fragments against the entire surface of the protein for locations with the strongest interactions. Here we report a similar approach that uses water molecule instead of fragments. The basis for this approach is the idea that the displacement of water molecules from a protein surface is a key contributor to protein–ligand binding affinity [2129]. A set of strongly bound waters yields a high desolvation penalty and makes it difficult to bind a ligand with high affinity. The task of predicting ligandability is then the task of identifying the combination of water molecules on the surface of the protein that is the easiest to displace. Whilst quantifying the displaceability of individual water molecules has been reported previously using inhomogeneous fluid solvation theory (IFST) [23, 3034], quantifying the displaceability of combinations of water molecules has not been reported. This is a challenging combinatorial problem due to the large number of hydration sites surrounding a protein. Approximation methods cannot be applied in this case because selectivity is based on the preference of one single binding site over all other possible sites, such that all possible combinations must be evaluated. Moreover, proteins are dynamic systems and constant changes in their structure are reflected in responsive changes of the solvent molecules on the surface.

In this work, we describe a new computational tool to quantify the binding free energy of combinations of water molecules from protein surfaces. We apply the method to the bromodomains (BRDs) family of proteins. BRDs represent a good test case due to the large number of available crystal structures, the limited structural flexibility, and experimental data on small-molecule binding. BRDs are very well structurally characterized. They were identified for the first time in the brahma gene of Drosophila melanogaster [35] and classified as epigenetic readers of covalent post-translational modifications of chromatin [36, 37]. In particular, the great majority of BRDs are capable of selectively recognizing ε-N-acetylated lysine residues (Kac) in histone tails [3840]. BRD possess a structurally conserved interaction module about 110 amino acid long characterized by a left-handed four-helix bundle (referred to as the BRD fold: αZ, αA, αB and αC) [41] shown in figure 1. The binding site for acetylated-lysine is a hydrophobic pocket made by helices and inter-helical αZ-αA and αB-αC loops (ZA and BC loops, respectively). Within the binding pocket lies a highly-conserved asparagine residue at the beginning of BC loop (at the top of the pocket) and four or five conserved water molecules (at the bottom of the pocket) that participate in recognition of Kac [22]. In the last two decades, the binding motif between BRDs and Kac histone has been established [42, 43] and extensively reviewed [4446]. Based on this protein-protein interaction motif, many inhibitors have been developed [4750]. Knowing the structure of a protein is very useful in the process of drug discovery and The Structural Genomics Consortium reveals a new BRD structure almost monthly [51].

Figure 1.

Figure 1. The bromodomain binding pocket (left) is formed from the conserved fold of helices αZ, αA, αB and αC as well as the connecting loops ZA, AB and BC. It is bounded at the base by five strongly-conserved water molecules and at the top by a conserved asparagine residue. Small molecule inhibitors such as (+)JQ1 (PDB ID 4FLP) bind in this pocket (right).

Standard image High-resolution image

There are 61 BRDs found in the human proteome appearing within 41 different proteins [43] that, based on their structure, can be divided in eight distinctive subfamilies [52]. Inhibitors have been developed for each of the subfamilies, from those deemed druggable such as BRD4 [53, 54] to those deemed difficult to drug such as BAZ2B [55], including inhibitors with mixed BRD pharmacology [56]. However most BRDs, excluding BRD4, have not yet been studied extensively and thus their biological role and potential for ligandability is not fully understood. Thus, we apply our method to the BRDs in order to predict their ligandability. Because extensive lead-optimization campaigns have not been prosecuted against many of the BRDs, and in some cases no inhibitors have been reported, we can make prospective predictions on the relative ease or difficulty of developing potent small molecule inhibitors.

We developed a suite of programs to quantify the binding free-energy of combinations of hydration sites from protein surfaces through a combinatorial search. We process information from molecular dynamics (MD) simulation and convert the locations and interaction energies of explicit water molecules into their statistical representation over time, referred to as hydration sites [30]. We then identify which combination of hydration sites is most weakly bound by summing the binding energies of hydration sites in each combination. Once determined, such a hot spot (the location and predicted ligandability) on the surface of one protein can be compared with hot spots on the other proteins. In this work we present details of the procedure for generating individual hydration site and assessing combinations of hydration sites as well as comparing the predicted and experimentally available results for BRDs. We have also compiled a list of the maximal reported affinities for each BRD, reported as Kd values measured using isothermal calorimetry (ITC), and discuss the potential for improvement in each case as well as the potential of BRD family members with no reported data. We hope that this tool will prove useful, quite early in the drug discovery process, in distinguishing proteins that are ligandable from those where developing a potent small molecule inhibitor will be challenging.

It is important to state that analysis of hydration sites around a protein only looks at part of the story for small molecule binding. Obviously, one must also consider direct interaction of a small molecule with the protein and the ligand in solution. In a hydrophobic pocket, solvent molecules can be displaced from the surface of the protein with little cost and the corresponding non-polar interactions formed with the ligand are of modest size. However, strongly bound water molecules near polar protein groups are much harder to displace, but the corresponding polar interactions formed with the ligand allow much greater energetic compensation. This work does not consider these compensation effects, but relies on the relationship between surface hydration sites and ligand displaceability established previously [30].

Methods

We identify the location of the combination of hydration sites with the weakest binding affinity with the ligand-binding hotspot and the summed binding free energy of the hydration sites with the potential for ligandability [30]. The procedure of locating such hotspots on the surface of a protein is performed in two steps. Firstly, hydration sites around the protein are generated from MD simulation. Secondly, combinations of these hydration sites are compared to identify the combination with the weakest total binding free energy. The method of generating hydration sites and calculating binding free energies using IFST has been reported before [30, 5759]. However, the evaluation of all possible combinations of hydration is reported here for the first time. Details of both steps of the method are given below. In terms of computer time, the first stage of hydration site analysis can be completed within 24 h on a 16 CPU machine and the second stage of assessing all combinations can be completed within a few minutes or a few days on a 4 CPU machine, depending on the number of hydration sites in total and the number of hydration sites per combination. We apply the method to the BRD family. The list of BRDs whose structures have been processed are given in tables 1 and 2.

Table 1. The values of ΔGopt for combinations of 18 hydration sites and the best values for Kd measured by ITC.

BRD ΔGopt (N  =  18) (kcal mol−1) Max pKd Max Kd (ITC) (nM) Ligand MW Kd ref.
ATAD2 −38.05 5.82 1500 350.40 [83]
BRPF1 −28.06 7.89 13 380.45 [84]
BRD4(1) −26.83 7.59 26 423.17 [85]
TIF1α −26.35 7.51 31 643.00 [86]
BRD3(1) −24.91 7.72 19 423.17 [85]
BRD1 −24.87 6.97 108 383.42 [87]
EP300 −24.53 7.42 38 508.20 [88]
BAZ2B −24.22 6.87 136 371.00 [89]
CREBBP −24.16 7.68 21 508.20 [88]
BRD4(2) −23.93 7.60 25 414.50 [90]
BRD3(2) −23.77 7.72 19 423.17 [85]
SMARCA4 −22.94 7.05 89 321.40 [91]
BAZ2A −22.78 6.96 109 357.42 [92]
BRD2(1) −22.44 7.34 46 423.17 [85]
PBRM1(5) −21.66 7.32 48 321.40 [91]
BRD2(2) −21.53 7.28 52 423.17 [85]
BPTF −21.50 5.55 2800 187.00 [93]
CECR2 −19.63 7.10 80 495.70 [94]
BRD9 −19.07 7.85 14 353.40 [95]
BRD7 −16.25 6.62 239 353.40 [95]

Table 2. The best values for ΔGopt of combinations of 18 hydration sites for BRDs whose values for Kd measured by ITC were not reported.

BRD ΔGopt (N  =  18) (kcal mol−1) BRD ΔGopt (N  =  18) (kcal mol−1)
PBRM1(1) −33.66 PBRM1(2) −23.65
PHIP(2) −26.93 TAF1(1) −23.40
ATAD2B −26.49 PBRM1(3) −23.29
ASH1L −26.17 KAT2B −22.50
KAT2A −24.73 TAF1L(2) −21.10
SP100 −24.57 PBRM1(4) −19.87
SMARCA2 −24.56 TAF1(2) −19.60

Protein structure preparation

To begin, downloaded protein structures from the Protein Databank [60] were prepared for analysis. We selected only apo protein structures for this analysis. We did not select crystal structures in complex with small molecules, because we are interested in applying the method to potential drug targets lacking existing inhibitors. Downloaded structures were initially processed by Schrodinger's Preparation Wizard [61] by changing seleno-methionines to methionines, adding side chains when they were missing, checking for orientations of the asparagine, glutamine and histidine residues, checked for protonation state of all ionizable residues and removing all metal, buffer and solvent molecules. Proteins were solvated by SOLVATE4 and we used the TIP4P-2005 water model [62]. Overall neutrality of the system was preserved by adding chloride and sodium ions placed at least 10.0 Å from the protein.

Molecular dynamics simulations

Prepared structures were then subjected to MD simulation in water. Simulations were performed using NAMD version 2.9 [63]. Force field parameters were from the CHARMM27 force field [64, 65] with heavy-atom restraints of 1.0 kcal mol−1 Å−2 for the protein and ions. Equilibration was performed for 1.0 ns in an NPT ensemble at 300 K using Langevin temperature control [66]. It was checked that the system was equilibrated by verifying that energy fluctuations were stable. MD simulations were performed using a time step of 2.0 fs, saving coordinates every 200 steps (400 fs), for a total run of 3.6 ns. Throughout the equilibration and dynamics runs, electrostatic interactions were modeled with a uniform dielectric and a dielectric constant of 1.0. Van der Waals interactions were truncated at 11 Å with switching from 9 Å. The electrostatic model was the particle mesh Ewald method [67]. The entire systems, protein plus solvent, was treated using rhombic dodecahedral periodic boundary conditions which extended at least 10 Å from the protein surface.

Hydration site identification

Since water molecules tend to cluster at protein surfaces and concentrate at distinct locations [57, 69], IFST is usefully applied to small subvolumes referred to as hydration sites [3134, 7074]. Hydration sites are then individual spherical regions with a high number density of water. Individual hydration sites are identified by clustering water molecules from the MD simulation. In this case, the clustering procedure was applied to a radius of 16 Å around the BRD binding site with a hydration site radius of 1.2 Å. Hydration sites were determined iteratively, starting with the water molecule in the region of highest water density. No hydration site was selected that was 2.4 Å within of an already selected hydration site. Detailed procedures for clustering have been described previously [30, 57].

IFST calculations

Trajectories from the simulation were then analyzed using IFST to obtain the of hydration site free energies. IFST calculations are performed on simulation data from equilibrium MD trajectories and thus the chemical potential of water is constant throughout the solution. The free energy change calculated by IFST is the solvation free energy of the solute. An IFST calculation on the whole system volume yields the total solvation free energy. An IFST calculation on a hydration site yields the contribution of water molecules in that hydration site to the solvation free energy. In the context of IFST, the end states are a system of bulk water with N solvent particles and a solution with N solvent particles. Thus, the calculations can be considered to capture the perturbing effect of the solute on the water molecules [58, 59, 68]. The binding free energy of a hydration site (ΔGIFST) is defined as the difference in free energy between the water molecules in the presence of the protein and the same number of water molecules in pure bulk solution. Unlike alchemical free-energy calculations such as double decoupling, IFST calculations never consider a state with an artificial cavity. ΔGIFST is calculated from the energetic (ΔEIFST) and entropic (ΔSIFST) contributions to free energy:

Equation (1)

ΔEIFST is the contribution of water molecules in a hydration site to the total hydration energy and can be calculated from average solute–water (Esw), water–water (Eww), and bulk solution (Ebulk) energies:

Equation (2)

ΔSIFST is the contribution of water molecules in a hydration site to the total hydration entropy. The entropy is based on a multiparticle expansion of the molecular distribution functions [75, 76]. IFST is based on the two-particle approximation and thus the entropy expansion is truncated at the two-particle level [58]. We use a truncation of Wallace's expansion for the entropy in the canonical ensemble [77]. ΔSIFST can then be calculated from solute–water (Ssw), water–water (Sww), and bulk solution (Sbulk) entropy terms:

Equation (3)

It is important to note that the excess energy and the excess entropy of solution depend on the ensemble, but that the ensemble dependent terms cancel in the expression for the solute chemical potential. Thus, the chemical potential is equal for insertion at constant volume and constant pressure [78, 79]. We have shown previously that Sww is negligible for ordered water molecules surrounding a protein and thus we set this term to zero [80]. We exploit a k-nearest neighbours approach to calculate Ssw using the first nearest neighbour (k  =  1) in all cases [29, 72, 80]:

Equation (4)

Equation (5)

Equation (6)

Equation (7)

R is the gas constant, F is the number of frames sampled, Vi is the volume of the system in frame i, and γ is Euler's constant, which corrects for the asymptotic bias. The nearest neighbour distance in translational space (dtrans) between two water molecules is the Euclidean norm between the Cartesian coordinates of water molecule j in frame i and its nearest neighbour water molecule k in frame l. The nearest neighbour distance in orientational space (dorient) between two water molecules is the distance between the rotations required to bring the two orientations to the same reference orientation. The correct distance metric for the rotation group is twice the geodesic distance on the unit sphere. This quaternion representations of the rotations for water molecule j in frame i and its nearest neighbour water molecule k in frame l are denoted by qij and qkl. The total solute–water distance (dtotal) is estimated by combining the translational and orientational distances.

For an individual hydration site, ΔGIFST is a predominantly negative number between 0 and  −15.0 kcal mol−1 [80] and tends to zero with increasing distance from the protein (e.g. for water molecules 7–8 Å from the protein) where interactions become more like those in bulk solvent. For the TIP4P-2005 water model ΔEbulk is  −11.57 kcal mol−1 and ΔSbulk is 15.5097 cal K−1 mol−1 [80]. Hydration sites free energies at the surface of CECR2 are shown in figure 2.

Figure 2.

Figure 2. Values of ΔGIFST in kcal mol−1 for hydration sites above the surface of the binding site of CECR2 (PDB ID 3NXB).

Standard image High-resolution image

For each protein, 9000 snapshots from molecular dynamics runs were processed in order to obtain ΔGIFST for the hydration sites.

Generation of combinations of hydration sites

Combinations of interest are those that include hydration sites on the surface of the protein. Therefore, the cut off distance was applied at 3.6 Å from the surface of the protein (using heavy atom to heavy atom distance) and was extended to 4.1 Å when the hydration site was over a hydrophobic residue. The reason for the extended cut off was that hydration sites are found at a further distance from the protein for hydrophobic residues. Choosing combinations of hydration sites is in principle very simple (i.e. take combinations of hydration sites like those in figure 2). However, the technical implementation of generating, evaluating, and storing such combinations is a challenging problem. Small proteins like BRDs have a surface covered by about 2000 hydration sites (by the definition above). If one is to take all possible combinations of combinations of hydration sites then the number of combinations is in excess of 1040, which is not possible to process. This number can be significantly smaller if one only considers combinations that enclose hydration sites within a contiguous volume (since a single small molecule inhibitor must be contiguous). In order to avoid generating all possible combinations and then checking if each combination of hydration sites can be enclosed by a contiguous volume, a different approach was taken. Prior to generation of combinations, every hydration site was evaluated for the number of its neighboring hydration sites. The cut off distance for two hydration sites to be considered neighbors was taken to be 3.38 Å. This distance is a reflection of the hydration site being a statistical representation of water molecules. In figure 3 it is shown that this distance is the cut off for O–O distance in the first solvation shell for TIP4P-2005 water model and that at this distance two hydration sites cannot be considered neighbors since the third one can be placed in-between them (which happens because hydration sites are never closer than 2.4 Å from each other). The consequence of this algorithm was that the resulting combinations were constituted of hydration sites that each had at least one neighbor (which guaranteed that a combination constituted a contiguous volume). For BRDs, the total number of such combinations did not exceed 1014. The number is still large yet cannot be smaller because in the search for the hot spot it is important not to miss a single combination. The reason for such exhaustiveness is to mimic proteins' ability to have only one hot spot for the substrate among all possible spots on its surface. The method must then identify this site.

Figure 3.

Figure 3. The distance cutoff between two neighboring hydration sites is 3.38 Å. Two hydration sites cannot be considered neighbors if there is the third hydration site between them.

Standard image High-resolution image

One can perform the search for combinations containing any number of hydration sites. Combinations of 4–6 hydration sites would be more appropriate for screening of fragments, while combinations of 30–40 hydration sites would correspond to small peptides. BRDs were first investigated using combinations from 4 to 20 hydration sites. To identify an appropriate number of hydration sites for binding of a drug-like molecule, we calculated the volumes of hydration site combinations of increasing size and compared them with the volume of a molecule of molecular weight 500 au (figure 4). The linear correlation between the molecular weight and the volume of a drug (R2  =  0.85) yielded an average volume of 1414 A3 for a molecule of molecular weight 500 au. The number of hydration sites per combination that resulted in the volume closest to that size was for a combination with 18 hydration sites. Thus, we use combinations of 18 hydration sites for the final analysis reported here, since we estimate that this is the number of water molecules that a molecule of molecular weight 500 au would displace on average. It is interesting to note that a linear correlation between molecular weight and surface accessible solvent area has been reported previously [9].

Figure 4.

Figure 4. A plot of the molecular weight against the volume enclosed by the solvent accessible surface area for 250 000 structures from the BindingDB [81]. Molecules are shown as black dots. An approximation of the distribution of volumes available to combinations of hydration sites of a given size is shown in blue (for 17, 18 and 19 hydration sites).

Standard image High-resolution image

The binding free energy of every combination (ΔGsum) was calculated as the sum of free energies for the 18 constituent hydration sites. ΔGsum for a set of N hydration sites is the contribution of water molecules in these N hydration site to the hydration free energy of the protein. The binding free energy of the optimal combination (ΔGopt) is then the maximal value of ΔGsum amongst all combinations.

Equation (8)

Equation (9)

Results

For each of 34 BRDs whose structures we studied (165 protein chains), we enumerated all possible combinations of 18 contiguous hydration sites. In each case, the combination with the weakest binding energy is referred to as the hot spot. Thus, every protein has a hot spot, but the question is how ligandable is it. As an illustration, the hot spot for the BRD of CECR2 is shown in figure 5 (green surface in the left-hand panel). Importantly, a correspondence between the locations of the predicted hot spots (blue spheres in the right-hand panel of figure 5), and the actual Kac binding sites (shown in figure 1) was observed for all 165 structures. This observation provides a basic qualitative validation of the approach.

Figure 5.

Figure 5. The predicted hot spot (green surface enclosing a combination of 18 hydration sites) for CECR2 from PDB ID 3NXB (left) and superimposed hot spots (blue; 18 hydration sites per combination) for all 165 chains from 34 BRD structures (right).

Standard image High-resolution image

We calculated ΔGopt for each of the BRDs and compared this with the maximum reported pKd value obtained by ITC [48, 82]. This data is summarized in table 1.

A graphical representation of the data can be seen in figure 6.

Figure 6.

Figure 6. Comparison between predicted ligandability calculated as ΔGopt (kcal mol−1) for combinations of size N  =  18 (circles) and the maximum observed pKd values measured by ITC (squares).

Standard image High-resolution image

The majority of BRDs are predicted to have similar ligandability and correspondingly most BRDs have experimental Kd values in the range of 10–100 nM. However, BPTF and BRD7 (green circles and squares) are different, having experimental Kd values greater than 100 nM. Based on the calculated ΔGopt, the prediction is that more potent inhibitors could be developed for these BRDs without significant difficulty. In particular, BRD7 is predicted to be a highly ligandable target. Conversely, ATAD2 (red circle and square) is predicted to have a relatively low ligandability and the prediction is that significant effort will be required to develop inhibitors approaching 10 nM. As a product of this work, we offer a dataset of maximal pKd values, hoping it will serve as a community resource in drug discovery efforts focused on BRDs (www.tcm.phy.cam.ac.uk/~sv375/BRDs/table/Kd_for_BRDs.html). We also considered 14 BRDs that have protein structures deposited in the PDB but do not have reported Kd values from ITC data. This data is summarized in table 2.

A graphical representation of the data can be seen in figure 7.

Figure 7.

Figure 7. Predictions for ligandability of BRDs with no Kd values measured by ITC (orange circles) and the correspondence for BRDs whose Kd values are known (dotted line).

Standard image High-resolution image

The trend observed for BRDs that do have reported Kd values from ITC data (figure 7 dotted line) suggests that identifying inhibitors with Kd in the range of 10–100 nM will not be a significant challenge for most other BRDs. However, four of these BRDs are notably different. PBRM1(1) is predicted to be a relatively difficult target (figure 7 red region), whereas TAF1L(2), PBRM1(4), and TAF1(2) are predicted to be a relatively ligandable targets (figure 7 green region). In addition to predicting ligandability, we also considered the five 'conserved' water molecules at the bottom of the Kac binding pocket (figure 1). Our analysis placed hydration sites that reproduced the positions of these conserved water molecules for each BRD. The ΔGIFST values are also strongly negative, suggesting that they are tightly bound. An example is shown for CECR2 in figure 8. The search for hot spots tends not to favor selection of these hydration sites.

Figure 8.

Figure 8. The crystal structure for CECR2 (PDB ID 3NXB), showing five crystallographic water molecules (dark blue) overlapping with predicted hydration sites (light blue). Hydration sites are labeled by their ΔGIFST values.

Standard image High-resolution image

Discussion

The primary technical challenge in this work is the enumeration of the trillions of viable combinations of hydration sites to identify the most weakly bound. This is illustrated for CECR2 in figure 9. The most weakly bound combination (shown in green) is located in the binding site, while the most strongly bound (red) is outside the binding site. This is the case for all 165 BRDs structures tested and is an encouraging result.

Figure 9.

Figure 9. Combinations of hydration sites on CECR2 (PDB ID 3NXB) with the lowest (green), intermediate (yellow) and highest (red) ΔGopt values. The most weakly bound combination (the hot spot) is in the Kac binding site.

Standard image High-resolution image

It is also important to discuss the assumptions, advantages, and limitations of the method. We chose combinations of 18 hydration sites because the volume of such combinations corresponds to a volume of a drug with molecular weight of 500 (figure 4). However, we studied whether the number of hydration sites influences the predicted ligandability. Trends observed with other number of hydration sites have also been computed and are presented in figure 10.

Figure 10.

Figure 10. ΔGopt for BRDs as the number of hydration sites changes (number of hydration sites N  =  4, 9, 14, 18).

Standard image High-resolution image

We observe that, in general, trends in relative ligandability (values of ΔGopt) are almost unchanged as the size of the combination changes. We observe that hot spots tend to be built around a small group of hydration sites that are very weakly bound. For example, ATAD2 always has the lowest and BRD7 always has the highest ΔGopt value. However the relative ordering does change for smaller combinations (N  <  10). The interpretation of this observation is that success in fragment screening might not always correspond to ligandability for a 500 MW inhibitor. The method described here can detect such scenarios and inform decision making in fragment-based drug design. It can also inform decision making in the hit-to-lead and lead-optimization stages of drug design. Even if a protein is ligandable and worth pursuing as a therapeutic target, major challenges remain in identifying an inhibitor in vitro and then optimizing it for use in vivo. An additional advantage of the method described here is that the results present a template for molecular design, suggesting the position, shape and characteristics of the eventual inhibitor. This information embedded in the results of the method is very useful in approaching the first of these challenges. An example is shown for BRDT in figure 11, with an inhibitor overlaying the most weakly bound combination of hydration sites. The position of the hot spot can also be used to predict when allosteric inhibition might be effective, while the 3D shape of the hot spot can be used to deduce vector poses for de novo drug design.

Figure 11.

Figure 11. Volume overlap for the predicted hot spot (purple) and the (+)-JQ1 ligand in BRDT (PDB ID 4FLP).

Standard image High-resolution image

Whilst the BRDs are adequate for testing a proof of concept, the applicability of the method to more diverse structures of protein families is yet to be established. BRDs are relatively new therapeutic target and most of those tested did not undergo systematic lead optimization campaigns. Thus, the currently available maximum Kd values are not reflective of the true potential. Testing this method against the well-studied set of protein targets [9] would therefore be a logical progression, and is the subject of our current work. It will also be important to study the compensating effect of ligand-protein interactions noted above, as this was not included in the current analysis. In addition, it would be interesting to quantitatively assess the two-particle approximation used in IFST, to develop more accurate models. However, these are useful goals, because knowing the potential of a protein to be ligandable helps focus drug discovery efforts on targets that can be inhibited effectively thus saving time, effort and money.

Conclusion

We have presented a new method capable of generating and evaluating all possible combinations of hydration sites on the surface of a protein. Based on the identification and quantification of the most weakly bound combination of hydration sites, we predicted the ligandability of a set of BRDs. The majority of BRDs are predicted to have similar ligandability. This is in line with the maximal observed Kd values determined by ITC for BRDs, which are mostly in the 10–100 nM range. The best target for binding a small molecule was predicted to be BRD9 and the worst ATAD2. This is in good agreement with the maximal observed Kd values determined by ITC for BRD9 (14 nM) and ATAD2 (1500 nM). Based on the maximal observed Kd values, it was concluded that the majority of BRDs have good inhibitors already developed. There are two notable cases where the maximal observed Kd values are lower than might be expected based on their predicted ligandability. Thus we predict that inhibitors for BPTF and BRD7 in the 10–100 nM range will be developed without significant difficulty. We also predicted the ligandability of the 14 BRDs with protein structures available in the PDB but with no Kd values determined by ITC. As with the majority of other BRDs, we expect that it will not be particularly challenging to develop inhibitors in the 10–100 nM range. However, we predict that significant effort will be required to develop inhibitors of PBRM1(1). Conversely, TAF1L(2), PBRM1(4) and TAF1(2) are expected to be highly ligandable targets. As a final result, we have assembled an experimental dataset of maximal binding affinities for the BRDs, which we hope will serve as a community resource to assist in medicinal chemistry efforts against this target family (www.tcm.phy.cam.ac.uk/~sv375/BRDs/table/Kd_for_BRDs.html).

Acknowledgments

We would like to thank S Knapp and O Fedorov for helpful discussions and the Cambridge High Performance Computing Service for technical support. Work in the DJH laboratory is supported by the Medical Research Council under grant ML/L007266/1. All calculations were performed using the Darwin Supercomputer of the University of Cambridge High Performance Computing Service (www.hpc.cam.ac.uk/) provided by Dell Inc. using Strategic Research Infrastructure Funding from the Higher Education Funding Council for England, and were funded by the EPSRC under grants EP/F032773/1 and EP/J017639/1.

Footnotes

  • SOLVATE is written by Helmut Grubmüller Theoretical Biophysics Group Institut für Medizinische Optik Ludwig-Maximilians-Universität München München Germany. Copyright © 1996–2013. All rights reserved.

Please wait… references are loading.