. Author manuscript; available in PMC: 2023 Dec 8.

Published in final edited form as: J Chem Inf Model. 2020 Oct 14;60(10):4594–4602. doi: 10.1021/acs.jcim.0c00542

Guiding Conventional Protein–Ligand Docking Software with Convolutional Neural Networks

Huaipan Jiang ¹, Mengran Fan ², Jian Wang ³, Anup Sarma ⁴, Shruti Mohanty ⁵, Nikolay V Dokholyan ⁶, Mehrdad Mahdavi ⁷, Mahmut T Kandemir ⁸

PMCID: PMC10706896 NIHMSID: NIHMS1945411 PMID: 33100014

Abstract

The high-performance computational techniques have brought significant benefits for drug discovery efforts in recent decades. One of the most challenging problems in drug discovery is the protein–ligand binding pose prediction. To predict the most stable structure of the complex, the performance of conventional structure-based molecular docking methods heavily depends on the accuracy of scoring or energy functions (as an approximation of affinity) for each pose of the protein–ligand docking complex to effectively guide the search in an exponentially large solution space. However, due to the heterogeneity of molecular structures, the existing scoring calculation methods are either tailored to a particular data set or fail to exhibit high accuracy. In this paper, we propose a convolutional neural network (CNN)-based model that learns to predict the stability factor of the protein–ligand complex and exhibits the ability of CNNs to improve the existing docking software. Evaluated results on PDBbind data set indicate that our approach reduces the execution time of the traditional docking-based method while improving the accuracy. Our code, experiment scripts, and pretrained models are available at https://github.com/j9650/MedusaNet.

Graphical Abstract

graphic file with name nihms-1945411-f0001.jpg

INTRODUCTION

Modern drug development is significantly bottlenecked by the expensive and time-consuming process of discovering biologically active compounds. It takes several years and millions of dollars to develop a new drug^1–4 and the number of new drugs brought to market by the global biotechnology and pharmaceutical industries has declined fairly steadily even with modern automated high-throughput screening technology—a phenomena known as Eroom’s law.⁵ Computational docking^6–8 provides an effective and relatively inexpensive way of identifying the lead compounds and estimating their relative binding affinity and binding mode. With computational docking, the bottleneck in the drug discovery pipeline is shifted from experimentally analyzing a limited number of known protein structures to finding small molecules that bind with high affinity to protein targets. Improving the docking accuracy of binding affinity and mode is therefore paramount for the success rate of virtual drug screening in the drug discovery pipeline.^9,10

The two key steps involved in any computational docking mechanism are sampling and scoring. Although recent research efforts have led to significant improvements in sampling and scoring algorithms,^11,12 it still remains a key challenge for improving the accuracy of docking methods toward the efficient computational screening process. The first step of the docking approach involves the sampling of the conformational space of the docking molecule. Since sampling the whole astronomical conformational space is impractical, various sampling algorithms have been explored in the past. Most well-known docking software such as ProDock,¹³ MCDOCK,¹⁴ and MedusaDock^15–19 adopt Monte Carlo (MC²⁰)-based algorithms. GOLD²¹ and AutoDock²² follow a genetic algorithm (GA²³)-based approach, which excels in finding a raw initial pose, but still fails to identify final, precise pose. Different from the GA-based methods, FlexX²⁴ employs a fragment-based method,²⁵ which has more hydrophilic hits, but limited chemical space.

Apart from sampling, the docking accuracy is predominantly dictated by the evaluation of docking energy, also widely known as scoring function.²⁶ The goal of scoring function is to estimate the free energy of the receptor–ligand complex to predict their binding affinity. The scoring function can be used to determine the binding mode and site of a ligand, predict binding affinity, and identify the potential drug leads for a given protein target. Consequently, a precise and reliable scoring function is critical to the success of any docking method.⁹ Traditional scoring functions²⁷ could be categorized to physically based^28–34 and knowledge-based.^35–39 Physically based scoring functions, designed to quantify interatomic interactions and predict the affinity of the ligand–receptor complex during molecular docking, strongly rely on force field parameterization. Accurate descriptions of interatomic interactions are computationally expensive and prohibitive for a significant virtual screening campaign. On the other hand, knowledge-based scoring functions are based on the Inverse Boltzmann statistical principle to derive the desired pairwise potentials. The limitation of this method is the difficulty to locate the reference state. The concerned framework in this work, MedusaDock,^15–18 employs a physically based score function, MedusaScore, which consists of a bonding model, a van der Waals (VDW⁴⁰) interaction model, an explicit hydrogen-bonding model, and a solvent model.

In this work, we propose a new machine learning (ML)-based scoring function to predict protein–ligand binding affinity, in contrast to traditional physically based and knowledge-based scoring functions. ML is an important approach that has contributed to multiple fields of science and technology with huge and disruptive impacts.^41–46 In particular, convolutional neural networks (CNN) have been used for protein–ligand docking in recent studies.^47–52 For example, in a recent work, AtomNet⁴⁸ trains a CNN model based on protein–ligand bind poses, which can predict how likely a certain pose is stable. In general, most existing CNN models for docking take a binding pose as the input and extract the features among the atoms in the protein–ligand complex and provide the likelihood of pose stability.

Although these CNN-based docking approaches can achieve reasonable accuracy in docking, the main bottleneck is in generating the docking poses. For example, some approaches such as AtomNet exhaustively sample all possible binding poses for each protein in DUD-E⁵³ and ChEMBL-20⁵⁴ data set, which is extremely time consuming. Another CNN-based approach KDeep⁵² utilizes the native binding pose in PDBbind⁵⁵ as input for evaluating the binding affinity but cannot work on the negative binding data. In other words, a CNN model cannot directly help toward proposing a good binding pose for a given protein–ligand complex. Instead, a CNN model can only predict whether a given pose is good for a given protein–ligand pair (instead of proposing best binding poses). Motivated by these observations, in this work, we show how to integrate a CNN model with traditional docking software to identify the best binding poses directly. This combined approach helps to improve execution time as well as docking accuracy for energy-based traditional docking software.

More specifically, we first propose a three-dimensional (3D) CNN model (MedusaNet) shown in Figure 1 to predict how likely a pose is a good binding pose. This 3D CNN model includes the MedusaDock energy as a feature. We first evaluate the performance of our CNN model on a PDBbind-derived data set. We next demonstrate the ability of our CNN model to boost the docking process and improve the docking accuracy for MedusaDock. Finally, we characterize the performance of our CNN model with different root-mean-square deviation (RMSD) thresholds.⁵⁶ We want to emphasize that our approach is general. Although we use MedusaDock to generate candidate poses, this framework can be integrated with any applicable docking software and the CNN model.

METHODS

In this section, we first discuss the data set used in the evaluation and then expound on how we design and train our CNN model to improve the performance of MedusaDock.

Data Set.

We train our CNN model based on a refined set of the PDBbind database.⁵⁵ This data set is further curated by Wang et al.¹⁶ The complexes with more than one ligand and the complex with less than two rotatable bonds were removed since those complexes lack flexibility in the ligand. After the compilation, this data set contains 3875 protein–ligand complexes with their experimentally solved 3D structures.

To feed the CNN with data for training, we first obtain the massive center of the binding site by calculating the average coordinates across all atoms of the ligand in the native state complex. After that, we crop a cube of 20 × 20 × 20 Å³ around the binding site (First step in Figure 1). All of the atoms (both protein atoms and ligand atoms) inside this cube area are retained for training. In the default case, the input grid has the resolution of 20 × 20 × 20, which means each voxel represents the atoms inside that 1 × 1 × 1 Å³ area.

To train our CNN model, we randomly select 3100 out of the 3875 proteins as training data and the remaining 775 proteins as testing data. For each protein, the data consist of three parts: (1) the ground-truth data, (2) the positive data, and (3) the negative data. The ground-truth data are the X-ray crystallographic structure of the protein–ligand complexes. For positive and negative data, we run MedusaDock for 11 attempts with different random seeds, which generate a total of 230 826 poses for training and a total of 56 098 poses for testing. We consider a pose “positive” if its RMSD is less than 2 Å from the crystallographic structure (ground-truth pose). In our computational experiments, we only include the atom types used by MedusaDock, and we have 10 distinct atom types among all generated poses in the output of MedusaDock. We distinguish each atom type from the protein and the ligand except the hydrogen. The metals are not included in our atom types. We list all atom types in Table S1. As one can expect, each training/testing data is formed as a four-dimensional (4D) tensor, where the first three dimensions indicate the spatial information (X, Y, Z) and the last dimension indicates the atom type in such small cube (1 × 1 × 1 Å³) area.

Input Grid Generation.

The 3D grid in Figure 1 is usually implemented by a 3D array, and the value of each element of the array indicates how many atoms are there in the corresponding cube of the 3D grid. As an example, let us assume that we have a 3D grid that represents the space between [−10 Å, −10 Å, −10 Å] and [10 Å, 10 Å, 10 Å] with the origin at the massive center of the binding site. We first translate the 3D space into 3D grid with the resolution of [20 × 20 × 20], where each voxel represents a volume of [1 Å × 1 Å × 1 Å]. Subsequently, we place all of the atoms within such 3D space into the 3D grid based on the 3D coordinate of the atom. More specifically, if we have an atom with the coordinate of (0.2 Å, 3.4 Å, 1.5 Å), we first calculate the index of this atom in the 3D grid (which is [10, 13, 11]). After that, we set the value of the 3D array as 1 at that entry (at the index of [10, 13, 11]). Additionally, using a 4D array, one can distinguish the types of atoms in the protein–ligand complex. The fourth dimension indicates the type of the atoms in that cube.

3D CNN Model.

Traditional energy-based docking approaches suffer from both the accuracy and efficiency perspectives. Although they usually propose several poses for each protein as the candidates based on some scoring functions, however, for different protein–ligand complexes, the threshold of the scoring function for determining whether a pose is good or not is different. Hence, to find the best binding poses, docking attempts need to be performed for a large amount of times or even exhaustively, to search all possible binding poses in the searching area to select promising best poses. It is to be noted that selecting a few poses among thousands of proposed poses is intractable. On the other hand, a CNN model can learn the patterns from the 3D grid of the poses and predict whether each pose is good or bad. Recent studies^47–51 reveal that CNN can play a good role in this binary classification task.

Our CNN model (MedusaNet) in Figure 2 takes the 3D grid of a protein–ligand complex as the input. The first several convolution layers extract the features about the bond and other interactions between nearby atoms. The following fully connected layers reorganize the features and calculate the possibility of whether this input binding pose can be a stable binding. The first convolution layer has 5 × 5 × 5 filters, while all other convolution layers contain 3 × 3 × 3 filters. This is because the first layer requires a large-scale filter to collect more spatial information. The fully connected layers have a size of 1024, and the last layer has two neurons, which collectively capture the possibility of binding or not binding. RELU activation is applied on each convolution layer and fully connected layers. A dropout layer with a dropout ratio of 0.5 is applied after each fully connected layer to reduce the probability of overfitting. The output of the CNN is a value between 0 and 1, which indicates the probability that the pose is a good binding pose. We set the cutoff threshold to 0.5, which means that the poses with output probability less than 0.5 are considered as “bad poses,” whereas others are considered to be “good poses.” Compared to previous neural networks, our proposed CNN model does not employ the pooling layers to avoid the resolution loss during the training.⁵⁷ We also employ MedusaDock energy score as a feature in our CNN model and accept it in the second fully connected layer. This is because the convolution operators cannot catch the features with high-dimensional computation or exponential calculation, while the MedusaDock energy score includes such nonlinear features.

MedusaDock.

MedusaDock starts with a stochastic rotamer library of ligands generation, then shifts and rotates the rotamers into different poses. Based on the Monte Carlo algorithm and score function, MedusaDock tries to search for the best-fit poses within the docking boundary box. During this search process, both protein side chain and ligand will be repacked to imitate the realistic protein–ligand docking procedure. Additionally, to optimize the searching algorithm and reduce the computation of score function, MedusaDock separates the search step into coarse docking and fine docking. In coarse docking, MedusaDock quickly identifies several good poses, and those poses are carefully checked during the fine docking step. To search the docking poses of a protein–ligand complex with MedusaDock, we have to keep running MedusaDock with different random seeds until we cannot find a pose with lower energy. For some proteins, we need to run MedusaDock for thousands of times, even if some good poses have already been found.

Using CNN to Improve MedusaDock.

In general, to solve the protein–ligand docking problems with CNNs, the first step is to convert the protein and ligand molecule file into a 3D coordinate representation. In other words, the 3D coordinate of each atom should be translated to the shift of a 3D grid, where each atom is placed into the corresponding cube. After the 3D/4D tensor is generated as the input for the CNN, the network applies multiple convolution layers to calculate the energy of the bonds and other interactions between the atoms. This information can be captured by the convolution operations and extracted as features. The features are accordingly employed to predict the stability of the pose by the fully connected layers. Recall that, since the 3D/4D array has the position information of each and every atom, the pose of the protein–ligand is already implied from the input tensor.

Although previous CNN-based docking work can achieve successful accuracy in the docking stability prediction problem, to our knowledge, none of the existing approaches can generate the best binding poses directly. To solve the pose prediction problem for a protein–ligand complex, some previous works use traditional energy-based docking software^58–60 to sample a large number of possible binding poses and employ CNN to predict, for each pose, whether it is a stable docking. As a result, such CNN-based approaches cannot improve the docking process explicitly. Hence, in this paper, in addition to proposing a CNN model (MedusaNet), we want to boost the execution and improve the accuracy of the pose prediction with our CNN model.

In the original MedusaDock, to find the best binding pose for a protein–ligand complex, we run MedusaDock for multiple iterations until the global minimum energy is converged.^15–18 More specifically, we record the energy for the output pose from MedusaDock and update the global minimum energy value if we find a pose with lower energy in some iteration. We keep running MedusaDock for more iterations with different random seeds until we cannot find a lower energy pose. It is believed that we have found the best binding pose when the global minimum energy has converged. This might take up to 2000 MedusaDock runs, which corresponds to extremely long execution times, for some complex proteins. To reduce the execution time, we early stop the execution process by checking the output poses with our CNN model. As shown in Figure 2, after each iteration of a MedusaDock run, we send all of the candidate poses to MedusaNet, which in turn predicts whether those poses are good docking or not. Note that the search process early terminates when MedusaNet claims that k good poses are found by MedusaDock, instead of running until global minimum energy converges. We show in the Results section that this early termination strategy reduces the execution time significantly while finding good binding poses for more proteins.

RESULTS

We evaluate the performance of our CNN model using the following steps. First, we evaluate the scoring performance of our CNN model on a PDBbind-derived data set.⁵⁵ Next, we utilize the CNN model to boost the docking process and increase the docking accuracy. Finally, we evaluate our framework by employing different thresholds to define good poses.

Evaluating the Scoring Capability of the CNN Model.

For this experiment, we compile a data set composed of 3875 protein–ligand complexes from PDBbind v_2016 by retaining the complexes that include only one ligand. Subsequently, the 3875 proteins are randomly divided into training (3100 proteins) and testing (775 proteins) sets. For each protein, we use MedusaDock to perform 11 docking attempts with 11 different random seeds. MedusaDock generates ~286 K candidate binding poses for those proteins. We calculate the root-mean-square deviation (RMSD) of all candidate poses with the native binding pose. If a pose has an RMSD value less than 2 Å, we consider it as a good pose; otherwise, we consider it as a bad pose. We train the CNN model with proteins and ligands in the native pose, good poses, and bad poses in the training set and evaluate the model with the poses of the proteins in the testing set. We applied the cross-validation and randomly split the data set for 10 times to repeat the experiment for 10 times. For all results below, we show the average, minimum, and maximum results across all 10 validations.

We show the performance of our CNN model (MedusaNet) in Table 1 and compare it against three other approaches (MedusaDock energy score, a random forest (RF) approach with 100 decision trees, and another CNN model AtomNet⁴⁸). We compare MedusaNet with a RF approach to check if any simple machine learning method can improve the accuracy. Additionally, we compare MedusaNet against AtomNet to check if our model performs better than other CNN models. AtomNet is an existing CNN model for predicting the binding stability. It takes the 3D grid (with a resolution of 20 × 20 × 20) as the input feature map, and then applies four convolution layers with the parameters of 128 × 5³, 256 × 3³, 256 × 3³, 256 × 3³, respectively (number of filters × filter size). After that, the feature map is sent to two fully connected layers with the size of 1024. We use three different evaluation metrics—accuracy, AUC, and average RMSD—in our experiments. Accuracy is calculated by dividing the total number of poses by the sum of true positives and true negatives. It indicates how many poses we predict correctly. AUC is the area under receiver-operating characteristic (ROC) curves. This curve shows the true-positive rate against the false-positive rate with different classification thresholds. A classification model with AUC = 1 is a perfect classifier, and the classifier with AUC = 0.5 is generally a random selector. RMSD is calculated with respect to the crystal structure.

Table 1.

Evaluation of MedusaNet against Other Approaches in Terms of Accuracy, AUC, and Average RMSD^a

evaluation	metric	MedusaDock	random forest	AtomNet	MedusaNet
	avg	N/A	0.964	0.741	0.855
accuracy	min	N/A	0.962	0.628	0.705
	max	N/A	0.968	0.872	0.930
	avg	0.474	0.503	0.863	0.893
AUC	min	0.462	0.500	0.849	0.868
	max	0.489	0.508	0.885	0.915
average	avg	7.366	3.843	5.421	4.813
RMSD	min	7.223	1.967	4.945	3.960
	max	7.519	6.387	5.895	5.694

Open in a new tab

We evaluate four approaches on the PDBbind test set with the binding poses of 775 proteins and applied cross-validation for 10 times. For each approach with each evaluation metric, we report the average, minimum, and maximum results across all validations.

From Table 1, we observe that MedusaNet performs better than all other approaches in terms of AUC, indicating that MedusaNet has a better pose classification capability. Both CNN models (AtomNet and MedusaNet) can generate poses with less average RMSD compared to the poses proposed by the MedusaDock. The accuracy of MedusaDock is not available since the MedusaDock energy score cannot be used to determine if a pose is good or not. Instead, it only indicates that a pose with a less MedusaDock energy score has potentially less RMSD value than a pose with a higher MedusaDock energy score for the same protein. Although the random forest approach results in good accuracy and average RMSD, it has a roughly 0.5 AUC. In fact, the random forest approach nearly predicted all poses as negative poses and only selected a few as good poses. Since most poses in the data set are bad poses, such selection will get a good accuracy while the AUC is nearly 0.5, which indicates that random forest failed in this task.

In the following sections, we show that not only could our proposed CNN models achieve better scoring capability for protein–ligand complexes, it can further be leveraged to guide the traditional energy-based docking software to be more efficient and effective.

CNN-Accelerated MedusaDock Docking with Improved Accuracy.

In MedusaDock, we perform repeated docking attempts with different random seeds until the minimum energy converges, aiming to find the global minimum energy pose. Specifically, we first perform 64 MedusaDock docking attempts at the outset with different random seeds and then record the pose with the lowest energy. Next, we perform another 64 docking attempts and check if any pose with a lower energy could be found in the 65th to 128th docking attempts. In general, we run twice more attempts of MedusaDock as long as we find a pose with a lower energy, occasionally resulting in thousands of docking attempts. After the minimum energy is converged, we select the k lowest energy poses as the docking results. In this work, we denote k as the number of poses we select for each test. This strategy to find the converged minimum energy is sometimes inefficient because (1) the number of docking attempts will burgeon significantly if the minimum energy dwindles slowly and constantly, and (2) the sampling process could halt at a local minimum, resulting in a low docking accuracy. We utilize the CNN model to circumvent these two difficulties. Specifically, CNN is used to judge if a pose is a good pose or a bad pose. MedusaDock can stop short of new docking attempts if an expected number of good poses have been generated, which will remarkably curtail the total number of docking attempts. On the other hand, CNN can force MedusaDock to explore new ligand poses if the minimum energy converges and the minimum energy pose is judged as a bad pose, thus preventing the docking procedure from being trapped in a local minimum.

We employ two CNN models (AtomNet and our MedusaNet) to guide the MedusaDock and test both models by combining them with MedusaDock through the aforementioned strategy in a data set composed of 100 randomly selected protein–ligand complexes from PDBbind testing set and compare both CNN-augmented versions of MedusaDock (MedusaDock-AtomNet, MedusaDock-MedusaNet) with the original version of MedusaDock. Recall that we randomly choose the training and testing set for 10 times in the cross-validation, the 100 randomly selected complexes are not used for model training in each validation. We observe that the original MedusaDock performs 745 docking attempts averagely among 10 cross-validation sets to achieve a convergent minimum energy (Figure 3). On the other hand, MedusaDock-AtomNet and MedusaDock-MedusaNet only need to perform 370 and 557 docking attempts, respectively, when 128 good poses are selected. If we only select eight good poses, the number of needed docking attempts can slump to 129 and 148, respectively, as compared to the original 745 attempts. Out of the 100 protein–ligand complexes, MedusaDock, MedusaDock-AtomNet, and MedusaDock-MedusaNet can generate near-native ligand pose (RMSD < 2 Å) for 44.6, 41.3, and 49.4 complexes on average, respectively, when eight poses are selected. If we select 128 poses for each complex, MedusaDock, MedusaDock-AtomNet, and MedusaDock-MedusaNet can propose at least one good pose for 67.7, 68.7, and 71.7 complexes on average, respectively. Further, for a large number of selected poses (e.g., 128), CNN-guided MedusaDock might need more attempts compared to the original MedusaDock. For example, on one validation set, MedusaDock-MedusaNet takes 1.27× attempts compared to the original MedusaDock to select 128 good poses for each complex. This result indicates that, for some proteins, performing docking attempts until the energy converges is not a sufficient condition for termination. From Figure 3a, one can also observe that the error bars of the number of docking attempts are very large, which indicates that some proteins will take many attempts of MedusaDock running while other proteins only require a few attempts. This is because the MedusaDock employs a random searching algorithm (Monte Carlo). For some proteins, MedusaDock finds good poses at early attempts, while for other proteins, the good poses are unfortunately proposed in the later attempts. Also, for some complex proteins, MedusaDock can only find a few good poses in total. As the results, we have to run much more attempts for the “Unlucky” proteins than those “lucky” proteins.

Figure 3. — Comparison between the performance of the original MedusaDock and MedusaDock guided by different CNN models. (a) Number of docking attempts performed by different MedusaDock versions. (b) Number of protein–ligand complexes that have good ligand poses generated by different versions of MedusaDock. Good poses have RMSD <2 Å and bad poses have RMSD >2 Å. The test set is composed of 100 randomly selected protein–ligand complexes from the PDBbind test set. Note that these 100 proteins are not used for training. We applied cross-validation for 10 times. The bars indicate the average results, and the error bars show the minimum/maximum results across 10 validations.

Investigating the Effect of Different RMSD Thresholds on Docking.

In the previous evaluation, we use 2 Å as the RMSD threshold to determine if a binding pose is good or not. A pose with an RMSD greater than 2 Å is considered as a bad pose, while the poses with RMSD values less than 2 Å are good poses. However, for different complexes, the RMSD threshold might be different. In this section, we evaluate the performance of our framework with different RMSD thresholds (Figure 4) and check if our approach succeeds with all RMSD thresholds. More specifically, we set the RMSD threshold to 2.00, 2.25, 2.50, 2.75, and 3.00 Å, and perform the convergent docking in the same test set composed of 100 protein–ligand complexes. Again, we applied the cross-validation for 10 times and reported the average, minimum, and maximum results across 10 validation sets for each experimental setting.

Figure 4. — Efficiency and accuracy improvement of CNN-guided MedusaDock with different RMSD thresholds. (a) Normalized number of docking attempts performed by MedusaDock guided by CNN when different RMSD thresholds are set and different number of candidate poses (k) are selected. (b) Number of successful proteins when different RMSD thresholds are set and different number of candidate poses (k) are selected. Blue refers to the number of good poses generated by the original MedusaDock, whereas orange refers to the number of good poses generated by MedusaDock guided by CNN. We applied cross-validation for 10 times. The bars show the average results across 10 validation sets and the error bars show the minimum/maximum results.

In general, for both approaches, the number of successful proteins increases as the RMSD threshold increases. For nearly all RMSD threshold and all combinations of the number of selected poses, MedusaDock-MedusaNet performs better than the original MedusaDock. The number of docking attempts for MedusaDock-MedusaNet increased as the number of selected poses increased. In most cases, MedusaDock-MedusaNet performs less number of attempts compared to the original MedusaDock except on some validation sets. Therefore, our framework achieves significant acceleration while ensuring more protein–ligand complexes find at least one good pose in the designated number of selected poses.

DISCUSSION

Traditional energy-based docking software have limited success in terms of both accuracy and execution time. In fact, existing approaches can take up to 1 week, rendering the selection of the best pose from thousands of candidate poses intractable in practice. In this work, we propose a new strategy to leverage CNN for improving both docking accuracy and efficiency. Although in this work we integrated CNN into MedusaDock for proof of concept, our proposed strategy to improve accuracy and efficiency is applicable to any docking software.

Our proposed CNN model (MedusaNet) is trained with the candidate poses generated by MedusaDock. Compared to some previous work, which only utilizes ground-truth poses, our training data also include both negative poses and positive poses. We do not employ the pooling layers in the network to avoid the resolution loss from a previous study.^50,57 Additionally, while normal CNN models can only catch the features with linear calculation, the MedusaDock energy function contains high-dimensional polynomial, exponential, or even logarithmic terms, indicating that similarity should be measured on a manifold rather than a simple Euclidean space. As a result, our 3D CNN model includes the MedusaDock energy as an input feature and receives it at the second fully connected layer.

Finally, previous CNN models cannot propose the best docking poses directly while requiring large amounts of computations for many possible poses. To address this bottleneck, we integrate our CNN model with MedusaDock to reduce the execution time. Our results indicate that the integrated framework improved the docking accuracy by helping more protein–ligand complexes find a good pose while reducing the computation of evaluating on a large amount of possible poses.

Supplementary Material

Supplement

NIHMS1945411-supplement-Supplement.pdf^{(142.5KB, pdf)}

ACKNOWLEDGMENTS

We acknowledge the support from the National Institutes for Health and the National Science Foundation. N.V.D. and J.W. are supported by National Institutes for Health (2R01 GM114015 and 1R35 GM134864 to N.V.D.) and the Passan Foundation. They are also supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant UL1 TR002014. M.T.K. is supported by NSF grants 1763681, 1629915, 1629129, 1931531, 0821527, 2040667, and 1439057. H.J. and M.F. are also supported by NSF grant 1439057. A.S. is supported by NSF grant 1955815. The content of this paper is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or NSF.

Footnotes

The authors declare no competing financial interest.

ASSOCIATED CONTENT

Supporting Information

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.0c00542.

Distinct atom types used in the CNN models; accuracy results among different algorithms for selecting different number of best candidate poses; speedup results among different algorithms for selecting different number of best candidate poses; number of attempts we run MedusaDock for different RMSD thresholds; number of proteins we successfully found, at least one good pose for different RMSD thresholds; and IDs of protein in our refined PDBbind data set (PDF)

Contributor Information

Huaipan Jiang, Department of Computer Science and Engineering, Pennsylvania State University, State College 16802, United States.

Mengran Fan, Department of Computer Science and Engineering, Pennsylvania State University, State College 16802, United States.

Jian Wang, Departments of Pharmacology and Biochemistry and Molecular Biology, Pennsylvania State College of Medicine, Hershey 17033, United States.

Anup Sarma, Department of Computer Science and Engineering, Pennsylvania State University, State College 16802, United States.

Shruti Mohanty, Department of Computer Science and Engineering, Pennsylvania State University, State College 16802, United States.

Nikolay V. Dokholyan, Departments of Pharmacology and Biochemistry and Molecular Biology, Pennsylvania State College of Medicine, Hershey 17033, United States; Departments of Chemistry and Biomedical Engineering, Pennsylvania State University, State College 16802, United States

Mehrdad Mahdavi, Department of Computer Science and Engineering, Pennsylvania State University, State College 16802, United States.

Mahmut T. Kandemir, Department of Computer Science and Engineering, Pennsylvania State University, State College 16802, United States

REFERENCES

(1).Biopharmaceutical Research & Development: The Process Behind New Medicines; PhRMA, 2015. [Google Scholar]
(2).Morgan S; Grootendorst P; Lexchin J; Cunningham C; Greyson D The cost of drug development: a systematic review. Health Policy 2011, 100, 4–17. [DOI] [PubMed] [Google Scholar]
(3).Dickson M; Gagnon JP The cost of new drug discovery and development. Discovery Med. 2009, 4, 172–179. [PubMed] [Google Scholar]
(4).Mullin R Drug development costs about $1.7 billion. Chem. Eng. News 2003, 81, No. 8. [Google Scholar]
(5).Scannell JW; Blanckley A; Boldon H; Warrington B Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev. Drug Discovery 2012, 11, 191–200. [DOI] [PubMed] [Google Scholar]
(6).Lengauer T; Rarey M Computational methods for biomolecular docking. Curr. Opin. Struct. Biol. 1996, 6, 402–406. [DOI] [PubMed] [Google Scholar]
(7).Cherkasov A; Muratov EN; Fourches D; Varnek A; Baskin II; Cronin M; Dearden J; Gramatica P; Martin YC; Todeschini R; et al. QSAR modeling: where have you been? Where are you going to. J. Med. Chem. 2014, 57, 4977–5010. [DOI] [PMC free article] [PubMed] [Google Scholar]
(8).Golbraikh A; Shen M; Xiao Z; Xiao Y-D; Lee K-H; Tropsha A Rational selection of training and test sets for the development of validated QSAR models. J. Comput.-Aided Mol. Des. 2003, 17, 241–253. [DOI] [PubMed] [Google Scholar]
(9).Wang Z; Sun H; Yao X; Li D; Xu L; Li Y; Tian S; Hou T Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys. 2016, 18, 12964–12975. [DOI] [PubMed] [Google Scholar]
(10).Low Y; Uehara T; Minowa Y; Yamada H; Ohno Y; Urushidani T; Sedykh A; Muratov E; Kuz’min V; Fourches D; et al. Predicting drug-induced hepatotoxicity using QSAR and toxicogenomics approaches. Chem. Res. Toxicol. 2011, 24, 1251–1262. [DOI] [PMC free article] [PubMed] [Google Scholar]
(11).Sousa SF; Fernandes PA; Ramos MJ Protein-ligand docking: current status and future challenges. Proteins: Struct., Funct., Bioinf. 2006, 65, 15–26. [DOI] [PubMed] [Google Scholar]
(12).Kirchmair J; Laggner C; Wolber G; Langer T Comparative analysis of protein-bound ligand conformations with respect to catalyst’s conformational space subsampling algorithms. J. Chem. Inf. Model. 2005, 45, 422–430. [DOI] [PubMed] [Google Scholar]
(13).Roisman LC; Piehler J; Trosset JY; Scheraga HA; Schreiber G Structure of the interferon-receptor complex determined by distance constraints from double-mutant cycles and flexible docking. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 13231–13236. [DOI] [PMC free article] [PubMed] [Google Scholar]
(14).Liu M; Wang S MCDOCK: a Monte Carlo simulation approach to the molecular docking problem. J. Comput.-Aided Mol. Des. 1999, 13, 435–451. [DOI] [PubMed] [Google Scholar]
(15).Ding F; Yin S; Dokholyan NV Rapid flexible docking using a stochastic rotamer library of ligands. J. Chem. Inf. Model. 2010, 50, 1623–1632. [DOI] [PMC free article] [PubMed] [Google Scholar]
(16).Wang J; Dokholyan NV MedusaDock 2.0: Efficient and Accurate Protein-Ligand Docking With Constraints. J. Chem. Inf. Model. 2019, 59, 2509–2515. [DOI] [PMC free article] [PubMed] [Google Scholar]
(17).Ding F; Dokholyan NV Incorporating backbone flexibility in MedusaDock improves ligand-binding pose prediction in the CSAR2011 docking benchmark. J. Chem. Inf. Model. 2013, 53, 1871–1879. [DOI] [PubMed] [Google Scholar]
(18).Nedumpully-Govindan P; Jemec DB; Ding F CSAR Benchmark of Flexible MedusaDock in Affinity Prediction and Nativelike Binding Pose Selection. J. Chem. Inf. Model. 2016, 56, 1042–1052. [DOI] [PubMed] [Google Scholar]
(19).Yin S; Biedermannova L; Vondrasek J; Dokholyan NV MedusaScore: an accurate force field-based scoring function for virtual drug screening. J. Chem. Inf. Model. 2008, 48, 1656–1662. [DOI] [PMC free article] [PubMed] [Google Scholar]
(20).Hammersley J Monte Carlo Methods; Springer Science & Business Media, 2013. [Google Scholar]
(21).Verdonk ML; Cole JC; Hartshorn MJ; Murray CW; Taylor RD Improved protein··Cligand docking using GOLD. Proteins: Struct., Funct., Bioinf. 2003, 52, 609–623. [DOI] [PubMed] [Google Scholar]
(22).Morris GM; Huey R; Lindstrom W; Sanner MF; Belew RK; Goodsell DS; Olson AJ AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 2009, 30, 2785–2791. [DOI] [PMC free article] [PubMed] [Google Scholar]
(23).Whitley D A genetic algorithm tutorial. Statistics and Computing 1994, 4, 65–85. [Google Scholar]
(24).Kramer B; Rarey M; Lengauer T Evaluation of the FLEXX incremental construction algorithm for protein··Cligand docking. Proteins: Struct., Funct., Bioinf. 1999, 37, 228–241. [DOI] [PubMed] [Google Scholar]
(25).Pellecchia M Fragment-based drug discovery takes a virtual turn. Nat. Chem. Biol. 2009, 5, 274–275. [DOI] [PubMed] [Google Scholar]
(26).Jain AN Scoring functions for protein-ligand docking. Curr. Protein Pept. Sci. 2006, 7, 407–420. [DOI] [PubMed] [Google Scholar]
(27).Li J; Fu A; Zhang L An overview of scoring functions used for protein-ligand interactions in molecular docking. Interdiscip. Sci.: Comput. Life Sci. 2019, 11, 320–328. [DOI] [PubMed] [Google Scholar]
(28).Yang Y; Lightstone FC; Wong SE Approaches to efficiently estimate solvation and explicit water energetics in ligand binding: the use of WaterMap. Expert Opin. Drug Discovery 2013, 8, 277–287. [DOI] [PubMed] [Google Scholar]
(29).Michel J; Tirado-Rives J; Jorgensen WL Prediction of the water content in protein binding sites. J. Phys. Chem. B 2009, 113, 13337–13346. [DOI] [PMC free article] [PubMed] [Google Scholar]
(30).Ross GA; Morris GM; Biggin PC Rapid and accurate prediction and scoring of water molecules in protein binding sites. PLoS One 2012, 7, No. e32036. [DOI] [PMC free article] [PubMed] [Google Scholar]
(31).Uehara S; Tanaka S AutoDock-GIST: Incorporating thermodynamics of active-site water into scoring function for accurate protein-ligand docking. Molecules 2016, 21, No. 1604. [DOI] [PMC free article] [PubMed] [Google Scholar]
(32).Kumar A; Zhang KY Investigation on the effect of key water molecules on docking performance in CSARdock exercise. J. Chem. Inf. Model. 2013, 53, 1880–1892. [DOI] [PubMed] [Google Scholar]
(33).Sun H; Li Y; Li D; Hou T Insight into crizotinib resistance mechanisms caused by three mutations in ALK tyrosine kinase using free energy calculation approaches. J. Chem. Inf. Model. 2013, 53, 2376–2389. [DOI] [PubMed] [Google Scholar]
(34).Chaskar P; Zoete V; Röhrig, U. F. On-the-fly QM/MM docking with attracting cavities. J. Chem. Inf. Model. 2017, 57, 73–84. [DOI] [PubMed] [Google Scholar]
(35).Muegge I; Martin YC A general and fast scoring function for protein- ligand interactions: a simplified potential approach. J. Med. Chem. 1999, 42, 791–804. [DOI] [PubMed] [Google Scholar]
(36).Gohlke H; Hendlich M; Klebe G Knowledge-based scoring function to predict protein-ligand interactions. J. Mol. Biol. 2000, 295, 337–356. [DOI] [PubMed] [Google Scholar]
(37).Velec HF; Gohlke H; Klebe G DrugScoreCSD knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J. Med. Chem. 2005, 48, 6296–6303. [DOI] [PubMed] [Google Scholar]
(38).Neudert G; Klebe G DSX: a knowledge-based scoring function for the assessment of protein-ligand complexes. J. Chem. Inf. Model. 2011, 51, 2731–2745. [DOI] [PubMed] [Google Scholar]
(39).Yang C-Y; Wang R; Wang S M-score: a knowledge-based potential scoring function accounting for protein atom mobility. J. Med. Chem. 2006, 49, 5903–5911. [DOI] [PubMed] [Google Scholar]
(40).Feinberg G; Sucher J General theory of the van der Waals interaction: A model-independent approach. Phys. Rev. A 1970, 2, 2395. [Google Scholar]
(41).Sternberg MJ; King RD; Lewis RA; Muggleton S Application of machine learning to structural molecular biology. Philos. Trans. R. Soc., B 1994, 344, 365–371. [DOI] [PubMed] [Google Scholar]
(42).Behler J Perspective: Machine learning potentials for atomistic simulations. J. Chem. Phys. 2016, 145, No. 170901. [DOI] [PubMed] [Google Scholar]
(43).Biswas R; Blackburn L; Cao J; Essick R; Hodge KA; Katsavounidis E; Kim K; Kim Y-M; Le Bigot E-O; Lee C-H; et al. Application of machine learning algorithms to the study of noise artifacts in gravitational-wave data. Phys. Rev. D 2013, 88, No. 062003. [Google Scholar]
(44).Sinclair C; Pierce L; Matzner S An application of machine learning to network intrusion detection. Proceedings 15th Annual Computer Security Applications Conference (ACSAC’99) 1999, 371–377. [Google Scholar]
(45).Zheng W; Tropsha A Novel variable selection quantitative structure- property relationship approach based on the k-nearest-neighbor principle. J. Chem. Inf. Comput. Sci. 2000, 40, 185–194. [DOI] [PubMed] [Google Scholar]
(46).Popova M; Isayev O; Tropsha A Deep reinforcement learning for de novo drug design. Sci. Adv. 2018, 4, No. eaap7885. [DOI] [PMC free article] [PubMed] [Google Scholar]
(47).Ballester PJ; Mitchell JB A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 2010, 26, 1169–1175. [DOI] [PMC free article] [PubMed] [Google Scholar]
(48).Wallach I; Dzamba M; Heifets A AtomNet: a deep convolutional neural network for bioactivity prediction in structurebased drug discovery, arXiv:1510.02855. arXiv.org e-Print archive. https://arxiv.org/abs/1510.02855 (submitted Oct 10, 2015). [Google Scholar]
(49).Pereira JC; Caffarena ER; dos Santos CN Boosting docking-based virtual screening with deep learning. J. Chem. Inf. Model. 2016, 56, 2495–2506. [DOI] [PubMed] [Google Scholar]
(50).Ragoza M; Hochuli J; Idrobo E; Sunseri J; Koes DR Protein-ligand scoring with convolutional neural networks. J. Chem. Inf. Model. 2017, 57, 942–957. [DOI] [PMC free article] [PubMed] [Google Scholar]
(51).Wójcikowski M; Ballester PJ; Siedlecki P Performance of machine-learning scoring functions in structure-based virtual screening. Sci. Rep. 2017, 7, No. 46710. [DOI] [PMC free article] [PubMed] [Google Scholar]
(52).Jiménez J; Skalic M; Martinez-Rosell G; De Fabritiis G. K deep: Protein-ligand absolute binding affinity prediction via 3dconvolutional neural networks. J. Chem. Inf. Model. 2018, 58, 287–296. [DOI] [PubMed] [Google Scholar]
(53).Mysinger MM; Carchia M; Irwin JJ; Shoichet BK Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 2012, 55, 6582–6594. [DOI] [PMC free article] [PubMed] [Google Scholar]
(54).Mendez D; Gaulton A; Bento AP; Chambers J; De Veij M; Félix E; Magariños MP; Mosquera JF; Mutowo P; Nowotka M; et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019, 47, D930–D940. [DOI] [PMC free article] [PubMed] [Google Scholar]
(55).Liu Z; Su M; Han L; Liu J; Yang Q; Li Y; Wang R Forging the basis for developing protein-ligand interaction scoring functions. Acc. Chem. Res. 2017, 50, 302–309. [DOI] [PubMed] [Google Scholar]
(56).Yusuf D; Davis AM; Kleywegt GJ; Schmitt S An alternative method for the evaluation of docking performance: RSR vs RMSD. J. Chem. Inf. Model. 2008, 48, 1411–1422. [DOI] [PubMed] [Google Scholar]
(57).Sieg J; Flachsenberg F; Rarey M In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. J. Chem. Inf. Model. 2019, 59, 947–961. [DOI] [PubMed] [Google Scholar]
(58).Koes DR; Baumgartner MP; Camacho CJ Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J. Chem. Inf. Model. 2013, 53, 1893–1904. [DOI] [PMC free article] [PubMed] [Google Scholar]
(59).Hendlich M; Rippmann F; Barnickel G LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graphics Modell. 1997, 15, 359–363. [DOI] [PubMed] [Google Scholar]
(60).Trott O; Olson AJ AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

NIHMS1945411-supplement-Supplement.pdf^{(142.5KB, pdf)}

PERMALINK

Guiding Conventional Protein–Ligand Docking Software with Convolutional Neural Networks

Huaipan Jiang

Mengran Fan

Jian Wang

Anup Sarma

Shruti Mohanty

Nikolay V Dokholyan

Mehrdad Mahdavi

Mahmut T Kandemir