[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Sub-Quantum Thermodynamics as a Basis of Emergent Quantum Mechanics
Previous Article in Journal / Special Issue
In Defense of Gibbs and the Traditional Definition of the Entropy of Distinguishable Particles
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entropy and Free Energy of a Mobile Loop Based on the Crystal Structures of the Free and Bound Proteins

by
Mihail Mihailescu
and
Hagai Meirovitch
*
Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, 3059 BST3, Pittsburgh, PA 15260, USA
*
Author to whom correspondence should be addressed.
Entropy 2010, 12(8), 1946-1974; https://doi.org/10.3390/e12081946
Submission received: 12 June 2010 / Revised: 2 August 2010 / Accepted: 17 August 2010 / Published: 25 August 2010
(This article belongs to the Special Issue Configurational Entropy)
Figure 1
<p>An illustration of the HSMD reconstruction process of conformation <span class="html-italic">i</span> of a peptide consisting of three glycine residues, where for simplicity the oxygens and most of the hydrogens are discarded. At step <span class="html-italic">k</span>’ = 6 the TPs related to the “past” atoms <span class="html-italic">k</span>’ = 1…5 (depicted by full spheres) have already been determined and these atoms are kept fixed in their positions at conformation <span class="html-italic">i</span>. At this step (6) one calculates the TPs of bond angle α<span class="html-italic"><sub>k</sub></span> (defined by C’-Cα-N) and dihedral angle α<sub><span class="html-italic">k</span>−1</sub> (defined by C’-Cα-N-C’) which are related to C’, where <span class="html-italic">k</span> = 2<span class="html-italic">k</span>’ and thus <span class="html-italic">k</span> = 12. The TPs are obtained from an MD simulation where the as yet unreconstructed atoms <span class="html-italic">k</span>’ = 6…10 (the “future” atoms) are moved (depicted by empty spheres connected by dashed lines) while the past atoms <span class="html-italic">k</span>’ = 1…5 are kept fixed; notice that the future part should remain within the limits of the microstate and future-past interactions are taken into account. Small bins δα<sub><span class="html-italic">k</span>−1</sub> and δα<span class="html-italic"><sub>k</sub></span> are centered at the values α<sub><span class="html-italic">k</span>−1</sub> and α<span class="html-italic"><sub>k</sub></span> in <span class="html-italic">i</span>. The TP is calculated from the number of visits (<span class="html-italic">n</span><sub>visit</sub> Equation 11) of the future part to δα<sub><span class="html-italic">k</span>−1</sub> and δα<span class="html-italic"><sub>k</sub></span> <span class="html-italic">simultaneously</span> during the simulation. After <math display="inline"> <semantics> <mrow> <msup> <mtext>ρ</mtext> <mrow> <mtext>HS</mtext> </mrow> </msup> <mo stretchy="false">(</mo> <msub> <mi>α</mi> <mrow> <mn>11</mn> </mrow> </msub> <msub> <mi>α</mi> <mrow> <mn>12</mn> </mrow> </msub> <msub> <mrow> <mrow> <mo>|</mo> <mi>α</mi> </mrow> </mrow> <mrow> <mn>10</mn> </mrow> </msub> <mo>,</mo> <mo>⋯</mo> <mo>,</mo> <msub> <mi>α</mi> <mn>1</mn> </msub> <mo stretchy="false">)</mo> </mrow> </semantics> </math> (Equation 11) has been determined the coordinates of C’ (and O) are fixed at their positions in <span class="html-italic">i</span>, <span class="html-italic">i.e.</span>, they become “past” atoms and the process continues.</p> ">
Versions Notes

Abstract

:
A mobile loop changes its conformation from “open” (free enzyme) to “closed” upon ligand binding. The difference in the Helmholtz free energy, ΔFloop between these states sheds light on the mechanism of binding. With our “hypothetical scanning molecular dynamics” (HSMD-TI) method ΔFloop = FfreeFbound where Ffree and Fbound are calculated from two MD samples of the free and bound loop states; the contribution of water is obtained by a thermodynamic integration (TI) procedure. In previous work the free and bound loop structures were both attached to the same “template” which was “cut” from the crystal structure of the free protein. Our results for loop 287−290 of AcetylCholineEsterase agree with the experiment, ΔFloop~ −4 kcal/mol if the density of the TIP3P water molecules capping the loop is close to that of bulk water, i.e., Nwater = 140 − 180 waters in a sphere of a 18 Å radius. Here we calculate ΔFloop for the more realistic case, where two templates are “cut” from the crystal structures, 2dfp.pdb (bound) and 2ace.pdb (free), where Nwater = 40 − 160; this requires adding a computationally more demanding (second) TI procedure. While the results for Nwater ≤ 140 are computationally sound, ΔFloop is always positive (18 ± 2 kcal/mol for Nwater = 140). These (disagreeing) results are attributed to the large average B-factor, 41.6 of 2dfp (23.4 Å2 for 2ace). While this conformational uncertainty is an inherent difficulty, the (unstable) results for Nwater = 160 suggest that it might be alleviated by applying different (initial) structural optimizations to each template.

1. Introduction

In recent years we have developed a new simulation method for calculating the absolute entropy, S and the absolute Helmholtz free energy, F called the hypothetical scanning Monte Carlo (HSMC) (or HSMD), where molecular dynamics (MD) is used [1,2,3,4,5,6,7]. HSMC(D) has been applied to systems of increasing complexity, where the most recent ones have been mobile loops in proteins [8,9,10]. Typically, a mobile loop has an “open” conformation in the unbound protein, which is changed to a “closed” conformation upon ligand binding, i.e., the loop becomes a “lid” which covers the bound ligand. In these studies the main focus has been on entropy and free energy differences, ΔFloop between these two loop conformations in the bound and free proteins, rather than on the absolute values themselves. (As discussed later, ΔFloop not only sheds light on the mechanism of ligand binding, but in some cases it is an (ignored) component of the absolute free energy of binding.) In the present work we develop HSMD further, extending its applicability to more complex models, in particular to more complex models of loops, as explained below.
Before describing HSMC(D) and its planned enhancements in detail, it should be emphasized that calculation of S and F still constitutes a central problem in computer simulation, in spite of the significant progress achieved in the last 50 years [11,12,13,14,15,16,17,18,19,20,21,22,23]. This problem is in particular severe in structural biology due to the flexibility and strong long-range interactions characterizing bio-macromolecules such as proteins. More specifically, the potential energy surface of a protein, E(x) is rugged (x is the 3N-dimensional vector of the Cartesian coordinates of the molecule’s N atoms), i.e., this surface is “decorated” by a tremendous number of localized wells and “wider” ones, defined over regions, Ωm (called microstates) each consisting of many localized wells. A microstate Ωm, (e.g., the α-helical region of a peptide) which typically constitutes only a tiny part of the entire conformational space, Ω, can be represented by a sample (trajectory) generated by a local MD [24,25] simulation. A molecule will visit a localized well only for a very short time [several femtoseconds (fs)] while staying much longer within a microstate [26,27] which is thus of a greater physical significance (for further discussions about microstates and their problematic definition in simulations, see [8,9]).
Typically one is interested in calculating the free energy of the most stable microstates rather than calculating the total free energy (of Ω). Thus, flexible protein segments (e.g., sidechains and surface loops), cyclic peptides and ligands bound to proteins can populate significantly several Ωm in thermodynamic equilibrium, which should be identified and their populations, pm = exp[–Fm/kBT] calculated (kB is Boltzmann constant and T is the absolute temperature). In particular, identifying the global free energy minimum of a protein is the daunting task of protein folding.
As stated above, we are primarily interested in entropy and free energy differences, which are commonly calculated by thermodynamic integration (TI) techniques where the integration can be carried out over physical quantities such as the energy, temperature, and the specific heat, as well as over non-thermodynamic parameters [free energy perturbation (FEP), and histogram analysis methods are also included in this category] [11,12,13,14,15,16,17,18,19,20,21,22,23]. While TI is a robust and highly versatile approach, it has some limitations; for example, if one seeks to calculate ΔFm,n = ∫dF between two microstates m and n with significant structural variance the integration from m to n becomes difficult and for large molecules unfeasible. This drawback of TI can be overcome to a large extent by methods that provide the absolute Fm (and Sm) from a given sample; thus, one is required to carry out (only) two separate local MD simulations of microstates m and n, calculating directly the absolute Fm and Fn hence their difference ΔFmn = FmFn, where the complex TI process is avoided. HSMC(D) (and other techniques [28,29,30,31] such as the quasi-harmonic approximation [32,33,34]) enable one to calculate the absolute F (and S) (for a more extensive discussion on TI and other techniques for calculating differences, ΔF (and ΔS) see [17]).
With HSMC(D) each conformation i of a given MD or MC sample is reconstructed step-by-step (from nothing) using transition probabilities (TPs); the product of these TPs leads to an approximation for the correct Boltzmann probability PiB from which various free energy functionals can be defined. While the TPs of HSMC(D) are stochastic in nature (calculated by MD or MC simulations), all the system interactions are taken into account; in this respect HSMC(D) can be viewed as exact [3], where the only approximation involved is due to insufficient MC (MD) sampling for calculating the TPs. HSMC(D) has unique features: it provides rigorous lower and upper bounds for F, which enable one to determine the accuracy from HSMD results alone without the need to know the correct answer. Furthermore, F can be obtained from a very small sample and in principle even from any single conformation (e.g., see results for argon in [3]).
The HSMC(D) methodology has been developed systematically, as applied to systems of increasing complexity. The initial (HSMC) calculations of liquid argon, TIP3P water [3], self-avoiding walks [5] and polyglycine molecules [6] have verified the validity of the theoretical predictions stated above by comparisons with accurate results obtained by other well established techniques; a subsequent application of HSMD to peptides has led to a 100 times reduction in computer time [7].
As mentioned earlier, HSMD has also been applied to mobile loops calculating ΔFm,n (and ΔSm,n) between the open and closed microstates. First, we treated the 7-residue mobile loop, 304–310 (Gly-His-Gly-Ala-Gly-Gly-Ser) of the enzyme porcine pancreatic α-amylase [8,9], where the system is modeled by the AMBER96 force field [35] alone and by AMBER96 with the GB/SA implicit solvent of Still and coworkers [36]. In this work only protein atoms close to the loop (the template) within a sphere of radius, Rtmpl were considered, where they were kept fixed in their x-ray crystallography positions, while only the loop’s atoms were moved by MD. Later the implicit solvent was replaced by explicit solvent, i.e., the α-amylase loop was capped with 70 TIP3P [37] water molecules (within a sphere of radius Rwater = 14 Å), which (together with the loop’s atoms) were moved in the MD simulation. Because the application of HSMD to water has not been optimized yet, the contribution of water to the free energy was calculated by a TI procedure incorporated within the framework of HSMD. This HSMD-TI process leads to a relatively small statistical error because only the water-loop interactions are involved, where their full values are gradually decreased to zero, for a fixed loop structure.
The number of water molecules used (70) is relatively small relying on a previous paper [38] where the minimal number of water molecules, N water min required to cap a loop was studied. N water min was determined from a stability criterion based on the loop’s root mean square deviation (RMSD) from the Protein Data Bank (pdb) structure during 4 ns MD simulations. Surprisingly, N water min (which depends on the loop’s sequence and environment) was found to be relatively small on average 12 waters per residue, in accord with a study of Steinbach and Brooks [39] For the loop of amylase, which consists of small residues, N water min = 70 seemed to be adequate. Clearly, these conclusions should further be tested and corroborated by free energy calculations.
A detailed such study was carried out in a following paper [10] where HSMD-TI was applied to the loop 287–290 with the bulky residues, Ile, Phe, Arg, and Phe of acetylcholinesterase (AChE) from Torpedo california; reaction of AChE with the inhibitor diisopropylphosphorofluoridate (DFP) leads to a displacement of the loop’s backbone roughly by 4 Å [40,41,42,43,44,45], and experimental evidence suggests that the free energy penalty for the loop displacement is on the order of 4 kcal/mol (i.e., ΔFloop = FfreeFbound ~ −4 kcal/mol) [40,42,46]. Therefore AChE has been an ideal system for checking the performance of HSMD-TI, and in particular for examining the minimal number of water molecules needed to cap the loop. Thus we carried out two sets of calculations for systems of increasing size defined by Rtmpl = 12 and Rwater = 13 Å, and by Rtmpl = 13 and Rwater = 14 Å, for an increasing number of water molecules, Nwater = 80, 100, 120, 180 and 80, 100, 120, …, 220, respectively. We have found that to recover the experimental free energy difference the water density should be close to that of bulk water and our results for the first set, ΔFloop = −3.1 ± 2.5 and −3.6 ± 4 kcal/mol for a sphere containing 160 and 180 waters are equal within error bars to the experimental value [10].
Because the crystal structures of the free (unbound) [47] and bound [40] proteins are similar, in [10], (but also in our previous loop studies [8,9] and studies by others [48,49]) the template was “cut” from the crystal structure of the unbound protein, where the loop conformation in the bound protein was attached to the free template by superimposing the two crystal structures. Obviously, it would be more realistic to carry out the calculations with respect to two templates, which are cut separately from the free and bound x-ray structures. However, this would require defining an additional TI procedure where the water-template interactions are also gradually eliminated. This TI is expected to be computationally demanding since, unlike the original TI where (relatively small) loop-water interactions are considered, the number of template-water interacting atoms is relatively large increasing with the increase of the system size.
AChE is expected to be a suitable system for studying the effect of two different templates because the experimental ΔFloop is known. Therefore, the loop of AChE is treated here again, where the simulations are carried out with respect to two different templates which are cut from the free and bound crystal structures; this requires applying the (relatively) elaborate HSMD-TI procedure mentioned above. While the basic theory is similar to that described in [10], we derive it here in a new and shorter way for better clarity.
The foregoing (mainly technical) discussion did not touch on the relation of ΔFloop to ligand binding. One question of interest in this area is whether the conformational change adopted by a mobile loop upon ligand binding is induced by the ligand (induced fit [50,51]) or alternatively, the free loop (in the apo protein) interconverts among different microstates, one of which is selected upon binding (selected fit [52]). While some recent studies favor the latter mechanism, there is no general consensus, and the adopted mechanism is probably system dependent. The present study is not aimed at solving this problem, which would require to apply the longest simulation possible to the entire AChE protein soaked in explicit water, as done, for example, in [53] for studying loop 6 of TIM. Notice, however, that in the case of induced fit ΔFloop (in the apo protein) should be added to the total absolute free energy of binding of the protein-ligand complex. Thus, if the template remains unchanged (due to binding), calculation of ΔFloop carried out in [10] is expected to be adequate, whereas for a changed template the present calculation should be used. Indeed, in our recent study of the avidin-biotin complex ΔFloop is taken into account (for the first time) in the calculation of the absolute free energy of binding [54].
However, even if the behavior of the loop of AChE is of a selected fit type (i.e., the loop conformations in the apo protein also cover the structural region of the bound loop) the crystal structures and other experimental data suggest that the loop spends appreciable amount of time in its unbound microstate. Therefore, calculating ΔFloop, which measures the difference in stability between the two (possibly) meta-stable microstates is of interest. For this purpose, generating the longest possible MD trajectories might be a disadvantage because the loop is expected to escape from the original microstates during the simulations. Therefore, ΔFloop is calculated from relatively short trajectories (0.5 ns) where the loop remains in its original microstates (bound and free).
In this context it should be pointed out that determining the exact limits of a microstate in conformational space is practically impossible and therefore it is commonly defined by an MC or MD sample initiated from a microstate’s structure. Thus, the microstate’s size typically increases with simulation time, t and estimation of E, S, and F depend on t as well; in previous publications we have developed procedures for checking that the system remains in its microstate during a simulation. Thus, in the case of two microstates one should verify that the differences, ΔEloop(t), ΔSloop(t) and ΔFloop(t) are stable during a long enough time, t; for more details see [8,9].
Finally, while the differences ΔEloop, ΔSloop and ΔFloop between two loop microstates might be small, the fluctuations in E and S increase as N1/2, where N is the number of atoms simulated. Therefore, simulating the entire protein might be prohibitive and to keep the fluctuations manageable the coordinates of the template in our calculations are held fixed. Notice, however, that this is not an inherent limitation as in our present application of HSMD-TI to the avidin-biotin complex [54] part of the template is also moved in the MD simulation since the size of the fluctuations remains under control.

2. Theory and Methodology

2.1. The Loop and the Protein’s Template

As in [10] we study the 4-residue loop, Ile, Phe, Arg, Phe, (287-290) of AChE in two microstates related to the free and bound loop structures. The starting point is the available crystal structures of AChE from the protein data bank (PDB), where the unbound (free) structure 2ace [47] is based on residues 4 to 535 and 2dfp, the structure of AChE with DFP consists of residues 2 to 535 (the bound structure) [40]. To be able to compare these structures 2dfp was trimmed so that both structures consist of the 531 residues, 4–535 and crystal water molecules were retained.
Moreover changing the bound structure also consisted in recovering the catalytic residue Serine (200) of this serine hydrolase (2ace) by remodeling the mono-isoprophyl-phosphoryl residue (MIS) (200) of the aged phosphorylated enzyme (2dfp). Remodeling consisted in removing the d-isopropyl-phosphoryl chain without changing the heavy atoms of the modified Serine residue and adding the corresponding hydrogen atoms.
A close analysis shows that these conformations differ overall by (all heavy atoms) root mean square deviation (RMSD) of 1.22 Å, while the loop conformations differ by RMSD = 1.38, 2.91, and 2.71 Å for the backbone, sidechains, and all loop heavy atoms, respectively. The RMSD of the templates, i.e., all atoms excluding the loop’s atoms, is 1.10 Å. It should be emphasized that the coordinates of the x-ray bound structure (2dfp) appear with significantly larger B-factors than those of the free structure (2ace).
Using the program TINKER [55], and the AMBER force field [35], where Lys, Arg, and His are positively charged and Glu and Asp have a negative charge. Hydrogens were added to the free and bound structures (including the crystal water) and the potential energy was minimized, with all heavy atoms restrained to their crystal positions by harmonic forces (with a force constant of 1 kcal/Å2). Afterwards the loop and (TIP3P) water atoms were allowed to relax in the presence of a fixed template. These minimizations eliminate bad atomic overlaps and strains in the original structures, while keeping the atoms reasonably close to the PDB coordinates.
The relatively weak force constant (1 kcal/Å2) (as compared to 100 kcal/Å2 used in [10] for 2ace) was chosen to allow more extensive changes in the bound structure due to its relatively large B-factors and the chemical change imposed on it by removing the phosphate group; correspondingly, larger structural changes were also allowed to occur in the free structure. To assess the change in the structure during minimization we list the various RMSD values (Å) in pairs, for the beginning and the end of the process: RMSD for all atoms, (0.98, 0.99), backbone atoms, (0.43, 0.41), sidechains, (1.20, 1.21), whole template, (0.85, 0.86), template backbone, (0.35, 032), template sidechains, (1.05, 1.06), whole loop (2.71, 2.87), loop backbone (1.39, 1.45), loop sidechains (2.92, 3.10). Thus, the strongest changes occurred for the loops.
Taking into account the whole protein (of 8,284 atoms) would be computationally prohibitive; therefore (as in [8,9,10]), the template size is reduced to the Ntmpl atoms closest to the loop, where the rest of the protein atoms are ignored. More specifically, the center of mass of the backbone atoms of the free loop (in the 2ace structure) is calculated as a (3D) reference point denoted xcmb and a distance (Rtmpl) is chosen. If the distance of any atom of a residue from xcmb is less than Rtmpl, the entire residue is included in the template; otherwise, the residue is eliminated. We use Rtmpl = 12 Å which was found in [10] to lead to a large enough template consisting of Ntmpl = 944 atoms; however, for the present (somewhat different) structure Ntmpl = 961, and exactly all these atoms were included in the template which was “cut” from the crystal structure of the bound protein (2dfp). We added to each template 33 crystal water molecules selected from those which are closest to the loop (30 in [10]).
After defining the templates, the water molecules and the orientations of the polar hydrogens of the free and bound loops (on different templates) were subjected to an optimization procedure. This procedure (where all the loop’s heavy atoms are kept fixed), was carried out in several rounds, each consisting of a 0.1 ns MD run (1 fs time step) at high temperature (1,250 K) followed by energy minimization; the process is stopped after a fixed number of unsuccessful rounds (typically 100), namely rounds which do not reduce the energy further. Although the heavy atoms are fixed, considerable gain in potential energy is achieved. Next, keeping the template atoms fixed, the energy of each system was minimized where the loop and water molecules were allowed to move freely. Each of the final “free” and “bound” structures constitute starting points for a further analysis. The RMSD values between the final (optimized) loop structure and its crystal structure are relatively small. For the free loop they are 0.18, 0.45, and 0.42 Å, and for the bound loop, 0.27, 0.47, and 0.44 Å for the backbone, sidechains, and all loop atoms, respectively.

2.2. Addition of Water

While our considerations thus far are based on the crystal structures, we seek to simulate the loop in solution. Therefore, it is not clear whether the positions of the crystal waters are relevant for the solution environment. In particular, water molecules that are caged within the crystal structure are expected to stay there during the simulation, and thus can be considered as part of the template. Therefore, the number and arrangement of these waters should be globally optimized, practically by energy minimization, which, however, is a non-trivial problem (see below).
As in [10], our strategy is to add more water molecules to the already existing crystallographic waters. Thus, we define a sphere centered at xcmb with a radius, Rwater (Rwater = Rtmpl + 1 Å) where waters are added at random to the hemisphere oriented towards the exterior of the template. To hold these waters around the loop they are restrained with a flat-welled half-harmonic potential (with a force constant of 10 kcal mol-1Å-2) based on their distance from xcmb. That is, if the distance of a water oxygen from xcmb is greater than Rwater a harmonic restoring force is applied, otherwise the restraining force is zero.
Even though we have found in [10] that for the above Rtmpl and Rwater, Nwater should be 140–180, it is of interest to check this conclusion again for the present case of two different templates; therefore, we carry out calculations for Nwater = 40, 60, 80, 100, 120, 140, and 160.
Each of the two fully hydrated systems is then treated by a sequence of MD/minimization procedure similar to that outlined above, applied only to the hydrogen atoms of the loop and all water molecules restrained within the full sphere of radius Rwater. Again, at the end of this optimization the energy is minimized where the loop’s atoms and the water molecules are free to move. The change in the loops’ conformations from their crystal structures measured by the RMSD are relatively small; for the free loop we obtain 0.31, 1.03, and 0.94 Å for backbone, side chains and all loop atoms, respectively, where the corresponding results for the bound loop are 0.37, 1.06, and 0.98 Å (see the discussion related to Table 1).
For each of the optimized “free” and “bound” structures an MD run at 300 K is performed, where only the loop and water atoms are moved while the template atoms are kept fixed. Thus, 1,000 loop/water configurations are collected by retaining a configuration every 0.5 ps along the 0.5 ns MD trajectory. An equilibration run of 0.5 ns is carried out prior to the production run. The total potential energy Etotal is the sum of partial energies related to the loop and water (the template-template energy is constant and thus is ignored):
Etotal = [Eloop-loop+Eloop-tmpl]+[Ewater-water+Ewater-tmpl+Ewater-loop] = Eloop+Ewater
where Eloop-loop is the intra loop energy, Eloop-tmpl is the energy due to loop-template interactions; these energies define the total loop energy Eloop, and the interactions related to water are defined in a similar way, where their total is defined as Ewater. The reconstruction of the loop structure is carried out in internal coordinates; therefore, the loop conformations simulated by MD are transferred from Cartesians to the dihedral angles φi, ψi, and ωi (i = 1,Nres = 4), the bond angles θi,l (i = 1,Nres, l = 1,3), the side chain angles χ, and the corresponding bond angles. For convenience, all these angles (ordered along the backbone) are denoted by αk, k = 1,37 = K. We have argued in [7,8] that to a good approximation bond stretching can be ignored.

2.3. Statistical Mechanics of a Loop in Internal Coordinates

The theory of HSMD-TI has been developed in previous publications; therefore, the present description is shorter, providing mainly equations that are related directly to the calculations.
The reconstruction of the loop structure (of Nres = 4 residues) is carried out in internal coordinates; therefore, the loop conformations simulated by MD are transferred from Cartesians to the dihedral angles φi, ψi, and ωi (i = 1, Nres), the bond angles θi,l (i = 1, Nres, l = 1, 3), the side chain angles χ, and the corresponding bond angles. For convenience, all these angles (ordered along the backbone) are denoted by αk, k = 1,K; We have argued in [7,8] that to a good approximation bond stretching can be ignored, thus the bond lengths are considered to be constant.
The partition function of the loop/water /template system is:
Z m = m exp [ E ( x loop , x N ) / k B T ] d x loop d x N
where E(xloop,xN) = Etotal is defined in Equation (1), xloop is the Cartesian coordinates of the loop in microstate m and x N is the 9N Cartesian coordinates of the water molecules; Etotal also depends on the coordinates of the “frozen” template which are omitted for simplicity. For the same reason the letter m will be omitted in most of the equations and Nwater will be replaced in the theoretical section by N (N = Nwater). After changing the variables of integration from xloop to internal coordinates, the integral becomes a function of the K dihedral and bond angles, αk, k = 1,K and a Jacobian, cos(θi,l) that depends (only) on each of the bond angles θi,l [27,28,32]:
Z = m exp { [ E loop ( [ α k ] ) E water ( [ α k ] , x N ) ] / k B T } d [ α k ] d x N
where [ α k ] = [ α 1 , ... α K ] and d [ α k ] = d α 1 ... d α K . In Equation (3) we have omitted a factor, which depends on the bond lengths and is assumed to be the same (i.e., constant) for different microstates of the same loop and therefore does not affect entropy differences (e.g., see discussion in [10]). The Jacobian is also omitted for simplicity because we have shown [8] that it cancels out (within the error bars) in entropy and free energy differences. The Boltzmann probability density corresponding to Z [Equation (3)] is:
ρ B ( [ α k ] , x N ) = exp { E ( [ α k ] , x N ) / k B T } / Z
and the exact entropy, S and exact free energy, F (defined up to an additive constant) are:
S = k B m ρ B ( [ α k ] , x N ) ln ρ B ( [ α k ] , x N ) d [ α k ] d x N
and:
F = m ρ B ( [ α k ] , x N ) { E ( [ α k ] , x N ) + k B T ln ρ B ( [ α k ] , x N ) } d [ α k ] d x N
It should be noted that the fluctuation of the exactF is zero [56,57], because by substituting ρ B ( [ α k ] ) [Equation (4)] inside the curly brackets of Equation (6) one obtains, E ( [ α k ] ) + k B T ln ρ B ( [ α k ] ) = k T ln Z = F , i.e., the expression in the curly brackets is constant and equal to F for any set [ α k ] within m. This means that the free energy can be obtained from any single conformation if its Boltzmann probability density is known. [Notice, however, that calculation of ρ B ( [ α k ] , x N ) for a single conformation depends on the entire microstate as is also evident from the HSMC(D) procedure discussed later]. Still, the fluctuation of an approximate free energy (i.e., which is based on an approximate probability density) is finite and is expected to decrease as the approximation improves [3,4,5,56,57]. Because HSMC(D) provides an approximation for ρ B ( [ α k ] , x N ) , one can, in principle, estimate the free energy of the system from any single structure [3,4,5]. This is the reason why in practice reliable HSMC(D) results for F (but not necessarily for S and E) can be obtained from a relatively small sample.

2.4. Exact Future Scanning Procedure

It should first be pointed out that MC and MD are exact dynamical methods that enable one to sample system configuration i correctly with its Boltzmann probability, PiB, while the value of PiB is not provided (due to the dynamical character of these methods). [To simplify the discussion we use the discrete probability, PiB rather than the probability density ρ B ( [ α k ] , x N ) defined in Equation (4).] Thus, properties such as the energy that are “written” on i can easily be calculated, while a direct calculation of the absolute S is difficult because lnPiB is unknown (it depend not only on i but on the entire ensemble through the partition function Z, which cannot be obtained from a finite sample).
Unlike the dynamical MC and MD, the exact future scanning method [that constitutes the basis of HSMC(D)] is a growth procedure which enables one (at least in principle) generating any system configuration (from nothing) by determining the positions of the atoms step-by-step with the help of transition probabilities (TPs); the product of these TPs leads to the value of PiB hence to S and F. Practically, a loop/water/template configuration would be generated by initially building a loop structure (in the presence of moving water) followed by the construction of a configuration of the surrounding water molecules (in the presence of a fixed loop conformation). In this way a sample of statistically independent system configurations can be obtained.
For simplicity this construction is described for a loop without side chains, consisting of M Gly residues, i.e., ordered along the chain are the 3M heavy atoms denoted k’ = 1…3M and the 6M dihedral and bond angles denoted αk, 1 ≤ αk ≤ 6M = K with values within microstate m; the loop is surrounded by Nwater water molecules moving within the volume defined by a sphere of radius, Rwater, the template, and the loop. We seek to generate a configuration of the entire system by first generating a loop conformation and then a configuration of the water molecules.
With the scanning procedure the position of the heavy atoms (xloop) is determined step-by step. Thus, the position of the first atom k’ = 1 is defined by the simultaneous determination of the first pair of dihedral and bond angles α1, and α2. The maximum range Δα1Δα2 which will keep the loop within m is defined, and each of Δα1 and Δα2 is divided into nb small bins (of sizes Δα1/nb and Δα2/nb) denoted j1 and j2, j1 = 1…nb, j2 = 1…nb, respectively. A long MD simulation of the whole system (loop + water) is carried out within microstate m, where a conformation is retained every l fs leading to a huge sample of size n; then, the number of conformations nj1j2 that visit simultaneously the (double) bin j1j2 is calculated from which the corresponding TP is obtained, pj1j2 = nj1j2/n (or ρj1j2 = [nj1j2/n]/[Δα1Δα2/nb2]). A double bin is then selected by a random number according to the probabilities pj1j2 which defines the position of atom k’ = 1 (and its hydrogen or oxygen). The position of this atom is not changed in the next steps of the build-up process, i.e., it becomes part of the “past”. The position of the second atom (k’ = 2) is determined in the same manner from a long MD simulation of the future part of the system (i.e., atoms k’ = 2…3M and water) where α3 and α4 are considered, bins Δα3/nb and Δα4/nb are defined, probabilities are calculated and a “lottery” (like above) determines the values of α3 and α4 which define the position of atom k’ = 2; the process continues until the positions of all the loop’s atoms (and their hydrogens or oxygens) have been determined. A configuration of the Nwater molecules is then determined in a similar way step-by-step in the presence of the fixed loop structure previously constructed (for details see [3]). Obviously, the smaller are the bins the higher is the accuracy of the construction process, provided that the statistics is adequate, i.e., that the (future) MD simulations are long enough; this stochastic scanning method becomes exact as the bin size → 0 (nb → ∞) and n→ ∞. Notice that in applications of the (deterministic) scanning method to lattice models, only part of the future has been considered, (i.e., only f steps ahead), where this part has been scanned completely; therefore, the corresponding TPs are approximate but deterministic (rather than stochastic), and accurate results were obtained by using an additional importance sampling procedure [58].
This procedure can be described more formally as follows: at step k’ (k = 2k’), the positions of k’ − 1 atoms have already been determined (from the values of the corresponding k − 2 angles α1, …, αk-2) and they are kept fixed (defining the “past”); αk-1 and αk (which will determine the position of atom k’) are defined with the exact TP density ρ ( α k 1 α k | α k 2 , , α 1 ) :
ρ ( α k 1 α k | α k 2 , , α 1 ) = Z future ( α k α k 1 , , α 1 ) / [ Z future ( α k 2 , , α 1 ) ]
where Z future ( α k , , α 1 ) is a future partition function. The term “future” indicates that the integration defining Zfuture is carried out over the positions of atoms k’ = k/2 + 1…K/2 (which affect angles, α k + 1 , , α K ) and the 9N coordinates x N of the water molecules (which will be determined in future steps of the build-up process). Notice that this integration is carried out in a restrictive way where the corresponding conformations (of the loop) remain within microstate m. Also, in this integration the atoms treated in the past (1… k’ − 1) (which were determined by α 1 α k 2 ) are held fixed in their coordinates. For simplicity the integrations below are written over the angles rather than the Cartesian coordinates (xloop) of the loop atoms, k’ = k/2+1…K/2. Thus:
Z future ( α k , , α 1 ) = m exp [ ( E ( α K , , α 1 , x N ) / k B T ] d α k + 1 d α K d x N
where E (Equation (1)) is the total potential energy of the loop/template/water system, which also imposes the loop closure condition. The product of the TPs (Equation (7)) leads to the (Boltzmann) probability density of the entire loop conformation, ρ loop B ( α K , , α 1 ) . After the loop structure has been constructed a configuration of water molecules is generated step-by-step (in the presence of the constant loop structure where the product of the corresponding TPs leads to the probability density of the water configuration, ρ water B ( α K , , α 1 , x N ) . The probability density ρ B ( [ α k ] , x N ) of the loop/water /template configuration is the product of ρ loop B ( [ α k ] ) and ρ water B ( [ α k ] , x N ) . One can define for m “the loop entropy of mean force”, S loop :
S loop = k B m ρ loop B ( [ α k ] ) ln ρ loop B ( [ α k ] ) d [ α k ]
where Sloop is defined up to an additive constant. Extending the exact scanning procedure to side chains is straightforward, where again the position of a side chain atom is defined by two angles as described above.
However, implementation of the exact scanning procedure (as described prior to Equation (7)) for generating Boltzmann weighted configurations of a large loop/water/template system would be inefficient, due to the need to calculate (at each step) a large set of accurate TPs (i.e., for an extremely large number of small bins); this would require extremely long (future) MD simulations. Also, with long simulations it is difficult to guarantee that the loop will remain in microstate m (see discussion in [8]). However, the exact scanning procedure provides the theoretical basis for HSMC(D). Thus, the exact scanning method is equivalent to any other exact simulation technique (in particular, to MC or MD) in the sense that large samples generated by such methods lead to the same averages and fluctuations (the sample does not carry a memory of the simulation method with which it has been generated). Therefore, one can assume that a given MC or MD sample has rather been generated by the exact scanning method, which enables one to reconstruct each conformation i by calculating the TP densities that hypothetically were used to create it step-by-step. With HSMC(D) the efficiency problems discussed above for the exact (stochastic) scanning procedure are alleviated to a large extent since only a single TP is calculated at each step (rather than many TPs (pj) required with the scanning method), as described below; also, because we are mainly interested in entropy differences, approximations (e.g., ignoring the Jacobians and bond stretching) can be applied without compromising the accuracy of the results.

2.5. Loop Reconstruction by HSMC(D)

The theory of HSMC(D) is described again for a loop consisting of M Gly residues. Notice that while HSMD and the exact scanning method (described in the previous section) have some resemblance, they are different. With the scanning method a loop conformation is generated with PiB whereas with HSMD an already generated structure (by MD) is reconstructed in order to obtain its probability. Thus, one starts by generating an MD sample of the loop in water in microstate m; the configurations of this sample will be reconstructed by HSMD (application of HSMC is similar). First, the conformations are represented in terms of the dihedral and bond angles αk, 1 ≤ αk ≤ 6M = K, and the variability range Δαk is calculated:
Δ α k = α k ( max ) α k ( min )
where αk(max) and αk(min) are the maximum and minimum values of αk found in the sample, respectively. Δαk, αk(max), and αk(min) enable one to verify that the sample has not “escaped” from microstate m. Notice that in the discussion below we define the loop conformation by the set of angles [αk] rather than by the positions of the loop atoms, xloop, which are determined by [αk].
Each of the configurations (frames) ( [ α k ] , x N ) (denoted i for brevity) of the sample is reconstructed in two stages, where the biotin structure is reconstructed first followed by the reconstruction of the water configuration xN. Because the position of atom k’ is defined by a dihedral and a bond angle one has to calculate their TP simultaneously. Thus, at step k’ (k = 2k’) of stage 1, the k−2 angles α k 2 α 1 have already been reconstructed and the TP density of α k 1 α k , ρ ( α k 1 α k | α k 2 , , α 1 ) , is calculated from an MD run, where the entire future of the loop and water is moved [i.e., the loop’s atoms k’,k’ + 1,…K/2 and their connected hydrogens, or oxygens and the water coordinates xN] while the past (loop’s atoms 1,2,…,k’−1 and their connected hydrogens or oxygens) are held fixed at their values in conformation i. By considering a future conformation every 10 fs, a sample of size nf is generated. Two small segments (bins) δαk-1 and δαk are centered at αk-1(i) and αk(i), respectively, and the number of simultaneous visits, nvisit, of the future chain to these two bins during the simulation is calculated; one obtains (see Figure 1):
ρ loop ( α k 1 α k | α k 2 , , α 1 ) ρ HS ( α k 1 α k | α k 2 , , α 1 ) = n visit / [ n f δ α k 1 δ α k ]
where ρ HS ( α k 1 α k | α k 2 , , α 1 ) becomes exact for very large nf (nf→∞) and very small bins (δαk-1, δαk→0). This means that in practice ρ HS ( α k 1 α k | α k 2 , , α 1 ) will be somewhat approximate due to insufficient future sampling (finite nf) and relatively large bin sizes. The same treatment can be applied to side chain atoms; however, notice that the sidechain of Arg is treated here approximately, where at each step three consecutive atoms are considered and their positions are defined only by the corresponding dihedral angles. In [8] we have shown that δαk and δαk+1 can be optimized. Notice that with HSMD the future loop conformations generated by MD at each step k‘ remain in general within the limits of m, which is represented by the analyzed MD sample. The corresponding probability density related to the loop is:
ρ HS ( α K , , α 1 ) = ρ HS ( [ α k ] ) = k = 2 , 2 K ρ HS ( α k 1 α k | α k 2 , , α 1 )
ρ HS ( [ α k ] ) defines an approximate entropy functional, denoted S loop A , which can be shown (using Jensen’s inequality) to constitute a rigorous upper bound for S loop (Equation (9)) [3,59]:
S loop A = k B m ρ loop B ( [ α k ] ) ln ρ HS ( [ α k ] ) d [ α K ]
ρ loop B (Equation (9)) is the Boltzmann probability density of [ α K ] in m. ( S loop A S loop is also known as the Gibbs’ inequality). Being an upper bound suggests that S loop A will decrease as the approximation improves. In practice, however, this inequality is satisfied only if the probabilities are well defined (e.g., they are correctly normalized). This is pointed out because ρ HS ( α K , , α 1 ) is calculated stochastically and its error (and the error in S ¯ loop A , the estimation of S loop A , see Equation (14)) increases with increasing system size (i.e., increasing the number of TPs; see later discussions in 3.3).
Figure 1. An illustration of the HSMD reconstruction process of conformation i of a peptide consisting of three glycine residues, where for simplicity the oxygens and most of the hydrogens are discarded. At step k’ = 6 the TPs related to the “past” atoms k’ = 1…5 (depicted by full spheres) have already been determined and these atoms are kept fixed in their positions at conformation i. At this step (6) one calculates the TPs of bond angle αk (defined by C’-Cα-N) and dihedral angle αk−1 (defined by C’-Cα-N-C’) which are related to C’, where k = 2k’ and thus k = 12. The TPs are obtained from an MD simulation where the as yet unreconstructed atoms k’ = 6…10 (the “future” atoms) are moved (depicted by empty spheres connected by dashed lines) while the past atoms k’ = 1…5 are kept fixed; notice that the future part should remain within the limits of the microstate and future-past interactions are taken into account. Small bins δαk−1 and δαk are centered at the values αk−1 and αk in i. The TP is calculated from the number of visits (nvisit Equation 11) of the future part to δαk−1 and δαk simultaneously during the simulation. After ρ HS ( α 11 α 12 | α 10 , , α 1 ) (Equation 11) has been determined the coordinates of C’ (and O) are fixed at their positions in i, i.e., they become “past” atoms and the process continues.
Figure 1. An illustration of the HSMD reconstruction process of conformation i of a peptide consisting of three glycine residues, where for simplicity the oxygens and most of the hydrogens are discarded. At step k’ = 6 the TPs related to the “past” atoms k’ = 1…5 (depicted by full spheres) have already been determined and these atoms are kept fixed in their positions at conformation i. At this step (6) one calculates the TPs of bond angle αk (defined by C’-Cα-N) and dihedral angle αk−1 (defined by C’-Cα-N-C’) which are related to C’, where k = 2k’ and thus k = 12. The TPs are obtained from an MD simulation where the as yet unreconstructed atoms k’ = 6…10 (the “future” atoms) are moved (depicted by empty spheres connected by dashed lines) while the past atoms k’ = 1…5 are kept fixed; notice that the future part should remain within the limits of the microstate and future-past interactions are taken into account. Small bins δαk−1 and δαk are centered at the values αk−1 and αk in i. The TP is calculated from the number of visits (nvisit Equation 11) of the future part to δαk−1 and δαk simultaneously during the simulation. After ρ HS ( α 11 α 12 | α 10 , , α 1 ) (Equation 11) has been determined the coordinates of C’ (and O) are fixed at their positions in i, i.e., they become “past” atoms and the process continues.
Entropy 12 01946 g001
S loop A can be estimated (for microstate m) from a Boltzmann sample (of size ns) generated by MD using the arithmetic average:
S ¯ loop A ( m ) = k B n s t = 1 n s ln ρ HS ( t , m )
where ρ HS ( t , m ) is the value of ρ HS ( [ α k ] ) obtained for configuration t of the sample of m. S ¯ loop A (with the bar) is an estimation of the ensemble average S loop A (Equation (13)); correspondingly, the ensemble averages of the energy are estimated from a sample of size ns and should appear with a bar as well. However, from now on only estimations will be considered and for simplicity, all of them will appear without the bar, like the energies defined in Equation (1). S loop A (Equations (13) and (14)) constitutes a measure of a pure geometrical character for the loop flexibility, i.e., with no direct dependence on the interaction energy. When the converged or the best value of S loop A is considered, it will be denoted by Sloop; thus, Floop = Eloop−TSloop is defined as the loop’s contribution to the total free energy, where Eloop is defined in Equation (1). In the same way, the difference in the loop entropies between the free and bound microstates obtained for a specific set of parameters is denoted by Δ S loop A while the converged difference is denoted by Δ S loop :
Δ S loop A = S ¯ loop A ( free ) S ¯ loop A ( bound )
where:
Δ S loop = S ¯ loop A ( free ) S ¯ loop A ( bound ) (converged )
One can define a free energy difference for the loop, ΔFloop = ΔEloop−TΔSloop, where Δ E loop is obtained from Equation (1). Notice that (unlike in [10]) ΔSloop and ΔFloop are obtained for different templates; for simplicity their dependence on the templates is omitted.

2.6. Thermodynamic Integration of Water

To reconstruct the water configuration one can use, in principle, the HSMC(D) procedure for fluids mentioned previously [3], which for a fixed loop structure, [αk] would lead to ρ water HS ( [ α k ] , x N , tmpl ) (as an approximation to ρ water B ( [ α k ] , x N , tmpl ) ; see discussion preceding Equation (9)) and then to the contribution of the water configuration to the free energy F water HS ( [ α k ] , x N , tmpl ) = E water ( [ α k ] , x N , tmpl ) + k B T ln ρ water HS ( [ α k ] , x N , tmpl ) , which depends on the free or bound template (tmpl). However, this procedure for fluids has not been optimized yet and it is relatively time consuming.
Alternatively, as in [9,10], one can obtain F water ( [ α k ] , x N , tmpl ) (defined up to an additive constant) by a thermodynamic integration (TI) procedure based on the same reference state for the free and bound structures (i.e., the above constant is the same for both structures). In this state only the water-water interactions and the harmonic restraining force that keeps the waters in the spherical volume are preserved, while the water-template and water-loop interactions [electrostatic and Lennard Jones (LJ)] are switched off, i.e., the fixed loop structure [ α k ] and the fixed template do not “see” the surrounding waters. Thus, ΔFwater (our main interest) can be obtained by a TI procedure (applied to each system) where these interactions are gradually increased (from zero) during an MD simulation of water (while the loop structure and template remain fixed). (Notice that while the templates are structurally different, the template-template interactions are ignored, therefore the templates are considered to be in the same state.)
In practice, however, it is more convenient to carry out the integration in an opposite direction, starting from the current water configuration with full water-loop and water-template interactions, which are gradually decreased to zero. More specifically, the water-loop interactions are integrated (eliminated) first (in the presence of intact water-template interactions), which leads to F water TI ( [ α k ] , m ) . The subsequent template–water integration (that depends only on the template) is denoted F water TI ( tmpl , m ) , where m stands for “free” or “bound”. Each of the free and bound integrations are carried out in two stages, where in the first stage the electrostatic interactions (i.e., the charges) are decreased to zero followed by decreasing to zero the Lennard Jones (LJ) interactions, where the latter is performed by decreasing a parameter λ from 1 to zero (see section 3.3 and Appendix).
The water-loop integration is carried out ns times for each of the loop structures [ α k ] of the sample of size ns. Denoting these [ α k ] of the sample by t and keeping for brevity the letter m (“free” or “bound”) only on the left side of the equation, leads to:
F water TI ( loop , m ) = F water TI ( ch,loop ) + F water TI ( LJ,loop ) = 1 n s t = 1 n s F water TI ( ch , t ) + F water TI ( LJ , t )
Unlike the different [ α k ] in the sample, the template remains constant and therefore, in principle, only a single integration is needed. However, to decrease the statistical fluctuations, ns’ integrations (denoted by t) are carried out, each starts from a different water configuration and a set of velocities (denoted t) thus:
F water TI ( tmpl , m ) = F water TI ( ch,tmpl ) + F water TI ( LJ,tmpl ) = 1 n ' s t = 1 n s ' F water TI ( ch , tmpl, t ) + F water TI ( LJ , tmpl , t )
It should be pointed out again that the template-template energy is not considered in our model, therefore the two templates only differ by their geometry and the corresponding reference states are thus the same.

2.7. The Reconstruction Procedure with HSMD

The HSMD reconstruction procedure needs further discussions. Thus, the MD simulation of the future chain at step k’ starts from the reconstructed conformation i, and every g = 10 fs the current conformation is considered, where the ninit initial considered conformations are discarded for equilibration. For each of the next nf (considered) future conformations the pair (αk-1k) (k = 2k’) are calculated and their contribution to nvisit (Equation (11)) is calculated. An essential issue is how to guarantee an adequate coverage of microstate m, i.e., that the future chains will span its entire region (in particular the side chain rotamers) while avoiding their “overflow” to neighboring microstates, conditions that will occur for a too small and a too large nf, respectively. [Note that even at step k’, where the “past” of the loop (atoms 1…k1) is kept fixed, the (future) unfixed part (atoms k’, k’ + 1…) can leave the microstate during long MD simulations. Such an “overflow” is more likely to happen for small residues such as Gly and for small k’.]
In previous work [6,7,8,9] we developed procedures for keeping the loop in its original microstate and measures for estimating the extent of coverage of the microstate by the reconstructed samples. In this paper these procedures have not been applied because the maximal nf values used are not large and the microstates are concentrated (i.e., the Δαk values of Equation (10) are relatively small; see discussions in 3.2).

3. Results and Discussion

3.1. Simulation Details

The solvated free and bound structures (defined on the corresponding x-ray structures) were initially optimized as described in section 2.1 and section 2.2 leading to two optimized structures. Then, starting from each optimized structure, an 1 ns MD trajectory was generated at 300 K, where the initial 0.5 ns part was used for equilibration and a sample of 1,000 structures was generated from the last (half) part of the trajectory by retaining a structure to a sample every 0.5 ps. From these samples (which represent the free and bound microstates on their different templates) the free energy and entropy are calculated.
All simulations were carried out with the velocity-Verlet algorithm [60] based on a time step of 2 fs, where bonds involving hydrogens (including those of water) were frozen to their ideal values by the RATTLE algorithm [60]; the Berendsen [60] heat bath controlled the temperature. Cut-offs on long-range interactions were not imposed, and in the reconstruction process a structure was added to the sample every g = 10 fs, where the ninit = 250 initial structures (2.5 ps) were discarded for equilibration. The future samples were generated for several bin sizes, δ, where results are presented for δ = Δαk/l, l = 5, 10, 20, 30, 40, and 50, centered at αk (i.e., αk ± δ/2) (Equation (11)). If the counts of the smallest bin are smaller than 50 the bin size is increased to the next size, and if necessary to the next one, etc. In the case of zero counts, nvisit is taken to be 1; however, an event of zero counts is very rare.

3.2. Dihedral Angles for Different Microstates

In Table 1 we present results for αk(min), αk(max) and Δαk (Equation (10)) for the backbone dihedral angles φ and ψ and the sidechains, χ. These values are based on the samples of 1,000 conformations generated for the free and bound microstates for Nwater = 140 (rather than the results for Nwater = 160, which show some inconsistencies, see 3.4). The table reveals that the Δαk values for the backbone of the free and bound microstates are relatively small (in most cases smaller than 80o) and they are comparable to those obtained by other samples (based on different Nwater values). Somewhat larger Δαk values were obtained for χ, where all these results are also comparable to those obtained in [10]; this suggests that ΔSloop (Equation (15b)) would be small as well. These small Δαk values stem from the constraints imposed by the template on the inner loop.
For comparison we also provide in Table 1 the dihedral values of the crystal structures, αk(crystal). These angles enable one to determine whether the samples have escaped from their original microstates. While exact definition of a microstate is practically unfeasible (see discussion in section 1.3 of [8]), we have accepted an “escape” criterion for a dihedral angle when αk(crystal) + 60o is smaller than αk(max) or αk(crystal)60o is larger than αk(min), i.e., if some angle values fall beyond the range αk(crystal) ± 60o; these angles are bold-faced in the table. The table reveals that only one backbone angle, φ(Arg) of both the free and bound microstates has been escaped but with small deviations of 18o and 10o, respectively. The number of escaped sidechain angles is seven for the free microstate and five for the bound one. Thus, the original microstates are retained for the backbone but only partially for the side chains, which might be expected since we model the loops in solution rather in the crystal environment.
Table 1. Minimum and maximum values of dihedral angles, αk(min) and αk(max) and their differences Δαk (in degrees) for the 140 free and bound samples*.
Table 1. Minimum and maximum values of dihedral angles, αk(min) and αk(max) and their differences Δαk (in degrees) for the 140 free and bound samples*.
FreeBound
Residue angleα k(crystal)α k(min)α k(max)Δα kα k(crystal)α k(min)α k(max)Δα k
Ser ψ1651351622717215918627
ω1791561792317714217230
Ile φ−120−150−10941−57−60−852
ψ13813618549−44−93−3162
Phe φ70297950−98−131−6962
ψ433288561307313966
Arg φ−139−217−13384−113−126−4383
ψ136841607613−371653
Phe φ−122−155−8273−88−93−4152
Side Chains
Ile χ1−118−93−3855−82−65−2540
χ238−98−3464−47−62−1745
χ3180−180180360180−179180359
χ2’180319160180147763
Phe χ1−139−174−11955−62−86−4046
χ2−36−58866−51−94−3163
Arg χ1−68−129−5178−72−88−3949
χ2−55−223 −14875−71−228−15078
χ 3−1703010474−50−3690126
χ4−9761176115176−183−51132
χ50−3735720.4−293766
χ60−4242840−393473
χ6’0−3735720−484492
Phe χ1−43−62−134940367640
χ2−80−104−4163−88−132−7953
k(min), αk(max), and Δαk are defined in Equation (10); their values were calculated from samples of 1,000 loop conformations (0.5 ps) generated for the free and bound microstates. The values of αk(crystal) were calculated from the PDB crystal structures, 2ace [47] and 2dfp [40] of the free and bound protein, respectively.
Table 2. HSMD results (in kcal/mol) for the loop entropy, T S loop A and for entropydifferences T S loop A between the free and bound microstates at T = 300*.
Table 2. HSMD results (in kcal/mol) for the loop entropy, T S loop A and for entropydifferences T S loop A between the free and bound microstates at T = 300*.
Nwater = 140
Bin size nf T S loop A T Δ S loop A
FreeBound
Δαk/52,00063.463.30.1
Δαk/102,00061.661.5.0.1
Δαk/202,00061.361.4−0.1
Δαk/30 200 60.8 60.9−0.1
400 61.0 61.1−0.1
1,200 61.3 61.30.0
1,60061.3 61.30.0
2,00061.361.4−0.1
Δαk/40 200 60.860.9−0.1
400 61.0 61.1−0.1
1,200 61.2 61.3−0.1
"1,60061.3 61.4−0.1
2,00061.3 61.4−0.1
Δαk/50 200 60.8 60.9−0.1
"400 61.0 61.1−0.1
"1,200 61.3 61.30.0
"1,60061.3 61.30.0
2,00061.3 61.30.0
Errors ≤ ±0.3±0.30.0
Converged 61.3 ± 0.361.3 ± 0.30.0 ± 0.2
QH10472.1 ± 1.071.5 ± 1.10.6 ± 2.0
*Results are presented for the loop entropy, S loop A (Equations (13) and (14)) and for the differences, T Δ S loop A = T [ S loop A ( free ) S loop A ( bound ) ] (Equation (15a)]; they were obtained by reconstructing 80 loop structures selected homogeneously from larger MD samples (of 1,000 water-loop configurations) of the free and bound microstates obtained for Nwater = 140. The results are calculated as a function of the bin size δ = Δαk/l (Equation (10)) and nf (Equation (11)) the sample size of the future chains used in the reconstruction process.

3.3. Results for the Loop Entropy

Results for the loop entropy, S loop A [Equations (13) and (14)] appear in Table 2 for Nwater = 140 for the free and bound microstates and for their difference T [ S loop A ( free ) S loop A ( bound ) ] = T Δ S loop A [see the discussion preceding Equation (15a)]. These results were obtained by reconstructing ns = 80 loop structures, distributed homogeneously along the entire sample of 1,000 system configurations. The simulated future consists of the future part of the loop including all the surrounding water molecules. The results are presented for several values of nf - the sample size of the future chains [Equation (11)], where nf = 200, 400, 1,200, 1,600, and 2,000; these values of nf are used for pairs of angles, such as a backbone dihedral and the successive bond angle. However, for the sidechains we also reconstruct a single χ angle and triplets of successive χ angles (e.g., for Arg) for which the maximal nf is 1,000 and 4,000 (rather than 2,000), respectively (see Equation (11)). The results are also presented as a function of bin size, δ = Δαk/l (Equations (10) and (11)) where l = 30, 40 and 50, while for nf = 2,000 we also provide results for larger bin sizes defined by l = 5, 10 and 20. The statistical errors were obtained from the fluctuations (standard deviation).
Being an upper bound, T S loop A [Equation (13)] is expected to decrease as the approximation improves, i.e., with decreasing the bin size — an expectation that is fully satisfied. For example, for nf = 2,000, the T S loop A ( free ) values are 63.4, 61.6, 61.3, 61.3, 61.3, and 61.3 (kcal/mol + constant), i.e., they decrease for l = 5, 10, and 20 and converge to 61.3 (±0.3) kcal/mol for l = 30, 40 and 50; a similar behavior is observed for T S loop A ( bound ) . One would also expect T S loop A to decrease as nf increases in each bin. On the other hand, the table reveals that the central values of T S loop A slightly increase in going from nf = 200 to 1,200 and then become constant. While most of these changes are within the error bars, they reflect an insufficient equilibration of 2.5 ps (see discussion following Equation (13)). Thus, eliminating the results for nf = 200 (i.e., increasing the equilibration to 4.5 ps) would lead to the expected decrease among the (new) T S loop A results due to the expected increase in the (new) results for nf = 200 and 1,000 (however, to be consistent with [10] the results in the table are based on a 2.5 ps equilibration). In any case, within the error bars the T S loop A values in Table 2 can be considered as converged; they are only slightly larger (by ~1 kcal/mol) than those of Table 4 of [10] and the errors in both tables are comparable.
We are mostly interested in the difference in entropy between the free and bound microstates, T Δ S loop [Equation (15b)]. Table 2 shows that for all approximations T Δ S loop ~0 kcal/mol, i.e., the entropy of both microstates is the same, while in [10] for 160 water molecules T Δ S loop ( n s = 80 ) = 1.2 ± 0.1 kcal/mol, i.e., the free microstate has slightly higher entropy than the bound one.
The HSMD results for the entropy are compared to those obtained with the quasi-harmonic (QH) method [32], i.e., based on the equation:
S loop QH ( m ) = ( k B / 2 ) { N + ln [ ( 2 π ) N Det( σ ) ] }
where the covariance matrix, σ, is obtained from a local MD sample and N is the number of internal coordinates used in the HSMD procedure. It should be pointed out that in a detailed study of QH Chang et al. [61] have found that the method might be unreliable when used in Cartesian coordinates or applied (in internal coordinates) to several microstates. On the other hand, QH was found suitable for treating a single microstate, while the convergence of the results is slow and large samples are typically needed; also, because QH takes into account only second order correlations S loop QH ( m ) constitutes an upper bound for the correct S. Still, entropy differences Δ S loop QH ( m ) are expected to be reliable (see also [21]). Thus, to obtain reasonable precision, results for S loop QH ( m ) (m is bound or free) were obtained from MD samples of 10,000 loop-water configurations. To avoid the “escape” of a sample from the original microstate, it consists of 10 separate samples of 1,000 configurations (0.5 ns), each started from the same structure with a different set of initial velocities, where the initial trajectory of 0.5 ns is used for equilibration and is thus discarded. The values of T S loop QH exceed the HSMD results for T S loop (for ns = 2,000) by 11–10 kcal/mol for the free and bound microstates. These elevated results are in accord with S loop QH being an upper bound and are comparable to the overestimation values found in our previous studies [6,7,8,9,10]. On the other hand, S loop QH ( free)- S loop QH ( bound) = 0.6 ± 2.0 is in accord (within the error bars) with the HSMD result ΔSloop = 0.
As in [10], the computer time required to reconstruct a loop structure capped with Nwater = 160 is 8.1 hours CPU on a 2.1 GHz Athlon processor, meaning that the entire reconstructions required 1,296 and 1,472 h CPU; this time is smaller for Nwater = 140. However, we have shown that considering only 10% (nf = 200) of the maximal reconstruction samples and using smaller samples of ns = 40 (rather than 80) have led to sufficiently accurate entropy differences, meaning that the total computer time can be reduced to 65 h CPU, respectively. We have generated the relatively large reconstruction samples to verify the convergence of the results.

3.4. The Effect of Water

In Table 3 results are presented for the loop energy, E loop-tmpt , E loop-loop ,and their sum, Eloop (Equation (1)), for the loop-water free energies (calculated by TI), F water TI (ch,loop), F water TI (LJ,loop), and their sum F water TI (loop) [Equation (16)], for the water-template free energies F water TI ( ch,tmpl ) , F water TI ( LJ,tmpl ) and their sum F water TI ( tmpl ) (Equation (17)); we also provide results for Fsum = Eloop + F water TI (loop) + F water TI ( tmpl ) (which is the total free energy without the contribution of the loop entropy, Sloop, Equations (13) and (14)), and for the total energy, Etotal (Equation (1)). For simplicity we have omitted the letter “m” from these quantities; however, each one of them is calculated for the free and bound microstates, and their difference, Δ (free bound) is also provided. These results are obtained for 40 ≤ Nwater≤ 160. Details about the integrations are described in the Appendix.
To check the stability of these results, and assess their statistical errors they were calculated for an increasing sample size (not shown), where the maximal ns is 80 (Nwater ≤ 140) and 40 for Nwater = 160. Statistical errors, s/(ns)1/2, where s is the standard deviation, are provided only for Nwater = 140 and 160, which are larger than those obtained for Nwater < 140. The largest error of 2.5 kcal/mol is for Etotal(Nwater = 160), where the other errors are significantly smaller. The errors in the differences, Δ are again very stable, and smaller than 2.6 kcal/mol for ΔFsum(Nwater = 160); these errors are comparable to those obtained in [10]. It is in particular encouraging that small errors are found for the water-template free energies, F water TI ( ch,tmpl ) , F water TI ( LJ,tmpl ) and their sum F water TI ( tmpl ) , which are based on an integration of a significant number of atoms and are calculated for the first time in this paper.
Based on [10], one would expect to obtain the experimental difference in free energy, ΔFloop = FfreeFbound ~ 4 kcal/mol for water densities close to that of bulk water, which for the present system occur for Nwater ranging from 140 to 180. Still, it is of interest to check this expectation by calculating results also for smaller Nwater. We have seen in 3.2 that for Nwater = 140 ΔSloop~0, where small ΔSloop was found also in [10]. We therefore assume that ΔSloop ~0 for other sets of results (Nwater ≠ 140) meaning that ΔFsum actually constitutes the difference in free energy between the two microstates. However, the results for ΔFsum in the table are positive for all Nwater, even for Nwater = 140 and 160, i.e., they differ significantly from the experimental value; in particular, for Nwater = 140 ΔFsum = 18 ± 2 kcal/mol.
Expecting ΔSloop to be close to zero for all values of Nwater , also means that the total entropy, TΔStotal = ΔEtotal – ΔFsum is mainly due to water. The table shows that in all cases ΔEtotal is positive, i.e., the total energy of the free loop is larger than that of the bound one. Correspondingly, in all cases (besides for Nwater = 100), ΔEtotal is larger than ΔFsum, i.e., the entropy of the free loop (as expected) is also the largest, where TΔStotal ranging from 5 (Nwater = 140) to 42 kcal/mol for Nwater = 160. Thus the entropy contributes significantly to the free energy where it constitutes between 20 and 64% of the total energy.
Before returning to the experimental-computational disagreement related to ΔFloop, it is of interest to discuss the results for Nwater = 160. These results (in particular those for the differences Δ) show an unexpected “jump” as compared to their counterparts for Nwater < 160. Thus, while the results for ΔFsum for 60 ≤ Nwater ≤ 140 are comparable (ranging between 15 and 26 kcal/mol), ΔFsum(Nwater = 160) = 95 kcal/mol. Correspondingly, ΔEtotal increases from 23 (Nwater = 140) to 137 kcal/mol (Nwater = 160). The main contributors for these results are the relatively large values of the sum of the loop energies, ΔEloop = 98 and Δ F water TI ( LJ,tmpl ) = 55 kcal/mol; these values are significantly larger than their counterparts for Nwater < 160. To check this point further we integrated the water-template interactions in our sample for the free microstate generated in [10] for Nwater = 160 to find the still relatively large results, ΔFsum = 43 kcal/mol and ΔEtotal = 134 kcal/mol. In these calculations (which do not appear in the table) ΔEloop = 117 contributes significantly to the large ΔFsum; on the other hand, Δ F water TI ( LJ,tmpl ) = 64 kcal/mol is negative! It has been difficult to identify the configurationl elements that lead to these energetic changes even after examining the individual configurations numerically and by computer graphics. However, since the template is fixed, these results suggest that as the number of waters increase to Nwater = 160 their configuration undergoes a significant change, probably due to the increase of pressure in the sphere. Thus, water molecules that stay in the bulk for Nwater <160 enter the template already in the equilibration stage (i.e., when the water-template interactions are still intact) affecting thereby the results for Δ F water TI ( LJ,tmpl ) and Δ F water TI ( ch,tmpl ) .
These configurational changes of water depend on the template which is different in [10] than in the present study. In this context it should be noted that there is a large cavity in the active site of the bound template as the catalytic residue was trimmed significantly to become a smaller Ser residue (see 2.1). It should be noted, that this effect of water is expected and indeed is found to be less pronounced in the loop-water integrations, F water TI (ch,loop) and F water TI (LJ,loop) (Equation (16)). The problems involving the results for Nwater = 160 are not expected to disappear to a large extent when a single template is used, because changes in the arrangement of water will affect approximately equally the results for the free and bound microstates and the corresponding deviations will mostly be cancelled in differences.
Unlike the non-consistent results for Nwater = 160 discussed above, the results for Nwater < 160 show a consistent behavior, and based on [10] one would expect the simulations with Nwater = 140 to lead to the experimental, ΔFloop = 4 rather than ΔFloop = +18 kcal/mol. This disagreement might stem from the uncertainty in the crystal structure of the bound protein, 2dfp, which is characterized by large B-factors (with an average of 41.6 Å2) whereas for many atoms the values are much larger; for comparison, the average B-factors in 2ace, the crystal structure of free AChE, is 23.4 Å2. The large cavity formed in the active site of the bound template mentioned above might also affect the results for the bound protein. These are inherent problems which might not be easy to overcome completely; however, one might alleviated them by applying different initial optimizations to the two templates. Thus, in this work we have used the same harmonic force constants of 1 kcal/(molÅ2) to the free and bound x-ray structures, while it is plausible to anticipate that the force constant should be commensurate with the quality of the crystal structure, i.e., stronger for well defined structures. The results for Nwater = 160 provide some support for this assumption; thus, Fsum = 43 is better (i.e., smaller) than Fsum = 95 kcal/mol where these values are based on templates optimized by the force constants, 100 ([10]) and 1 kcal/(molÅ2) (here), respectively. However, the optimal force constants are unknown a-priori and their determination probably would require quite a few trials.
In summary: The fact that the experimental value, ΔFloop = 4 is known makes AChE a suitable system for checking the calculation of ΔFloop based on two different templates. As previously discussed, such calculations are of interest, as they mimic better the experimental conditions, and for an enzyme with a mobile loop, ΔFloop might be an important ingredient of the total absolute free energy of binding. Our calculations have revealed some instability in the results for Nwater = 160 but have led to consistent results for Nwater ≤ 140; obviously, it is somewhat disappointing that ΔFloop = 4 kcal/mol has not been recovered for Nwater = 140. However, this is not a failure of HSMD-TI per se but reflects the uncertainty in the crystal structure of the bound protein. One should also bear in mind that no other studies of a loop attached to two templates have been carried out thus far and besides our HSMD-TI papers [8,9,10,62] only few calculations of ΔFloop (based on a single template) are available [48,49]. Thus, the present work should be considered as the first step and a basis for future more extensive studies, which can be pursued in several avenues: for example, HSMD-TI can be applied to a mobile loop with known ΔFloop where the quality of the crystal structures of both, the bound and the free protein is high. As is already stated above, one can try optimizing the templates by changing the force constants in a systematic way for Nwater = 140, and more work can be done to elucidate the problematic behavior of the results for Nwater = 160, which again reflects problems in the simulation itself rather than in the application of HSMD-TI.
Table 3. Results in kcal/mol for energy and free energy components and their difference (Δ) between the free and bound microstates*.
Table 3. Results in kcal/mol for energy and free energy components and their difference (Δ) between the free and bound microstates*.
E loop energyF loop-water (TI)F template-water (TI)
template-looploop-loopsumCHLJsumCHLJsumF sumE total
40 waters ns = 40
Free−93−48−140−608−52−242−12−254−446−432
Bound−173−49−222−13−1−14−246−11−257−493−494
81182−479−383034761
60 waters ns = 40
Free−91−51−142−6012−48−313−2−315−505−675
Bound−174−47−221−8157−305−4−309−523−725
84−479−52−2−55−82−71850
80 waters ns = 40
Free−90−49−139−6745−23−35011−339−500−874
Bound−168−47−215−113019−35429−326−521−928
78−276−5615−415−18−132154
100 waters ns = 80
Free−108−39−147−6651−15−37937−342−504−1075
Bound−168−45−213−64640−39538−357−530−1134
60666−605−5516−115265
120 waters ns = 80
Free−105−39−143−6459−5−42673−353−500−1296
Bound−152−40−192−246340−44279−363−516−1339
47149−40−4−4416−6111538
140 waters ns = 80
Free−89 (0.5)−32 (0.7)−121 (0.9)−75 (0.3)82 (0.8)8 (0.8)−471 (0.4)126 (0.6)−345 (0.6)−459 (1.3)−1491 (1.4)
Bound−128 (0.4)−36 (0.7)−164 (0.8)−44 (0.4)76 (0.6)32 (0.8)−479 (0.6)135 (0.8)−344 (0.9)−477 (1.4)−1515 (1.6)
39 (0.6)4 (1.1)43 (1.2)−30 (1.5)6 (1.0)−24 (1.1)8 (0.6)−9 (1.0)−1 (1.1)18 (1.9)23 (2.2)
160 waters ns = 40
Free−82 (0.7)−28 (0.9)−111 (1.0)−73 (0.6)106 (1.0)33 (1.3)−523 (0.5)332 (1.4)−191 (1.5)−269 (2.1)−1524 (2.5)
Bound−167 (0.8)−42 (1.2)−209 (1.2)−8 (0.7)96 (0.9)88 (0.9)−520 (0.5)276 (1.6)−244 (1.5)−364 (1.6)−1661 (2.4)
84 (0.7)13 (1.2)98 (1.2)−65 (0.7)10 (0.9)−54 (1.0)−2 (0.5)55 (1.7)53 (1.7)95 (2.1)137 (2.6)
*The energies, E loop-tmpl , E loop-loop , their sum, Eloop, and the total energy, Etotal are defined in Equation (1). Floop-water(TI) are F water TI (ch) and F water TI (LJ) (Equation (16)) which are free energies calculated by TI, where the loop-water charges and Lennard-Jones interactions, respectively, are gradually eliminated; their sum is F water TI ( loop ) . Correspondingly, Ftmpl-water(TI) are F water TI ( ch,tmpl ) and F water TI ( LJ,tmpl ) [Equation (17)] which are free energies calculated by TI, where the template-water charges and Lennard-Jones interactions, respectively, are gradually eliminated; their sum is F water TI ( tmpl ) . Fsum = Eloop + F water TI ( loop) + F water TI ( tmpl) is the total free energy (without the contribution of the loop entropy, Sloop). Δ is the difference, free bound. ns is the sample size. Statistical errors, s/(ns)1/2, where s is the standard deviation, are given only for results based of a number of water molecules, Nwater = 140 and 160; the errors for Nwater < 140 are smaller than those obtained for Nwater = 140 and 160.

4. Summary and Conclusions

This is an additional paper in a series of papers where our HSMD-TI method has been extended systematically for mobile loops in proteins. The method provides the absolute entropy and free energy and thus enables one calculating these quantities for microstates m and n with a large conformational variance without the need to carry out a complex thermodynamic integration from m to n. Mobile loops in enzymes can be crucial for the extent of binding, where their open conformation is moved to form a “lid” over the bound active site.
Thus far we have studied the 7-residue mobile loop, 304–310 (Gly-His-Gly-Ala-Gly-Gly-Ser) of the enzyme porcine pancreatic α-amylase, where the system was modeled by the AMBER96 force field alone and by AMBER96 with the GB/SA implicit solvent [8]. Later the implicit solvent was replaced by explicit solvent, i.e., the α-amylase loop was capped with 70 TIP3P [9] water molecules. To determine the minimum size of the template and the minimum number of water molecules required to cap a loop, we have recently carried out a systematic study of the loop, 287–290 (Ile, Phe, Arg, and Phe) of AChE [10]. The conclusion is that to recover the experimental free energy difference the water density should be close to that of bulk water, and the minimum template size is of 1,000 atoms for this loop. In all these studies and in a recent study of a mobile loop of the protein streptavidin [62] the free and bound loop structures were attached to a common template that was “cut” from the crystal structure of the free loop. Obviously, one would seek to treat each loop attached to its own template. This has been the objective of the present paper.
As a first step of this study the heavy atoms of the two crystal structures were harmonically restrained (with a force constant of 1 kcal/Å2) and the energy was minimized. Afterwards the loop and (TIP3P) water atoms were allowed to relax in the presence of a fixed template. The relatively weak force constant (1 kcal/Å2) (as compared to 100 kcal/Å2 used in [10] for the pdb structure 2ace) was chosen to allow more extensive changes in the bound structure due to its relatively large B-factors and the chemical change imposed on it by removing the phosphate group; correspondingly, larger structural changes were also allowed to occur in the free structure. Then two cuts from each crystal structures were performed, which contain exactly the same atoms; these templates and the attached loops were subjected to further optimization steps as described earlier.
The new computational component in this study is the annihilation (for each template) of the water-template interactions using a TI procedure. Unlike the loop-water TI, one would expect the accuracy of the water–template TI to depend strongly on the system size. Thus, for Nwater ≤ 140 the errors are relatively small and the results for the free-bound differences in the total free energy and energy, ΔFsum and ΔEtotal are comparable, while for Nwater = 160 the errors become larger (in particular for the Lennard-Jones part of the water-template TI) and a significant increase in ΔFsum and ΔEtotal occurs. We attribute this jump in the results to changes in the configuration of water stemming from the increase in water pressure in the sphere as the number of waters grows.
Finally, calculation of ΔFloop based on separate templates which are cut from the free and bound crystal structures is more realistic than using a single common template for both loop conformations. ΔFloop should also be considered in the calculation of the absolute free energy of binding when the conformational change of the loop is due to ligand binding is of an induced fit type. Thus, while the present calculations show that recovering the experimental result, ΔFloop = Ffree − Fbound ~ 4 kcal/mol is not straightforward, to achieve improvement more research is needed, which will be based on the conclusions of the initial work carried out here, as specified in the summary of the previous section.

5. Appendix

5.1. Thermodynamic Integration of Water

As described earlier, two TI processes are carried out where the first is applied to the loop and water. Thus, the interaction energy (electrostatic and Lennard Jones (LJ)) between a fixed loop structure and the (moving) water molecules is decreased gradually to zero (rather than increased from zero) at constant T and V; the water-water and water-template potential energy is unchanged. After completing a loop-water TI (i.e., the loop-water interactions are zero), a second TI process is performed where the water-template interactions (LJ and electrostatic) are decreased to zero; this TI process starts from the last water configuration obtained in the loop-water TI. For the (LJ) potential we have used the shifted scaling potential, introduced by Zacharias et al. [63]:
φ ( r i j , λ ) = λ 4 ε [ σ 12 ( r i j 2 + δ ( 1 λ ) ) 6 σ 6 ( r i j 2 + δ ( 1 λ ) ) 3 ]
where the shift parameter, δ = 3 Å2, prevents the divergence of the potential (and its derivative) at small pair separations; a similar scaling function is used for the Coulomb interactions. The free energy derivatives with respect to λ, ∂F/∂λ is:
F λ = E ( x N , λ ) λ λ
where the derivative of the energy is calculated analytically. The integration with respect to λ is carried out by dividing the range [1,0] into 20 equal integration bins Δλi. The (λ = 1→λ = 0) integration of the electrostatic interactions (i.e., charge elimination) is carried out first (in the presence of intact LJ interactions) followed by a ε = λ = 1→0 integration of the LJ interactions. Thus, the entire two-stage process is based on 40 ∂F/∂λi integration steps.
The MD simulation consists of a 2 fs integration step. Each (Δλi) step (window) starts with energy minimization (based on λi) of the last structure obtained in the simulation of the i-1 step, followed by 5 ps MD simulation for equilibration, which is discarded. The next step the production MD simulation, is of 20 ps in the loop-water TI, where a configuration is retained every 0.02 ps, i.e., altogether 1,000 configurations are used for evaluating <∂F/∂λi>. For the template-water TI, the production simulation is longer, of 40 ps (2,000 configurations), due to the larger size of the template. It should be pointed out again that these two integrations are carried out for two different templates. The computer time is discussed in the Appendix of [10].

Acknowledgments

This work was supported by NIH grant 2-R01 GM066090.

References and Notes

  1. White, R.P.; Meirovitch, H. A simulation method for calculating the absolute entropy and free energy of fluids: Application to liquid argon and water. Proc. Natl. Acad. Sci. USA 2004, 101, 9235–9240. [Google Scholar] [CrossRef] [PubMed]
  2. Cheluvaraja, S.; Meirovitch, H. Simulation method for calculating the entropy and free energy of peptides and proteins. Proc. Natl. Acad. Sci. USA 2004, 101, 9241–9246. [Google Scholar] [CrossRef] [PubMed]
  3. White, R.P.; Meirovitch, H. Lower and upper bounds for the absolute free energy by the hypothetical scanning Monte Carlo Method: Application to liquid argon and water. J. Chem. Phys. 2004, 121, 10889–10904. [Google Scholar] [CrossRef] [PubMed]
  4. White, R.P.; Meirovitch, H. Free volume hypothetical scanning molecular dynamics method for the absolute free energy of liquids. J. Chem. Phys. 2006, 124, 204108–204113. [Google Scholar] [CrossRef] [PubMed]
  5. White, R.P.; Meirovitch, H. Calculation of the entropy of random coil polymers with the hypothetical scanning Monte Carlo Method. J. Chem. Phys. 2005, 123, 214908–214911. [Google Scholar] [CrossRef] [PubMed]
  6. Cheluvaraja, S.; Meirovitch, H. Calculation of the entropy and free energy by the hypothetical scanning Monte Carlo Method: Application to peptides. J. Chem. Phys. 2005, 122, 054903–054913. [Google Scholar] [CrossRef] [PubMed]
  7. Cheluvaraja, S.; Meirovitch, H. Calculation of the entropy and free energy of peptides by molecular dynamics simulations using the hypothetical scanning molecular dynamics method. J. Chem. Phys. 2006, 125, 024905–024913. [Google Scholar] [CrossRef] [PubMed]
  8. Cheluvaraja, S.; Meirovitch, H. Stability of the free and bound microstates of a mobile loop of α amylase obtained from the absolute entropy and free energy. J. Chem. Theory Comput. 2008, 2008, 4, 192–208. [Google Scholar] [CrossRef]
  9. Cheluvaraja, S.; Mihailescu, M.; Meirovitch, H. Entropy and free energy of a mobile loop in explicit water. J. Phys, Chem. B 2008, 112, 9512–9522. [Google Scholar] [CrossRef] [PubMed]
  10. Mihailescu, M.; Meirovitch, H. Absolute free energy and entropy of a mobile loop of the enzyme Acetyl Choline Esterase. J. Phys. Chem. B 2009, 113, 7950–7964. [Google Scholar] [CrossRef] [PubMed]
  11. Beveridge, D.L.; DiCapua, F.M. Free energy via molecular simulation: applications to chemical and biomolecular systems. Annu. Rev. Biophys. Chem. 1989, 18, 431–492. [Google Scholar] [CrossRef] [PubMed]
  12. Kollman, P.A. Free energy calculations: applications to chemical and biochemical phenomena. Chem. Rev. 1993, 93, 2395–2417. [Google Scholar] [CrossRef]
  13. Jorgensen, W.L. Free energy calculations: a breakthrough for modeling organic chemistry in solution. Acc. Chem. Res. 1989, 22, 184–189. [Google Scholar] [CrossRef]
  14. Meirovitch, H. Calculation of the free energy and entropy of macromolecular systems by computer simulation. Rev. Compu.t Chem. 1998, 12, 1–74. [Google Scholar]
  15. Gilson, M.K.; Given, J.A.; Bush, B.L.; McCammon, J.A. The statistical thermodynamic basis for computing of binding affinities: A critical review. Biophys. J. 1997, 72, 1047–1069. [Google Scholar] [CrossRef] [PubMed]
  16. Boresch, S.; Tettinger, F.; Leitgeb, M.; Karplus, M. Absolute binding free energies: A qualitative approach for their calculation. J. Phys. Chem. B 2003, 107, 9535–9551. [Google Scholar] [CrossRef]
  17. Meirovitch, H. Recent developments in methodologies for calculating entropy and free energy of biological systems by computer simulation. Curr. Opin. Struct. Biol. 2007, 17, 181–186. [Google Scholar] [CrossRef] [PubMed]
  18. Gilson, M.K.; Zhou, H.X. Calculation of protein-ligand binding affinities. Ann. Rev. Biophys. Biomol. Struct. 2007, 36, 21–42. [Google Scholar] [CrossRef] [PubMed]
  19. Foloppe, N.; Hubbard, R. Towards predictive ligand design with free-energy based computational methods? Curr. Med. Chem. 2007, 13, 3583–3608. [Google Scholar] [CrossRef]
  20. Van Gunsteren, W.F.; Bakowies, D.; Baron, R.; Chandrasekhar, I.; Christen, M.; Daura, X.; Gee, P.J.; Geerke, D.P.; Glättli, A.; Hünenberger, P.H.; Kastenholz, M.A.; Oostenbrink, C.; Schenk, M.; Trzesniak, D.; Van Der Vegt, N.F.A.; Yu, H.B. Biomolecular modeling: Goals, Problems, Perspectives. Angew. Chem. Int. Ed. 2006, 45, 4064–4092. [Google Scholar] [CrossRef] [PubMed]
  21. Singh, N.; Warshel, A. A comprehensive examination of the contributions to the binding entropy of protein-ligand complexes. Proteins 2010, 78, 1724–1735. [Google Scholar] [CrossRef] [PubMed]
  22. Deng, Y.; Roux, B. Computations of standard binding free energies with molecular dynamics simulations. J. Phys. Chem. B 2009, 113, 2234–2246. [Google Scholar] [CrossRef] [PubMed]
  23. Warshel, A; Sharma, P.K.; Kato, M.; Parson, W.W. Modeling electrostatic effects in proteins. Biochim. Biophys. Acta. 2006, 1764, 1647–1676. [Google Scholar]
  24. Alder, B.J.; Wainwright, T.E. Studies of molecular dynamics. I. General method. J. Chem. Phys. 1959, 31, 459–466. [Google Scholar] [CrossRef]
  25. McCammon, J.A.; Gelin, B.R.; Karplus, M. Dynamics of folded proteins. Nature 1977, 267, 585–590. [Google Scholar] [CrossRef] [PubMed]
  26. Elber, R.; Karplus, M. Multiple conformational states of proteins a molecular dynamics analysis of myoglobin. Science 1987, 235, 318–321. [Google Scholar] [CrossRef] [PubMed]
  27. Stillinger, F.H.; Weber, T.A. Packing structures and transitions in liquids and solids. Science 1984, 225, 983–989. [Google Scholar] [CrossRef] [PubMed]
  28. Gō, N.; Scheraga, H.A. Analysis of the contribution of internal vibrations to the statistical weights of equilibrium conformations of macromolecules. J. Chem. Phys. 1969, 51, 4751–4767. [Google Scholar] [CrossRef]
  29. Gō, N.; Scheraga, H.A. On the use of classical statistical mechanics in the treatment of polymer chain conformation. Macromolecules 1976, 9, 535–542. [Google Scholar] [CrossRef]
  30. Hagler, A.T.; Stern, P.S.; Sharon, R.; Becker, J.M.; Naider, F. Computer simulation of the conformational properties of oligopeptides. Comparison of theoretical methods and analysis of experimental results. J. Am. Chem. Soc. 1979, 101, 6842–6852. [Google Scholar] [CrossRef]
  31. Messer, B.M.; Roca, M.; Chu, Z.T.; Vicatos, S.; Kilshtain, A.V.; Warshel, A. Multiscale simulations of protein landscapes: Using coarse-grained models as reference potentials to full explicit models. Proteins 2010, 78, 1212–1227. [Google Scholar]
  32. Karplus, M.; Kushick, J.N. Method for estimating the configurational entropy of macromolecules. Macromolecules 1981, 14, 325–332. [Google Scholar] [CrossRef]
  33. Schlitter, J. Estimation of absolute and relative entropies of macromolecules using the covariance matrix. Chem. Phys. Lett. 1993, 215, 617–621. [Google Scholar] [CrossRef]
  34. Andricioaei, I.; Kaplus, M. On the calculation of entropy from covariance matrices of the atomic fluctuations. J. Chem. Phys. 2001, 115, 6289–6292. [Google Scholar] [CrossRef]
  35. Cornell, W.D.; Cieplak, P.; Bayly, C.I.; Gould, I.R.; Merz, K.M., Jr.; Ferguson, D.M.; Spellmeyer, D.C.; Fox, T.; Caldwell, J.W.; Kollman, P.A. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 1995, 117, 5179–5197. [Google Scholar] [CrossRef]
  36. Qiu, D.; Shenkin, P.S.; Hollinger, F.P.; Still, W.C. The GB/SA continuum model for salvation: A fast analytical method for the calculation approximate Born radii. J. Phys. Chem. 1997, 101, 3005–3014. [Google Scholar] [CrossRef]
  37. Jorgensen, W.L.; Chandrasekhar, J.; Madura, J.D.; Impey, R.W.; Klein, M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926–935. [Google Scholar] [CrossRef]
  38. White, R.P.; Meirovitch, H. Minimalist explicit solvation models for surface loops in proteins. J. Chem. Theory Comput. 2006, 2, 1135–1151. [Google Scholar] [CrossRef] [PubMed]
  39. Steinbach, P.J.; Brooks, B.R. Protein hydration elucidated by molecular dynamics simulation. Proc. Natl. Acad. Sci. USA. 1993, 90, 9135–9139. [Google Scholar] [CrossRef] [PubMed]
  40. Millard, C.B.; Kryger, G.; Ordentlich, A.; Greenblatt, H.M.; Harel, M.; Raves, M. L.; Segall, Y.; Barak, D.; Shafferman, A.; Silman, I.; Sussman, J.L. Crystal Structures of aged phosphonylated acetylcholinesterase: Nerve agent reaction products at the atomic level. Biochemistry 1999, 38, 7032–7039. [Google Scholar] [CrossRef] [PubMed]
  41. Millard, C.B.; Koellner, G.; Ordentlich, A.; Shafferman, A.; Silman, I.; Sussman, J.L. Reaction products of acetylcholinesterase and VX reveal a mobile histidine in the catalytic triad. J. Am. Chem. Soc. 1999, 121, 9883–9884. [Google Scholar] [CrossRef]
  42. Chiu, Y.C.; Main, A.R.; Dauterman, W.C. Affinity and phosphorylat1on constants of a series of O,O-dialkyl malaoxons and paraoxons with acetylcholinesterase. Biochem. Pharmacol. 1969, 18, 2171–2177. [Google Scholar] [CrossRef]
  43. Hart, G.J.; O’Brien, R.D. Recording spectrophotometric method for determination of dissociation and phosphorylation constants for the inhibition of acetylcholineesterase by organophosphates in the presence of substrate. Biochemistry 1973, 12, 2940–2945. [Google Scholar]
  44. Forsberg, A.; Puu, G. Kinetics for the inhibition of acetylcholinesterase from the electric eel by some organophosphates and carbamates. Eur. J. Biochem. 1984, 140, 153–156. [Google Scholar] [CrossRef] [PubMed]
  45. Ordentlich, A.; Barak, D.; Kronman, C.; Flashner, Y.; Leitner, M.; Segall, Y.; Ariel, N.; Cohen, S.; Velan, B.; Shafferman, A. Dissection of the human acetylcholinesterase active center determinants of substrate specificity. Identification of residues constituting the anionic site, the hydrophobic site, and the acyl pocket. J. Biol. Chem. 1993, 268, 17083–17095. [Google Scholar] [PubMed]
  46. Ordentlich, A.; Kronman, C.; Barak, D.; Stein, D.; Ariel, N.; Marcus, D.; Velan, B.; Shafferman, A. Engineering resistance to 'aging' of phosphylated human. Acetylcholinesterase. role of hydrogen bond network in the active center. FEBS. Lett. 1993, 334, 215–220. [Google Scholar] [CrossRef]
  47. Raves, M.L.; Harel, M.; Pang, Y.P.; Silman, I.; Kozikowski, A.P.; Sussman, J.L. Crystal Structures of aged phosphonylated acetylcholinesterase: Nerve agent reaction products at the atomic level. Nat. Struct. Biol. 1997, 4, 57–63. [Google Scholar] [CrossRef] [PubMed]
  48. Carlacci, L; Millard, C.B.; Olson, M.A. Conformational energy landscape of the acyl pocket loop in acetylcholinesterase: A Monte Carlo-generalized Born model study. Biophys. Chem. 2004, 111, 143–157. [Google Scholar] [CrossRef] [PubMed]
  49. Olson, M.A. Modeling loop reorganization free energies of acetylcholinesterase: A comparison model of explicit and implicit solvent models. Proteins 2004, 57, 645–650. [Google Scholar] [CrossRef] [PubMed]
  50. Getzoff, E.D.; Geysen, H.M.; Rodda, S.J.; Alexander, H.; Tainer, J.A.; Lerner, R.A. Mechanisms of antibody binding to a protein. Science 1987, 235, 1191–1196. [Google Scholar] [CrossRef] [PubMed]
  51. Rini, J.M.; Schulze-Gahmen, U.; Wilson, I.A. Structural evidence for induced fit as a mechanism for antibody- antigen recognition. Science 1992, 255, 959–965. [Google Scholar] [CrossRef] [PubMed]
  52. Constantine, K.L.; Friedrichs, M.S.; Wittekind, M.; Jamil, H.; Chu, C.H.; Parker, R.A.; Goldfarb, V.; Mueller, L.; Farmer, B.T. Backbone and side chain dynamics of uncomplexed human adipocyte and muscle fatty acid-binding proteins. Biochemistry 1998, 37, 7965–7980. [Google Scholar] [CrossRef] [PubMed]
  53. Cansu, S; Doruker, P. Dimerization affects collective dynamics of triosphosphate isomerase. Biochemistry 2008, 47, 1358–1368. [Google Scholar]
  54. General, I.J; Dragomirova, R; Meirovitch, H. Calculation of the Absolute free energy of binding: the effect of a mobile loop on the avidin/biotin complex. J. Phys. Chem. B 2010. in review. [Google Scholar]
  55. Ponder, J.W. TINKER Software Tools for Molecular Design, version 3.9; Department of biochemistry and molecular biophysics, Washington University School of Medicine: St. Louis, MO, USA, 2001. [Google Scholar]
  56. Meirovitch, H.; Alexandrowicz, Z. On the zero fluctuation of the microscopic free energy and its potential use. J. Stat. Phys. 1976, 15, 123–127. [Google Scholar] [CrossRef]
  57. Meirovitch, H. Simulation of a free energy upper bound, based on the anti-correlation between an approximate free energy functional and its fluctuation. J. Chem. Phys. 1999, 111, 7215–7224. [Google Scholar] [CrossRef]
  58. Meirovitch, H. The scanning simulation method for macromolecules. Int. J. Mod. Phys. 1990, 1, 119–145. [Google Scholar] [CrossRef]
  59. Meirovitch, H. Computer simulation of the free energy of polymer chains with excluded volume and with finite interactions. Phys. Rev. A 1985, 32, 3709–3715. [Google Scholar] [CrossRef] [PubMed]
  60. Allen, M.P.; Tildesley, D.J. Computer Simulation of Liquids; Clarenden Press: Oxford, UK, 1987. [Google Scholar]
  61. Chang, C.E.; Chen, W.; Gilson, M.K. Evaluating the accuracy of the quasiharmonic approximation. J. Chem. Theory. Comput. 2005, 1, 1017–1028. [Google Scholar] [CrossRef]
  62. General, I.J.; Meirovitch, H. Relative stability of the open and closed conformations of the active site loop of streptavidin. J. Chem. Phys 2010. in review. [Google Scholar] [CrossRef] [PubMed]
  63. Zacharias, M.; Straatsma, T.P.; McCammon, J.A. Separation-shifted scaling, a new scaling method for Lennard-Jones interactions in thermodynamic integration. J. Chem. Phys. 1994, 100, 9025–9031. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Mihailescu, M.; Meirovitch, H. Entropy and Free Energy of a Mobile Loop Based on the Crystal Structures of the Free and Bound Proteins. Entropy 2010, 12, 1946-1974. https://doi.org/10.3390/e12081946

AMA Style

Mihailescu M, Meirovitch H. Entropy and Free Energy of a Mobile Loop Based on the Crystal Structures of the Free and Bound Proteins. Entropy. 2010; 12(8):1946-1974. https://doi.org/10.3390/e12081946

Chicago/Turabian Style

Mihailescu, Mihail, and Hagai Meirovitch. 2010. "Entropy and Free Energy of a Mobile Loop Based on the Crystal Structures of the Free and Bound Proteins" Entropy 12, no. 8: 1946-1974. https://doi.org/10.3390/e12081946

APA Style

Mihailescu, M., & Meirovitch, H. (2010). Entropy and Free Energy of a Mobile Loop Based on the Crystal Structures of the Free and Bound Proteins. Entropy, 12(8), 1946-1974. https://doi.org/10.3390/e12081946

Article Metrics

Back to TopTop