Summary
We introduce PHASE, a highly flexible system for common pharmacophore identification and assessment, 3D QSAR model development, and 3D database creation and searching. The primary workflows and tasks supported by PHASE are described, and details of the underlying scientific methodologies are provided. Using results from previously published investigations, PHASE is compared directly to other ligand-based software for its ability to identify target pharmacophores, rationalize structure-activity data, and predict activities of external compounds.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Guner OF (2000) Pharmacophore perception, development, and use in drug design. International University Line, La Jolla, CA
Van Drie JH (2003) Curr Pharm Design 9:1649
Topliss JG (1983) Quantitative structure-activity relationships of drugs, vol 19. Academic Press, New York
Martin YC (1978) Quantitative drug design: a critical introduction. Marcel Dekker, New York
Hansch C, Fujita T (1964) J Am Chem Soc 86:1616
Gund P, Wipke WT, Langridge R (1974) Computer searching of a molecular structure file for pharmacophoric patterns, vol 3. Elsevier, Amsterdam, pp 33–39
Kier LB, Hall LH (1976) Molecular connectivity in chemistry and drug research. Academic Press, London
Hancsh C, Leo A (1979) Substituent constants for correlation analysis in chemistry and biology. Wiley, New York
Hopfinger AJ (1980) J Am Chem Soc 102:7196
Van Drie JH, Weininger D, Martin YC (1989) J Comput-Aided Mol Design 3:225
Lauri G, Bartlett PA (1994) J Comput-Aided Mol Design 8:51
Van Drie JH (1997) J Comput-Aided Mol Design 11:39
Chen X, Rusinko A, III Young SS (1998) J Chem Inf Comput Sci 38:1054
Chen X, Rusinki A, III Tropsha A, Young SS (1999) J Chem Inf Comput Sci 39:887
Greene J, Kahn S, Savoj H, Sprague P, Teig S (1994) J Chem Inf Comput Sci 34:1297
Barnum D, Greene J, Smellie A, Sprague P (1996) J Chem Inf Comput Sci 36:563
Martin YC, In Hansch C, Fujita T (eds) (1995) Classical and 3D QSAR in agrochemistry. American Chemical Society, Washington, DC, pp 318–329
Jones G, Willett P, Glen RC (1995) J Comput-Aided Mol Design 9:532
Cramer RD, Patterson DE, Bunce JD (1988) J Am Chem Soc 110:5959
Van Drie JH, In Guner OF (ed) (2000) Pharmacophore perception, development, and use in drug design. International University Line, La Jolla, CA, pp 517–530
Ligprep 2.0 (2006) Schrodinger, LLC, New York, NY
MacroModel 9.1 (2006) Schrodinger, LLC, New York, NY
Halgren TA (1996) J Comput Chem 17:520
MacroModel 2.0 (2006) User Manual, Schrodinger LLC, New York, NY
Chang G, Guida W, Still WC (1989) J Am Chem Soc 111:4379
Kolossvary I, Guida WC (1996) J Am Chem Soc 118:5011
SMARTS – Language for Describing Molecular Patterns, Daylight Chemical Information Systems, Inc., Aliso Viejo, CA
Marshall GR, Barry CD, Bosshard HE, Dammkoehler RA, Dunn DA, In Olson EC, Christoffersen RE (eds) (1979) Computer-assisted drug design. American Chemical Society, Washington, DC, pp 205–226
Beusen DD, Marshall GR, In Guner OF (ed) (2000) Pharmacophore perception, development, and use in drug design. International University Line, La Jolla, CA, pp 23–45
Van Drie JH (1997) J Chem Inf Comput Sci 37:38
Patel Y, Gillet VJ, Bravi G, Leach AR (2002) J Comput-Aided Mol Design 16:653
Suling WJ, Reynolds RC, Barrow EW, Wilson LN, Piper JR, Barrow WW (1998) J Antimicrob Chemother 42:811
Suling WJ, Seitz LE, Pathak V, Westbrook L, Barrow EW, Zywno-Van-Ginkel S, Reynolds RC, Piper JR, Barrow W (2000) Antimicrob Agents Chemoth 44:2784
Debnath AK (2002) J Med Chem 45:41
Maestro 7.5 (2006) Schrodinger, LLC, New York, NY
World Drug Index (2001) Thomson Scientific
Wold H, In Gani J (ed) (1975) Perspectives in probability and statistics, Papers in Honour of Bartlett MS on the Occasion of His Sixty-Fifth Birthday, Academic Press, London, pp 117–142
Wold S, Ruhe H, Wold H, Dunn WJI (1984) SIAM J Scientific Stat Comput 5:735
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: selectivity estimation
In PHASE, the selectivity of a pharmacophore hypothesis H is defined as follows:
where p(H) is the probability that a random drug-like molecule will match the hypothesis, irrespective of any activity exhibited by that molecule toward the biological target in question. Given a database of drug-like molecules, it is straightforward to search that database for matches to a hypothesis, and thereby arrive at an estimate of selectivity based on that particular sample population of molecules. However, application of such a procedure is far too time-consuming to be practical when scoring a large number of hypotheses, so a rapid means of estimating selectivity based on the physical characteristics of a hypothesis is sought.
Van Drie [12] has shown that selectivities of two-point pharmacophores can be reliably estimated with respect to a given database using pre-tabulated probabilities that cover discrete distance ranges. He went on to show that highly selective three-point pharmacophores can be constructed by combining two-point pharmacophores with the highest selectivities. This is a natural consequence of the fact that the probability of matching a k-point pharmacophore \(\hbox{H}^{\langle k \rangle}\) is less than or equal to the probability of matching all (k· (k−1))/2 two-point pharmacophores embedded within \(\hbox{H}^{\langle k \rangle}\):
Strict equality is not preserved because a given molecule may match each of the two-point pharmacophores even if it fails to contain a single arrangement of k features that matches \(\hbox{H}^{\langle k \rangle}\). Nevertheless, since matching the two-point pharmacophores is a necessary condition for matching \(\hbox{H}^{\langle k \rangle}\), the right-hand-side of Eq. A2 is of interest for purposes of estimating selectivity.
If the two-point probabilities are independent, then the following relation holds:
Further, if sites i and j are separated by a distance of d ij , and their pharmacophore feature types are α(i) and α(j), respectively, then Eq. A3 can be rewritten in terms of probabilities of matching specific inter-feature distances to within a tolerance Δd:
Given a population of drug like molecules and a pair of feature types x and y, there is a probability density p *(d xy ) that describes the distribution of xy pharmacophores within that population. While p *(d xy ) may be complex and possibly discontinuous, for purposes of estimating selectivity a simple Gaussian dependence is assumed, so that the probability density may be written as:
For small values of Δd, the following approximation can be made:
Substituting A5 and A6 into A4 yields
Taking logarithms,
Although it is certainly possible to estimate the univariate parameters σα (i)α (j) and μα (i)α (j) for each possible pair of feature types, it is advantageous to treat the right-hand-side of Eq. A8 as a general polynomial in d ij , and fit the associated coefficients to observed probabilities for a large number and variety of pharmacophores:
This treatment can help overcome certain deficiencies in the model, such as the assumption that the two-point probabilities are independent of each other (Eq. A3). In practice, the second-order terms in Eq. A9 do not add much statistically independent information to the model, and we have found a first-order approximation to be satisfactory:
To determine appropriate values for the A and B parameters, a training set was assembled by randomly selecting 1000 minimized structures from a conformational database of the World Drug Index [36], then randomly choosing between two and seven pharmacophore sites from each structure. This yielded a training set of 1000 pharmacophores containing varying numbers of sites and different combinations of the features A, D, H, N, P, and R. A sample probability was computed for each pharmacophore Hλ by determining the number of structures M λ out of the original 1000 that matched the pharmacophore to within a tolerance of 2.0 Å in all intersite distances:
Since there were six types of features in the sampled pharmacophores, the number of unique feature pairs was 21, requiring a total of 42 adjustable parameters. No attempt was made to optimize all of these independently because of the possibility of only limited information for certain pairs of features. For example, pharmacophores that contain both negative and positive ionizable features tend to be very rare among drug-like structures, so they cannot be expected to be well-represented in a relatively small population sample. Therefore, parameter values were determined by applying a partial least-squares (PLS) procedure to fit the −log10(Hλ) values in terms of latent factors constructed from the pool of 42 variables. Details of the PLS algorithm used in PHASE are provided in Appendix B.
To arrive at an appropriate number of PLS factors to include in the model, predictions were made for a test set of 500 pharmacophores drawn from the same sample population of 1000 WDI structures. As successively more PLS factors were incorporated into the model, test set errors trended downward until reaching a minimum at 23 factors. At this point, the test set RMSE was 0.372 log units and Q 2 was 0.786. This compared to a training set RMSE of 0.343 and R 2 of 0.826. This model has been integrated into PHASE for computation of the Selectivity_Score term that appears in Eq. 7.
It is worth noting that training sets containing as many as 5000 structures were also investigated, and no significant improvement in the test set predictions was observed. The protocol of using 1000 structures was adopted because it is far less computationally demanding, and therefore represents a practical approach for users who wish to calibrate selectivity models based on a different set of structures.
Appendix B: partial least-squares regression
PHASE utilizes a standard recursive procedure for extracting orthogonal latent factors from a data matrix in a predetermined number of steps. It is distinguished from the NIPALS algorithm [37, 38], which is an iterative approach with a user-defined stopping criterion, but no absolute control over the total number of steps.
Let X∈R n × m represent the independent variable data matrix for a training set of n observations and a pool of m variables. Let y∈R n × 1 represent the training set dependent data, which will be estimated using latent factors extracted from X. Creation of the PLS regression model proceeds as follows:
Center each column of X:
Center y:
Determine PLS factors and regression coefficients for up to M PLS factors (M ≤ m):
For a regression with M PLS factors, the estimates \({\hat{{\bf y}}}\) are then given by:
To apply the M-factor PLS model to a new set of \({\tilde{n}}\) observations with data matrix \({\tilde{{\bf X}}}\) , the regression coefficients b must first be translated back to the space of the original X variables:
Define
The coefficients b x may then be used to make estimates for the new observations as follows:
Rights and permissions
About this article
Cite this article
Dixon, S.L., Smondyrev, A.M., Knoll, E.H. et al. PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J Comput Aided Mol Des 20, 647–671 (2006). https://doi.org/10.1007/s10822-006-9087-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-006-9087-6