Abstract
Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85 % of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.
Similar content being viewed by others
Notes
As opposed to parametric approaches, where the information from the training data are summarized in the parameters of a distribution, non-parametric approaches require the training data for later predictions. This distinction does not prevent non-parametric approaches from having parameters, here the regression weights \(\varvec{\alpha}\) and hyper-parameters \(\varvec{\theta}\). Parameters \(\varvec{\alpha}\), which directly belong to the model itself, are computed from the data by solving an optimization problem. Hyper-parameters \(\varvec{\theta}\) parameterize the kernel, and can be estimated via gradient-based optimization by maximizing the marginal likelihood.
Predictions are technically equivalent to those of kernel ridge regression [28], a regularized form of ordinary regression. Here, we do not use additional features of GPs like predictive variance. However, the used GP MTL methods do make use of Bayesian aspects of GPs.
Technically, \({\mathbf{K}^\mathbf{t} \otimes \mathbf{K}^\mathbf{x} \in {\mathbb{R}}^{MN \times MN}}\). In our setting, each sample (compound) occurs in one task only. After removing (marginalizing out) rows and columns corresponding to combinations of compounds and tasks that don’t occur, the resulting matrix is N × N. In practice, it is not necessary to construct the MN × MN matrix explicitly.
Task similarity matrices are positive definite. Their entries thus correspond to evaluations of an inner product in some Hilbert space, which can be converted to Euclidean distance by using \(||\mathbf{x}-\mathbf{z}||_{2}^{2}= \sum_{i=1}^d |x_i-z_i|^2 = \; <\!\mathbf{x}-\mathbf{z},\mathbf{x}-\mathbf{z}\!> =<\!\mathbf{x},\mathbf{x}\!> -2 <\!\mathbf{x},\mathbf{z}\!> + <\!\mathbf{z},\mathbf{z}\!>\).
Comparison is based on Table S2 of the supplement of Ref. [20], using column R’ and third lines from each row of the common tasks.
References
Rupp M, Körner R, Tetko IV (2010) Predicting the pK a of small molecules. Comb Chem High Throughput Screen 14(5):307–327
Lee A, Crippen G (2009) Predicting pK a . J Chem Inf Model 49(9):2013–2033
Fraczkiewicz R (2006) In silico prediction of ionization. In: Testa B, Waterbeemd H (eds) Comprehensive medicinal chemistry II, vol 5, Elsevier, Oxford, pp 603–626
Wan H, Ulander J (2006) High-throughput pK a screening and prediction amenable for ADME profiling. Expert Opin Drug Metab Toxicol 2(1):139–155
Ho J, Coote M (2010) A universal approach for continuum solvent pK a calculations: are we there yet? Theor Chim Acta 125(1–2):3–21
Tehan B, Lloyd E, Wong M, Pitt W, Gancia E, Manallack D (2002) Estimation of pKa using semiempirical molecular orbital methods. Part 2: application to amines, anilines and various nitrogen containing heterocyclic compounds. Quant Struct Act Rel 21(5):473–485
Caruana R (1997) Multi-task learning. Mach Learn 28:41–75
Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156
Varnek A, Gaudin C, Marcou G, Baskin I, Pandey A, Tetko I (2009) Inductive transfer of knowledge: application of multi-task learning and feature net approaches to model tissue-air partition coefficients. J Chem Inf Model 49(1):133–144
Ning X, Rangwala H, Karypis G (2009) Multi-assay-based structure-activity relationship models: improving structure-activity relationship models by incorporating activity information from related targets. J Chem Inf Model 49(11):2444–2456
Mordelet F, Vert JP (2011) ProDiGe: PRioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC Bioinf 12:389
Rossotti F, Rossotti H (1961) The determination of stability constants and other equilibrium constants in solution. McGraw-Hill, New York
Hasselbalch KA (1916) Die Berechnung der Wasserstoffzahl des Blutes aus der freien und gebundenen Kohlensäure desselben, und die Sauerstoffbindung des Blutes als Funktion der Wasserstoffzahl. Biochem Z 78:112–144
Clark J, Perrin D (1964) Prediction of the strength of organic bases. Q Rev Chem Soc 18:295–320
Perrin DD, Dempsey B, Serjeant EP (1981) pK a Prediction for organic acids and bases. Chapman and Hall/CRC Press, Boca Raton
Lyman W, Reehl W, Rosenblatt D (eds) (1982) Handbook of chemical property estimation methods: environmental behavior of organic compounds. McGraw-Hill, New York
Livingstone D (2003) Theoretical property predictions. Curr Top Med Chem 3(10):1171–1192
Hammett L (1937) The effect of structure upon the reactions of organic compounds. Benzene derivatives. J Am Chem Soc 59(1):96–103
Ertl P (1997) Simple quantum chemical parameters as an alternative to the Hammett sigma constants in QSAR studies. Quant Struct Act Rel 16(5):377–382
Rupp M, Körner R, Tetko IV (2010) Estimation of acid dissociation constants using graph kernels. Mol Inf 29(10):731–740
Tehan B, Lloyd E, Wong M, Pitt W, Montana J, Manallack D, Gancia E (2002) Estimation of pKa using semiempirical molecular orbital methods. Part 1: application to phenols and carboxylic acids. Quant Struct Act Rel 21(5):457–472
Howard P, Meylan W (1999) Physical/chemical property database (PHYSPROP). Syracuse Research Corporation, Environmental Science Center, 6225 Running Ridge Road, North Syracuse, New York
Fukui K, Yonezawa T, Nagata C (1954) Theory of substitution in conjugated molecules. Bull Chem Soc Jpn 27(7):423–427
Sadowski J, Gasteiger J (1993) From atoms and bonds to three-dimensional atomic coordinates: automatic model builders. Chem Rev 93(7):2567–2581
Stewart J (1997) MOPAC: a general molecular orbital package. Quant Chem Prog Exch 10:86
Sushko I, Novotarskyi S, Körner R, Pandey AK, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY, Todeschini R, Varnek A, Marcou G, Ertl P, Potemkin V, Grishina M, Gasteiger J, Schwab C, Baskin II, Palyulin VA, Radchenko EV, Welsh WJ, Kholodovych V, Chekmarev D, Cherkasov A, de Sousa JA, Zhang QY, Bender A, Nigsch F, Patiny L, Williams A, Tkachenko V, Tetko IV (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25(6):533–554
Rasmussen CE, Williams CK (2005) Gaussian processes for machine learning. MIT Press, Cambridge
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Data mining, inference, and prediction, 2nd edn. Springer, New York
Cressie NA (1993) Statistics for spatial data. Wiley, New York
Bonilla E, Chai KM, Williams C (2008) Multi-task Gaussian process prediction. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems 20. MIT Press, Cambridge, pp 153–160
Rebonato R, Jäckel P (1999) The most general methodology for creating a valid correlation matrix for risk management and option pricing purposes. J Risk 2(2):17–27
Skolidis G, Sanguinetti G (2011) Bayesian multitask classification with Gaussian process priors. IEEE Trans Neural Netw 22(12):2011–2021
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometr Bull 1(6):80–83
Manallack D (2007) The pK a distribution of drugs: application to drug discovery. Perspect Med Chem 1:25–38
Liao C, Nicklaus M (2009) Comparison of nine programs predicting pK a values of pharmaceutical substances. J Chem Inf Model 49(12):2801–2812
Acknowledgments
The authors thank Klaus-Robert Müller, Gisbert Schneider, Tiago Rodrigues, and an anonymous referee for helpful suggestions, and David Manallack for the provision of data. M. Rupp and K. Hansen acknowledge partial support by FP7-ICT programme of the European Community (PASCAL2) and DFG (grant MU 987/4-2). M. Rupp acknowledges partial support by FP7 programme of the European Community (Marie Curie IEF 273039). G. Sanguinetti and G. Skolidis acknowledge support from the Engineering and Physical Sciences Research Council (EPSRC, grant EP/F009461/2). G. Sanguinetti is funded by the Scottish government through the SICSA initiative.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Skolidis, G., Hansen, K., Sanguinetti, G. et al. Multi-task learning for pKa prediction. J Comput Aided Mol Des 26, 883–895 (2012). https://doi.org/10.1007/s10822-012-9582-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-012-9582-x