Multi-task learning for pKa prediction

Grigorios Skolidis¹,
Katja Hansen^2,4,
Guido Sanguinetti³ &
…
Matthias Rupp^4,5

604 Accesses
6 Citations
Explore all metrics

Abstract

Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85 % of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set

Article Open access 14 August 2017

Open-source QSAR models for pKa prediction using multiple machine learning approaches

Article Open access 18 September 2019

Effect of missing data on multitask prediction methods

Article Open access 22 May 2018

Notes

As opposed to parametric approaches, where the information from the training data are summarized in the parameters of a distribution, non-parametric approaches require the training data for later predictions. This distinction does not prevent non-parametric approaches from having parameters, here the regression weights \(\varvec{\alpha}\) and hyper-parameters \(\varvec{\theta}\). Parameters \(\varvec{\alpha}\), which directly belong to the model itself, are computed from the data by solving an optimization problem. Hyper-parameters \(\varvec{\theta}\) parameterize the kernel, and can be estimated via gradient-based optimization by maximizing the marginal likelihood.
Predictions are technically equivalent to those of kernel ridge regression [28], a regularized form of ordinary regression. Here, we do not use additional features of GPs like predictive variance. However, the used GP MTL methods do make use of Bayesian aspects of GPs.
Technically, \({\mathbf{K}^\mathbf{t} \otimes \mathbf{K}^\mathbf{x} \in {\mathbb{R}}^{MN \times MN}}\). In our setting, each sample (compound) occurs in one task only. After removing (marginalizing out) rows and columns corresponding to combinations of compounds and tasks that don’t occur, the resulting matrix is N × N. In practice, it is not necessary to construct the MN × MN matrix explicitly.
Task similarity matrices are positive definite. Their entries thus correspond to evaluations of an inner product in some Hilbert space, which can be converted to Euclidean distance by using \(||\mathbf{x}-\mathbf{z}||_{2}^{2}= \sum_{i=1}^d |x_i-z_i|^2 = \; <\!\mathbf{x}-\mathbf{z},\mathbf{x}-\mathbf{z}\!> =<\!\mathbf{x},\mathbf{x}\!> -2 <\!\mathbf{x},\mathbf{z}\!> + <\!\mathbf{z},\mathbf{z}\!>\).
Comparison is based on Table S2 of the supplement of Ref. [20], using column R’ and third lines from each row of the common tasks.

References

Rupp M, Körner R, Tetko IV (2010) Predicting the pK_a of small molecules. Comb Chem High Throughput Screen 14(5):307–327
Article Google Scholar
Lee A, Crippen G (2009) Predicting pK _a. J Chem Inf Model 49(9):2013–2033
Article CAS Google Scholar
Fraczkiewicz R (2006) In silico prediction of ionization. In: Testa B, Waterbeemd H (eds) Comprehensive medicinal chemistry II, vol 5, Elsevier, Oxford, pp 603–626
Google Scholar
Wan H, Ulander J (2006) High-throughput pK_a screening and prediction amenable for ADME profiling. Expert Opin Drug Metab Toxicol 2(1):139–155
Article CAS Google Scholar
Ho J, Coote M (2010) A universal approach for continuum solvent pK _a calculations: are we there yet? Theor Chim Acta 125(1–2):3–21
CAS Google Scholar
Tehan B, Lloyd E, Wong M, Pitt W, Gancia E, Manallack D (2002) Estimation of pK_a using semiempirical molecular orbital methods. Part 2: application to amines, anilines and various nitrogen containing heterocyclic compounds. Quant Struct Act Rel 21(5):473–485
Article CAS Google Scholar
Caruana R (1997) Multi-task learning. Mach Learn 28:41–75
Article Google Scholar
Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156
Article CAS Google Scholar
Varnek A, Gaudin C, Marcou G, Baskin I, Pandey A, Tetko I (2009) Inductive transfer of knowledge: application of multi-task learning and feature net approaches to model tissue-air partition coefficients. J Chem Inf Model 49(1):133–144
Article CAS Google Scholar
Ning X, Rangwala H, Karypis G (2009) Multi-assay-based structure-activity relationship models: improving structure-activity relationship models by incorporating activity information from related targets. J Chem Inf Model 49(11):2444–2456
Article CAS Google Scholar
Mordelet F, Vert JP (2011) ProDiGe: PRioritization of disease genes with multitask machine learning from positive and unlabeled examples. BMC Bioinf 12:389
Article Google Scholar
Rossotti F, Rossotti H (1961) The determination of stability constants and other equilibrium constants in solution. McGraw-Hill, New York
Google Scholar
Hasselbalch KA (1916) Die Berechnung der Wasserstoffzahl des Blutes aus der freien und gebundenen Kohlensäure desselben, und die Sauerstoffbindung des Blutes als Funktion der Wasserstoffzahl. Biochem Z 78:112–144
CAS Google Scholar
Clark J, Perrin D (1964) Prediction of the strength of organic bases. Q Rev Chem Soc 18:295–320
Article CAS Google Scholar
Perrin DD, Dempsey B, Serjeant EP (1981) pK _a Prediction for organic acids and bases. Chapman and Hall/CRC Press, Boca Raton
Google Scholar
Lyman W, Reehl W, Rosenblatt D (eds) (1982) Handbook of chemical property estimation methods: environmental behavior of organic compounds. McGraw-Hill, New York
Google Scholar
Livingstone D (2003) Theoretical property predictions. Curr Top Med Chem 3(10):1171–1192
Article CAS Google Scholar
Hammett L (1937) The effect of structure upon the reactions of organic compounds. Benzene derivatives. J Am Chem Soc 59(1):96–103
Article CAS Google Scholar
Ertl P (1997) Simple quantum chemical parameters as an alternative to the Hammett sigma constants in QSAR studies. Quant Struct Act Rel 16(5):377–382
Article CAS Google Scholar
Rupp M, Körner R, Tetko IV (2010) Estimation of acid dissociation constants using graph kernels. Mol Inf 29(10):731–740
Article CAS Google Scholar
Tehan B, Lloyd E, Wong M, Pitt W, Montana J, Manallack D, Gancia E (2002) Estimation of pK_a using semiempirical molecular orbital methods. Part 1: application to phenols and carboxylic acids. Quant Struct Act Rel 21(5):457–472
Article CAS Google Scholar
Howard P, Meylan W (1999) Physical/chemical property database (PHYSPROP). Syracuse Research Corporation, Environmental Science Center, 6225 Running Ridge Road, North Syracuse, New York
Fukui K, Yonezawa T, Nagata C (1954) Theory of substitution in conjugated molecules. Bull Chem Soc Jpn 27(7):423–427
Article CAS Google Scholar
Sadowski J, Gasteiger J (1993) From atoms and bonds to three-dimensional atomic coordinates: automatic model builders. Chem Rev 93(7):2567–2581
Article CAS Google Scholar
Stewart J (1997) MOPAC: a general molecular orbital package. Quant Chem Prog Exch 10:86
Google Scholar
Sushko I, Novotarskyi S, Körner R, Pandey AK, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY, Todeschini R, Varnek A, Marcou G, Ertl P, Potemkin V, Grishina M, Gasteiger J, Schwab C, Baskin II, Palyulin VA, Radchenko EV, Welsh WJ, Kholodovych V, Chekmarev D, Cherkasov A, de Sousa JA, Zhang QY, Bender A, Nigsch F, Patiny L, Williams A, Tkachenko V, Tetko IV (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25(6):533–554
Article CAS Google Scholar
Rasmussen CE, Williams CK (2005) Gaussian processes for machine learning. MIT Press, Cambridge
Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Data mining, inference, and prediction, 2nd edn. Springer, New York
Google Scholar
Cressie NA (1993) Statistics for spatial data. Wiley, New York
Google Scholar
Bonilla E, Chai KM, Williams C (2008) Multi-task Gaussian process prediction. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems 20. MIT Press, Cambridge, pp 153–160
Google Scholar
Rebonato R, Jäckel P (1999) The most general methodology for creating a valid correlation matrix for risk management and option pricing purposes. J Risk 2(2):17–27
Google Scholar
Skolidis G, Sanguinetti G (2011) Bayesian multitask classification with Gaussian process priors. IEEE Trans Neural Netw 22(12):2011–2021
Article Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometr Bull 1(6):80–83
Article Google Scholar
Manallack D (2007) The pK_a distribution of drugs: application to drug discovery. Perspect Med Chem 1:25–38
Google Scholar
Liao C, Nicklaus M (2009) Comparison of nine programs predicting pK _a values of pharmaceutical substances. J Chem Inf Model 49(12):2801–2812
Article CAS Google Scholar

Download references

Acknowledgments

The authors thank Klaus-Robert Müller, Gisbert Schneider, Tiago Rodrigues, and an anonymous referee for helpful suggestions, and David Manallack for the provision of data. M. Rupp and K. Hansen acknowledge partial support by FP7-ICT programme of the European Community (PASCAL2) and DFG (grant MU 987/4-2). M. Rupp acknowledges partial support by FP7 programme of the European Community (Marie Curie IEF 273039). G. Sanguinetti and G. Skolidis acknowledge support from the Engineering and Physical Sciences Research Council (EPSRC, grant EP/F009461/2). G. Sanguinetti is funded by the Scottish government through the SICSA initiative.

Author information

Authors and Affiliations

Department of Statistical Science, University College London, Gower Street, London, WC1E 6BT, UK
Grigorios Skolidis
Theory Department, Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, 14195, Berlin, Germany
Katja Hansen
School of Informatics, University of Edinburgh, 10 Crichton Street, EH8 9AB, Edinburgh, Scotland
Guido Sanguinetti
Machine Learning Group, TU Berlin, Franklinstr. 28/29, 10587, Berlin, Germany
Katja Hansen & Matthias Rupp
Institute of Pharmaceutical Sciences, ETH Zurich, Wolfgang-Pauli-Str. 10, 8093, Zürich, Switzerland
Matthias Rupp

Authors

Grigorios Skolidis
View author publications
You can also search for this author in PubMed Google Scholar
Katja Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Guido Sanguinetti
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Rupp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Rupp.

Electronic supplementary material

Below is the link to the electronic supplementary material.

PDF (1610 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Skolidis, G., Hansen, K., Sanguinetti, G. et al. Multi-task learning for pK_a prediction. J Comput Aided Mol Des 26, 883–895 (2012). https://doi.org/10.1007/s10822-012-9582-x

Download citation

Received: 15 November 2011
Accepted: 11 May 2012
Published: 20 June 2012
Issue Date: July 2012
DOI: https://doi.org/10.1007/s10822-012-9582-x

Multi-task learning for pK_a prediction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set

Open-source QSAR models for pKa prediction using multiple machine learning approaches

Effect of missing data on multitask prediction methods

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

PDF (1610 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set

Open-source QSAR models for pKa prediction using multiple machine learning approaches

Effect of missing data on multitask prediction methods

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

PDF (1610 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Multi-task learning for pK_a prediction