Abstract
Variable selection is applied frequently in QSAR research. Since the selection process influences the characteristics of the finally chosen model, thorough validation of the selection technique is very important. Here, a validation protocol is presented briefly and two of the tools which are part of this protocol are introduced in more detail. The first tool, which is based on permutation testing, allows to assess the inflation of internal figures of merit (such as the cross-validated prediction error). The other tool, based on noise addition, can be used to determine the complexity and with it the stability of models generated by variable selection. The obtained statistical information is important in deciding whether or not to trust the predictive abilities of a specific model. The graphical output of the validation tools is easily accessible and provides a reliable impression of model performance. Among others, the tools were employed to study the influence of leave-one-out and leave-multiple-out cross-validation on model characteristics. Here, it was confirmed that leave-multiple-out cross-validation yields more stable models. To study the performance of the entire validation protocol, it was applied to eight different QSAR data sets with default settings. In all cases internal and external model performance was good, indicating that the protocol serves its purpose quite well.
Similar content being viewed by others
References
R.D. Cramer D.E. Patterson J.D. Bunce (1988) J. Am. Chem. Soc. 110 5959
G. Cruciani P. Crivori P.-A. Carrupt B. Testa (2000) J. Mol. Struct. 503 17
J.G. Topliss R.J. Costello (1972) J. Med. Chem. 15 1066
J.G. Topliss R.P. Edwards (1979) J. Med. Chem. 22 1238
W. Zucchini (2000) J. Math. Psychol. 44 41
D.W. Osten (1988) J. Chemom. 2 39
K. Baumann H. Albert M. von Korff (2002) J. Chemom. 16 339
K. Baumann M. von Korff H. Albert (2002) J. Chemom. 16 351
S. Geisser (1975) J. Am. Stat. Assoc. 70 320
J. Shao (1993) J. Am. Stat. Assoc. 88 486
G. Cruciani M. Baroni S. Clementi G. Costantino D. Riganelli B. Skagerberg (1992) J. Chemom. 6 335
K. Baumann (2003) Trends Anal. Chem. 22 395
J. Shao (1996) J. Am. Stat. Assoc. 91 655
R. Wehrens H. Putter L.M.C. Buydens (2000) Chemom. Intell. Lab. Syst., 54 35
A.C. Rencher F.C. Pun (1980) Technometrics 22 49
V.F. Flack P.C. Chang (1987) Am. Stat., 41 84
C.M. Hurvich C.L. Tsai (1990) Am. Stat. 44 214
Baumann, K., Stiefl, N. and von Korff, M., In Ford, M., Livingstone, D., Dearden, J. and van de Waterbeemd, H. (Eds.), EuroQSAR 2002, Designing Drugs and Crop Protectants: Processes, Problems and Solutions, Blackwell Publishing, Oxford, UK, 2003, pp. 290–292.
L. Breiman (1996) Ann. Stat., 24 2350
E.A. Coats (1998) Perspect. Drug Discov. Des. 12-14 199
N. Stiefl K. Baumann (2003) J. Med. Chem., 46 1390
R.C. Rao H. Toutenburg (1999) Linear Models EditionNumber2 Springer New York
J. Ye (1998) J. Am. Stat. Assoc. 93 120
L. Breiman (2000) Mach. Learning 40 229
G. Klopman A.N. Kalos (1985) J. Comput. Chem. 6 492
S.S. So M. Karplus (1997) J. Med. Chem., 40 4347
H. Kubinyi F.A. Hamprecht T. Mietzner (1998) J. Med. Chem., 41 2553
H. Martens T. Naes (1989) Multivariate Calibration John Wiley & Sons Chichester, UK
H. Kubinyi (1996) J. Chemom., 10 119
D.L. Selwood D.J. Livingstone J.C.W. Comley A.B. O’Dowd A.T. Hudson P. Jackson K.S. Jandu V.S. Rose J.N. Stables (1990) J. Med. Chem. 33 136
S.R. Krystek J.T. Hunt P.D. Stein T.R. Stouch (1995) J. Med. Chem. 38 659
D.D. Robinson P.J. Winn P.D. Lyne W.G. Richards (1999) J. Med. Chem., 42 573
E. Gancia G. Bravi P. Mascagni A. Zaliani (2000) J. Comput.-Aided Mol. Des. 14 293
K. Baumann (2002) Quant. Struct.-Act. Relat. 21 507
L. Breiman (1996) Mach. Learning 26 123
Freund, Y. and Schapire, R., In Saitta, L. (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference, Morgan Kaufmann Publishers, San Francisco, CA, 1996, pp. 148–156.
Y. Freund R. Schapire (1997) J. Comp. Syst. Sci., 55 119
K. Baumann (2002) J. Chem. Inf. Comput. Sci. 42 26
R.W. Kennard L.A. Stone (1969) Technometrics 11 137
W. Wu B. Walczak D.L . Massart S. Heuerding F. Erni I.R. Last K.A. Prebble (1996) Chemom. Intell. Lab. Syst. 33 35
N. Stiefl G. Bringmann C. Rummey K. Baumann (2003) J. Comput.-Aided Mol. Des. 17 347
N.M. Faber (1999) Chemom. Intell. Lab. Syst. 49 79
D. Jouan-Rimbaud E. Bouveresse D.L. Massart O.E. de Noord (1999) Anal. Chim. Acta 338 283
A. Golbraikh A. Tropsha (2002) J. Mol. Graph. Mod. 20 269
A. Tropsha P. Gramatica V.K. Gombar (2003) QSAR Comb. Sci. 22 69
A. Kulkarni A.J. Hopfinger R. Osborne L.H. Bruner E.D. Thompson (2001) Toxicol. Sci. 59 335
Stiefl, N., Holzgrabe, U. and Baumann, K., In Ford, M., Livingstone, D., Dearden, J. and van de Waterbeemd, H. (Eds.), EuroQSAR 2002, Designing Drugs and Crop Protectants: Processes, Problems and Solutions, Blackwell Publishing, Oxford, UK, 2003, pp. 195–197.
Baumann, K. and Stiefl, N., In Ford, M., Livingstone, D., Dearden, J. and van de Waterbeemd, H. (Eds.), EuroQSAR 2002, Designing Drugs and Crop Protectants: Processes, Problems and Solutions, Blackwell Publishing, Oxford, UK, 2003, pp. 153–157.
W. Sippl J.M. Contreras I. Parrot Y.M. Rival C.G. Wermuth (2001) J. Comput.-Aided Mol. Des., 15 395
M.L. Barreca A. Carotti A. Carrieri A. Chimirri A.M. Monforte M. Pellegrini Calace A. Rao (1999) Bioorg. Med. Chem., 7 2283
G. Costantino A. Macchiarulo E. Camaioni R. Pellicciari (2001) J. Med. Chem. 44 3786
P. Burman (1989) Biometrika 76 503
F. Mosteller J.W. Tukey (1977) Data Analysis and Regression Addison-Wesley Reading, MA
R.P. Picard R.D. Cook (1984) J. Am. Stat. Assoc., 79 575
Kubinyi, H. and Abraham, U., In Kubinyi, H. (Ed.), 3D QSAR in Drug Design–Theory Methods and Applications, ESCOM Science Publishers, Leiden, The Netherlands, 1993, pp. 717–728.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Baumann, K., Stiefl, N. Validation tools for variable subset regression. J Comput Aided Mol Des 18, 549–562 (2004). https://doi.org/10.1007/s10822-004-4071-5
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10822-004-4071-5