Abstract
Inductive databases (IDBs) contain both data and patterns. Inductive Queries (IQs) are used to access, generate and manipulate the patterns in the IDB. IQs are conjunctions of primitive constraints that have to be satisfied by target patterns: they can be different for different types of patterns. Constraint-based data mining algorithms are used to answer IQs.
So far, mostly the problem of mining frequent patterns has been considered in the framework of IDBs: the types of patterns considered include frequent itemsets, episodes, Datalog queries, sequences, and molecular fragments. Here we consider the problem of constraint-based mining for predictive models, where the data mining task is regression and the models are polynomial equations.
More specifically, we first define the pattern domain of polynomial equations. We then present a complete and a heuristic solver for this domain. We evaluate the use of the heuristic solver on standard regression problems and illustrate its use on a toy problem of reconstructing a biochemical reaction network. Finally, we consider the use of a combination of different pattern domains (molecular fragments and polynomial equations) for practical applications in modeling quantitative structure-activity relationships (QSARs).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bassingthwaighte, J.B. (ed.): Web Page of the Physiome Project (2002) (Web page update), http://www.physiome.org/
Bayardo, R.: Constraints in data mining. SIGKDD Explorations 4(1) (2002)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
De Raedt, L.: Data mining as constraint logic programming. In: Computational Logic: From Logic Programming into the Future (In honor of Bob Kowalski). Springer, Berlin (2002)
De Raedt, L., Kramer, S.: Inductive databases for bio and chemoinformatics. In: Frasconi, P., Shamir, R. (eds.) Artificial Intelligence and Heuristic Methods for Bioinformatics. IOS Press, Amsterdam (2003)
Džeroski, S., Blockeel, H., Kompare, B., Kramer, S., Pfahringer, B., Van Laer, W.: Experiments in predicting biodegradability. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 80–91. Springer, Heidelberg (1999)
Džeroski, S., Todorovski, L., Ljubič, P.: Using constraints in discovering dynamics. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 297–305. Springer, Heidelberg (2003)
Džeroski, S., Todorovski, L.: Discovering dynamics: from inductive logic programming to machine discovery. Journal of Intelligent Information Systems 4, 89–108 (1995)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin (2001)
Helma, C. (ed.): Predictive Toxicology. CRC Press, Boca Raton (2005)
Howard, P.H., Boethling, R.S., Jarvis, W.F., Meylan, W.M., Michalenko, E.M.: Handbook of Environmental Degradation Rates. Lewis Publishers (1991)
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)
Koza, J.R., Mydlowec, W., Lanza, G., Yu, J., Keane, M.A.: Reverse engineering of metabolic pathways from observed data using genetic programming. In: Proc. Sixth Pacific Symposium on Biocomputing, pp. 434–445. World Scientific, Singapore (2001)
Kramer, S., De Raedt, L.: Feature construction with version spaces for biochemical applications. In: Proc. Eighteenth International Conference on Machine Learning, pp. 258–265. Morgan Kaufmann, San Francisco (2001)
Langley, P., Simon, H.A., Bradshaw, G.L., Żytkow, J.M.: Scientific Discovery. MIT Press, Cambridge (1987)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Richard, A. (ed.): Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network (2004) (Web page update), http://www.epa.gov/nheerl/dsstox/
Todorovski, L., Džeroski, S.: Declarative bias in equation discovery. In: Proc. Fourteenth International Conference on Machine Learning, pp. 376–384. Morgan Kaufmann, San Francisco (1997)
Todorovski, L., Džeroski, S.: Theory revision in equation discovery. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 390–400. Springer, Heidelberg (2001)
Todorovski, L., Džeroski, S., Ljubic, P.: Discovery of polynomial equations for regression. In: Proc. Sixth International Multi-Conference Information Society, vol. A, pp. 151–154. Jožef Stefan Institute, Ljubljana (2003)
Todorovski, L., Ljubič, P., Džeroski, S.: Inducing polynomial equations for regression. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 441–452. Springer, Heidelberg (2004)
Torgo, L.: Regression data sets (2001), http://www.liacc.up.pt/~ltorgo/Regression/DataSets.html
Voit, E.O.: Computational Analysis of Biochemical Systems. Cambridge University Press, Cambridge (2000)
Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: The Proceedings of the Poster Papers of the Eighth European Conference on Machine Learning, pp. 128–137. University of Economics, Faculty of Informatics and Statistics, Prague (1997)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Mateo (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Džeroski, S., Todorovski, L., Ljubič, P. (2006). Inductive Queries on Polynomial Equations. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_7
Download citation
DOI: https://doi.org/10.1007/11615576_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31331-1
Online ISBN: 978-3-540-31351-9
eBook Packages: Computer ScienceComputer Science (R0)