Abstract
Unlike on-line discretization performed by a number of machine learning (ML) algorithms for building decision trees or decision rules, we propose off-line algorithms for discretizing numerical attributes and grouping values of nominal attributes. The number of resulting intervals obtained by discretization depends only on the data; the number of groups corresponds to the number of classes. Since both discretization and grouping is done with respect to the goal classes, the algorithms are suitable only for classification/prediction tasks.
As a side effect of the off-line processing, the number of objects in the datasets and number of attributes may be reduced.
It should be also mentioned that although the original idea of the discretization procedure is proposed to the Kex system, the algorithms show good performance together with other machine learning algorithms.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Berka, P.—Bruha, I.: Various discretization procedures of numerical attributes: Empirical comparisons. In: (Kodratoff, Nakhaeizadeh, Taylor eds.) Proc. MLNet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, Herakleion, 1995, p. 136–141.
Berka, P.—Ivánek, J.: Automated Knowledge Acquisition for PROSPECTOR-like Expert Systems. In. (Bergadano, deRaedt eds.) Proc. ECML’94, Springer 1994.
Berka, P.—Sochorová, M.—Rauch, J.: Using GUHA and KEX for Knowledge Discovery in Databases; the KDD Sisyphus Experience. In Proc: Poster Session ECML’98, TU Chemnitz 1998.
Biggs, D.—de Ville, B.—Suen, E.: A method of choosing multiway partitions for classification and decision trees. Journal of Applied Statistics, Vol. 18, No. 1, 1991, 49–62.
Bruha, I.—Kočková, S.: A covering learning algorithm for cost-sensitive and noisy environments. In: Proc. of ECML’93 Workshop on Learning Robots, 1993.
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Y. Kodratoff, ed.: Machine Learning—EWSL-91, Springer-Verlag, 1991, 164–178.
Clark, P.—Niblett, T.: The CN2 induction algorithm. Machine Learning, 3 (1989), 261–283.
Fayyad, U.M.—Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classifiacation Learning. In: Proc. IJCAI’93, 1993.
Kietz, J.U.—Reimer U.—Staudt, M.: Mining Insurance Data at Swiss Life. In: Proc. 23rd VLDB Conference, Athens, 1997.
Merz, C.J.—Murphy, P.M.: UCI Repository of Machine Learning Databases. Irvine, University of California, Dept. of Information and Computer Science, 1997.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berka, P., Bruha, I. (1998). Discretization and grouping: Preprocessing steps for data mining. In: Żytkow, J.M., Quafafou, M. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1998. Lecture Notes in Computer Science, vol 1510. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0094825
Download citation
DOI: https://doi.org/10.1007/BFb0094825
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65068-3
Online ISBN: 978-3-540-49687-8
eBook Packages: Springer Book Archive