Discretization and grouping: Preprocessing steps for data mining

Petr Berka¹ &
Ivan Bruha²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1510))

Included in the following conference series:

European Symposium on Principles of Data Mining and Knowledge Discovery

1399 Accesses
1 Altmetric

Abstract

Unlike on-line discretization performed by a number of machine learning (ML) algorithms for building decision trees or decision rules, we propose off-line algorithms for discretizing numerical attributes and grouping values of nominal attributes. The number of resulting intervals obtained by discretization depends only on the data; the number of groups corresponds to the number of classes. Since both discretization and grouping is done with respect to the goal classes, the algorithms are suitable only for classification/prediction tasks.

As a side effect of the off-line processing, the number of objects in the datasets and number of attributes may be reduced.

It should be also mentioned that although the original idea of the discretization procedure is proposed to the Kex system, the algorithms show good performance together with other machine learning algorithms.

Download to read the full chapter text

Chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Berka, P.—Bruha, I.: Various discretization procedures of numerical attributes: Empirical comparisons. In: (Kodratoff, Nakhaeizadeh, Taylor eds.) Proc. MLNet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, Herakleion, 1995, p. 136–141.
Google Scholar
Berka, P.—Ivánek, J.: Automated Knowledge Acquisition for PROSPECTOR-like Expert Systems. In. (Bergadano, deRaedt eds.) Proc. ECML’94, Springer 1994.
Google Scholar
Berka, P.—Sochorová, M.—Rauch, J.: Using GUHA and KEX for Knowledge Discovery in Databases; the KDD Sisyphus Experience. In Proc: Poster Session ECML’98, TU Chemnitz 1998.
Google Scholar
Biggs, D.—de Ville, B.—Suen, E.: A method of choosing multiway partitions for classification and decision trees. Journal of Applied Statistics, Vol. 18, No. 1, 1991, 49–62.
Google Scholar
Bruha, I.—Kočková, S.: A covering learning algorithm for cost-sensitive and noisy environments. In: Proc. of ECML’93 Workshop on Learning Robots, 1993.
Google Scholar
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Y. Kodratoff, ed.: Machine Learning—EWSL-91, Springer-Verlag, 1991, 164–178.
Google Scholar
Clark, P.—Niblett, T.: The CN2 induction algorithm. Machine Learning, 3 (1989), 261–283.
Google Scholar
Fayyad, U.M.—Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classifiacation Learning. In: Proc. IJCAI’93, 1993.
Google Scholar
Kietz, J.U.—Reimer U.—Staudt, M.: Mining Insurance Data at Swiss Life. In: Proc. 23rd VLDB Conference, Athens, 1997.
Google Scholar
Merz, C.J.—Murphy, P.M.: UCI Repository of Machine Learning Databases. Irvine, University of California, Dept. of Information and Computer Science, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Intelligent Systems, Prague University of Economic, W. Churchill Sq. 4, CZ-13067, Prague, Czech Republic
Petr Berka
Department of Computer Science and Systems, McMaster University, L8S4K1, Hamilton, Ont., Canada
Ivan Bruha

Authors

Petr Berka
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Bruha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jan M. Żytkow Mohamed Quafafou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berka, P., Bruha, I. (1998). Discretization and grouping: Preprocessing steps for data mining. In: Żytkow, J.M., Quafafou, M. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1998. Lecture Notes in Computer Science, vol 1510. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0094825

Download citation

DOI: https://doi.org/10.1007/BFb0094825
Published: 19 October 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65068-3
Online ISBN: 978-3-540-49687-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics