Abstract
Emerging patterns (EPs) are itemsets whose supports change significantly from one dataset to another; they were recently proposed to capture multi-attribute contrasts between data classes, or trends over time. In this paper we propose a new classifier, CAEP, using the following main ideas based on EPs: (i) Each EP can sharply differentiate the class membership of a (possibly small) fraction of instances containing the EP, due to the big difference between its supports in the opposing classes; we define the differentiating power of the EP in terms of the supports and their ratio, on instances containing the EP. (ii) For each instance t, by aggregating the differentiating power of a fixed, automatically selected set of EPs, a score is obtained for each class. The scores for all classes are normalized and the largest score determines t’s class. CAEP is suitable for many applications, even those with large volumes of high (e.g. 45) dimensional data; it does not depend on dimension reduction on data; and it is usually equally accurate on all classes even if their populations are unbalanced. Experiments show that CAEP has consistent good predictive accuracy, and it almost always outperforms C4.5 and CBA. By using efficient, border-based algorithms (developed elsewhere) to discover EPs, CAEP scales up on data volume and dimensionality. Observing that accuracy on the whole dataset is too coarse description of classifiers, we also used a more accurate measure, sensitivity and precision, to better characterize the performance of classifiers. CAEP is also very good under this measure.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD 1998.
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
Guozhu Dong and Jinyan Li. Efficient mining of emerging patterns: Discovering trends and differences. To appear in ACM KDD’99, August 1999.
William B. Frakes and Ricardo Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice Hall, 1992.
Mary Kozak. An analysis of 5′-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Research, 15:8125–8148, 1987.
R. Kohavi, et al. MLC++: a machine learning library in C++. In Tools with artificial intelligence, pages 740–743, 1994.
R Kohavi, M Sahami. Error-based and Entropy-based Discretization of Continuous Features. In KDD’96, 1996.
J. Li, G. Dong, and K. Ramamohanarao. JEP-Classifier: Classification by Aggregating Jumping Emerging Patterns. Tech report, Univ of Melbourne, 1999.
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. of 4th KDD, 1998.
P.M. Murphy and D.W. Aha. UCI repository of machine learning database. In [http://www.cs.uci.edu/mlearn/mlrepository.html].
J R Quinlan. Induction of decision trees. In Machine Learning, Vol 1, 1986.
J.R. Quinlan. C4.5: program for machine learning. Morgan Kaufmann, 1992.
R.E. Schapire. The strength of weak learnability. Machine Learning, 5(2), 1990.
Z. Zheng, Constructing X-of-N attributes for decision tree learning. To appear in Machine Learning.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dong, G., Zhang, X., Wong, L., Li, J. (1999). CAEP: Classification by Aggregating Emerging Patterns. In: Arikawa, S., Furukawa, K. (eds) Discovery Science. DS 1999. Lecture Notes in Computer Science(), vol 1721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46846-3_4
Download citation
DOI: https://doi.org/10.1007/3-540-46846-3_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66713-1
Online ISBN: 978-3-540-46846-2
eBook Packages: Springer Book Archive