Abstract
Decision trees are one of the most popular classifiers used in a wide range of real-world problems. Thus, it is very important to achieve higher prediction accuracy for decision trees. Most of the well-known decision tree induction algorithms used in practice are based on greedy approaches and hence do not consider conditional dependencies among the attributes. As a result, they may generate suboptimal solutions. In literature, often genetic programming-based (a complex variant of genetic algorithm) decision tree induction algorithms have been proposed to eliminate some of the problems of greedy approaches. However, none of the algorithms proposed so far can effectively address conditional dependencies among the attributes. In this paper, we propose a new, easy-to-implement genetic algorithm-based decision tree induction technique which is more likely to ascertain conditional dependencies among the attributes. An elaborate experimentation is conducted on thirty well known data sets from the UCI Machine Learning Repository in order to validate the effectiveness of the proposed technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abellan, J.: Ensembles of decision trees based on imprecise probabilities and uncertainty measures. Inf. Fusion 14, 423–430 (2013)
Adnan, M.N., Islam, M.Z.: ComboSplit: combining various splitting criteria for building a single decision tree. In: Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition, pp. 1–8 (2014)
Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: Proceedings of the 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 304–315 (2016)
Adnan, M.N., Islam, M.Z.: Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl.-Based Syst. 110, 86–97 (2016)
Adnan, M.N., Islam, M.Z., Kwan, P.W.H.: Extended space decision tree. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds.) ICMLC 2014. CCIS, vol. 481, pp. 219–230. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45652-1_23
Aitkenhead, M.J.: A co-evolving decision tree classification method. Expert Syst. Appl. 34(1), 18–25 (2008)
Arlot, S.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)
Barros, R.C., Basgalupp, M.P., de Carvalho, A.C.P.L.F., Freitas, A.A.: A survey of evolutionary algorithm for decision tree induction. IEEE Trans. Syst. Man Cybern. - Part C: Appl. Rev. 42(3), 291–312 (2012)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2008)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, Belmont (1985)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Espejo, P.G., Sebastian, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. - Part C: Appl. Rev. 40(2), 121–144 (2010)
Fu, Z., Golden, B., Lele, S., Raghavan, S., Wasli, E.: Genetically engineered decision trees: population diversity produces smarter trees. Oper. Res. 51(6), 894–907 (2003)
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2006)
Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press, Cambridge (1992)
Hunt, E., Marin, J., Stone, P.: Experiments in Induction. Academic Press, New York (1966)
Kamber, M., Winstone, L., Gong, W., Cheng, S., Han, J.: Generalization and decision tree induction: efficient classification in data mining. In: Proceedings of the International Workshop Research Issues on Data Engineering, pp. 111–120 (1997)
Kataria, A., Singh, M.D.: A review of data classification using k-nearest neighbour algorithm. Int. J. Emerg. Technol. Adv. Eng. 3(6), 354–360 (2013)
Kim, Y.W., Oh, I.S.: Classifier ensemble selection using hybrid genetic algorithms. Pattern Recogn. Lett. 29, 796–802 (2008)
Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2004)
Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 585–588 (2003)
Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets.html. Accessed 15 Mar 2016
Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40, 203–229 (2000)
Liu, Y., Shen, Y., Wu, X.: Automatic clustering using genetic algorithms. Appl. Math. Comput. 218, 1267–1279 (2011)
Mason, R., Lind, D., Marchal, W.: Statistics: An Introduction. Brooks/Cole Publishing Company, New York (1998)
Murthy, S.K.: On growing better decision trees from data. Ph.D. thesis, The Johns Hopkins University, Baltimore, Maryland (1997)
Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min. Knowl. Discov. 2, 345–389 (1998)
Murthy, S.K., Kasif, S., Salzberg, S.S.: A system for induction of oblique decision trees. J. Artif. Intell. Res. 2, 1–32 (1994)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
Rahman, M.A., Islam, M.Z.: A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl.-Based Syst. 71, 345–365 (2014)
Shirasaka, M., Zhao, Q., Hammami, O., Kuroda, K., Saito, K.: Automatic design of binary decision trees based on genetic programming. In: Second Asia-Pacific Conference on Simulated Evolution and Learning. Australian Defense Force Academy, Canberra (1998)
Tamon, C., Xiang, J.: On the boosting pruning problem. In: López de Mántaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 404–412. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45164-1_41
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, London (2006)
Tanigawa, T., Zhao, Q.: A study on efficient generation of decision trees using genetic programming. In: Genetic and Evolutionary Computation Conference (GECCO’2000), pp. 1047–1052. Morgan Kaufmann (2000)
Triola, M.F.: Elementary Statistics. Addison Wesley Longman Inc., Reading (2001)
Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4, 65–85 (1994)
Wilcoxon, F.: Individual comparison by ranking methods. Biometrics 1, 80–83 (1945)
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. 30, 451–462 (2000)
Zhao, H.: A multi-objective genetic programming programming approach to developing pareto optimal decision trees. Decis. Support Syst. 43(3), 809–826 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Adnan, M.N., Islam, M.Z., Akbar, M.M. (2018). On Improving the Prediction Accuracy of a Decision Tree Using Genetic Algorithm. In: Gan, G., Li, B., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2018. Lecture Notes in Computer Science(), vol 11323. Springer, Cham. https://doi.org/10.1007/978-3-030-05090-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-05090-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05089-4
Online ISBN: 978-3-030-05090-0
eBook Packages: Computer ScienceComputer Science (R0)