Abstract
This paper investigates why some companies grow faster than others, by data mining a survey of a large number of companies in Flanders (the northern part of Belgium). Faster or slower average growth over a time period is explained by building a classification tree containing several categorical variables (both quantitative and qualitative). The technique used – called genAID – splits the population at different levels. It is inspired by the Automatic Interaction Detector (AID) technique to find trees that explain the variability in average growth but uses a genetic algorithm to overcome some of the drawbacks of AID.
Classical AID or other tree-growing techniques usually generate a single tree for interpretation. This approach has been criticized because, due to the artifacts of data, spurious interactions may occur. genAID offers the user-analyst a set of trees, which are the best ones found over a number of generations of the genetic algorithm. The user-analyst is then offered the choice of choosing a tree by trading off explanatory power against either the ease of understanding or the conformity with an existing theory.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adriaans, P., Zantinge, D.: Data Mining. Addison-Wesley, Harlow (1996)
Chen, M.-S., Han, J., Yu, P.S.: Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering 8, 866–883 (1996)
Einhorn, H.J.: Alchemy in the behavioral sciences. Public Opinion Quarterly 36, 367–378 (1972)
Kass, G.V.: Significance testing in automatic interaction detection (AID). Applied Statistics 24, 178–189 (1975)
Kass, G.V.: An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29, 119–127 (1980)
Koza, J.R.: Genetic Programming. MIT Press, Cambridge (1992)
Laveren, E., Limère, A., Cleeren, K., Van Bilsen, E.: Growth factors of flemish enterprises: an exploratory study over the periode 1993-1997. Brussels Economic Journal-Cahiers Economiques de Bruxelles 46(1), 5–38 (2003)
Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association 58, 415–435 (1963)
Ooghe, H., Verbaere, E., Croucke, M.: Ondernemingsdimensie en financiële structuur. Maandblad voor Accountancy en Bedrijfseconomie 3, 62–77 (1988) (in Dutch)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Smith, M.: Neural networks for statistical modeling. Thomson, Boston (1996)
Sonquist, J.A., Baker, E., Morgan, J.: Searching for structure. Technical report, Institute for Social Research. University of Michigan, Ann Arbor (1973)
Söorensen, K., Janssens, G.K.: Data mining with genetic algorithms on binary trees. European Journal of Operational Research 151, 253–264 (2003)
Van Hove, H., Verschoren, A.: Genetic algorithms and trees: part 1: recognition trees (the fixed width case). Computers and Artificial Intelligence 13, 453–476 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Janssens, G.K., Sösrensen, K., Limère, A., Vanhoof, K. (2005). Analysis of Company Growth Data Using Genetic Algorithms on Binary Trees. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_29
Download citation
DOI: https://doi.org/10.1007/11430919_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)