[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Using Correspondence Analysis to Combine Classifiers

Published: 01 July 1999 Publication History

Abstract

Several effective methods have been developed recently for improving predictive performance by generating and combining multiple learned models. The general approach is to create a set of learned models either by applying an algorithm repeatedly to different versions of the training data, or by applying different learning algorithms to the same data. The predictions of the models are then combined according to a voting scheme. This paper focuses on the task of combining the predictions of a set of learned models. The method described uses the strategies of stacking and Correspondence Analysis to model the relationship between the learning examples and their classification by a collection of learned models. A nearest neighbor method is then applied within the resulting representation to classify previously unseen examples. The new algorithm does not perform worse than, and frequently performs significantly better than other combining techniques on a suite of data sets.

References

[1]
Ali, K., & Pazzani, M. (1995). Learning multiple relational rule-based models. In D. Fisher & H. Lenz (Eds.), Learning from data: Artificial intelligence and statistics (Vol. 5). Fort Lauderdale, FL: Springer-Verlag.
[2]
Breiman, L. (1994). Heuristics of instability in model selection (Technical Report). Department of Statistics, University of California at Berkeley.
[3]
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.
[4]
Chan, P., & Stolfo, S. (1995). A comparative evaluation of voting and meta-learning on partitioned data. Proceedings of the 12th International Conference on Machine Learning (pp. 90-98). Morgan Kaufmann.
[5]
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261-283.
[6]
Cost, S., & Salzberg, S. (1993). A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10(1), 57-78.
[7]
Dietterich, T.G. (1996). Statistical tests for comparing supervised classification learning algorithms (Technical Report). Corvallis, OR: Dept. of Computer Science, Oregeon Statue University.
[8]
Dongarra, J., & Grosse, E. (1998). Netlib repository. http://www.netlib.org/.
[9]
Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. Addison-Wesley.
[10]
Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. London and New York: Chapman and Hall.
[11]
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256- 285. Also appeared in COLT90.
[12]
Freund, Y., & Schapire, R.E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. Proceedings of the Second European Conference on Computational Learning Theory (pp. 23-37). Springer-Verlag.
[13]
Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm. Proceedings of the 13th International Conference on Machine Learning. Morgan Kaufmann.
[14]
Greenacre, M.J. (1984). Theory and application of correspondence analysis. London: Academic Press.
[15]
Ho, K., Hull, J.J., & Srihari, S.N. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-16(1), 66-75.
[16]
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., & Hinton, G.E. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79-87.
[17]
Jordan, M.I., & Jacobs, R.A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181-214.
[18]
Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 231-238). MIT Press.
[19]
Maclin, R., & Shavlik, J.W. (1995). Combining the predictions of multiple classifiers: Using competitive learning to initialize neural networks. Proceedings of the 14th International Joint Conference on Artificial Intelligence.
[20]
Margineantu, D.D., & Dietterich, T.G. (1997). Pruning adaptive boosting. Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann.
[21]
Meir, R. (1995). Bias, variance and the combination of least squares estimators. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 295-302). MIT Press.
[22]
Merz, C.J. (1995). Dynamical selection of learning algorithms. In D. Fisher & H. Lenz (Eds.), Learning from data: Artificial intelligence and statistics (Vol. 2). Springer Verlag.
[23]
Merz, C. (1998). Classification and regression by combining models. Ph.D. thesis, University of California, Irvine.
[24]
Merz, C., & Murphy, P. (1996). UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/ MLRepository.html.
[25]
Murthy, S., Kasif, S., Salzberg, S., & Beigel, R. (1993). OC1: Randomized induction of oblique decision trees. Proceedings of AAAI-93. AAAI Press.
[26]
Opitz, D.W., & Shavlik, J.W. (1996). Generating accurate and diverse members of a neural-network ensemble. In D.S. Touretzky, M.C. Mozer, & M.E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 535-541). MIT Press.
[27]
Perrone, M.P. (1994). Putting it all together: Methods for combining neural networks. In J.D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 1188-1189). Morgan Kaufmann Publishers.
[28]
Perrone, M.P., & Cooper, L.N. (1993). When networks disagree: Ensemble methods for hybrid neural networks. In R.J. Mammone (Ed.), Artificial neural networks for speech and vision (pp. 126-142). London: Chapman & Hall.
[29]
Press, W.H. (1992). Numerical recipes in C: The art of scientific computing (pp. 59-70). Cambridge University Press.
[30]
Quinlan, R. (1993). C4.5 programs for machine learning. San Mateo, CA: Morgan Kaufmann.
[31]
Quinlan, J.R. (1996). Bagging, boosting, and C4.5. Proceedings of the Fourteenth National Conference on Artificial Intelligence.
[32]
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart, J.L. McClelland, & the PDP research group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1: Foundations). MIT Press.
[33]
Salzberg, S. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1(3).
[34]
Schapire, R.E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197-227.
[35]
Shannon, W., & Banks, D. (1997). A distance metric for classification trees. Preliminary Papers of the Sixth International Workshop on Artificial Intelligence and Statistics. Society for Artificial Intelligence and Statistics, Fort Lauderdale, FL.
[36]
Tresp, V., & Taniguchi, M. (1995). Combining estimators using non-constant weighting functions. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 419-426). MIT Press.
[37]
Wolpert, D.H. (1992). Stacked generalization. Neural Networks, 5, 241-259.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language
Machine Language  Volume 36, Issue 1-2
July-August 1999
132 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 1999

Author Tags

  1. classification
  2. combining estimates
  3. correspondence analysis
  4. multiple models

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Trustable Co-Label Learning From Multiple Noisy AnnotatorsIEEE Transactions on Multimedia10.1109/TMM.2021.313775225(1045-1057)Online publication date: 1-Jan-2023
  • (2023)A hybrid ensemble method with negative correlation learning for regressionMachine Language10.1007/s10994-023-06364-3112:10(3881-3916)Online publication date: 23-Aug-2023
  • (2022)A heterogeneous online learning ensemble for non-stationary environmentsKnowledge-Based Systems10.1016/j.knosys.2019.104983188:COnline publication date: 21-Apr-2022
  • (2020)A-Stacking and A-BaggingExpert Systems with Applications: An International Journal10.1016/j.eswa.2019.113160146:COnline publication date: 15-May-2020
  • (2018)CST-VotingJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-16957135:1(99-109)Online publication date: 1-Jan-2018
  • (2018)Heterogeneous classifier ensemble with fuzzy rule-based meta learnerInformation Sciences: an International Journal10.1016/j.ins.2017.09.009422:C(144-160)Online publication date: 1-Jan-2018
  • (2017)Diversity-induced weighted classifier ensemble learning2017 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2017.8296478(1232-1236)Online publication date: 17-Sep-2017
  • (2017)The tech-talk balanceProceedings of the 10th International Workshop on Cooperative and Human Aspects of Software Engineering10.1109/CHASE.2017.8(43-48)Online publication date: 20-May-2017
  • (2016)Random subspace method with class separability weightingExpert Systems: The Journal of Knowledge Engineering10.1111/exsy.1214933:3(275-285)Online publication date: 1-Jun-2016
  • (2016)A design framework for hierarchical ensemble of multiple feature extractors and multiple classifiersPattern Recognition10.1016/j.patcog.2015.11.00652:C(1-16)Online publication date: 1-Apr-2016
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media