Abstract
Industry, science and business applications need to manipulate a huge amount of data every day. Most of the time these data come from distributed sources and are analyzed trying to discover knowledge and recognize patterns using Data Mining techniques. Data classification is a technique that allows to decide if a set of data belongs to a group of information or not. Data classification requires putting all data together in a big centralized datasets. To congregate and analyze this dataset represents a very expensive task in terms of time, memory and bandwidth consuming. Nowadays, architectures for Distributed Data Mining have been developed trying to reduce computing and storage costs. This paper presents an approach to building a distributed data classifier which takes only metadata from distributed datasets avoiding the total access to the original data. Using only metadata reduces the computing time and bandwidth consumption required to build a data classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Artificial Intelligence Unit of University of Dortmund, Yale 4.0., http://rapid-i.com/ (last visit January 2009)
Khoussainov, R., Zuo, X., Kushmerick, N.: Grid-enabled Weka: A Toolkit for Machine Learning on the Grid. ERCIM 59, 47–48 (2004)
McQueen, J.: Some methods for classification and analysis of multivariations. In: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid Prototyping for Complex Data Mining Tasks. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)
Peña, J.M., Sánchez, A., Robles, V., Pérez, M.S., Herrero, P.: Adapting the Weka Data Mining Toolkit to a Grid Based Environment. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 492–497. Springer, Heidelberg (2005)
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)
Ross Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Shaikh Ali, A., Rana, O.F., Taylor, I.J.: Web Services Composition for Distributed Data Mining. In: International Conference Workshop on Parallel Processing, pp. 11–18. IEEE, Los Alamitos (2005)
Statistics Department of the University of Auckland, R Project 2.6.1., http://www.r-project.org/ (last visit November 2008)
Talia, D., Trunfio, P., Verta, O.: Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 309–320. Springer, Heidelberg (2005)
University of Illinois and Data Mining Research Group and DAIS Research Laboratory, IlliMine 1.1.0., http://illimine.cs.uiuc.edu/ (last visit December 2008)
Williams, G.: Rattle 2.2.74, http://rattle.togaware.com (last visit May 2009)
Witten, H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sosa-Sosa, V.J., Lopez-Arevalo, I., Jasso-Luna, O., Fraire-Huacuja, H. (2010). Distributed Implementation of an Intelligent Data Classifier. In: Melin, P., Kacprzyk, J., Pedrycz, W. (eds) Soft Computing for Recognition Based on Biometrics. Studies in Computational Intelligence, vol 312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15111-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-15111-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15110-1
Online ISBN: 978-3-642-15111-8
eBook Packages: EngineeringEngineering (R0)