Distributed Implementation of an Intelligent Data Classifier

Victor J. Sosa-Sosa⁵,
Ivan Lopez-Arevalo⁵,
Omar Jasso-Luna⁵ &
…
Hector Fraire-Huacuja⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 312))

806 Accesses

Abstract

Industry, science and business applications need to manipulate a huge amount of data every day. Most of the time these data come from distributed sources and are analyzed trying to discover knowledge and recognize patterns using Data Mining techniques. Data classification is a technique that allows to decide if a set of data belongs to a group of information or not. Data classification requires putting all data together in a big centralized datasets. To congregate and analyze this dataset represents a very expensive task in terms of time, memory and bandwidth consuming. Nowadays, architectures for Distributed Data Mining have been developed trying to reduce computing and storage costs. This paper presents an approach to building a distributed data classifier which takes only metadata from distributed datasets avoiding the total access to the original data. Using only metadata reduces the computing time and bandwidth consumption required to build a data classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 103.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Hardcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Performance Analysis of Distributed Algorithms for Big Data Classification

A Study of Various Varieties of Distributed Data Mining Architectures

Decentralised and Privacy Preserving Machine Learning for Multiple Distributed Data Resources

References

Artificial Intelligence Unit of University of Dortmund, Yale 4.0., http://rapid-i.com/ (last visit January 2009)
Khoussainov, R., Zuo, X., Kushmerick, N.: Grid-enabled Weka: A Toolkit for Machine Learning on the Grid. ERCIM 59, 47–48 (2004)
Google Scholar
McQueen, J.: Some methods for classification and analysis of multivariations. In: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid Prototyping for Complex Data Mining Tasks. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)
Google Scholar
Peña, J.M., Sánchez, A., Robles, V., Pérez, M.S., Herrero, P.: Adapting the Weka Data Mining Toolkit to a Grid Based Environment. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 492–497. Springer, Heidelberg (2005)
Google Scholar
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Ross Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Shaikh Ali, A., Rana, O.F., Taylor, I.J.: Web Services Composition for Distributed Data Mining. In: International Conference Workshop on Parallel Processing, pp. 11–18. IEEE, Los Alamitos (2005)
Google Scholar
Statistics Department of the University of Auckland, R Project 2.6.1., http://www.r-project.org/ (last visit November 2008)
Talia, D., Trunfio, P., Verta, O.: Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 309–320. Springer, Heidelberg (2005)
Chapter Google Scholar
University of Illinois and Data Mining Research Group and DAIS Research Laboratory, IlliMine 1.1.0., http://illimine.cs.uiuc.edu/ (last visit December 2008)
Williams, G.: Rattle 2.2.74, http://rattle.togaware.com (last visit May 2009)
Witten, H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers, San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Ciudad Victoria, Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV), Tamaulipas, México, 87130
Victor J. Sosa-Sosa, Ivan Lopez-Arevalo & Omar Jasso-Luna
Instituto Tecnológico de Ciudad Madero, 10 de Mayo S/N, Col. Los Mangos. Ciudad Madero, Tamaulipas, México, 89440
Hector Fraire-Huacuja

Authors

Victor J. Sosa-Sosa
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Lopez-Arevalo
View author publications
You can also search for this author in PubMed Google Scholar
Omar Jasso-Luna
View author publications
You can also search for this author in PubMed Google Scholar
Hector Fraire-Huacuja
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tijuana Institute of Technology, Chula Vista, USA
Patricia Melin
Polish Academy of Sciences, Warsaw, Poland
Janusz Kacprzyk
University of Alberta, Edmonton, Canada
Witold Pedrycz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sosa-Sosa, V.J., Lopez-Arevalo, I., Jasso-Luna, O., Fraire-Huacuja, H. (2010). Distributed Implementation of an Intelligent Data Classifier. In: Melin, P., Kacprzyk, J., Pedrycz, W. (eds) Soft Computing for Recognition Based on Biometrics. Studies in Computational Intelligence, vol 312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15111-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-15111-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15110-1
Online ISBN: 978-3-642-15111-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Distributed Implementation of an Intelligent Data Classifier

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Performance Analysis of Distributed Algorithms for Big Data Classification

A Study of Various Varieties of Distributed Data Mining Architectures

Decentralised and Privacy Preserving Machine Learning for Multiple Distributed Data Resources

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Distributed Implementation of an Intelligent Data Classifier

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Performance Analysis of Distributed Algorithms for Big Data Classification

A Study of Various Varieties of Distributed Data Mining Architectures

Decentralised and Privacy Preserving Machine Learning for Multiple Distributed Data Resources

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation