A New Cluster-based Instance Selection Algorithm

Ireneusz Czarnowski²³ &
Piotr Jędrzejowicz²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6682))

Included in the following conference series:

KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications

1873 Accesses
7 Citations

Abstract

The main contribution of the paper is proposing and evaluating, through the computational experiment, an agent-based population learning algorithm generating a representative training dataset of the required size. The proposed approach is based on the assumption that prototypes are selected from clusters. Thus, the number of clusters produced has a direct influence on the size of the reduced dataset. Agents within an A-Team execute various local search procedures and cooperate to find-out a solution to the instance reduction problem aiming at obtaining a compact representation of the dataset. Computational experiment has confirmed that the proposed algorithm is competitive to other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Approach to Imbalanced Data Classification Based on Instance Selection and Over-Sampling

Data Mining by Evolving Agents for Clusters Discovery and Metric Learning

Cluster-Based Instance Selection for the Imbalanced Data Classification

References

Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. School of Information and Computer Science. University of California, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Google Scholar
Barbucha, D., Czarnowski, I., Jędrzejowicz, P., Ratajczak-Ropel, E., Wierzbowska, I.: e-JABAT - An Implementation of the Web-Based A-Team. In: Nguyen, N.T., Jain, I.C. (eds.) Intelligent Agents in the Evolution of Web and Applications. SCI, vol. 167, pp. 57–86. Springer, Heidelberg (2009)
Chapter Google Scholar
Bellifemine, F., Caire, G., Poggi, A., Rimassa, G.: JADE. A White Paper. Exp. 3(3), 6–20 (2003)
Google Scholar
Czarnowski, I., Jędrzejowicz, P.: An Approach to Data Reduction and Integrated Machine Classification. New Generation Computing 28(1), 21–40 (2010)
Article MATH Google Scholar
Czarnowski, I., Jędrzejowicz, P.: An Approach to Instance Reduction in Supervised Learning. In: Coenen, F., Preece, A., Macintosh, A. (eds.) Research and Development in Intelligent Systems XX, pp. 267–282. Springer, London (2004)
Chapter Google Scholar
Czarnowski, I., Jędrzejowicz, P.: Cluster Integration for the Cluster-Based Instance Selection. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ICCCI 2010. LNCS, vol. 6421, pp. 353–362. Springer, Heidelberg (2010)
Chapter Google Scholar
Datasets used for classification: comparison of results. directory of data sets, http://www.is.umk.pl/projects/datasets.html (accessed September 1, 2009)
Hamo, Y., Markovitch, S.: The COMPSET Algorithm for Subset Selection. In: Proceedings of The Nineteenth International Joint Conference for Artificial Intelligence, Edinburgh, Scotland, pp. 728–733 (2005)
Google Scholar
Jędrzejowicz, J., Jędrzejowicz, P.: Cellular GEP-Induced Classifiers. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ICCCI 2010. LNCS, vol. 6421, pp. 343–352. Springer, Heidelberg (2010)
Chapter Google Scholar
Jędrzejowicz, P.: Social Learning Algorithm as a Tool for Solving Some Difficult Scheduling Problems. Foundation of Computing and Decision Sciences 24, 51–66 (1999)
MathSciNet MATH Google Scholar
Kim, S.-W., Oommen, B.J.: A Brief Taxonomy and Ranking of Creative Prototype Reduction Schemes. Pattern Analysis Application 6, 232–244 (2003)
Article MathSciNet Google Scholar
Klusch, M., Lodi, S., Moro, G.: Agent-Based Distributed Data Mining: The KDEC Scheme. In: Klusch, M., Bergamaschi, S., Edwards, P., Petta, P. (eds.) Intelligent Information Agents. LNCS (LNAI), vol. 2586, pp. 104–122. Springer, Heidelberg (2003)
Chapter Google Scholar
Krishnaswamy, S., Zaslavsky, A., Loke, S.W.: Techniques for Estimating the Computation and Communication Costs of Distributed Data Mining. In: Sloot, P.M.A., Tan, C.J.K., Dongarra, J., Hoekstra, A.G. (eds.) ICCS-ComputSci 2002. LNCS, vol. 2329, pp. 603–612. Springer, Heidelberg (2002)
Chapter Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, SanMateo (1993)
Google Scholar
Silva, J., Giannella, C., Bhargava, R., Kargupta, H., Klusch, M.: Distributed Data Mining and Agents. Engineering Applications of Artificial Intelligence Journal 18, 791–807 (2005)
Article Google Scholar
Talukdar, S., Baerentzen, L., Gove, A., de Souza, P.: Asynchronous Teams: Co-operation Schemes for Autonomous, Computer-Based Agents. Technical Report EDRC 18-59-96, Carnegie Mellon University, Pittsburgh (1996)
Google Scholar
Uno, T.: Multi-sorting Algorithm for Finding Pairs of Similar Short Substrings from Large-scale String Data. Knowledge and Information Systems (2009); doi: 10.1007/s10115-009-0271-6
Google Scholar
Vucetic, S., Obradovic, Z.: Performance Controlled Data Reduction for Knowledge Discovery in Distributed Databases. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 29-39 (2000)
Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction Techniques for Instance-based Learning Algorithm. Machine Learning 33(3), 257–286 (2000)
Article MATH Google Scholar
Yu, K., Xiaowei, X., Ester, M., Kriegel, H.-P.: Feature Weighting and Instance Selection for Collaborative Filtering: An Information-Theoretic Approach. Knowledge and Information Systems 5(2), 201–224 (2004)
Article Google Scholar
Zhu, X., Wu, X.: Scalable Representative Instance Selection and Ranking. In: IEEE Proceedings of the 18th International Conference on Pattern Recognition, vol. 3, pp. 352–355 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems, Gdynia Maritime University, Morska 83, 81-225, Gdynia, Poland
Ireneusz Czarnowski & Piotr Jędrzejowicz

Authors

Ireneusz Czarnowski
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Jędrzejowicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Manchester Metropolitan University, M1 5GD, Manchester, UK
James O’Shea & Keeley Crockett &
Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370, Wroclaw, Poland
Ngoc Thanh Nguyen
University of Bournemouth, BH12 5BB, Poole, UK
Robert J. Howlett
University of South Australia, 5095, Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Czarnowski, I., Jędrzejowicz, P. (2011). A New Cluster-based Instance Selection Algorithm. In: O’Shea, J., Nguyen, N.T., Crockett, K., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems: Technologies and Applications. KES-AMSTA 2011. Lecture Notes in Computer Science(), vol 6682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22000-5_45

Download citation

DOI: https://doi.org/10.1007/978-3-642-22000-5_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21999-3
Online ISBN: 978-3-642-22000-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics