Analysis of k-means clustering approach on the breast cancer Wisconsin dataset

Ashutosh Kumar Dubey¹,
Umesh Gupta¹ &
Sonal Jain¹

3897 Accesses
69 Citations
1 Altmetric
Explore all metrics

Abstract

Purpose

Breast cancer is one of the most common cancers found worldwide and most frequently found in women. An early detection of breast cancer provides the possibility of its cure; therefore, a large number of studies are currently going on to identify methods that can detect breast cancer in its early stages. This study was aimed to find the effects of k-means clustering algorithm with different computation measures like centroid, distance, split method, epoch, attribute, and iteration and to carefully consider and identify the combination of measures that has potential of highly accurate clustering accuracy.

Methods

K-means algorithm was used to evaluate the impact of clustering using centroid initialization, distance measures, and split methods. The experiments were performed using breast cancer Wisconsin (BCW) diagnostic dataset. Foggy and random centroids were used for the centroid initialization. In foggy centroid, based on random values, the first centroid was calculated. For random centroid, the initial centroid was considered as (0, 0).

Results

The results were obtained by employing k-means algorithm and are discussed with different cases considering variable parameters. The calculations were based on the centroid (foggy/random), distance (Euclidean/Manhattan/Pearson), split (simple/variance), threshold (constant epoch/same centroid), attribute (2–9), and iteration (4–10). Approximately, 92 % average positive prediction accuracy was obtained with this approach. Better results were found for the same centroid and the highest variance. The results achieved using Euclidean and Manhattan were better than the Pearson correlation.

Conclusions

The findings of this work provided extensive understanding of the computational parameters that can be used with k-means. The results indicated that k-means has a potential to classify BCW dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Investigating Distance Metrics in Semi-supervised Fuzzy c-Means for Breast Cancer Classification

Breast Cancer Risk Prediction Using Different Clustering Techniques

A Comparison Study Between Otsu’s Thresholding, Fuzzy C-Means, and K-Means for Breast Tumor Segmentation in Mammograms

References

Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, Parkin DM, Forman D, Bray F (2015) Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer 136(5):E359–E386
Article CAS PubMed Google Scholar
Dubey AK, Gupta U, Jain S (2015) Breast cancer statistics and prediction methodology: a systematic review and analysis. Asian Pac J Cancer Prev 16(10):4237–4245
Article PubMed Google Scholar
Dubey AK, Gupta U, Jain S (2014) A Survey on Breast Cancer Scenario and Prediction Strategy. In: Proceedings of the 3rd international conference on frontiers of intelligent computing: theory and applications (FICTA), 2014. Springer International Publishing, pp 367–375
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of 20th international conference on very large data bases, VLDB 1994. vol 1215. pp 487–499
Jain R (2015) Introduction to data mining techniques. http://www.iasri.res.in/ebook/expertsystem/datamining.pdf. Accessed 22 April 2015
Alpaydin E (2014) Introduction to machine learning. MIT press, Cambridge, Massachusetts, United States
Bradley PS, Fayyad UM (1998) Refining initial points for k-means clustering. In: Proceedings of the 15th international conference on machine learning (ICML), Morgan Kaufmann, San Francisco, vol 98. pp 91–99
Mary C, Raja SK (2009) Refinement of clusters from k-means with ant colony optimization. J Theor Appl Inf Technol 6(4):28–32
Google Scholar
Wang C, Machiraju R, Huang K (2014) Breast cancer patient stratification using a molecular regularized consensus clustering method. Methods 67(3):304–312
Article CAS PubMed PubMed Central Google Scholar
Rahideh A, Shaheed MH (2011) Cancer classification using clustering based gene selection and artificial neural networks. In: IEEE 2nd international conference on control, instrumentation and automation (ICCIA), 2011. pp 1175–1180
Vanisri D, Loganathan C (2010) Fuzzy pattern cluster scheme for breast cancer datasets. In: IEEE international conference on communication and computational intelligence (INCOCCI), 2010. pp 410–414
Festa P (2013) A biased random-key genetic algorithm for data clustering. Math Biosci 245(1):76–85
Article CAS PubMed Google Scholar
Chen CH (2014) A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection. Appl Soft Comput 20:4–14
Article Google Scholar
Wei D, Jiang Q, Wei Y, Wang S (2012) A novel hierarchical clustering algorithm for gene sequences. BMC Bioinform 13(1):174
Article Google Scholar
Ahmad FK, Yusoff N (2013) Classifying breast cancer types based on fine needle aspiration biopsy data using random forest classifier. In: IEEE 13th international conference on intelligent systems design and applications (ISDA), 2013. pp 121–125
Bache K, Lichman M (2013) UCI machine learning repository. 1990:92. http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

JK Lakshmipat University, Near Mahindra SEZ, P.O. Mahapura Ajmer Road, Jaipur, Rajasthan, 302 026, India
Ashutosh Kumar Dubey, Umesh Gupta & Sonal Jain

Authors

Ashutosh Kumar Dubey
View author publications
You can also search for this author in PubMed Google Scholar
Umesh Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Sonal Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashutosh Kumar Dubey.

Ethics declarations

Conflict of interest

The authors Ashutosh Kumar Dubey, Umesh Gupta, and Sonal Jain declare that they have no conflict of interest. The manuscript does not contain clinical studies or patient data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dubey, A.K., Gupta, U. & Jain, S. Analysis of k-means clustering approach on the breast cancer Wisconsin dataset. Int J CARS 11, 2033–2047 (2016). https://doi.org/10.1007/s11548-016-1437-9

Download citation

Received: 15 February 2016
Accepted: 27 May 2016
Published: 16 June 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11548-016-1437-9

Abstract

Purpose

Methods

Results

Conclusions

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Investigating Distance Metrics in Semi-supervised Fuzzy c-Means for Breast Cancer Classification

Breast Cancer Risk Prediction Using Different Clustering Techniques

A Comparison Study Between Otsu’s Thresholding, Fuzzy C-Means, and K-Means for Breast Tumor Segmentation in Mammograms

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Analysis of k-means clustering approach on the breast cancer Wisconsin dataset

Abstract

Purpose

Methods

Results

Conclusions

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Investigating Distance Metrics in Semi-supervised Fuzzy c-Means for Breast Cancer Classification

Breast Cancer Risk Prediction Using Different Clustering Techniques

A Comparison Study Between Otsu’s Thresholding, Fuzzy C-Means, and K-Means for Breast Tumor Segmentation in Mammograms

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation