Abstract
Clustering analysis of data from DNA microarray hybridization studies is essential for evaluating and identifying biologically significant co-expressed genes. The K-means algorithm is one of the most widely used clustering technique. It attempts to solve the clustering problem by assigning each gene to a single cluster. However, in practice especially in case of Bioinformatics data, one gene can be found in many clusters simultaneously. To sort out this problem, Fuzzy C-means (FCM) clustering algorithm is applied to microarray data. Two pattern recognition data (IRIS and WBCD data) and thirteen microarray data is used to evaluate performance of K-means and Fuzzy C-means. Improvement of approx. 30 percent clustering accuracy is achieved in case of FCM compared to K-means algorithm. Extensive simulation results shows that the FCM clustering algorithm was able to provide the highest accuracy and generalization results compared to K-means clustering algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anderson, E.: The IRISes of the Gaspe Penisula. Bulletin of the American IRIS society 59, 2–5 (1939)
Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C.F., Trent, J.M., Staudt, L.M., Hudson Jr., J., Bogosk, M.S., et al.: The transcriptional program in the response of human fibroblast to serum. Science 283, 83–87 (1999)
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell. 2, 65–73 (1998)
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999)
Doulaye, D., Kastner, P.: Fuzzy C-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)
http://www.cse.buffalo.edu/faculty/azhang/Teaching/index.html
Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by geneexpression profiling. Nature 43, 503–511 (2000)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression. Science 286(5439), 531–537 (1999)
Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.J., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinomas Sub-classes. Proceedings of the National Academy of Sciences 98(24), 13790–13795 (2001)
Yeoh, E.-J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.-H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2) (2002)
Hoshida, Y., Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Subclass mapping: identifying common subtypes in independent disease data sets. PLoS ONE 2(11) (2007)
Mangasarian, O.L., Wolberg, W.H.: Cancer diagnosis via linear programming. SIAM News 23(5), 1–18 (1990)
DeRisi, J., Penland, L., Brown, P.O., Bittner, M.L., Meltzer, P.S., Ray, M., Chen, Y., Su, Y.A., Trent, J.M.: Use of a cDNA microarray to analyze gene expression patterns in human cancer. Nature Genetics 14, 457–460 (1996)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of National Academy of Science 96, 6745–6750 (1999)
Ben-Dor, A., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6, 281–297 (1999)
Eissen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Clustering analysis and display of genome wide expression patterns. Proceedings of the National Academy of Sciences 95, 14863–14868 (1998)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231 (1996)
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, New Jersey (1988)
Selim, S.Z., Ismail, M.A.: K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6, 81–87 (1984)
Spath, H.: Cluster Analysis Algorithms. Ellis Horwood, Chichester (1989)
Pal, N.R., Bedzek, J.C., Taso, E.C.K.: Generalized Clustering Networks and Kohonen’s Self- Organizing Scheme. IEEE Trans. on Neural Networks 3(4), 546–557 (1993)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Liew, A.W.C., Yan, H., Yang, M., Chen, P.: Microarray Data Analysis. In: Chen, Y.-P.P. (ed.) Bioinformatics Technologies, ch. 12, pp. 353–388. Springer, Heidelberg (2005)
Liew, A.W.C., Yan, H., Yang, M.: Data Mining for Bioinformatics. In: Chen, Y.-P.P. (ed.) Bioinformatics Technologies, ch. 4, pp. 63–116. Springer, Heidelberg (2005)
Cheng, K.O., Law, N.F., Siu, W.C., Liew, A.W.C.: Identification of coherent patterns in gene expression data using an efficient bi-clustering algorithm and parallel coordinate visualization. BMC Bioinformatics 9(210) (2008), doi.10.1186/1471-2105-9-210
Gan, X., Liew, A.W.C., Yan, H.: Discovering biclusters in gene expression data based on high-dimensional linear geometries. BMC Bioinformatics 9(209) (2008), doi:10.1186/1471-2105-9-209
Yin, Z.H., Tang Yuangang, Y.G., Sun, F.C., Sun, Z.Q.: Fuzzy Clustering with Novel Separable Criterion. Tsinghua Science and Technology 11, 50–53 (2006)
Liu, H.C., Yih, J.M., Liu, S.W.: Fuzzy C-mean Algorithm Based on Mahalanobis Distances and Better initial values. In: 12th International Conference on Fuzzy Theory & Technology, JCIS, Salt Lake City, Utah (2007)
Liu, H.C., Yih, J.M., Sheu, T.W., Liu, S.W.: A New Fuzzy Possibility Clustering Algorithms Based On Unsupervised Mahalanobis Distances. In: International Conference on Machine Learning and Cybernetics, Hong Kong, pp. 3939–3944 (2007)
Tang, Y., Zhang, Y.-Q., Huang, Z.: FCM-SVM-RFE Gene Feature Selection Algorithm for Leukemia Classification from Microarray Gene Expression Data. In: The 14th IEEE International Conference on Fuzzy Systems (FUZZ 2005), pp. 97–101 (2005)
Wang, W., Wang, C., Cui, X., Wang, A.: A Clustering Algorithm Combine the FCM algorithm with Supervised Learning Normal Mixture Model. In: The 19th IEEE International Conference on pattern Recognition (ICPR 2008), December 2008, pp. 1–4 (2008)
Bezdek, J.C., Pal, N.R.: Some New Indexes of Cluster Validity. IEEE Transactions Systs., Man Cyberns. 28, 301–315 (1998)
Pal, S.K., Bandyopadhyay, S., Ray, S.S.: Evolutionary Computation in Bioinformatics: A Review. IEEE Transactions on Systems, Man, And Cybernetics—Part C: Applications And Reviews 36(5), 601–615 (2006)
Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data: A Survey. IEEE 16(11) (November 2004)
Dhiraj, K., Rath, S.K.: SA-kmeans: A Novel Data Mining Approach to Identifying and Validating Gene Expression Data. In: SPIT-IEEE International conference and colloquium, Mumbai, India, vol. 4, pp. 107–112 (2008)
Dhiraj, K., Rath, S.K.: Gene Expression Analysis using Clustering. In: Third IEEE International Conference on Bioinformatics and Biomedical Engineering, to be held on June 11th to 13th in Beijing, China (2009) ISBN: 978-1-4244-2902-8
Dhiraj, K., Rath, S.K.: Family of Genetic Algorithm Based Clustering Algorithm for Pattern Recognition. In: 1st IIMA International Conference on Advanced Data Analysis, Business Analytics and Intelligence, to be held on June 6th to 7th in IIM Ahmedabad, INDIA (2009)
Dhiraj, K., Rath, S.K.: Comparison of SGA and RGA based clustering algorithm for pattern recognition. International Journal of Recent Trends in Engineering 1(1) (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dhiraj, K., Rath, S.K., Babu, K.S. (2009). FCM for Gene Expression Bioinformatics Data. In: Ranka, S., et al. Contemporary Computing. IC3 2009. Communications in Computer and Information Science, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-03547-0_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03546-3
Online ISBN: 978-3-642-03547-0
eBook Packages: Computer ScienceComputer Science (R0)