Abstract
The two mature disciplines, namely Data Mining and Data Warehousing have broadly the same set of objectives. Yet, they have developed largely separate from each other resulting in different techniques being used in each discipline. It has been recognized that mining techniques developed for pattern recognition such as Clustering and Visualization can assist in designing data warehouse schema. However, a suitable methodology is required for the seamless integration of mining methods in the design of warehouse schema. In previous work, we presented a methodology that employs hierarchical clustering to derive a tree structure that can be used by a data warehouse designer to build a schema. We believe that, in order to strengthen the decision making process, there is a strong need for a method that automatically extracts knowledge present at different levels of abstraction from a warehouse. We demonstrate with examples how mining at different levels of a hierarchical warehouse schema can give new insights about the underlying data cluster which not only helps in building more meaningful dimensions and facts for data warehouse design but can also improve the decision making process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering 14(4), 673–690 (2002)
Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering 63(2), 503–527 (2007)
Rosario, G.E., Rundensteiner, E.A., Brown, D.C., et al.: Mapping nominal values to numbers for effective visualization. Information Visualization 3(2), 80–95 (2004)
Ankerst, M., Berchtold, S., Keim, D.A.: Similarity clustering of dimensions for an enhanced visualization of multidimensional data. In: Proceedings of the IEEE Symposium on Information Visualization(InfoVis), p. 52 (1998)
Fua, Y.H., Ward, M.O., Rundensteiner, E.A.: Hierarchical parallel coordinates for exploration of large datasets, pp. 43–50
Chen, J.X., Wang, S.: Data visualization: parallel coordinates and dimension reduction. Computing in Science & Engineering 3(5), 110–112 (2001)
Artero, A.O., de Oliveira, M.C.F., Levkowitz, H.: Enhanced high dimensional data visualization through dimension reduction and attribute arrangement, pp. 707–712
Dori, D., Feldman, R., Sturm, A.: From conceptual models to schemata: An object-process-based data warehouse construction method. Information Systems 33(6), 567–593 (2008)
Kohavi. R., Becker. B.: UCI repository of machine learning databases, (January 20, 2011), http://archive.ics.uci.edu/ml/datasets/Adult , http://archive.ics.uci.edu/ml/datasets/Adult
Seo, J., Bakay, M., Zhao, P., et al.: Interactive color mosaic and dendrogram displays for signal/noise optimization in microarray data analysis, pp. 461–464
Ward, M.O.: Xmdvtool: Integrating multiple methods for visualizing multivariate data, pp. 326–333
Soni, S., Kurtz, W.: Analysis Services: optimizing cube performance using Microsoft SQL server 2000 Analysis Services. Microsoft SQL Server 2000 Technical Articles (2001)
Milenova, B.L., Campos, M.M.: O-cluster: scalable clustering of large high dimensional data sets, pp. 290–297
Milenova, B.L., Campos, M.M.: Clustering large databases with numeric and nominal values using orthogonal projections
Doring, C., Borgelt, C., Kruse, R.: Fuzzy clustering of quantitative and qualitative data, pp. 84–89
Luo, H., Kong, F., Li, Y.: Clustering mixed data based on evidence accumulation. Advanced Data Mining and Applications 4093, 348–355 (2006)
McCane, B., Albert, M.: Distance functions for categorical and mixed variables. Pattern Recognition Letters 29(7), 986–993 (2008)
Hsu, C.C., Chen, C.L., Su, Y.W.: Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences 177(20), 4474–4492 (2007)
Artero, A.O., de Oliveira, M.C.F., Levkowitz, H.: Uncovering clusters in crowded parallel coordinates visualizations. In: Proceedings of the IEEE Symposium on Information Visualization(InfoVis), pp. 81–88 (2004)
Pardillo, J., Mazón, J.N.: Designing OLAP schemata for data warehouses from conceptual models with MDA. Decision Support Systems (2010)
Palopoli, L., Pontieri, L., Terracina, G., et al.: A novel three-level architecture for large data warehouses* 1. Journal of Systems Architecture 47(11), 937–958 (2002)
Song, I.Y., Khare, R., An, Y., et al.: Samstar: An automatic tool for generating star schemas from an entity-relationship diagram, pp. 522–523
Usman, M., Asghar, S., Fong, S.: A Conceptual Model for Combining Enhanced OLAP and Data Mining Systems. In: 2009 Fifth International Joint Conference on INC, IMS and IDC, pp. 1958–1963 (2009)
Usman, M., Asghar, S., Fong, S.: Integrated Performance and Visualization Enhancement of OLAP Using Growing Self Organizing Neural Networks. Journal of Advances in Information Technology 1(1), 26–37 (2010)
Asghar, S., Alahakoon, D., Hsu, A.: Enhancing OLAP functionality using self-organizing neural networks. Neural, Parallel & Scientific Computations 12(1), 1–20 (2004)
Goil, S., Choudhary, A.: PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining. Journal of parallel and distributed computing 61(3), 285–321 (2001)
Usman, M., Pears, R.: A methodology for integrating and exploiting data mining techniques in the design of data warehouses. In: Proceedings of ICMIA2010 2nd International Conference on Data Mining and Intelligent Information Technology Applications, Seoul (November 2010)
Kohavi, R., Becker, B.: Adult dataset (1996), http://archive.ics.uci.edu/ml/datasets/Adult
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Usman, M., Pears, R. (2011). Multi Level Mining of Warehouse Schema. In: Fong, S. (eds) Networked Digital Technologies. NDT 2011. Communications in Computer and Information Science, vol 136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22185-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-22185-9_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22184-2
Online ISBN: 978-3-642-22185-9
eBook Packages: Computer ScienceComputer Science (R0)