[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

High Performance OLAP and Data Mining on Parallel Computers

Published: 01 December 1997 Publication History

Abstract

On-Line Analytical Processing (OLAP) techniques are increasingly being used in decision support systems to provide analysis of data. Queries posed on such systems are quite complex and require different views of data. Analytical models need to capture the multidimensionality of the underlying data, a task for which multidimensional databases are well suited. Multidimensional OLAP systems store data in multidimensional arrays on which analytical operations are performed. Knowledge discovery and data mining requires complex operations on the underlying data which can be very expensive in terms of computation time. High performance parallel systems can reduce this analysis time.
Precomputed aggregate calculations in a Data Cube can provide efficient query processing for OLAP applications. In this article, we present algorithms for construction of data cubes on distributed-memory parallel computers. Data is loaded from a relational database into a multidimensional array. We present two methods, sort-based and hash-based for loading the base cube and compare their performances. Data cubes are used to perform consolidation queries used in roll-up operations using dimension hierarchies. Finally, we show how data cubes are used for data mining using Attribute Focusing techniques. We present results for these on the IBM-SP2 parallel machine. Results show that our algorithms and techniques for OLAP and data mining on parallel systems are scalable to a large number of processors, providing a high performance platform for such applications.

References

[1]
Bhandari I., Halliday M., Tarver E., Brown D., Chaar J. and Chillarege R., "A case study of software process improvement during development", IEEE Transactions on Software Engineering, 19(12), December 1993, pp. 1157-1170.
[2]
Bhandari I., "Attribute Focusing: Data mining for the layman", Research Report RC 20136, IBM T. J Watson Research Center.
[3]
Bhandari I., Colet E., et al., "Advanced Scout: Data Mining and Knowledge Discovery in NBA Data", Research Report RC 20443, IBM T. J Watson Research Center, 1996.
[4]
Codd E. F., "Providing OLAP to user-analysts : An IT mandate", Technical Report, E. F. Codd and Associates, 1993.
[5]
Fayyad U. M, Piatesky-Shapiro G., Smyth P. and Uthurusamy R., "From data mining to knowledge discovery: An overview", Advances in data mining and knowledge discovery, MIT Press, pp. 1-34.
[6]
Goil S. and Choudhary A., "Parallel Data Cube Construction for High Performance On-Line Analytical Processing", To appear in the 4th International Conference on High Performance Computing, Bangalore, India.
[7]
Gray J., Bosworth A., Layman A and Pirahesh H., "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals", Proc. International Conference on Data Engineering, 1996.
[8]
Guting A., "An Introduction to Spatial Databases", VLDB Journal, 3, 1994, pp. 357-399.
[9]
Harinarayan V., Rajaraman A. and Ullman J. D., "Implementing Data Cubes Efficiently", Proc. SIGMOD'96.
[10]
Kumar V., Grama A., Gupta A. and Karypis G., "Introduction to Parallel Computing: Design and Analysis of Algorithms", Benjamin Cummings Publishing Company, California, 1994.
[11]
"OLAP Council Benchmark" available from http://www.olapcouncil.org
[12]
Sarawagi S., Agrawal R., and Gupta A., "On Computing the Data Cube", Research Report 10026, IBM Almaden Research Center, San Jose, California, 1996.
[13]
S. Sarawagi and M. Stonebraker, "Efficient Organization of Large Multidimensional Arrays", Proc. of the Eleventh International Conference on Data Engineering, Houston, February 1994.
[14]
Zhao Y., Tufte K. and Naughton J., "On the Performance of an Array-Based ADT for OLAP Workloads", Technical Report, University of Wisconsin, Madison, May 1996.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery  Volume 1, Issue 4
December 1997
97 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 1997

Author Tags

  1. Attribute Focusing
  2. Data Cube
  3. Data mining
  4. High Performance
  5. Parallel Computing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Multithreading Heterogeneous Graph AggregationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332012736:6(2548-2562)Online publication date: 5-Oct-2023
  • (2019)Scalable distributed data cube computation for large-scale multidimensional data analysis on a Spark clusterCluster Computing10.1007/s10586-018-1811-122:1(2063-2087)Online publication date: 1-Jan-2019
  • (2015)A linear algebra approach to OLAPFormal Aspects of Computing10.1007/s00165-014-0316-927:2(283-307)Online publication date: 1-Mar-2015
  • (2013)A Practice of TPC-DS Multidimensional Implementation on NoSQL Database SystemsRevised Selected Papers of the 5th TPC Technology Conference on Performance Characterization and Benchmarking - Volume 839110.1007/978-3-319-04936-6_7(93-108)Online publication date: 26-Aug-2013
  • (2012)Data guided approach to generate multi-dimensional schema for targeted knowledge discoveryProceedings of the Tenth Australasian Data Mining Conference - Volume 13410.5555/2525373.2525400(229-240)Online publication date: 5-Dec-2012
  • (2012)A New Parallel Data Cube Construction SchemeInternational Journal of Grid and High Performance Computing10.4018/jghpc.20120401034:2(32-45)Online publication date: 1-Apr-2012
  • (2005)Developing high-performance parallel applications using EPASProceedings of the Third international conference on Parallel and Distributed Processing and Applications10.1007/11576235_46(431-441)Online publication date: 2-Nov-2005
  • (2004)Parallel ROLAP Data Cube Construction on Shared-Nothing MultiprocessorsDistributed and Parallel Databases10.1023/B:DAPD.0000018572.20283.e015:3(219-236)Online publication date: 1-May-2004
  • (2003)CubiST++Distributed and Parallel Databases10.1023/A:102553731578514:3(221-254)Online publication date: 1-Nov-2003
  • (2002)Parallelizing the Data CubeDistributed and Parallel Databases10.1023/A:101394021941511:2(181-201)Online publication date: 1-Mar-2002
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media