[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

A linear algebra approach to OLAP

Published: 01 March 2015 Publication History

Abstract

Inspired by the relational algebra of data processing, this paper addresses the foundations of data analytical processing from a linear algebra perspective. The paper investigates, in particular, how aggregation operations such as cross tabulations and data cubes essential to quantitative analysis of data can be expressed solely in terms of matrix multiplication, transposition and the Khatri–Rao variant of the Kronecker product. The approach offers a basis for deriving an algebraic theory of data consolidation, handling the quantitative as well as qualitative sides of data science in a natural, elegant and typed way. It also shows potential for parallel analytical processing, as the parallelization theory of such matrix operations is well acknowledged.

References

References

[1]
Backhouse K and Backhouse RC Safety of abstract interpretations for free, via logical relations and Galois connections Sci Comput Programm 2004 15 1–2 153-196
[2]
Bird R, de Moor O (1997) Algebra of programming. In: Hoare CAR (ed) Series in computer science. Prentice-Hall International, New Jersey
[3]
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis, SC’09. ACM, New York, pp 18:1–18:11
[4]
Bird RS (1989) Lecture notes on constructive functional programming, 1989. In: Broy M (ed) CMCS Int. Summer School directed by F.L. Bauer [et al.], vol 55. Springer, NATO Adv. Science Institute (Series F: Comp. and System Sciences), Berlin
[5]
Backhouse RC, Michaelis D (2006) Exercises in quantifier manipulation. In: Uustalu T (ed) MPC’06. LNCS, vol 4014. Springer, Berlin, pp 70–81
[6]
Chaudhuri S and Dayal U An overview of data warehousing and OLAP technology SIGMOD Rec. 1997 26 65-74
[7]
Codd EF A relational model of data for large shared data banks CACM 1970 13 6 377-387
[8]
Desharnais J, Grinenko A, and Möller B Relational style laws and constructs of linear algebra J Logic Algebr Meth Program 2014 83 2 154-168
[9]
Davenport TH, Patil DJ (2012) Data scientist: the sexiest job of the 21st century. Oct Harv Bus Rev
[10]
Datta A and Thomas H The cube data model: a conceptual model and algebra for on-line analytical processing in data warehouses Dec Supp Syst 1999 27 3 289-301
[11]
Eavis T, Dimitrov G, Dimitrov I, Cueva D, Lopez A, and Taleb A Parallel OLAP with the Sidera server Future Gener Comput Syst 2010 26 2 259-266
[12]
Frias MF Fork algebras in algebra, logic and computer science. In: Logic and computer science 2002 Singapore World Scientific Publishing Co.
[13]
Gray J, Bosworth A, Layman A, Pirahesh H (1996) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: Su SYW (ed) Proceedings of the 12th int. conf. on data engineering, Feb. 26–Mar. 1, 1996, New Orleans, Louisiana. IEEE Computer Society, New York, pp 152–159
[14]
Goil S and Choudhary A High performance OLAP and data mining on parallel computers Data Min Knowl Discov 1997 1 391-417
[15]
Goil S and Choudhary A Parsimony: an infrastructure for parallel multidimensional analysis and data mining J Parallel Distrib Comput 2001 61 3 285-321
[16]
Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, and Pirahesh H Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals Data Min Knowl Disc 1997 1 1 29-53
[17]
Gyssens M, Lakshmanan LVS (1997) A foundation for multi-dimensional databases. VLDB J 106–115
[18]
Johnson T, Lakshmanan LV, Ng RT (2000) The 3w model and algebra for unified data mining. VLDB 21–32
[19]
Jensen CS, Pedersen TB, and Thomsen C Multidimensional databases and data warehousing. In: Synthesis Lectures on Data Management 2010 San Rafael Morgan & Claypool Publishers
[20]
Macedo H. (2012) Matrices as arrows—why categories of matrices matter. PhD thesis, University of Minho, October, MAPi PhD programme
[21]
Maier D The theory of relational databases 1983 Rockville Computer Science Press
[22]
Macedo HD, Oliveira JN (2010) Matrices as arrows! A biproduct approach to typed linear algebra. In: MPC, LNCS, vol 6120. Springer, Berlin, pp 271–287
[23]
Macedo HD, Oliveira JN (2011) Do the two middle letters of “OLAP” stand for linear algebra (“LA”)? Technical report TR-HASLab:4:2011, HASLab, U.Minho & INESC TEC, July. http://wiki.di.uminho.pt/twiki/bin/view/DI/FMHAS/TechnicalReports
[24]
Macedo HD, Oliveira JN (2011) Towards linear algebras of components. In: FACS 2010 of LNCS, vol 6921. Springer, Berlin, pp 300–303
[25]
Macedo HD and Oliveira JN Typing linear algebra: a biproduct-oriented approach Sci Comput Program 2013 78 11 2160-2191
[26]
Macedo HD, Oliveira JN (2014) Typed linear algebra for the data scientist (In preparation)
[27]
Ng RT, Wagner A, and Yin Y Iceberg-cube computation with PC clusters SIGMOD Rec. 2001 30 25-36
[28]
Oliveira JN (2009) Extended static checking by calculation using the pointfree transform. LNCS, vol 5520. Springer, Berlin, pp 195–251
[29]
Oliveira JN (2011) Pointfree foundations for (generic) lossless decomposition. Technical report TR-HASLab:3:2011, HASLab, U.Minho & INESC TEC. http://wiki.di.uminho.pt/twiki/bin/view/DI/FMHAS/TechnicalReports.
[30]
Oliveira JN Towards a linear algebra of programming Formal Aspects Comput 2012 24 4–6 433-458
[31]
Oliveira JN Weighted automata as coalgebras in categories of matrices Int J Found Comp Sci 2013 24 06 709-728
[32]
Oliveira JN A relation-algebraic approach to the “Hoare logic” of functional dependencies JLAP 2014 83 2 249-262
[33]
Oliveira JN (2014) Relational algebra for “just good enough" hardware. In: RAMiCS. LNCS, vol 8428. Springer, Berlin, pp 119–138
[34]
O’Neil P (1989) Model 204 architecture and performance. In: Gawlick D, Haynie M, Reuter A (ed) High performance transaction systems. Lecture notes in computer science, vol 359. Springer, Berlin, pp 39–59
[35]
Pedersen TB and Jensen CS Multidimensional database technology Computer 2001 34 40-46
[36]
Park C-S, Kim MH, and Lee Y-J Finding an efficient rewriting of OLAP queries using materialized views in data warehouses Dec Supp Syst 2002 32 4 379-399
[37]
Rao C.R., Rao M.B. (1998) Matrix algebra and its applications to statistics and econometrics. World Scientific Pub Co Inc
[38]
Sorber L, Barel M, Lathauwer L (2014) Tensorlab v2.0: a MATLAB toolbox for tensor computations, January. http://www.tensorlab.net
[39]
Schmidt G (2011) Relational mathematics. Encyclopedia of mathematics and its applications, vol 132, Cambridge U.P.
[40]
Sorjonen S (2012) OLAP query performance in column-oriented databases. Columnar databases seminar, DCS. University of Helsinki. https://www.cs.helsinki.fi/en/courses/58312305/2012/s/s/1.
[41]
Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: KDD’06: proc. of the 12th ACM SIGKDD int. conf. on knowledge discovery and data mining. ACM, New York, pp 374–383
[42]
Sun J, Tao D, Papadimitriou S, Yu PS, Faloutsos C (2008) Incremental tensor analysis: theory and applications. ACM Trans Knowl Discov Data 2:11:1–11:37
[43]
Vassiliadis P and Sellis T A survey of logical models for OLAP databases SIGMOD Rec 1999 28 4 64-69
[44]
Wu K, Otoo EJ, and Shoshani A Optimizing bitmap indices with efficient compression ACM Trans Database Syst 2006 31 1-38
[45]
Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, and Demmel J Optimization of sparse matrix-vector multiplication on emerging multicore platforms Parallel Comput 2009 35 178-194
[46]
Whitehorn M, Zare R, and Pasumansky M Fast track to MDX 2002 Berlin Springer
[47]
Yang G, Jin R, and Agrawal G Implementing data cube construction using a cluster middleware: algorithms, implementation experience, and performance evaluation Future Gener Comput Syst 2003 19 4 533-550
[48]
Yang X, Parthasarathy S, and Sadayappan P Fast sparse matrix-vector multiplication on GPUs: implications for graph mining Proc VLDB Endowment 2011 4 231-242

Cited By

View all
  • (2022)Data Model Design to Support Data-Driven IT Governance ImplementationTechnologies10.3390/technologies1005010610:5(106)Online publication date: 8-Oct-2022
  • (2022)Sparse convolutional array for DOA estimationEURASIP Journal on Advances in Signal Processing10.1186/s13634-022-00904-02022:1Online publication date: 23-Oct-2022
  • (2022)A Survey of Practical Formal Methods for SecurityFormal Aspects of Computing10.1145/352258234:1(1-39)Online publication date: 5-Jul-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Formal Aspects of Computing
Formal Aspects of Computing  Volume 27, Issue 2
Mar 2015
231 pages
ISSN:0934-5043
EISSN:1433-299X
Issue’s Table of Contents

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2015
Accepted: 18 August 2014
Revision received: 14 August 2014
Received: 25 April 2013
Published in FAC Volume 27, Issue 2

Author Tags

  1. Software engineering
  2. Formal methods
  3. Data science

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)13
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Data Model Design to Support Data-Driven IT Governance ImplementationTechnologies10.3390/technologies1005010610:5(106)Online publication date: 8-Oct-2022
  • (2022)Sparse convolutional array for DOA estimationEURASIP Journal on Advances in Signal Processing10.1186/s13634-022-00904-02022:1Online publication date: 23-Oct-2022
  • (2022)A Survey of Practical Formal Methods for SecurityFormal Aspects of Computing10.1145/352258234:1(1-39)Online publication date: 5-Jul-2022
  • (2021)AlgorithmicsAdvancing Research in Information and Communication Technology10.1007/978-3-030-81701-5_3(59-98)Online publication date: 4-Aug-2021
  • (2020)Data Warehouse Hybrid Modeling MethodologyData Science Journal10.5334/dsj-2020-03819Online publication date: 13-Oct-2020
  • (2018)An algebraic framework for minimum spanning tree problemsTheoretical Computer Science10.1016/j.tcs.2018.04.012744(37-55)Online publication date: Oct-2018
  • (2018)Verifying minimum spanning tree algorithms with Stone relation algebrasJournal of Logical and Algebraic Methods in Programming10.1016/j.jlamp.2018.09.005101(132-150)Online publication date: Dec-2018
  • (2017)An algebra for OLAPIntelligent Data Analysis10.3233/IDA-16316121:5(1267-1300)Online publication date: 1-Jan-2017
  • (2017)The data cube as a typed linear algebra operatorProceedings of The 16th International Symposium on Database Programming Languages10.1145/3122831.3122834(1-11)Online publication date: 1-Sep-2017
  • (2017)Relations in linear algebraJournal of Logical and Algebraic Methods in Programming10.1016/j.jlamp.2017.05.00391(1-16)Online publication date: Oct-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media