[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1183614.1183687acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Vector and matrix operations programmed with UDFs in a relational DBMS

Published: 06 November 2006 Publication History

Abstract

In general, a relational DBMS provides limited capabilities to perform multidimensional statistical analysis, which requires manipulating vectors and matrices. In this work, we study how to extend a DBMS with basic vector and matrix operators by programming User-Defined Functions (UDFs). We carefully analyze UDF features and limitations to implement vector and matrix operations commonly used in statistics, machine learning and data mining, paying attention to DBMS, operating system and computer architecture constraints. UDFs represent a C programming interface that allows the definition of scalar and aggregate functions that can be used in SQL. UDFs have several advantages and limitations. A UDF allows fast evaluation of arithmetic expressions, memory manipulation, using multidimensional arrays and exploiting all C language control statements. Nevertheless, a UDF cannot perform disk I/O, the amount of heap and stack memory that can be allocated is small and the UDF code must consider specific architecture characteristics of the DBMS. We experimentally compare UDFs and SQL with respect to performance, ease of use, flexibility and scalability. We profile UDFs based on call overhead, memory management and interleaved disk access. We show UDFs are faster than standard SQL aggregations and as fast as SQL arithmetic expressions.

References

[1]
C. Aggarwal and P. Yu. Finding generalized projected clusters in high dimensional spaces. In ACM SIGMOD Conference, pages 70--81, 2000.
[2]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In ACM SIGMOD Conference, pages 207--216, 1993.
[3]
P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In ACM KDD Conference, pages 9--15, 1998.
[4]
J. Clear, D. Dunn, B. Harvey, M. L. Heytens, and P. Lohman. Non-stop SQL/MX primitives for knowledge discovery. In ACM KDD Conference, pages 425--429, 1999.
[5]
R. Elmasri and S.B. Navathe. Fundamentals of Database Systems. Addison-Wesley, 4th edition, 2003.
[6]
J. Gehrke, Venkatesh Ganti, and R. Ramakrishnan. Boat-optimistic decision tree construction. In ACM SIGMOD Conference, pages 169--180, 1999.
[7]
G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In ACM KDD Conference, pages 204--208, 1998.
[8]
T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, New York, 1st edition, 2001.
[9]
S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing main-memory join on modern hardware. TKDE, 14(4):709--730, 2002.
[10]
A. Netz, S. Chaudhuri, U. Fayyad, and J. Berhardt. Integrating data mining with SQL databases: OLEDB for data mining. In IEEE ICDE Conference, 2001.
[11]
C. Ordonez. Horizontal aggregations for building tabular data sets. In ACM DMKD Conference, pages 35--42, 2004.
[12]
C. Ordonez. Programming the K-means clustering algorithm in SQL. In ACM KDD Conference, pages 823--828, 2004.
[13]
C. Ordonez. Vertical and horizontal percentage aggregations. In ACM SIGMOD Conference, pages 866--871, 2004.
[14]
C. Ordonez and P. Cereghini. SQLEM: Fast clustering in SQL using the EM algorithm. In ACM SIGMOD Conference, pages 559--570, 2000.
[15]
C. Ordonez and E. Omiecinski. FREM: Fast and robust EM clustering for large data sets. In ACM CIKM Conference, pages 590--599, 2002.
[16]
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. In ACM SIGMOD Conference, pages 343--354, 1998.
[17]
K. Sattler and O. Dunemann. SQL database primitives for decision tree classifiers. In ACM CIKM Conference, pages 379--386, 2001.
[18]
A. Witkowski, S. Bellamkonda, T. Bozkaya, G. Dorman, N. Folkert, A. Gupta, L. Sheng, and S. Subramanian. Spreadsheets in RDBMSfor OLAP. In ACM SIGMOD Conference, pages 52--63, 2003.

Cited By

View all
  • (2020)Monitoring Networks with Queries Evaluated by Edge Computing2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9377998(2223-2231)Online publication date: 10-Dec-2020
  • (2018)Integrating DBMS and Parallel Data Mining Algorithms for Modern Many-Core ProcessorsData Analytics and Management in Data Intensive Domains10.1007/978-3-319-96553-6_17(230-245)Online publication date: 13-Jul-2018
  • (2017)A Tool for Statistical Analysis on Network Big Data2017 28th International Workshop on Database and Expert Systems Applications (DEXA)10.1109/DEXA.2017.23(32-36)Online publication date: Aug-2017
  • Show More Cited By

Index Terms

  1. Vector and matrix operations programmed with UDFs in a relational DBMS

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management
      November 2006
      916 pages
      ISBN:1595934332
      DOI:10.1145/1183614
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 November 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. SQL
      2. UDF
      3. matrix
      4. vector

      Qualifiers

      • Article

      Conference

      CIKM06
      CIKM06: Conference on Information and Knowledge Management
      November 6 - 11, 2006
      Virginia, Arlington, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Monitoring Networks with Queries Evaluated by Edge Computing2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9377998(2223-2231)Online publication date: 10-Dec-2020
      • (2018)Integrating DBMS and Parallel Data Mining Algorithms for Modern Many-Core ProcessorsData Analytics and Management in Data Intensive Domains10.1007/978-3-319-96553-6_17(230-245)Online publication date: 13-Jul-2018
      • (2017)A Tool for Statistical Analysis on Network Big Data2017 28th International Workshop on Database and Expert Systems Applications (DEXA)10.1109/DEXA.2017.23(32-36)Online publication date: Aug-2017
      • (2017)Integrating the R Language Runtime System with a Data Stream WarehouseDatabase and Expert Systems Applications10.1007/978-3-319-64471-4_18(217-231)Online publication date: 2-Aug-2017
      • (2016)Weighted Moore–Penrose generalized matrix inverse: MySQL vs. Cassandra database storage systemSādhanā10.1007/s12046-016-0523-641:8(837-846)Online publication date: 18-Aug-2016
      • (2012)Dynamic optimization of generalized SQL queries with horizontal aggregationsProceedings of the 2012 ACM SIGMOD International Conference on Management of Data10.1145/2213836.2213919(637-640)Online publication date: 20-May-2012
      • (2011)Extend core UDF framework for GPU-enabled analytical query evaluationProceedings of the 15th Symposium on International Database Engineering & Applications10.1145/2076623.2076641(143-151)Online publication date: 21-Sep-2011
      • (2010)Fast UDFs to compute sufficient statistics on large data sets exploiting caching and samplingData & Knowledge Engineering10.1016/j.datak.2009.12.00169:4(383-398)Online publication date: 1-Apr-2010
      • (2009)Efficiently support MapReduce-like computation models inside parallel DBMSProceedings of the 2009 International Database Engineering & Applications Symposium10.1145/1620432.1620438(43-53)Online publication date: 16-Sep-2009
      • (2009)Efficient computation of PCA with SVD in SQLProceedings of the 2nd Workshop on Data Mining using Matrices and Tensors10.1145/1581114.1581119(1-10)Online publication date: 28-Jun-2009
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media