[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2032397.2032399guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

The architecture of SciDB

Published: 20 July 2011 Publication History

Abstract

SciDB is an open-source analytical database oriented toward the data management needs of scientists. As such it mixes statistical and linear algebra operations with data management ones, using a natural nested multidimensional array data model. We have been working on the code for two years, most recently with the help of venture capital backing. Release 11.06 (June 2011) is downloadable from our website (SciDB.org).
This paper presents the main design decisions of SciDB. It focuses on our decisions concerning a high-level, SQL-like query language, the issues facing our query optimizer and executor and efficient storage management for arrays. The paper also discusses implementation of features not usually present in DBMSs, including version control, uncertainty and provenance.

References

[1]
http://arxiv.org/abs/0805.2366
[2]
Becla, J., Lim, K.-T.: Report from the First Workshop on Extremely Large Databases. Data Science Journal 7 (2008).
[3]
Szalay, A.: Private communication.
[4]
Branco, M., Cameron, D., Gaidioz, B., Garonne, V., Koblitz, B., Lassnig, M., Rocha, R., Salgado, P., Wenaus, T.: Managing ATLAS data on a petabyte-scale with DQ2. Journal of Physics: Conference Series 119 (2008).
[5]
Szalay, A.: The Sloan Digital Sky Survey and Beyond. In: SIGMOD Record (June 2008).
[6]
Cudre-Mauroux, P., et al.: A Demonstration of SciDB: a Science-oriented DBMS. VLDB 2(2), 1534-1537 (2009).
[7]
Becla, J., Lim, K.-T.: Report from the Second Workshop on Extremely Large Databases, http://www-conf.slac.stanford.edu/xldb08/, http://www.jstage.jst.go.jp/article/dsj/7/0/1/_pdf
[8]
Becla, J., Lim, K.-T.: Report from the Third Workshop on Extremely Large Databases, http://www-conf.slac.stanford.edu/xldb09/
[9]
Becla, J., Lim, K.-T.: Report from the Fourth Workshop on Extremely Large Databases, http://www-conf.slac.stanford.edu/xldb10/
[10]
Cudre-Maroux, P., et al.: SS-DB: A Standard Science DBMS Benchmark (submitted for publication).
[11]
http://www.hdfgroup.org/HDF5/
[12]
http://en.wikipedia.org/wiki/APLprogramming_language
[13]
http://en.wikipedia.org/wiki/Functional_programming
[14]
http://kx.com/
[15]
Stonebraker, M., Rowe, L.A., Hirohama, M.: The Implementation of POSTGRES. IEEE Transactions on Knowledge and Data Engineering 2(1), 125-142 (1990).
[16]
http://developer.postgresql.org/docs/postgres/xaggr.html
[17]
http://www.netlib.org/ScaLAPACK/
[18]
Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays. In: ICDE, pp. 328-336 (1994), http://citeseer.ist.psu.edu/article/sarawagi94efficient.html
[19]
Soroush, E., et al.: ArrayStore: A Storage Manager for Complex Parallel Array Processing. In: Proc. 2011 SIGMOD Conference (2011).
[20]
Seering, A., et al.: Efficient Versioning for Scientific Arrays (submitted for publication).
[21]
Mutsuzaki, M., Theobald, M., de Keijzer, A., Widom, J., Agrawal, P., Benjelloun, O., Das Sarma, A., Murthy, R., Sugihara, T.: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS. In: Proceedings of the 2007 CIDR Conference, Asilomar, CA (January 2007).
[22]
Wu, E., et al.: The SciDB Provenance System (in preparation).
[23]
Cohen, J., et al.: Mad Skills: New Analysis Practices for Big Data. In: Proc. 2009 VLDB Conference
[24]
http://www.r-project.org/
[25]
http://monetdb.cwi.nl/
[26]
van Ballegooij, A., Cornacchia, R., de Vries, A.P., Kersten, M.L.: Distribution Rules for Array Database Queries. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 55-64. Springer, Heidelberg (2005).

Cited By

View all
  • (2024)PreVision: An Out-of-Core Matrix Computation System with Optimal Buffer ReplacementProceedings of the ACM on Management of Data10.1145/36392972:1(1-25)Online publication date: 26-Mar-2024
  • (2022)Serving deep learning models with deduplication from relational databasesProceedings of the VLDB Endowment10.14778/3547305.354732515:10(2230-2243)Online publication date: 7-Sep-2022
  • (2021)Adaptive code generation for data-intensive analyticsProceedings of the VLDB Endowment10.14778/3447689.344769714:6(929-942)Online publication date: 12-Apr-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
SSDBM'11: Proceedings of the 23rd international conference on Scientific and statistical database management
July 2011
601 pages
ISBN:9783642223501

Sponsors

  • Paradigm4 Inc.: Paradigm4 Inc.
  • Microsoft Research: Microsoft Research
  • Gordon and Betty Moore Foundation: Gordon and Betty Moore Foundation
  • eScience Institute: eScience Institute

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 20 July 2011

Author Tags

  1. linear algebra
  2. multi-dimensional array
  3. scientific data management
  4. statistics

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PreVision: An Out-of-Core Matrix Computation System with Optimal Buffer ReplacementProceedings of the ACM on Management of Data10.1145/36392972:1(1-25)Online publication date: 26-Mar-2024
  • (2022)Serving deep learning models with deduplication from relational databasesProceedings of the VLDB Endowment10.14778/3547305.354732515:10(2230-2243)Online publication date: 7-Sep-2022
  • (2021)Adaptive code generation for data-intensive analyticsProceedings of the VLDB Endowment10.14778/3447689.344769714:6(929-942)Online publication date: 12-Apr-2021
  • (2021)Subarray Skyline Query Processing in Array DatabasesProceedings of the 33rd International Conference on Scientific and Statistical Database Management10.1145/3468791.3468799(37-48)Online publication date: 6-Jul-2021
  • (2020)GraphiteProceedings of the VLDB Endowment10.14778/3380750.338075113:6(783-797)Online publication date: 11-Mar-2020
  • (2020)A Relational Matrix Algebra and its Implementation in a Column StoreProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389747(2573-2587)Online publication date: 11-Jun-2020
  • (2019)Progressive top-k subarray query processing in array databasesProceedings of the VLDB Endowment10.14778/3329772.332977612:9(989-1001)Online publication date: 1-May-2019
  • (2019)MLearnProceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning10.1145/3329486.3329494(1-4)Online publication date: 30-Jun-2019
  • (2019)Enabling and Optimizing Non-linear Feature Interactions in Factorized Linear AlgebraProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319878(1571-1588)Online publication date: 25-Jun-2019
  • (2019)DistMEProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319865(759-774)Online publication date: 25-Jun-2019
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media