[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2389686.2389690acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

SciQL: a query language for unified scientific data processing and management

Published: 02 November 2012 Publication History

Abstract

Science is more and more data-driven. This means, that a significant part of a scientist's work is dedicated to accessing, visualizing, integrating and analyzing data from a possibly wide range of heterogeneous sources. In this paper we propose SciQL, a query language that supports scientists in this task and allows them to focus on their main purpose, i.e., on doing research.
SciQL sits between scientists or data processing tools on the one hand and different data sources on the other hand in order to decouple users from technical aspects of accessing data. It allows users to express their data management, refinement, transformation, processing procedures and visualizations in SciQL regardless of the syntax and capabilities of the underlying physical data source sources. This way scientists and client tools deal with only one language to interact with different data sources, e.g., text files, spreadsheets, relational DBMSs, or MapReduce systems. To achieve this, SciQL provides various constructs among them Schema Definition, (e.g., schema design and Data transformation), Data Retrieval (connecting to various data sources and formats, filtering, joining, grouping), Data Manipulation (e.g. Updating, deleting, versioning and provenance) and Visualization commands and data structures can be named.
In this paper, we will discuss the general idea why we believe SciQL is needed, and explain the goals and the steps we intend to take in order to achieve these aims.

References

[1]
Apache™ Hadoop™. http://hadoop.apache.org/.
[2]
Exploratories for large-scale and long-term functional biodiversity research. German Research Foundation (DFG) Priority Programm No. 1374. http://www.biodiversity-exploratories.de/.
[3]
Google Refine. http://code.google.com/p/google-refine/.
[4]
HTSQL: Hyper Text Structured Query Language. http://htsql.org/.
[5]
Matlab: The Language of Technical Computing. http://www.mathworks.com/products/matlab/.
[6]
RStudio™, free and open source integrated development environment for R. http://www.rstudio.org/.
[7]
The Open Provenance Model (OPM). http://openprovenance.org/.
[8]
UnQL: Unstructured Query Language. http://www.unqlspec.org/display/UnQL/Home.
[9]
Writing R Extensions. http://cran.r-project.org/doc/manuals/R-exts.html.
[10]
A. Ailamaki, V. Kantere, and D. Dash. Managing scientific data. Commun. ACM, 53(6):68--78, 2010.
[11]
BExIS++. http://www.uni-jena.de/Mitteilungen/PM110525_bexis.html.
[12]
K. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.-C. Kanne, F. Özcan, and E. Shekita. Jaql: A Scripting Language for Large Scale Semistructured Data Analysis. In H. V. Jagadish, J. Blakeley, J. M. Hellerstein, N. Koudas, W. Lehner, S. Sarawagi, and U. Röhm, editors, Proceedings of the 37th International Conference on Very Large Data Bases, volume 4 of Proceedings of the VLDB Endowment, pages 1272--1283, Seattle, USA, 2011. VLDB Endowment.
[13]
R. Bose. A conceptual framework for composing and managing scientific data lineage. In SSDBM {13}, pages 15--19.
[14]
M. Corporation. Language-Integrated Query, November 2007. http://msdn.microsoft.com/en-us/library/bb397926.aspx.
[15]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, January 2008.
[16]
J. L. Denis Mukhin, David A. James. OCI Based Oracle Database Interface for R, 2012. http://cran.r-project.org/web/-packages/ROracle/index.html.
[17]
J. Hah, Y. Fu, W. Wang, K. Koperski, and O. Zaiane. DMQL: A data mining query language for relational databases, 1996.
[18]
A. Hey, S. Tansley, and K. Tolle. The fourth paradigm: data-intensive scientific discovery. Microsoft Research Redmond, WA, 2009.
[19]
T. N. S. K. P. d. . N. T. Joe Conway, Dirk Eddelbuettel. R Interface to the PostgreSQL Database System, 2012-01--29. http://-cran.r-project.org/web/packages/RPostgreSQL/index.html.
[20]
The Kepler Project. https://kepler-project.org.
[21]
W. Li, J. Shim, and K. Candan. Webdb: A system for querying semi-structured data on the web. Journal of Visual Languages & Computing, 13(1):3--33, 2002.
[22]
A. D. LLC. SSAS Entity Framework Provider to query OLAP cubes using Linq. http://www.agiledesignllc.com/Products/-SsasEFProvider/Features.
[23]
P. Missier, S. S. Sahoo, J. Zhao, C. A. Goble, and A. P. Sheth. Janus: From workflows to semantic provenance and linked open data. In D. L. McGuinness, J. Michaelis, and L. Moreau, editors, IPAW, volume 6378 of Lecture Notes in Computer Science, pages 129--141. Springer, 2010.
[24]
P. Prabhu, T. B. Jablin, A. Raman, Y. Zhang, J. Huang, H. Kim, N. P. Johnson, F. Liu, S. Ghosh, S. Beard, T. Oh, M. Zoufaly, D. Walker, and D. I. August. A Survey of the Practice of Computational Science. In ACM, editor, SC '11 State of the Practice Reports, pages 19:1--19:12, pub-ACM:adr, 2011. ACM Press.
[25]
R. I. Robert Gentleman. R Statistical Language, 2012. http://www.r-project.org/.
[26]
Z. Shen, J. Li, C. Li, X. He, and X. Su. VisualDB: Managing and publishing scientific data on the web. In CyberC, pages 399--404. IEEE, 2011.
[27]
B. Shishedjiev, M. Goranova, J. Georgieva, and V. Gancheva. Processing and managing scientific data in SOA environment. In Proceedings of the 9th WSEAS International Conference on APPLIED INFORMATICS AND COMMUNICATIONS (AIC '09), pages 25--30. WSEAS Press, Aug. 2009.
[28]
Taverna Workflow Management System. http://www.taverna.org.uk.
[29]
S. Urbanek. R interface to databases through the JDBC interface, 2011-05-17. http://cran.r-project.org/web/packages/RJDBC/-index.html.
[30]
VisTrails. http://www.vistrails.org/index.php/Main_Page.

Cited By

View all
  • (2014)An Introduction to the Data Retrieval Facilities of the XQt Language for Scientific DataData Integration in the Life Sciences10.1007/978-3-319-08590-6_14(143-150)Online publication date: 2014

Index Terms

  1. SciQL: a query language for unified scientific data processing and management

                      Recommendations

                      Comments

                      Please enable JavaScript to view thecomments powered by Disqus.

                      Information & Contributors

                      Information

                      Published In

                      cover image ACM Conferences
                      PIKM '12: Proceedings of the 5th Ph.D. workshop on Information and knowledge
                      November 2012
                      108 pages
                      ISBN:9781450317191
                      DOI:10.1145/2389686
                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Sponsors

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      Published: 02 November 2012

                      Permissions

                      Request permissions for this article.

                      Check for updates

                      Author Tags

                      1. data lifecycle management
                      2. scientific data processing
                      3. scientific query language

                      Qualifiers

                      • Research-article

                      Conference

                      CIKM'12
                      Sponsor:

                      Acceptance Rates

                      Overall Acceptance Rate 25 of 62 submissions, 40%

                      Upcoming Conference

                      CIKM '25

                      Contributors

                      Other Metrics

                      Bibliometrics & Citations

                      Bibliometrics

                      Article Metrics

                      • Downloads (Last 12 months)5
                      • Downloads (Last 6 weeks)1
                      Reflects downloads up to 05 Mar 2025

                      Other Metrics

                      Citations

                      Cited By

                      View all
                      • (2014)An Introduction to the Data Retrieval Facilities of the XQt Language for Scientific DataData Integration in the Life Sciences10.1007/978-3-319-08590-6_14(143-150)Online publication date: 2014

                      View Options

                      Login options

                      View options

                      PDF

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader

                      Figures

                      Tables

                      Media

                      Share

                      Share

                      Share this Publication link

                      Share on social media