[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2926534.2926535acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Cross-system NoSQL data transformations with NotaQL

Published: 26 June 2016 Publication History

Abstract

The rising adoption of NoSQL technology in enterprises causes a heterogeneous landscape of different data stores. Different stores provide distinct advantages and disadvantages, making it necessary for enterprises to facilitate multiple systems for specific purposes. This resulting polyglot persistence is difficult to handle for developers since some data needs to be replicated and aggregated between different and within the same stores. Currently, there are no uniform tools to perform these data transformations since all stores feature different APIs and data models. In this paper, we present the transformation language NotaQL that allows cross-system data transformations. These transformations are output-oriented, meaning that the structure of a transformation script is similar to that of the output. Besides, we provide an aggregation-centric approach, which makes aggregation operations as easy as possible.

References

[1]
Amazon Web Services, Inc. DynamoDB. http://aws.amazon.com/de/dynamodb/, 2015.
[2]
Apache Calcite. http://calcite.incubator.apache.org/.
[3]
Apache Cassandra. http://cassandra.apache.org/.
[4]
Apache CouchDB. http://couchdb.apache.org/.
[5]
Apache Hadoop project. http://hadoop.apache.org/.
[6]
Apache HBase. http://hbase.apache.org/.
[7]
Apache Phoenix - "We put the SQL back to NoSQL". http://phoenix.incubator.apache.org/.
[8]
ArangoDB. http://www.arangodb.com.
[9]
Basho Technologies, Inc. Riak. http://basho.com/riak/, 2015.
[10]
V. Benzaken, G. Castagna, K. Nguyen, and J. Siméon. Static and dynamic semantics of NoSQL languages. In ACM SIGPLAN Notices, volume 48, pages 101--114. ACM, 2013.
[11]
K. S. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.-C. Kanne, F. Ozcan, and E. J. Shekita. Jaql: A scripting language for large scale semistructured data analysis. In Proceedings of VLDB Conference, 2011.
[12]
R. Cattell. Scalable SQL and NoSQL data stores. ACM SIGMOD Record, 39(4):12--27, 2011.
[13]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008.
[14]
J. Clark et al. Xsl transformations (xslt). World Wide Web Consortium (W3C). URL http://www.w3.org/TR/xslt, 1999.
[15]
E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. Knowledge and Data Engineering, IEEE Transactions on, 13(1):64--78, 2001.
[16]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI, pages 137--150, 2004.
[17]
A. Eisenberg and J. Melton. SQL/XML and the SQLX Informal Group of Companies. Sigmod Record, 30(3):105--108, 2001.
[18]
Foreign data wrappers - PostgreSQL wiki. https://wiki.postgresql.org/wiki/Foreign\_data\_wrappers.
[19]
S. Goessner. JSONPath - XPath for JSON, 2007.
[20]
W. X. Q. W. Groups. The XPath 2.0 standard, 2007.
[21]
JSON. http://www.json.org/.
[22]
JSON to JSON transformation library. https://github.com/bazaarvoice/jolt/.
[23]
B. Kolev, P. Valduriez, C. Bondiombouy, R. Jiménez-Peris, R. Pau, and J. Pereira. CloudMdsQL: Querying heterogeneous cloud data stores with a common language. Distributed and Parallel Databases, pages 1--41, 2015.
[24]
L. V. Lakshmanan, F. Sadri, and I. N. Subramanian. SchemaSQL-a language for interoperability in relational multi-database systems. In VLDB, volume 96, pages 239--250, 1996.
[25]
J. Melton, J. E. Michels, V. Josifovski, K. Kulkarni, and P. Schwarz. SQL/MED: a status report. ACM SIGMOD Record, 31(3):81--89, 2002.
[26]
MongoDB. https://www.mongodb.org.
[27]
K. W. Ong, Y. Papakonstantinou, and R. Vernoux. The SQL++ Query Language: Configurable, Unifying and Semi-structured. arXiv preprint arXiv:1405.3631, 2014.
[28]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. (1999-66), November 1999.
[29]
T. Parr. ANTLR. http://www.antlr.org/, 2015.
[30]
Redis. https://www.redis.io.
[31]
M. T. Roth and P. M. Schwarz. Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources. In VLDB, volume 97, pages 25--29, 1997.
[32]
P. J. Sadalage and M. Fowler. NoSQL Distilled: A brief guide to the emerging world of polyglot persistence. Addison-Wesley Professional, 1st edition, 2012.
[33]
G. Salton and C.-S. Yang. On the specification of term values in automatic indexing. Journal of documentation, 29(4):351--372, 1973.
[34]
K. Sato. An inside look at google bigquery. White paper, 2012.
[35]
M. Schaarschmidt, F. Gessert, and N. Ritter. Towards Automated Polyglot Persistence. Datenbanksysteme für Business, Technologie und Web (BTW), 2015.
[36]
S. Scherzinger, M. Klettke, and U. Störl. Managing schema evolution in NoSQL data stores. arXiv preprint arXiv:1308.0514, 2013.
[37]
J. Schildgen and S. Deßloch. Incremental Data Transformations on Wide-Column Stores with NotaQL. In 9th Symposium and Summer School On Service-Oriented Computing (SummerSoc), Crete, Greece, 6 2015.
[38]
J. Schildgen and S. Deßloch. NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column Stores. In British International Conference on Databases - BICOD 2015, 7 2015.
[39]
S. Strauch, V. Andrikopoulos, and T. Bachmann. Migrating application data to the cloud using cloud data. In 3rd International Conference on Cloud Computing and Service Science,(CLOSER), pages 36--46. Citeseer, 2013.
[40]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2):1626--1629, 2009.
[41]
Transformy.io. https://www.transformy.io.
[42]
C. M. Wyss and E. L. Robertson. Relational languages for metadata integration. ACM Transactions on Database Systems (TODS), 30(2):624--660, 2005.
[43]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, volume 10, page 10, 2010.

Cited By

View all
  • (2019)Evolution Management of Multi-model DataHeterogeneous Data Management, Polystores, and Analytics for Healthcare10.1007/978-3-030-33752-0_10(139-153)Online publication date: 23-Oct-2019
  • (2017)Data transformation as a means towards dynamic data storage and polyglot persistenceInternational Journal of Network Management10.1002/nem.197627:4Online publication date: 4-May-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BeyondMR '16: Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond
June 2016
70 pages
ISBN:9781450343114
DOI:10.1145/2926534
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco

Acceptance Rates

BeyondMR '16 Paper Acceptance Rate 10 of 19 submissions, 53%;
Overall Acceptance Rate 19 of 36 submissions, 53%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Evolution Management of Multi-model DataHeterogeneous Data Management, Polystores, and Analytics for Healthcare10.1007/978-3-030-33752-0_10(139-153)Online publication date: 23-Oct-2019
  • (2017)Data transformation as a means towards dynamic data storage and polyglot persistenceInternational Journal of Network Management10.1002/nem.197627:4Online publication date: 4-May-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media