[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2882903.2899408acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

SnappyData: A Hybrid Transactional Analytical Store Built On Spark

Published: 26 June 2016 Publication History

Abstract

In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). In this demonstration, after presenting a few use case scenarios, we exhibit SnappyData as our our in-memory solution for delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. We show that SnappyData can exploit state-of-the-art approximate query processing techniques and a variety of data synopses. Finally, we allow the audience to define various high-level accuracy contracts (HAC), to communicate their accuracy requirements with SnappyData in an intuitive fashion.

References

[1]
Apache Samza. http://samza.apache.org/.
[2]
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013.
[3]
M. Armbrust et al. Spark SQL: Relational data processing in Spark. In SIGMOD, 2015.
[4]
L. Braun et al. Analytics in motion: High performance event-processing and real-time analytics in the same database. In SIGMOD, 2015.
[5]
G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55, 2005.
[6]
M. Kornacker et al. Impala: A modern, open-source sql engine for hadoop. In CIDR, 2015.
[7]
E. Liarou et al. Monetdb/datacell: online analytics in a streaming column-store. PVLDB, 2012.
[8]
B. Mozafari and N. Niu. A handbook for building an approximate query engine. IEEE Data Engineering Bulletin, 2015.
[9]
B. Mozafari and C. Zaniolo. Optimal load shedding with aggregates and mining queries. In ICDE, 2010.
[10]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2):1626--1629, 2009.
[11]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD, 2014.

Cited By

View all
  • (2024)Generalized Measure-Biased Sampling and Priority SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334067336:11(6251-6265)Online publication date: Nov-2024
  • (2023)Polygon Simplification for the Efficient Approximate Analytics of Georeferenced Big DataSensors10.3390/s2319817823:19(8178)Online publication date: 29-Sep-2023
  • (2023)No DBA? No Regret! Multi-Armed Bandits for Index Tuning of Analytical and HTAP Workloads With Provable GuaranteesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327166435:12(12855-12872)Online publication date: 1-Dec-2023
  • Show More Cited By

Index Terms

  1. SnappyData: A Hybrid Transactional Analytical Store Built On Spark

                              Recommendations

                              Comments

                              Please enable JavaScript to view thecomments powered by Disqus.

                              Information & Contributors

                              Information

                              Published In

                              cover image ACM Conferences
                              SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
                              June 2016
                              2300 pages
                              ISBN:9781450335317
                              DOI:10.1145/2882903
                              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                              Sponsors

                              Publisher

                              Association for Computing Machinery

                              New York, NY, United States

                              Publication History

                              Published: 26 June 2016

                              Permissions

                              Request permissions for this article.

                              Check for updates

                              Author Tags

                              1. OLAP
                              2. OLTP
                              3. in-memory database
                              4. spark
                              5. spark streaming
                              6. stream analytics
                              7. stream processing

                              Qualifiers

                              • Research-article

                              Conference

                              SIGMOD/PODS'16
                              Sponsor:
                              SIGMOD/PODS'16: International Conference on Management of Data
                              June 26 - July 1, 2016
                              California, San Francisco, USA

                              Acceptance Rates

                              Overall Acceptance Rate 785 of 4,003 submissions, 20%

                              Contributors

                              Other Metrics

                              Bibliometrics & Citations

                              Bibliometrics

                              Article Metrics

                              • Downloads (Last 12 months)14
                              • Downloads (Last 6 weeks)2
                              Reflects downloads up to 11 Dec 2024

                              Other Metrics

                              Citations

                              Cited By

                              View all
                              • (2024)Generalized Measure-Biased Sampling and Priority SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334067336:11(6251-6265)Online publication date: Nov-2024
                              • (2023)Polygon Simplification for the Efficient Approximate Analytics of Georeferenced Big DataSensors10.3390/s2319817823:19(8178)Online publication date: 29-Sep-2023
                              • (2023)No DBA? No Regret! Multi-Armed Bandits for Index Tuning of Analytical and HTAP Workloads With Provable GuaranteesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327166435:12(12855-12872)Online publication date: 1-Dec-2023
                              • (2022)ByteHTAPProceedings of the VLDB Endowment10.14778/3554821.355483215:12(3411-3424)Online publication date: 1-Aug-2022
                              • (2022)A Sketching Approach for Obtaining Real-Time Statistics Over Data Streams in CloudIEEE Transactions on Cloud Computing10.1109/TCC.2020.298702310:2(1462-1475)Online publication date: 1-Apr-2022
                              • (2022)In-Memory Indexed Caching for Distributed Data Processing2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00019(104-114)Online publication date: May-2022
                              • (2022)Salvaging failing and straggling queries2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00108(1382-1395)Online publication date: May-2022
                              • (2022)Unleashing the power of querying streaming data in a temporal database worldInformation Systems10.1016/j.is.2021.101872103:COnline publication date: 22-Apr-2022
                              • (2022)A Survey of Scheduling Tasks in Big Data: Apache SparkMicro-Electronics and Telecommunication Engineering10.1007/978-981-16-8721-1_39(405-414)Online publication date: 28-Feb-2022
                              • (2021)QoS-Aware Approximate Query Processing for Smart Cities Spatial Data StreamsSensors10.3390/s2112416021:12(4160)Online publication date: 17-Jun-2021
                              • Show More Cited By

                              View Options

                              Login options

                              View options

                              PDF

                              View or Download as a PDF file.

                              PDF

                              eReader

                              View online with eReader.

                              eReader

                              Media

                              Figures

                              Other

                              Tables

                              Share

                              Share

                              Share this Publication link

                              Share on social media