[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2820783.2820869acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
short-paper
Free access

Sphinx: distributed execution of interactive SQL queries on big spatial data

Published: 03 November 2015 Publication History

Abstract

This paper presents Sphinx, a full-fledged distributed system which uses a standard SQL interface to process big spatial data. Sphinx adds spatial data types, indexes and query processing, inside the code-base of Cloudera Impala for efficient processing of spatial data. In particular, Sphinx is composed of four main components, namely, query parser, indexer, query planner, and query executor. The query parser injects spatial data types and functions in the SQL interface of Sphinx. The indexer creates spatial indexes in Sphinx by adopting a two-layered index design. The query planner utilizes these indexes to construct efficient query plans for range query and spatial join operations. Finally, the query executor carries out these plans on big spatial datasets in a distributed cluster. A system prototype of Sphinx running on real datasets shows up-to three orders of magnitude performance improvement over traditional Impala.

References

[1]
A. Aji and et al. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. In VLDB, 2013.
[2]
T. Brinkhoff. et al Efficient Processing of Spatial Joins Using R-Trees. In SIGMOD, pages 237--246, 1993.
[3]
J. V. den Bercken. et al The Bulk Index Join: A Generic Approach to Processing Non-Equijoins. In ICDE, page 257, 1999.
[4]
A. Eldawy and M. F. Mokbel. SpatialHadoop: A MapReduce Framework for Spatial Data. In ICDE, 2015.
[5]
A. Floratou. et al SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures. PVLDB, 7(12), 2014.
[6]
E. H. Jacox and H. Samet. Spatial join techniques. TODS, 32(1):7, 2007.
[7]
M. Kornacker and et al. Impala: A Modern, Open-Source SQL Engine for Hadoop. In CIDR, 2015.
[8]
S. Leutenegger, M. Lopez, and J. Edgington. STR: A Simple and Efficient Algorithm for R-Tree Packing. In ICDE, 1997.
[9]
J. Patel and D. DeWitt. Partition Based Spatial-Merge Join. In SIGMOD, 1996.
[10]
A. Thusoo. et al Hive: A Warehousing Solution over a Map-Reduce Framework. PVLDB, 2009.
[11]
S. Wanderman-Milne. et al Runtime Code Generation in Cloudera Impala. IEEE Data Engineering Bulletin, 37(1):31--37, 2014.
[12]
R. T. Whitman, M. B. Park, S. A. Ambrose, and E. G. Hoel. Spatial Indexing and Analytics on Hadoop. In SIGSPATIAL, 2014.

Cited By

View all
  • (2024)Augmentation Techniques for Balancing Spatial Datasets in Machine and Deep Learning ApplicationsProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691230(91-101)Online publication date: 29-Oct-2024
  • (2022)A Survey on Spatio-temporal Data Analytics SystemsACM Computing Surveys10.1145/350790454:10s(1-38)Online publication date: 10-Nov-2022
  • (2021)High-Level Languages for Geospatial Analysis of Big DataInterdisciplinary Approaches to Spatial Optimization Issues10.4018/978-1-7998-1954-7.ch004(62-81)Online publication date: 2021
  • Show More Cited By

Index Terms

  1. Sphinx: distributed execution of interactive SQL queries on big spatial data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGSPATIAL '15: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems
    November 2015
    646 pages
    ISBN:9781450339674
    DOI:10.1145/2820783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. SQL
    2. impala
    3. range query
    4. spatial
    5. spatial join
    6. sphinx

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    SIGSPATIAL'15
    Sponsor:

    Acceptance Rates

    SIGSPATIAL '15 Paper Acceptance Rate 38 of 212 submissions, 18%;
    Overall Acceptance Rate 257 of 1,238 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)164
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Augmentation Techniques for Balancing Spatial Datasets in Machine and Deep Learning ApplicationsProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691230(91-101)Online publication date: 29-Oct-2024
    • (2022)A Survey on Spatio-temporal Data Analytics SystemsACM Computing Surveys10.1145/350790454:10s(1-38)Online publication date: 10-Nov-2022
    • (2021)High-Level Languages for Geospatial Analysis of Big DataInterdisciplinary Approaches to Spatial Optimization Issues10.4018/978-1-7998-1954-7.ch004(62-81)Online publication date: 2021
    • (2019)Geospatial Information Processing TechnologiesManual of Digital Earth10.1007/978-981-32-9915-3_6(191-227)Online publication date: 20-Nov-2019
    • (2018)Confluence: Adaptive Spatiotemporal Data Integration Using Distributed Query Relaxation over Heterogeneous Observational Datasets2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)10.1109/UCC.2018.00027(184-193)Online publication date: Dec-2018
    • (2018)ExTCKNN: Expanding Tree-Based Continuous K Nearest Neighbor Query in Road Networks With Traffic RulesIEEE Access10.1109/ACCESS.2018.28814146(72594-72608)Online publication date: 2018
    • (2017)The era of big spatial dataProceedings of the VLDB Endowment10.14778/3137765.313782810:12(1992-1995)Online publication date: 1-Aug-2017
    • (2017)Distributed processing of big mobility data as spatio-temporal data streamsGeoinformatica10.1007/s10707-016-0264-z21:2(263-291)Online publication date: 1-Apr-2017
    • (2017)Sphinx: Empowering Impala for Efficient Execution of SQL Queries on Big Spatial DataAdvances in Spatial and Temporal Databases10.1007/978-3-319-64367-0_4(65-83)Online publication date: 22-Jul-2017
    • (2016)The Era of Big Spatial DataFoundations and Trends in Databases10.1561/19000000546:3-4(163-273)Online publication date: 28-Dec-2016

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media