[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Spatial online sampling and aggregation

Published: 01 November 2015 Publication History

Abstract

The massive adoption of smart phones and other mobile devices has generated humongous amount of spatial and spatio-temporal data. The importance of spatial analytics and aggregation is ever-increasing. An important challenge is to support interactive exploration over such data. However, spatial analytics and aggregation using all data points that satisfy a query condition is expensive, especially over large data sets, and could not meet the needs of interactive exploration. To that end, we present novel indexing structures that support spatial online sampling and aggregation on large spatial and spatio-temporal data sets. In spatial online sampling, random samples from the set of spatial (or spatio-temporal) points that satisfy a query condition are generated incrementally in an online fashion. With more and more samples, various spatial analytics and aggregations can be performed in an online, interactive fashion, with estimators that have better accuracy over time. Our design works well for both memory-based and disk-resident data sets, and scales well towards different query and sample sizes. More importantly, our structures are dynamic, hence, they are able to deal with insertions and deletions efficiently. Extensive experiments on large real data sets demonstrate the improvements achieved by our indexing structures compared to other baseline methods.

References

[1]
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013.
[2]
S. Agarwal, A. Panda, B. Mozafari, A. P. Iyer, S. Madden, and I. Stoica. Blink and it's done: Interactive queries on very large data. In PVLDB, volume 5, 2012.
[3]
L. Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1--24, 2003.
[4]
L. Arge, M. de Berg, H. Haverkort, and K. Yi. The priority r-tree: A practically efficient and worst-case optimal r-tree. ACM Transactions on Algorithms, 4(1), 2008.
[5]
L. Azevedo, G. Zimbro1, and J. de Souza. Approximate query processing in spatial databases using raster signatures. In Advances in Geoinformatics, pages 69--86. Springer Berlin Heidelberg, 2007.
[6]
A. Belussi, B. Catania, and S. Migliorini. Approximate queries for spatial data. In Advanced Query Processing, volume 36 of Intelligent Systems Reference Library, pages 83--127. Springer Berlin Heidelberg, 2013.
[7]
S. Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins. In SIGMOD, 1999.
[8]
A. Das Sarma, H. Lee, H. Gonzalez, J. Madhavan, and A. Halevy. Efficient spatial sampling of large geographical tables. In SIGMOD, New York, New York, USA, 2012.
[9]
R. Gemulla and W. Lehner. Deferred maintenance of disk-based random samples. In EDBT, 2006.
[10]
A. A. Guttman. R-trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD, 1984.
[11]
P. Haas and J. Hellerstein. Ripple joins for online aggregation. In SIGMOD, pages 287--298, 1999.
[12]
P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. In SSDBM, 1997.
[13]
J. Hellerstein, P. Haas, and H. Wang. Online aggregation. In SIGMOD, 1997.
[14]
X. Hu, M. Qiao, and Y. Tao. Independent range sampling. In PODS, 2014.
[15]
R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. The monte carlo database system: Stochastic analysis close to the data. ACM TODS, 36(3):18, 2011.
[16]
P. Jayachandran, K. Tunga, N. Kamat, and A. Nandi. Combining user interaction, speculative query execution and sampling in the DICE system. PVLDB, 7(13):1697--1700, 2014.
[17]
C. Jermaine, S. Arumugam, A. Pol, and A. Dobra. Scalable approximate query processing with the dbo engine. ACM Transactions on Database Systems, 33(4), Article 23, 2008.
[18]
C. Jermaine, A. Datta, and E. Omiecinski. A novel index supporting high volume data waresshouse insertion. In VLDB, pages 235--246, 1999.
[19]
C. Jermaine, A. Pol, and S. Arumugam. Online maintenance of very large random samples. In SIGMOD, 2004.
[20]
S. Joshi and C. Jermaine. Materialized sample views for database approximation. IEEE Transactions on Knowledge and Data Engineering, 20(3):337--351, 2008.
[21]
I. Kamel and C. Faloutsos. Hilbert R-tree: An Improved R-tree using Fractals. In VLDB, 1994.
[22]
A. Klein, R. Gemulla, P. Rösch, and W. Lehner. Derby/s: a DBMS for sample-based query answering. In SIGMOD, 2006.
[23]
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
[24]
S. Nath and P. B. Gibbons. Online maintenance of very large random samples on flash storage. PVLDB, 1(1):970--983, 2008.
[25]
S. Nirkhiwale, A. Dobra, and C. M. Jermaine. A sampling algebra for aggregate estimation. PVLDB, 6(14):1798--1809, 2013.
[26]
F. Olken. Random Sampling from Databases. PhD thesis, University of California at Berkeley, 1993.
[27]
F. Olken and D. Rotem. Random sampling from B+ trees. In VLDB, 1989.
[28]
F. Olken and D. Rotem. Sampling from spatial databases. In ICDE, 1993.
[29]
OpenStreetMap. OpenStreetMap.org, 2014.
[30]
N. Pansare, V. R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large mapreduce jobs. In PVLDB, volume 4, 2011.
[31]
P. Weber. User-Generated Street Maps. Pervasive Computing, IEEE, 7(4):12--18, 2008.
[32]
F. Xu, C. M. Jermaine, and A. Dobra. Confidence bounds for sampling-based group by estimates. ACM TODS, 33(3), 2008.
[33]
Y. Yan, L. J. Chen, and Z. Zhang. Error-bounded sampling for analytics on big sparse data. PVLDB, 7(13):1508--1519, 2014.
[34]
K. Yi, L. Wang, and Z. Wei. Indexing for summary queries: Theory and practice. ACM Transactions on Database Systems, 39(1), 2014.
[35]
K. Zeng, S. Gao, J. Gu, B. Mozafari, and C. Zaniolo. ABS: a system for scalable approximate queries with accuracy guarantees. In SIGMOD, 2014.
[36]
K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In SIGMOD, 2014.
[37]
Y. Zheng, Q. Li, Y. Chen, X. Xie, and W.-Y. Ma. Understanding mobility based on GPS data. In UbiComp, 2008.

Cited By

View all
  • (2024)$\mathsf {CheetahTraj}$CheetahTraj: Efficient Visualization for Large Trajectory Dataset With Quality GuaranteeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338748036:11(5737-5752)Online publication date: 26-Apr-2024
  • (2024)Independent Range Sampling on Interval Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00041(449-461)Online publication date: 13-May-2024
  • (2024)An efficient visual exploration approach of geospatial vector big data on the web mapInformation Systems10.1016/j.is.2023.102333121:COnline publication date: 16-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 9, Issue 3
November 2015
144 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 November 2015
Published in PVLDB Volume 9, Issue 3

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)8
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)$\mathsf {CheetahTraj}$CheetahTraj: Efficient Visualization for Large Trajectory Dataset With Quality GuaranteeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338748036:11(5737-5752)Online publication date: 26-Apr-2024
  • (2024)Independent Range Sampling on Interval Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00041(449-461)Online publication date: 13-May-2024
  • (2024)An efficient visual exploration approach of geospatial vector big data on the web mapInformation Systems10.1016/j.is.2023.102333121:COnline publication date: 16-May-2024
  • (2024)Online Mapping and VisualizationHigh Performance Geographic Information System10.1007/978-981-97-7170-7_5(179-214)Online publication date: 16-Oct-2024
  • (2023)Efficient Dynamic Weighted Set Sampling and Its ExtensionProceedings of the VLDB Endowment10.14778/3617838.361784017:1(15-27)Online publication date: 1-Sep-2023
  • (2023)dbET: Execution Time Distribution-based Plan SelectionProceedings of the ACM on Management of Data10.1145/35887111:1(1-26)Online publication date: 30-May-2023
  • (2023)Private Data Trading Towards Range Counting Queries in Internet of ThingsIEEE Transactions on Mobile Computing10.1109/TMC.2022.316432522:8(4881-4897)Online publication date: 1-Aug-2023
  • (2023)Generalized Measure-Biased Sampling and Priority SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334067336:11(6251-6265)Online publication date: 8-Dec-2023
  • (2023)Road-Aware Indexing for Trajectory Range QueriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322082235:8(8476-8489)Online publication date: 1-Aug-2023
  • (2023)JanusAQP: Efficient Partition Tree Maintenance for Dynamic Approximate Query Processing2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00050(572-584)Online publication date: Apr-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media