More Web Proxy on the site http://driver.im/

research-article

Spatial online sampling and aggregation

Editors: Surajit Chaudhuri, Jayant Haritsa Authors:

Robert Christensen,

Ke YiAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 9, Issue 3

Pages 84 - 95

https://doi.org/10.14778/2850583.2850584

Published: 01 November 2015 Publication History

Abstract

The massive adoption of smart phones and other mobile devices has generated humongous amount of spatial and spatio-temporal data. The importance of spatial analytics and aggregation is ever-increasing. An important challenge is to support interactive exploration over such data. However, spatial analytics and aggregation using all data points that satisfy a query condition is expensive, especially over large data sets, and could not meet the needs of interactive exploration. To that end, we present novel indexing structures that support spatial online sampling and aggregation on large spatial and spatio-temporal data sets. In spatial online sampling, random samples from the set of spatial (or spatio-temporal) points that satisfy a query condition are generated incrementally in an online fashion. With more and more samples, various spatial analytics and aggregations can be performed in an online, interactive fashion, with estimators that have better accuracy over time. Our design works well for both memory-based and disk-resident data sets, and scales well towards different query and sample sizes. More importantly, our structures are dynamic, hence, they are able to deal with insertions and deletions efficiently. Extensive experiments on large real data sets demonstrate the improvements achieved by our indexing structures compared to other baseline methods.

References

[1]

S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013.

Digital Library

[2]

S. Agarwal, A. Panda, B. Mozafari, A. P. Iyer, S. Madden, and I. Stoica. Blink and it's done: Interactive queries on very large data. In PVLDB, volume 5, 2012.

Digital Library

[3]

L. Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1--24, 2003.

Digital Library

[4]

L. Arge, M. de Berg, H. Haverkort, and K. Yi. The priority r-tree: A practically efficient and worst-case optimal r-tree. ACM Transactions on Algorithms, 4(1), 2008.

Digital Library

[5]

L. Azevedo, G. Zimbro1, and J. de Souza. Approximate query processing in spatial databases using raster signatures. In Advances in Geoinformatics, pages 69--86. Springer Berlin Heidelberg, 2007.

[6]

A. Belussi, B. Catania, and S. Migliorini. Approximate queries for spatial data. In Advanced Query Processing, volume 36 of Intelligent Systems Reference Library, pages 83--127. Springer Berlin Heidelberg, 2013.

[7]

S. Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins. In SIGMOD, 1999.

Digital Library

[8]

A. Das Sarma, H. Lee, H. Gonzalez, J. Madhavan, and A. Halevy. Efficient spatial sampling of large geographical tables. In SIGMOD, New York, New York, USA, 2012.

Digital Library

[9]

R. Gemulla and W. Lehner. Deferred maintenance of disk-based random samples. In EDBT, 2006.

Digital Library

[10]

A. A. Guttman. R-trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD, 1984.

Digital Library

[11]

P. Haas and J. Hellerstein. Ripple joins for online aggregation. In SIGMOD, pages 287--298, 1999.

Digital Library

[12]

P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. In SSDBM, 1997.

Digital Library

[13]

J. Hellerstein, P. Haas, and H. Wang. Online aggregation. In SIGMOD, 1997.

Digital Library

[14]

X. Hu, M. Qiao, and Y. Tao. Independent range sampling. In PODS, 2014.

Digital Library

[15]

R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. The monte carlo database system: Stochastic analysis close to the data. ACM TODS, 36(3):18, 2011.

Digital Library

[16]

P. Jayachandran, K. Tunga, N. Kamat, and A. Nandi. Combining user interaction, speculative query execution and sampling in the DICE system. PVLDB, 7(13):1697--1700, 2014.

Digital Library

[17]

C. Jermaine, S. Arumugam, A. Pol, and A. Dobra. Scalable approximate query processing with the dbo engine. ACM Transactions on Database Systems, 33(4), Article 23, 2008.

Digital Library

[18]

C. Jermaine, A. Datta, and E. Omiecinski. A novel index supporting high volume data waresshouse insertion. In VLDB, pages 235--246, 1999.

Digital Library

[19]

C. Jermaine, A. Pol, and S. Arumugam. Online maintenance of very large random samples. In SIGMOD, 2004.

Digital Library

[20]

S. Joshi and C. Jermaine. Materialized sample views for database approximation. IEEE Transactions on Knowledge and Data Engineering, 20(3):337--351, 2008.

Digital Library

[21]

I. Kamel and C. Faloutsos. Hilbert R-tree: An Improved R-tree using Fractals. In VLDB, 1994.

Digital Library

[22]

A. Klein, R. Gemulla, P. Rösch, and W. Lehner. Derby/s: a DBMS for sample-based query answering. In SIGMOD, 2006.

Digital Library

[23]

R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.

Digital Library

[24]

S. Nath and P. B. Gibbons. Online maintenance of very large random samples on flash storage. PVLDB, 1(1):970--983, 2008.

Digital Library

[25]

S. Nirkhiwale, A. Dobra, and C. M. Jermaine. A sampling algebra for aggregate estimation. PVLDB, 6(14):1798--1809, 2013.

Digital Library

[26]

F. Olken. Random Sampling from Databases. PhD thesis, University of California at Berkeley, 1993.

[27]

F. Olken and D. Rotem. Random sampling from B+ trees. In VLDB, 1989.

Digital Library

[28]

F. Olken and D. Rotem. Sampling from spatial databases. In ICDE, 1993.

Digital Library

[29]

OpenStreetMap. OpenStreetMap.org, 2014.

[30]

N. Pansare, V. R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large mapreduce jobs. In PVLDB, volume 4, 2011.

[31]

P. Weber. User-Generated Street Maps. Pervasive Computing, IEEE, 7(4):12--18, 2008.

Digital Library

[32]

F. Xu, C. M. Jermaine, and A. Dobra. Confidence bounds for sampling-based group by estimates. ACM TODS, 33(3), 2008.

Digital Library

[33]

Y. Yan, L. J. Chen, and Z. Zhang. Error-bounded sampling for analytics on big sparse data. PVLDB, 7(13):1508--1519, 2014.

Digital Library

[34]

K. Yi, L. Wang, and Z. Wei. Indexing for summary queries: Theory and practice. ACM Transactions on Database Systems, 39(1), 2014.

Digital Library

[35]

K. Zeng, S. Gao, J. Gu, B. Mozafari, and C. Zaniolo. ABS: a system for scalable approximate queries with accuracy guarantees. In SIGMOD, 2014.

Digital Library

[36]

K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In SIGMOD, 2014.

Digital Library

[37]

Y. Zheng, Q. Li, Y. Chen, X. Xie, and W.-Y. Ma. Understanding mobility based on GPS data. In UbiComp, 2008.

Digital Library

Cited By

Shen QZhang CYan XYang CZeng DZeng WTang B(2024)$\mathsf {CheetahTraj}$CheetahTraj: Efficient Visualization for Large Trajectory Dataset With Quality GuaranteeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338748036:11(5737-5752)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3387480
Amagata D(2024)Independent Range Sampling on Interval Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00041(449-461)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00041
Liu ZChen LMa MYang AZhong ZJing N(2024)An efficient visual exploration approach of geospatial vector big data on the web mapInformation Systems10.1016/j.is.2023.102333121:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.is.2023.102333
Show More Cited By

Spatial online sampling and aggregation

Recommendations

An iterative sampling method for online aggregation
Abstract
Online aggregation (OLA) makes it possible to save cost by taking acceptable approximate early answers. Compared to the precise results, computing the approximate ones are more cost effective, especially for large-scale datasets. The user can ...
Different Spatial Sampling Models in Geographical Analysis
ETTANDGRS '08: Proceedings of the 2008 International Workshop on Education Technology and Training & 2008 International Workshop on Geoscience and Remote Sensing - Volume 02

Taking the estimation of area-percent about different land use in certain city as an example, we evaluated the effect of simple random sampling, stratified sampling based on administrator region, stratified sampling based on knowledge and stratified-...
Design-based spatial sampling

Various sampling techniques are widely used in environmental, social and resource surveys. Spatial sampling techniques are more efficient than conventional sampling when surveying spatially distributed targets such as CO2 emissions, soil pollution, a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 9, Issue 3

November 2015

144 pages

ISSN:2150-8097

Editors:
Surajit Chaudhuri
Microsoft Research
,
Jayant Haritsa
I.I.Sc. Bangalore

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 November 2015

Published in PVLDB Volume 9, Issue 3

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
243
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)8

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shen QZhang CYan XYang CZeng DZeng WTang B(2024)$\mathsf {CheetahTraj}$CheetahTraj: Efficient Visualization for Large Trajectory Dataset With Quality GuaranteeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338748036:11(5737-5752)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3387480
Amagata D(2024)Independent Range Sampling on Interval Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00041(449-461)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00041
Liu ZChen LMa MYang AZhong ZJing N(2024)An efficient visual exploration approach of geospatial vector big data on the web mapInformation Systems10.1016/j.is.2023.102333121:COnline publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.is.2023.102333
Xiong WWu YOuyang XJia QChen HChen LXiong WWu YOuyang XJia QChen HChen L(2024)Online Mapping and VisualizationHigh Performance Geographic Information System10.1007/978-981-97-7170-7_5(179-214)Online publication date: 16-Oct-2024
https://doi.org/10.1007/978-981-97-7170-7_5
Zhang FJiang MWang S(2023)Efficient Dynamic Weighted Set Sampling and Its ExtensionProceedings of the VLDB Endowment10.14778/3617838.361784017:1(15-27)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.14778/3617838.3617840
Li YYu XKoudas NLin SSun CChen C(2023)dbET: Execution Time Distribution-based Plan SelectionProceedings of the ACM on Management of Data10.1145/35887111:1(1-26)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588711
Cai ZZheng XWang JHe Z(2023)Private Data Trading Towards Range Counting Queries in Internet of ThingsIEEE Transactions on Mobile Computing10.1109/TMC.2022.316432522:8(4881-4897)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TMC.2022.3164325
Chang ZLi FShen Y(2023)Generalized Measure-Biased Sampling and Priority SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334067336:11(6251-6265)Online publication date: 8-Dec-2023
https://dl.acm.org/doi/10.1109/TKDE.2023.3340673
Wang YLi KLi GTang N(2023)Road-Aware Indexing for Trajectory Range QueriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322082235:8(8476-8489)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3220822
Liang XSintos SKrishnan S(2023)JanusAQP: Efficient Partition Tree Maintenance for Dynamic Approximate Query Processing2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00050(572-584)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00050
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents