[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Stochastic data acquisition for answering queries as time goes by

Published: 01 November 2016 Publication History

Abstract

Data and actions are tightly coupled. On one hand, data analysis results trigger decision making and actions. On the other hand, the action of acquiring data is the very first step in the whole data processing pipeline. Data acquisition almost always has some costs, which could be either monetary costs or computing resource costs such as sensor battery power, network transfers, or I/O costs. Using out-dated data to answer queries can avoid the data acquisition costs, but there is a penalty of potentially inaccurate results. Given a sequence of incoming queries over time, we study the problem of sequential decision making on when to acquire data and when to use existing versions to answer each query. We propose two approaches to solve this problem using reinforcement learning and tailored locality-sensitive hashing. A systematic empirical study using two real-world datasets shows that our approaches are effective and efficient.

References

[1]
Aggdata. http://www.aggdata.com/.
[2]
APPL POMDP solver download page http://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/.
[3]
Azure data market. https://datamarket.azure.com/.
[4]
http://graphical.weather.gov/xml/.
[5]
http://msrg.org/datasets/traffic.
[6]
https://alerts.weather.gov/.
[7]
http://traces.cs.umass.edu/index.php/power/power.
[8]
D. Agrawal and others (21 authors). Challenges and Opportunities with Big Data. 2012. http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf.
[9]
C. M. Bishop. Pattern recognition and machine learning. Springer, 2006.
[10]
A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher. Min-wise independent permutations. J. of Computer & Sys. Sciences, 2000.
[11]
P. J. Denning. The locality principle. Communications of the ACM, 2005.
[12]
A. Deshpande, C. Guestrin, S. R. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004.
[13]
W. Fan, F. Geerts, and J. Wijsen. Determining the currency of data. ACM Trans. Database Syst., 2012.
[14]
T. Feder, R. Motwani, R. Panigrahy, C. Olston, and J. Widom. Computing the median with uncertainty. In ACM STOC, 2000.
[15]
L. Kaelbling et al. Reinforcement learning: A survey. J. of Artificial Intelligence Research, 1996.
[16]
L. Kaelbling et al. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 1998.
[17]
P. Koutris, P. Upadhyaya, M. Balazinska, B. Howe, and D. Suciu. Toward practical query pricing with QueryMarket. In SIGMOD, 2013.
[18]
H. Kurniawati, D. Hsu, and W. S. Lee. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and Systems, 2008.
[19]
J. Leskovec, A. Rajaraman, and J. Ullman. Mining of Massive Datasets. Cambridge University Press, 2014.
[20]
M. C. Nussbaum et al. Aristotle's De Motu Animalium: Text with translation, commentary, and interpretive essays. Princeton University Press, 1985.
[21]
C. Olston, B. T. Loo, and J. Widom. Adaptive precision setting for cached approximate values. ACM SIGMOD Record, 2001.
[22]
S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 2003.
[23]
S. Temizer et al. Collision avoidance for unmanned aircraft using Markov decision processes. In AIAA Guidance, Navigation, and Control Conference, 2010.
[24]
R. Weber. On the Gittins index for multiarmed bandits. Annals of Applied Probability, 1992.
[25]
P. Woelfel. Efficient strongly universal and optimally universal hashing. In Mathematical Foundation of Computer Science. 1999.

Cited By

View all
  • (2022)Recursive SQL and GPU-support for in-database machine learningDistributed and Parallel Databases10.1007/s10619-022-07417-740:2-3(205-259)Online publication date: 1-Sep-2022
  • (2021)In-Database Machine Learning with SQL on GPUsProceedings of the 33rd International Conference on Scientific and Statistical Database Management10.1145/3468791.3468840(25-36)Online publication date: 6-Jul-2021
  • (2018)Cost-efficient data acquisition on online data marketplaces for correlation analysisProceedings of the VLDB Endowment10.14778/3297753.329775712:4(362-375)Online publication date: 1-Dec-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 10, Issue 3
November 2016
216 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 November 2016
Published in PVLDB Volume 10, Issue 3

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Recursive SQL and GPU-support for in-database machine learningDistributed and Parallel Databases10.1007/s10619-022-07417-740:2-3(205-259)Online publication date: 1-Sep-2022
  • (2021)In-Database Machine Learning with SQL on GPUsProceedings of the 33rd International Conference on Scientific and Statistical Database Management10.1145/3468791.3468840(25-36)Online publication date: 6-Jul-2021
  • (2018)Cost-efficient data acquisition on online data marketplaces for correlation analysisProceedings of the VLDB Endowment10.14778/3297753.329775712:4(362-375)Online publication date: 1-Dec-2018

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media