[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3299869.3319857acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Top-k Queries over Digital Traces

Published: 25 June 2019 Publication History

Abstract

Recent advances in social and mobile technology have enabled an abundance of digital traces (in the form of mobile check-ins, association of mobile devices to specific WiFi hotspots, etc.) revealing the physical presence history of diverse sets of entities (e.g., humans, devices, and vehicles). One challenging yet important task is to identify k entities that are most closely associated with a given query entity based on their digital traces. We propose a suite of indexing techniques and algorithms to enable fast query processing for this problem at scale. We first define a generic family of functions measuring the association between entities, and then propose algorithms to transform digital traces into a lower-dimensional space for more efficient computation. We subsequently design a hierarchical indexing structure to organize entities in a way that closely associated entities tend to appear together. We then develop algorithms to process top-k queries utilizing the index. We theoretically analyze the pruning effectiveness of the proposed methods based on a mobility model which we propose and validate in real life situations. Finally, we conduct extensive experiments on both synthetic and real datasets at scale, evaluating the performance of our techniques both analytically and experimentally, confirming the effectiveness and superiority of our approach over other applicable approaches across a variety of parameter settings and datasets.

References

[1]
Tenindra Abeywickrama, Muhammad Aamir Cheema, and David Taniar. 2016. K-nearest neighbors on road networks: a journey in experimentation and in-memory implementation. Proceedings of the VLDB Endowment, Vol. 9, 6 (2016), 492--503.
[2]
Charu C Aggarwal and Jiawei Han. 2014. Frequent pattern mining .
[3]
Pritom Ahmed, Mahbub Hasan, Abhijith Kashyap, Vagelis Hristidis, and Vassilis J Tsotras. 2017. Efficient Computation of Top-k Frequent Terms over Spatio-temporal Ranges. In Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data . 1227--1241.
[4]
Ahmed M Aly, Walid G Aref, and Mourad Ouzzani. 2012. Spatial queries with two kNN predicates. Proceedings of the VLDB Endowment, Vol. 5, 11 (2012), 1100--1111.
[5]
Senjuti Basu Roy and Kaushik Chakrabarti. 2011. Location-aware type ahead search on spatial databases: semantics and efficiency. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. 361--372.
[6]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-tree: an efficient and robust access method for points and rectangles. In Acm Sigmod Record, Vol. 19. 322--331.
[7]
Andrei Z Broder. 1997. On the resemblance and containment of documents. In Compression and complexity of sequences 1997. proceedings. 21--29.
[8]
Yuan-Chi Chang, Lawrence Bergman, Vittorio Castelli, Chung-Sheng Li, Ming-Ling Lo, and John R Smith. 2000. The onion technique: indexing for linear optimization queries. In ACM Sigmod Record, Vol. 29. 391--402.
[9]
Lu Chen, Yunjun Gao, Gang Chen, and Haida Zhang. 2016. Metric all-k-nearest-neighbor search. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 1 (2016), 98--112.
[10]
Lei Chen, M Tamer Özsu, and Vincent Oria. 2005. Robust and fast similarity search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data . 491--502.
[11]
Zaiben Chen, Heng Tao Shen, Xiaofang Zhou, Yu Zheng, and Xing Xie. 2010. Searching trajectories by locations: an efficiency study. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 255--266.
[12]
Farhana M Choudhury, J Shane Culpepper, Timos Sellis, and Xin Cao. 2016. Maximizing bichromatic reverse spatial and textual k nearest neighbor queries. Proceedings of the VLDB Endowment, Vol. 9, 6 (2016), 456--467.
[13]
Richard Cole. 1988. Parallel merge sort. SIAM J. Comput., Vol. 17, 4 (1988), 770--785.
[14]
Tobias Emrich, Maximilian Franzke, Hans-Peter Kriegel, Johannes Niedermayer, Matthias Renz, and Andreas Züfle. 2014. An extendable framework for managing uncertain spatio-temporal data. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data . 1087--1090.
[15]
Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D Sivakumar, and Erik Vee. 2004. Comparing and aggregating rankings with ties. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems . 47--58.
[16]
Ronald Fagin, Ravi Kumar, and Dakshinamurthi Sivakumar. 2003 a. Comparing top k lists. SIAM Journal on discrete mathematics, Vol. 17, 1 (2003), 134--160.
[17]
Ronald Fagin, Amnon Lotem, and Moni Naor. 2003 b. Optimal aggregation algorithms for middleware. Journal of computer and system sciences, Vol. 66, 4 (2003), 614--656.
[18]
Yixiang Fang, Reynold Cheng, Wenbin Tang, Silviu Maniu, and Xuan Yang. 2016. Scalable algorithms for nearest-neighbor joins on big trajectory data. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 3 (2016), 785--800.
[19]
Zhenni Feng and Yanmin Zhu. 2016. A survey on trajectory data mining: techniques and applications. IEEE Access, Vol. 4 (2016), 2056--2067.
[20]
Elias Frentzos, Kostas Gratsias, and Yannis Theodoridis. 2007. Index-based most similar trajectory search. In IEEE 23rd International Conference on Data Engineering (ICDE), 2007. 816--825.
[21]
Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. 2012. Locality-sensitive hashing scheme based on dynamic collision counting. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data . 541--552.
[22]
Jinyang Gao, Hosagrahar Visvesvaraya Jagadish, Wei Lu, and Beng Chin Ooi. 2014. DSH: data sensitive hashing for high-dimensional k-nnsearch. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data . 1127--1138.
[23]
Ralf Hartmut Güting, Thomas Behr, and Jianqiu Xu. 2010. Efficient k-nearest neighbor search on moving object trajectories. The VLDB Journal The International Journal on Very Large Data Bases, Vol. 19, 5 (2010), 687--714.
[24]
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In ACM sigmod record, Vol. 29. 1--12.
[25]
Cheng-Kang Hsieh, Longqi Yang, Honghao Wei, Mor Naaman, and Deborah Estrin. 2016. Immersive recommendation: News and event recommendations using personal digital traces. In Proceedings of the 25th International Conference on World Wide Web. 51--62.
[26]
Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika, Vol. 30, 1/2 (1938), 81--93.
[27]
Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang. 2007. Trajectory clustering: a partition-and-group framework. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data. 593--604.
[28]
Zhenhui Li, Ming Ji, Jae-Gil Lee, Lu-An Tang, Yintao Yu, Jiawei Han, and Roland Kays. 2010. MoveMine: mining moving object databases. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 1203--1206.
[29]
Yunhao Liu, Yiyang Zhao, Lei Chen, Jian Pei, and Jinsong Han. 2012. Mining frequent trajectory patterns for activity monitoring using radio frequency tag arrays. IEEE Transactions on Parallel and Distributed Systems, Vol. 23, 11 (2012), 2138--2149.
[30]
Jiaheng Lu, Ying Lu, and Gao Cong. 2011. Reverse spatial and textual k nearest neighbor search. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. 349--360.
[31]
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2007. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases . 950--961.
[32]
Chunyang Ma, Hua Lu, Lidan Shou, and Gang Chen. 2013. KSQ: Top-k similarity query on uncertain trajectories. IEEE Transactions on Knowledge and Data Engineering, Vol. 25, 9 (2013), 2049--2062.
[33]
Johannes Niedermayer, Andreas Züfle, Tobias Emrich, Matthias Renz, Nikos Mamoulis, Lei Chen, and Hans-Peter Kriegel. 2013. Probabilistic nearest neighbor queries on uncertain moving object trajectories. Proceedings of the VLDB Endowment, Vol. 7, 3 (2013), 205--216.
[34]
Julien Pilourdault, Vincent Leroy, and Sihem Amer-Yahia. 2016. Distributed evaluation of top-k temporal joins. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data. 1027--1039.
[35]
Michalis Potamias, Francesco Bonchi, Aristides Gionis, and George Kollios. 2010. K-nearest neighbors in uncertain graphs. Proceedings of the VLDB Endowment, Vol. 3, 1--2 (2010), 997--1008.
[36]
Tobias Preis, Helen Susannah Moat, Steven R Bishop, Philip Treleaven, and H Eugene Stanley. 2013. Quantifying the digital traces of Hurricane Sandy on Flickr. Scientific reports, Vol. 3 (2013), 3141.
[37]
Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing, Vol. 26, 1 (1978), 43--49.
[38]
Venu Satuluri and Srinivasan Parthasarathy. 2012. Bayesian locality sensitive hashing for fast similarity search. Proceedings of the VLDB Endowment, Vol. 5, 5 (2012), 430--441.
[39]
Zhou Shao, Muhammad Aamir Cheema, David Taniar, and Hua Lu. 2016. Vip-tree: an effective index for indoor spatial queries. Proceedings of the VLDB Endowment, Vol. 10, 4 (2016), 325--336.
[40]
Mehdi Sharifzadeh and Cyrus Shahabi. 2010. Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. Proceedings of the VLDB Endowment, Vol. 3, 1--2 (2010), 1231--1242.
[41]
Jieming Shi, Dingming Wu, and Nikos Mamoulis. 2016. Top-k relevant semantic place retrieval on spatial RDF data. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data . 1977--1990.
[42]
Chaoming Song, Tal Koren, Pu Wang, and Albert-László Barabási. 2010. Modelling the scaling properties of human mobility. Nature Physics, Vol. 6, 10 (2010), 818.
[43]
Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. 2013. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data . 785--796.
[44]
Bo Tang, Shi Han, Man Lung Yiu, Rui Ding, and Dongmei Zhang. 2017. Extracting top-k insights from multi-dimensional data. In Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data . 1509--1524.
[45]
Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, and Jiawei Han. 2011. Retrieving k-nearest neighboring trajectories by a set of point locations. In International Symposium on Spatial and Temporal Databases. 223--241.
[46]
Michail Vlachos, George Kollios, and Dimitrios Gunopulos. 2002. Discovering similar multidimensional trajectories. In IEEE 18th International Conference on Data Engineering (ICDE), 2002. 673--684.
[47]
Hongjian Wang, Daniel Kifer, Corina Graif, and Zhenhui Li. 2016. Crime rate inference with big data. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining . 635--644.
[48]
Haozhou Wang, Kai Zheng, Xiaofang Zhou, and Shazia Sadiq. 2015. Sharkdb: An in-memory storage system for massive trajectory data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data . 1099--1104.
[49]
Sheng Wang, Zhifeng Bao, J Shane Culpepper, Timos Sellis, Mark Sanderson, and Xiaolin Qin. 2017. Answering top-k exemplar trajectory queries. In IEEE 33rd International Conference on Data Engineering (ICDE), 2017. 597--608.
[50]
Dong Xin, Chen Chen, and Jiawei Han. 2006. Towards robust indexing for ranked queries. In Proceedings of the 32nd international conference on Very large data bases. 235--246.
[51]
Bolong Zheng, Nicholas Jing Yuan, Kai Zheng, Xing Xie, Shazia Sadiq, and Xiaofang Zhou. 2015. Approximate keyword search in semantic trajectory database. In IEEE 31st International Conference on Data Engineering (ICDE), 2015. 975--986.
[52]
Kai Zheng, Shuo Shang, Nicholas Jing Yuan, and Yi Yang. 2013. Towards efficient search for activity trajectories. In IEEE 29th International Conference on Data Engineering (ICDE), 2013. 230--241.
[53]
Yuxin Zheng, Qi Guo, Anthony KH Tung, and Sai Wu. 2016. Lazylsh: Approximate nearest neighbor search for multiple distance functions with a single index. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data. 2023--2037.
[54]
Erkang Zhu, Ken Q Pu, Fatemeh Nargesian, and Renée J Miller. 2017. Interactive navigation of open data linkages. Proceedings of the VLDB Endowment, Vol. 10, 12 (2017), 1837--1840.

Cited By

View all
  • (2023)Efficient crowdsourced best objects finding via superiority probability based ordering for decision support systemsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119893223:COnline publication date: 1-Aug-2023
  • (2021)LES3Proceedings of the VLDB Endowment10.14778/3476249.347626314:11(2073-2086)Online publication date: 27-Oct-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. digital trace
  2. hashing
  3. spatial-temporal
  4. top-k

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '19
Sponsor:
SIGMOD/PODS '19: International Conference on Management of Data
June 30 - July 5, 2019
Amsterdam, Netherlands

Acceptance Rates

SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient crowdsourced best objects finding via superiority probability based ordering for decision support systemsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119893223:COnline publication date: 1-Aug-2023
  • (2021)LES3Proceedings of the VLDB Endowment10.14778/3476249.347626314:11(2073-2086)Online publication date: 27-Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media