[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Public Access

Discovering Interesting Subpaths with Statistical Significance from Spatiotemporal Datasets

Published: 09 January 2020 Publication History

Abstract

Given a path in a spatial or temporal framework, we aim to find all contiguous subpaths that are both interesting (e.g., abrupt changes) and statistically significant (i.e., persistent trends rather than local fluctuations). Discovering interesting subpaths can provide meaningful information for a variety of domains including Earth science, environmental science, urban planning, and the like. Existing methods are limited to detecting individual points of interest along an input path but cannot find interesting subpaths. Our preliminary work provided a Subpath Enumeration and Pruning (SEP) algorithm to detect interesting subpaths of arbitrary length. However, SEP is not effective in avoiding detections that are random variations rather than meaningful trends, which hampers clear and proper interpretations of the results. In this article, we extend our previous work by proposing a significance testing framework to eliminate these random variations. To compute the statistical significance, we first show a baseline Monte-Carlo method based on our previous work and then propose a Dynamic Search-and-Prune (D-SAP) algorithm to improve its computational efficiency. Our experiments show that the significance testing can greatly suppress the noisy detections in the output and D-SAP can greatly reduce the execution time.

References

[1]
2017. About African Monsoon. Retrieved from http://www.clivar.org/african-monsoon.
[2]
Naoki Abe, Yiqun Xie, Shashi Shekhar, Chid Apte, Vipin Kumar, Mitch Tuinstra, and Ranga Raju Vatsavai. 2017. Data science for food, energy and water: A workshop report. ACM SIGKDD Explorations Newsletter 18, 2 (2017), 1--4.
[3]
Saurabh Agrawal, Gowtham Atluri, Anuj Karpatne, William Haltom, Stefan Liess, Snigdhansu Chatterjee, and Vipin Kumar. 2017. Tripoles: A new class of relationships in time series data. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 697--706.
[4]
Samaneh Aminikhanghahi and Diane J. Cook. 2017. A survey of methods for time series change point detection. Knowledge and Information Systems 51, 2 (2017), 339--367.
[5]
S. Banerjee and A. E. Gelfand. 2006. Bayesian wombling: Curvilinear gradient assessment under spatial process models. Journal of the American Statistical Association 101 (2006), 1487--1501.
[6]
Michèle Basseville and Igor V. Nikiforov. 1993. Detection of Abrupt Changes: Theory and Application. Vol. 104. Prentice Hall, Englewood Cliffs, NJ.
[7]
Donald J. Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (AAAIWS’94). AAAI Press, 359--370. Retrieved from http://dl.acm.org/citation.cfm?id=3000850.3000887
[8]
John Canny. 1987. A computational approach to edge detection. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms 184, 87--116 (1987), 86.
[9]
Isla S. Castañeda, Josef P. Werne, and Thomas C. Johnson. 2007. Wet and arid phases in the southeast African tropics since the Last Glacial Maximum. Geology 35, 9 (2007), 823--826.
[10]
Lei Chen, M. Tamer Özsu, and Vincent Oria. 2005. Robust and fast similarity search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. ACM, 491--502.
[11]
C. J. Tucker, J. E. Pinzon, and M. E. Brown. 2006. Global Inventory Modeling and Mapping Studies. Global Land Cover Facility, University of Maryland, College Park, Maryland, 1981--2006.
[12]
Emre Eftelioglu, Shashi Shekhar, Dev Oliver, Xun Zhou, Michael R. Evans, Yiqun Xie, James M. Kang, Renee Laubscher, and Christopher Farah. 2014. Ring-shaped hotspot detection: A summary of results. In 2014 IEEE International Conference on Data Mining. IEEE, 815--820.
[13]
M. C. Fitzpatrick, E. L. Preisser, A. Porter, J. Elkinton, L. A. Waller, B.P. Carlin, and A. M. Ellison. 2010. Ecological boundary detection using Bayesian areal wombling. Ecology 91, 12 (2010), 3448--3455.
[14]
Marie-Josee Fortin. 1994. Edge detection algorithms for two-dimensional ecological data. Ecology 75, 4 (1994), 956--965.
[15]
Joachim Gudmundsson and Marc van Kreveld. 2006. Computing longest duration flocks in trajectory data. In Proceedings of the 14th Annual ACM International Symposium on Advances in Geographic Information Systems. ACM, 35--42.
[16]
Zhe Jiang, Shashi Shekhar, Xun Zhou, Joseph Knight, and Jennifer Corcoran. 2014. Focal-test-based spatial decision tree learning. IEEE Transactions on Knowledge and Data Engineering 27, 6 (2014), 1547--1559.
[17]
Jaya Kawale, Snigdhansu Chatterjee, Dominick Ormsby, Karsten Steinhaeuser, Stefan Liess, and Vipin Kumar. 2012. Testing the significance of spatio-temporal teleconnection patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 642--650.
[18]
Eamonn Keogh. 2002. Exact indexing of dynamic time warping. In Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, 406--417.
[19]
Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2003. Segmenting time series: A survey and novel approach. Data Mining in Time Series Databases 57 (2003), 1--21.
[20]
Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2001. An online algorithm for segmenting time series. In Proceedings IEEE International Conference on Data Mining (ICDM’01). IEEE, 289--296.
[21]
Amin Vahedian Khezerlou, Xun Zhou, Lufan Li, Zubair Shafiq, Alex X. Liu, and Fan Zhang. 2017. A traffic flow approach to early detection of gathering events: Comprehensive results. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 6 (2017), 74.
[22]
J. Kucera, P. Barbosa, and P. Strobl. 2007. Cumulative sum charts-a novel technique for processing daily time series of modis data for burnt area mapping in portugal. In International Workshop on the Analysis of Multi-temporal Remote Sensing Images, 2007 (MultiTemp’07). IEEE, 1--6.
[23]
Martin Kulldorff. 1997. A spatial scan statistic. Communications in Statistics-Theory and Methods 26, 6 (1997), 1481--1496.
[24]
S. Liang, S. Banerjee, and B.P. Carlin. 2009. Bayesian wombling for spatial point processes. Biometrics 65, 4 (2009), 1243--1253.
[25]
H. Lu and B.P. Carlin. 2005. Bayesian areal wombling for geographical boundary analysis. Geographical Analysis 37, 3 (2005), 265--285.
[26]
Ronald P. Neilson. 1993. Transient ecotone response to climatic change: Some conceptual and modelling approaches. Ecological Applications 3, 3 (1993), 385--395.
[27]
D. Nikovski and A. Jain. 2010. Fast adaptive algorithms for abrupt change detection. Machine Learning 79, 3 (2010), 283--306.
[28]
I. R. Noble. 1993. A model of the responses of ecotones to climate change. Ecological Applications 3, 3 (1993), 396--403.
[29]
E. S. Page. 1954. Continuous inspection schemes. Biometrika 41, 1/2 (1954), 100--115.
[30]
Sangjun Park, Hesham Rakha, and Feng Guo. 2011. Multi-state travel time reliability model: Impact of incidents on travel time reliability. In 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC’11). IEEE, 2106--2111.
[31]
Sushil K. Prasad, Danial Aghajarian, Michael McDermott, Dhara Shah, Mohamed Mokbel, Satish Puri, Sergio J. Rey, Shashi Shekhar, Yiqun Xe, Ranga Raju Vatsavai, Fusheng Wang, Yanhui Liang, Hoang Vo, and Shaowen Wang. 2017. Parallel processing over spatial-temporal datasets from geo, bio, climate and social science communities: A research roadmap. In 2017 IEEE International Congress on Big Data (BigData Congress’17). IEEE, 232--250.
[32]
M. Sharifzadeh, F. Azmoodeh, and C. Shahabi. 2005. Change detection in time series data using wavelet footprints. Advances in Spatial and Temporal Databases, Medeiros C. Bauzer, M. J. Egenhofer, E. Bertino (Eds.). Lecture Notes in Computer Science, Vol. 3633. Springer, Berlin, Heidelberg.
[33]
S. Shekhar and S. Chawla. 2003. Spatial Databases: A Tour.Prentice Hall, 2003 (ISBN 013-017480-7).
[34]
Jun-ichi Takeuchi and Kenji Yamanishi. 2006. A unifying framework for detecting outliers and change points from time series. IEEE Transactions on Knowledge and Data Engineering 18, 4 (2006), 482--492.
[35]
C. J. Tucker, J. E. Pinzón, M. E. Brown, D. A. Slayback, E. W. Pak, R. Mahoney, E. F. Vermote, and N. El Saleous. 2005. An extended AVHRR 8-km NDVI dataset compatible with MODIS and SPOT vegetation NDVI data. International Journal of Remote Sensing 26, 20 (2005), 4485--4498.
[36]
Amin Vahedian, Xun Zhou, Ling Tong, Yanhua Li, and Jun Luo. 2017. Forecasting gathering events through continuous destination prediction on big trajectory data. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 34.
[37]
Jan Verbesselt, Rob Hyndman, Glenn Newnham, and Darius Culvenor. 2010. Detecting trend and seasonal changes in satellite image time series. Remote Sensing of Environment 114, 1 (2010), 106--115.
[38]
Yiqun Xie, Han Bao, Shashi Shekhar, and Joseph Knight. 2018. A TIMBER framework for mining urban tree inventories using remote sensing datasets. In 2018 IEEE International Conference on Data Mining (ICDM’18). IEEE, 1344--1349.
[39]
Yiqun Xie, Rahul Bhojwani, Shashi Shekhar, and Joseph Knight. 2018. An unsupervised augmentation framework for deep learning based geospatial object detection: A summary of results. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 349--358.
[40]
Yiqun Xie, Emre Eftelioglu, Reem Y. Ali, Xun Tang, Yan Li, Ruhi Doshi, and Shashi Shekhar. 2017. Transdisciplinary foundations of geospatial data science. ISPRS International Journal of Geo-Information 6, 12 (2017), 395--418.
[41]
Yiqun Xie and Shashi Shekhar. 2019. A nondeterministic normalization based scan statistic (NN-scan) towards robust hotspot detection: A summary of results. In SIAM International Conference on Data Mining (SDM’19). SIAM.
[42]
Hui Xiong, Shashi Shekhar, Pang-Ning Tan, and Vipin Kumar. 2004. Exploiting a support-based upper bound of Pearson’s correlation coefficient for efficiently identifying strongly correlated pairs. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 334--343.
[43]
Zhuoning Yuan, Xun Zhou, and Tianbao Yang. 2018. Hetero-ConvLSTM: A deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining. ACM, 984--992.
[44]
Sheng Yue, Paul Pilon, and George Cavadias. 2002. Power of the Mann--Kendall and Spearman’s rho tests for detecting monotonic trends in hydrological series. Journal of Hydrology 259, 1--4 (2002), 254--271.
[45]
Xun Zhou, Shashi Shekhar, and Reem Y. Ali. 2014. Spatiotemporal change footprint pattern discovery: An inter-disciplinary survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4, 1 (2014), 1--23.
[46]
Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K. Snyder. 2011. Discovering interesting sub-paths in spatiotemporal datasets: A summary of results. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 44--53.
[47]
Xun Zhou, Shashi Shekhar, and Dev Oliver. 2013. Discovering persistent change windows in spatiotemporal datasets: A summary of results. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data. ACM, 37--46.

Cited By

View all
  • (2023)Sustainable Road Planning for Trucks in Urbanized Areas of Chinese Cities Using Deep Learning ApproachesSustainability10.3390/su1511876315:11(8763)Online publication date: 29-May-2023
  • (2023)Spatial Data ScienceMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_18(401-422)Online publication date: 18-Aug-2023
  • (2022)Statistically-Robust Clustering Techniques for Mapping Spatial Hotspots: A SurveyACM Computing Surveys10.1145/348789355:2(1-38)Online publication date: 18-Jan-2022
  • Show More Cited By

Index Terms

  1. Discovering Interesting Subpaths with Statistical Significance from Spatiotemporal Datasets

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 11, Issue 1
      February 2020
      304 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3375625
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 January 2020
      Accepted: 01 August 2019
      Revised: 01 June 2019
      Received: 01 February 2019
      Published in TIST Volume 11, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Interesting sub-paths
      2. spatial
      3. statistical significance
      4. temporal

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)138
      • Downloads (Last 6 weeks)25
      Reflects downloads up to 16 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Sustainable Road Planning for Trucks in Urbanized Areas of Chinese Cities Using Deep Learning ApproachesSustainability10.3390/su1511876315:11(8763)Online publication date: 29-May-2023
      • (2023)Spatial Data ScienceMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_18(401-422)Online publication date: 18-Aug-2023
      • (2022)Statistically-Robust Clustering Techniques for Mapping Spatial Hotspots: A SurveyACM Computing Surveys10.1145/348789355:2(1-38)Online publication date: 18-Jan-2022
      • (2020)A Unified Framework for Robust and Efficient Hotspot Detection in Smart CitiesACM/IMS Transactions on Data Science10.1145/33795621:3(1-29)Online publication date: 14-Sep-2020

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media