More Web Proxy on the site http://driver.im/

research-article

Public Access

Discovering Interesting Subpaths with Statistical Significance from Spatiotemporal Datasets

Authors:

Shashi ShekharAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 11, Issue 1

Article No.: 2, Pages 1 - 24

https://doi.org/10.1145/3354189

Published: 09 January 2020 Publication History

All formats PDF

Abstract

Given a path in a spatial or temporal framework, we aim to find all contiguous subpaths that are both interesting (e.g., abrupt changes) and statistically significant (i.e., persistent trends rather than local fluctuations). Discovering interesting subpaths can provide meaningful information for a variety of domains including Earth science, environmental science, urban planning, and the like. Existing methods are limited to detecting individual points of interest along an input path but cannot find interesting subpaths. Our preliminary work provided a Subpath Enumeration and Pruning (SEP) algorithm to detect interesting subpaths of arbitrary length. However, SEP is not effective in avoiding detections that are random variations rather than meaningful trends, which hampers clear and proper interpretations of the results. In this article, we extend our previous work by proposing a significance testing framework to eliminate these random variations. To compute the statistical significance, we first show a baseline Monte-Carlo method based on our previous work and then propose a Dynamic Search-and-Prune (D-SAP) algorithm to improve its computational efficiency. Our experiments show that the significance testing can greatly suppress the noisy detections in the output and D-SAP can greatly reduce the execution time.

References

[1]

2017. About African Monsoon. Retrieved from http://www.clivar.org/african-monsoon.

[2]

Naoki Abe, Yiqun Xie, Shashi Shekhar, Chid Apte, Vipin Kumar, Mitch Tuinstra, and Ranga Raju Vatsavai. 2017. Data science for food, energy and water: A workshop report. ACM SIGKDD Explorations Newsletter 18, 2 (2017), 1--4.

Digital Library

[3]

Saurabh Agrawal, Gowtham Atluri, Anuj Karpatne, William Haltom, Stefan Liess, Snigdhansu Chatterjee, and Vipin Kumar. 2017. Tripoles: A new class of relationships in time series data. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 697--706.

Digital Library

[4]

Samaneh Aminikhanghahi and Diane J. Cook. 2017. A survey of methods for time series change point detection. Knowledge and Information Systems 51, 2 (2017), 339--367.

Digital Library

[5]

S. Banerjee and A. E. Gelfand. 2006. Bayesian wombling: Curvilinear gradient assessment under spatial process models. Journal of the American Statistical Association 101 (2006), 1487--1501.

[6]

Michèle Basseville and Igor V. Nikiforov. 1993. Detection of Abrupt Changes: Theory and Application. Vol. 104. Prentice Hall, Englewood Cliffs, NJ.

[7]

Donald J. Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (AAAIWS’94). AAAI Press, 359--370. Retrieved from http://dl.acm.org/citation.cfm?id=3000850.3000887

Digital Library

[8]

John Canny. 1987. A computational approach to edge detection. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms 184, 87--116 (1987), 86.

[9]

Isla S. Castañeda, Josef P. Werne, and Thomas C. Johnson. 2007. Wet and arid phases in the southeast African tropics since the Last Glacial Maximum. Geology 35, 9 (2007), 823--826.

[10]

Lei Chen, M. Tamer Özsu, and Vincent Oria. 2005. Robust and fast similarity search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. ACM, 491--502.

Digital Library

[11]

C. J. Tucker, J. E. Pinzon, and M. E. Brown. 2006. Global Inventory Modeling and Mapping Studies. Global Land Cover Facility, University of Maryland, College Park, Maryland, 1981--2006.

[12]

Emre Eftelioglu, Shashi Shekhar, Dev Oliver, Xun Zhou, Michael R. Evans, Yiqun Xie, James M. Kang, Renee Laubscher, and Christopher Farah. 2014. Ring-shaped hotspot detection: A summary of results. In 2014 IEEE International Conference on Data Mining. IEEE, 815--820.

Digital Library

[13]

M. C. Fitzpatrick, E. L. Preisser, A. Porter, J. Elkinton, L. A. Waller, B.P. Carlin, and A. M. Ellison. 2010. Ecological boundary detection using Bayesian areal wombling. Ecology 91, 12 (2010), 3448--3455.

[14]

Marie-Josee Fortin. 1994. Edge detection algorithms for two-dimensional ecological data. Ecology 75, 4 (1994), 956--965.

[15]

Joachim Gudmundsson and Marc van Kreveld. 2006. Computing longest duration flocks in trajectory data. In Proceedings of the 14th Annual ACM International Symposium on Advances in Geographic Information Systems. ACM, 35--42.

Digital Library

[16]

Zhe Jiang, Shashi Shekhar, Xun Zhou, Joseph Knight, and Jennifer Corcoran. 2014. Focal-test-based spatial decision tree learning. IEEE Transactions on Knowledge and Data Engineering 27, 6 (2014), 1547--1559.

Digital Library

[17]

Jaya Kawale, Snigdhansu Chatterjee, Dominick Ormsby, Karsten Steinhaeuser, Stefan Liess, and Vipin Kumar. 2012. Testing the significance of spatio-temporal teleconnection patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 642--650.

Digital Library

[18]

Eamonn Keogh. 2002. Exact indexing of dynamic time warping. In Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, 406--417.

Digital Library

[19]

Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2003. Segmenting time series: A survey and novel approach. Data Mining in Time Series Databases 57 (2003), 1--21.

[20]

Eamonn Keogh, Selina Chu, David Hart, and Michael Pazzani. 2001. An online algorithm for segmenting time series. In Proceedings IEEE International Conference on Data Mining (ICDM’01). IEEE, 289--296.

[21]

Amin Vahedian Khezerlou, Xun Zhou, Lufan Li, Zubair Shafiq, Alex X. Liu, and Fan Zhang. 2017. A traffic flow approach to early detection of gathering events: Comprehensive results. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 6 (2017), 74.

Digital Library

[22]

J. Kucera, P. Barbosa, and P. Strobl. 2007. Cumulative sum charts-a novel technique for processing daily time series of modis data for burnt area mapping in portugal. In International Workshop on the Analysis of Multi-temporal Remote Sensing Images, 2007 (MultiTemp’07). IEEE, 1--6.

[23]

Martin Kulldorff. 1997. A spatial scan statistic. Communications in Statistics-Theory and Methods 26, 6 (1997), 1481--1496.

[24]

S. Liang, S. Banerjee, and B.P. Carlin. 2009. Bayesian wombling for spatial point processes. Biometrics 65, 4 (2009), 1243--1253.

[25]

H. Lu and B.P. Carlin. 2005. Bayesian areal wombling for geographical boundary analysis. Geographical Analysis 37, 3 (2005), 265--285.

[26]

Ronald P. Neilson. 1993. Transient ecotone response to climatic change: Some conceptual and modelling approaches. Ecological Applications 3, 3 (1993), 385--395.

[27]

D. Nikovski and A. Jain. 2010. Fast adaptive algorithms for abrupt change detection. Machine Learning 79, 3 (2010), 283--306.

Digital Library

[28]

I. R. Noble. 1993. A model of the responses of ecotones to climate change. Ecological Applications 3, 3 (1993), 396--403.

[29]

E. S. Page. 1954. Continuous inspection schemes. Biometrika 41, 1/2 (1954), 100--115.

[30]

Sangjun Park, Hesham Rakha, and Feng Guo. 2011. Multi-state travel time reliability model: Impact of incidents on travel time reliability. In 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC’11). IEEE, 2106--2111.

[31]

Sushil K. Prasad, Danial Aghajarian, Michael McDermott, Dhara Shah, Mohamed Mokbel, Satish Puri, Sergio J. Rey, Shashi Shekhar, Yiqun Xe, Ranga Raju Vatsavai, Fusheng Wang, Yanhui Liang, Hoang Vo, and Shaowen Wang. 2017. Parallel processing over spatial-temporal datasets from geo, bio, climate and social science communities: A research roadmap. In 2017 IEEE International Congress on Big Data (BigData Congress’17). IEEE, 232--250.

[32]

M. Sharifzadeh, F. Azmoodeh, and C. Shahabi. 2005. Change detection in time series data using wavelet footprints. Advances in Spatial and Temporal Databases, Medeiros C. Bauzer, M. J. Egenhofer, E. Bertino (Eds.). Lecture Notes in Computer Science, Vol. 3633. Springer, Berlin, Heidelberg.

[33]

S. Shekhar and S. Chawla. 2003. Spatial Databases: A Tour.Prentice Hall, 2003 (ISBN 013-017480-7).

[34]

Jun-ichi Takeuchi and Kenji Yamanishi. 2006. A unifying framework for detecting outliers and change points from time series. IEEE Transactions on Knowledge and Data Engineering 18, 4 (2006), 482--492.

Digital Library

[35]

C. J. Tucker, J. E. Pinzón, M. E. Brown, D. A. Slayback, E. W. Pak, R. Mahoney, E. F. Vermote, and N. El Saleous. 2005. An extended AVHRR 8-km NDVI dataset compatible with MODIS and SPOT vegetation NDVI data. International Journal of Remote Sensing 26, 20 (2005), 4485--4498.

[36]

Amin Vahedian, Xun Zhou, Ling Tong, Yanhua Li, and Jun Luo. 2017. Forecasting gathering events through continuous destination prediction on big trajectory data. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 34.

Digital Library

[37]

Jan Verbesselt, Rob Hyndman, Glenn Newnham, and Darius Culvenor. 2010. Detecting trend and seasonal changes in satellite image time series. Remote Sensing of Environment 114, 1 (2010), 106--115.

[38]

Yiqun Xie, Han Bao, Shashi Shekhar, and Joseph Knight. 2018. A TIMBER framework for mining urban tree inventories using remote sensing datasets. In 2018 IEEE International Conference on Data Mining (ICDM’18). IEEE, 1344--1349.

[39]

Yiqun Xie, Rahul Bhojwani, Shashi Shekhar, and Joseph Knight. 2018. An unsupervised augmentation framework for deep learning based geospatial object detection: A summary of results. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 349--358.

Digital Library

[40]

Yiqun Xie, Emre Eftelioglu, Reem Y. Ali, Xun Tang, Yan Li, Ruhi Doshi, and Shashi Shekhar. 2017. Transdisciplinary foundations of geospatial data science. ISPRS International Journal of Geo-Information 6, 12 (2017), 395--418.

[41]

Yiqun Xie and Shashi Shekhar. 2019. A nondeterministic normalization based scan statistic (NN-scan) towards robust hotspot detection: A summary of results. In SIAM International Conference on Data Mining (SDM’19). SIAM.

[42]

Hui Xiong, Shashi Shekhar, Pang-Ning Tan, and Vipin Kumar. 2004. Exploiting a support-based upper bound of Pearson’s correlation coefficient for efficiently identifying strongly correlated pairs. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 334--343.

Digital Library

[43]

Zhuoning Yuan, Xun Zhou, and Tianbao Yang. 2018. Hetero-ConvLSTM: A deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining. ACM, 984--992.

Digital Library

[44]

Sheng Yue, Paul Pilon, and George Cavadias. 2002. Power of the Mann--Kendall and Spearman’s rho tests for detecting monotonic trends in hydrological series. Journal of Hydrology 259, 1--4 (2002), 254--271.

[45]

Xun Zhou, Shashi Shekhar, and Reem Y. Ali. 2014. Spatiotemporal change footprint pattern discovery: An inter-disciplinary survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4, 1 (2014), 1--23.

Digital Library

[46]

Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, and Peter K. Snyder. 2011. Discovering interesting sub-paths in spatiotemporal datasets: A summary of results. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 44--53.

[47]

Xun Zhou, Shashi Shekhar, and Dev Oliver. 2013. Discovering persistent change windows in spatiotemporal datasets: A summary of results. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data. ACM, 37--46.

Digital Library

Cited By

Liang HWang SLi HPan JLi XSu CLiu B(2025)AIAM: Adaptive interactive attention model for solving p-Median problem via deep reinforcement learningInternational Journal of Applied Earth Observation and Geoinformation10.1016/j.jag.2025.104454138(104454)Online publication date: Apr-2025
https://doi.org/10.1016/j.jag.2025.104454
Wang HZhao ZMa YWu HBao F(2023)Sustainable Road Planning for Trucks in Urbanized Areas of Chinese Cities Using Deep Learning ApproachesSustainability10.3390/su1511876315:11(8763)Online publication date: 29-May-2023
https://doi.org/10.3390/su15118763
Li YXie YShekhar S(2023)Spatial Data ScienceMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_18(401-422)Online publication date: 18-Aug-2023
https://doi.org/10.1007/978-3-031-24628-9_18
Show More Cited By

Index Terms

Discovering Interesting Subpaths with Statistical Significance from Spatiotemporal Datasets
1. Information systems
  1. Information systems applications
    1. Data mining
    2. Spatial-temporal systems

Recommendations

Discovering interesting sub-paths in spatiotemporal datasets: a summary of results
GIS '11: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

Given a spatiotemporal (ST) dataset and a path in its embedding spatiotemporal framework, the goal is to to identify all interesting sub-paths defined by an interest measure. Sub-path discovery is of fundamental importance for understanding climate ...
A comparison of statistical significance tests for information retrieval evaluation
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student's paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher's ...
Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-On Tutorial
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

This hands-on half-day tutorial consists of two sessions. Part~I covers the following topics: Preliminaries; Paired and two-sample t-tests, confidence intervals; One-way ANOVA and two-way ANOVA without replication; Familiwise error rate. Part~II covers ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 11, Issue 1

February 2020

304 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3375625

Editor:
Yu Zheng
JD Finance, China

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 January 2020

Accepted: 01 August 2019

Revised: 01 June 2019

Received: 01 February 2019

Published in TIST Volume 11, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
646
Total Downloads

Downloads (Last 12 months)162
Downloads (Last 6 weeks)40

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liang HWang SLi HPan JLi XSu CLiu B(2025)AIAM: Adaptive interactive attention model for solving p-Median problem via deep reinforcement learningInternational Journal of Applied Earth Observation and Geoinformation10.1016/j.jag.2025.104454138(104454)Online publication date: Apr-2025
https://doi.org/10.1016/j.jag.2025.104454
Wang HZhao ZMa YWu HBao F(2023)Sustainable Road Planning for Trucks in Urbanized Areas of Chinese Cities Using Deep Learning ApproachesSustainability10.3390/su1511876315:11(8763)Online publication date: 29-May-2023
https://doi.org/10.3390/su15118763
Li YXie YShekhar S(2023)Spatial Data ScienceMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_18(401-422)Online publication date: 18-Aug-2023
https://doi.org/10.1007/978-3-031-24628-9_18
Xie YShekhar SLi Y(2022)Statistically-Robust Clustering Techniques for Mapping Spatial Hotspots: A SurveyACM Computing Surveys10.1145/348789355:2(1-38)Online publication date: 18-Jan-2022
https://dl.acm.org/doi/10.1145/3487893
Xie YShekhar S(2020)A Unified Framework for Robust and Efficient Hotspot Detection in Smart CitiesACM/IMS Transactions on Data Science10.1145/33795621:3(1-29)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.1145/3379562

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents