Abstract
Success of anomaly detection, similar to other spatial data mining techniques, relies on neighborhood definition. In this paper, we argue that the anomalous behavior of spatial objects in a neighborhood can be truly captured when both (a) spatial autocorrelation (similar behavior of nearby objects due to proximity) and (b) spatial heterogeneity (distinct behavior of nearby objects due to difference in the underlying processes in the region) are taken into consideration for the neighborhood definition. Our approach begins by generating micro neighborhoods around spatial objects encompassing all the information about a spatial object. We selectively merge these based on spatial relationships accounting for autocorrelation and inferential relationships accounting for heterogeneity, forming macro neighborhoods. In such neighborhoods, we then identify (i) spatio-temporal outliers, where individual sensor readings are anomalous, (ii) spatial outliers, where the entire sensor is an anomaly, and (iii) spatio-temporally coalesced outliers, where a group of spatio-temporal outliers in the macro neighborhood are separated by a small time lag indicating the traversal of the anomaly. We demonstrate the effectiveness of our approach in neighborhood formation and anomaly detection with experimental results in (i) water monitoring and (ii) highway traffic monitoring sensor datasets. We also compare the results of our approach with an existing approach for spatial anomaly detection.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
ARC (2002) ARC IMS 4.0, ArcView 8.3. http://www.esri.com/
Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23(3): 345–405
Birant D, Kut A (2006) Spatio-temporal outlier detection in large databases. J Comput Inf Technol 14(4): 291–297
Chatfield C (1983) Statistics for technology, a course in applied statistics. Science Paperbacks. Chapman & Hall/CRC, Boca Raton, FL
Dasgupta D, Forrest S (1999) Novelty detection in time series data using ideas from immunology. In: International conference on intelligent systems
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases. In: KDD, AAAI Press, USA, pp 44–49
Ester M, Kriegel H, Sander J (1997) Spatial data mining: a database approach. In: 5th International symposium on advances in spatial databases, Springer, London, pp 47–66
Ester M, Frommelt A, Kriegel HP, Sander J (1998) Algorithms for characterization and trend detection in spatial databases. In: 4th International conference on KDD
Ester M, Kriegel HP, Sander J (1999) Knowledge discovery in spatial databases. In: KI ’99: proceedings of the 23rd annual German conference on artificial intelligence, Springer, London, pp 61–74
Estivill-Castro V, Lee I (2000) Autoclust: automatic clustering via boundary extraction for mining massive point—data sets. In: 5th International conference on geocomputation
Griffith D (1987) Spatial autocorrelation: a primer. Assoc Am Geogr
Haining R (2003) Spatial data analysis: theory and practice. Cambridge University Press, Cambridge
Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16(12): 1472–1485
Huang Y, Pei J, Xiong H (2006) Mining co-location patterns with rare events from spatial data sets. GeoInformatica 10(3): 239–260
Kang I, Kim T, Li K (1997) A spatial data mining method by delaunay triangulation. In: 5th ACM international workshop on advances in geographic information systems, pp 35–39. doi:10.1145/267825.267836
Kang JM, Shekhar S, Wennen C, Novak P (2008) Discovering flow anomalies: a sweet approach. In: ICDM, IEEE computer society, pp 851–856
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. John Wiley & Sons Inc., Hoboken, NJ
Keogh E, Lonardi S, Chiu BY (2002) Finding surprising patterns in a time series database in linear time and space. In: 8th ACM international conference on knowledge discovery and data mining, ACM Press, New York, NY, pp 550–556. doi:10.1145/775047.775128
Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: 24th International conference on very large data bases, NY, USA, pp 392–403. http://www.vldb.org/conf/1998/p392.pdf
Kou Y, Lu CT, Santos RFD (2007) Spatial outlier detection: a graph-based approach. In: ICTAI ’07: proceedings of the 19th IEEE international conference on tools with artificial intelligence, vol 1 (ICTAI 2007), IEEE Computer Society, Washington, DC, pp 281–288. doi:10.1109/ICTAI.2007.169
Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26(6): 1481–1496
Kulldorff M, Athas WF, Feurer EJ, AMiller B, Key CR (1998) Evaluating cluster alarms: a space-time scan statistic and brain cancer in los alamos, new mexico. Am J Public Health 88(9): 1377–1380
Lu C, Chen D, Kou Y (2003) Detecting spatial outliers with multiple attributes. In: 15th IEEE international conference on tools with artificial intelligence, p 122
Lu CT, Kou Y, Zhao J, Chen L (2007) Detecting and tracking regional outliers in meteorological data. Inf Sci 177(7): 1609–1632
McGuire MP, Janeja V, Gangopadhyay A (2008) Spatiotemporal neighborhood discovery for sensor data. In: Proceedings of the 2nd international workshop on knowledge discovery from sensor data (Sensor-KDD 2007), held in conjunction with the 14th international conference on knowledge discovery and data mining (ACM SIG-KDD 2008)
Miller HJ, Han J (2001) Geographic data mining and knowledge discovery. Taylor & Francis Inc., New York, NY
Moran P (1948) The interpretation of statistical maps. J R Stat Soc B 10(243): 51
NASQAN (2002) USGS, National stream water quality network (NASQAN), published data. http://pubs.usgs.gov/dds/wqn96cd/html/wqn/wq/region05.htm. Accessed 25 Aug 2009
Naus J (1965) The distribution of the size of the maximum cluster of points on the line. J Am Stat Assoc 60: 532–538
Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: 20th International conference on very large data bases, Morgan Kaufmann, Los Altos, CA, pp 144–155
Okabe A, Boots B, Sugihara K, Chiu S (2000) Spatial tessellations: concepts and applications of Voronoi diagrams. John Wiley & Sons Ltd., West Sussex, England
Shahabi C, Tian X, Zhao W (2000) TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data. In: 12th International conference on scientific and statistical database management
Shekhar S, Lu C, Zhang P (2001) Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: 7th ACM international conference on knowledge discovery and data mining, pp 371–376. doi:10.1145/502512.502567
Shekhar S, Schrater P, Vatsavai R, Wu W, Chawla S (2002) Spatial contextual classification and prediction models for mining geospatial data. In: IEEE transaction on multimedia
Shekhar S, Lu CT, Zhang P, Shekhar S, Lu CT, Zhang P (2003) A unified approach to spatial outliers detection. GeoInformatica 7: 139–166
Shewchuk JR (1996) Triangle: engineering a 2d quality mesh generator and delaunay triangulator. In: Selected papers from the workshop on applied computational geormetry, towards geometric engineering, Springer, London, pp 203–222
Sun P, Chawla S (2004) On local spatial outliers. In: 4th IEEE international conference on data mining, pp 209–216
Unwin D (1982) Introductory spatial analysis. Methuen, London
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Sanjay Chawla.
This work is supported in part by the National Science Foundation under grants IIS-0306838 and CNS-0746943.
Rights and permissions
About this article
Cite this article
Janeja, V.P., Adam, N.R., Atluri, V. et al. Spatial neighborhood based anomaly detection in sensor datasets. Data Min Knowl Disc 20, 221–258 (2010). https://doi.org/10.1007/s10618-009-0147-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-009-0147-0