[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1014052.1014082acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Rapid detection of significant spatial clusters

Published: 22 August 2004 Publication History

Abstract

Given an N x N grid of squares, where each square has a count cij and an underlying population pij, our goal is to find the rectangular region with the highest density, and to calculate its significance by randomization. An arbitrary density function D, dependent on a region's total count C and total population P, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff's spatial scan statistic DK to find the most significant spatial disease cluster. A naive approach to finding the maximum density region requires O(N4) time, and is generally computationally infeasible. We present a multiresolution algorithm which partitions the grid into overlapping regions using a novel overlap-kd tree data structure, bounds the maximum score of subregions contained in each region, and prunes regions which cannot contain the maximum density region. For sufficiently dense regions, this method finds the maximum density region in O((N log N)2) time, in practice resulting in significant (20-2000x) speedups on both real and simulated datasets.

References

[1]
R. Agrawal, J. Gehrke, D. Gunopulus, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. ACM-SIGMOD Intl. Conf. on Mgmt. of Data, pages 94--105, 1998.
[2]
K. Deng and A. W. Moore. Multiresolution instance-based learning. In Proc. 12th Intl. Joint Conf. on Artificial Intelligence, pages 1233--1239, 1995.
[3]
S. Goil, H. Nagesh, and A. Choudhary. MAFIA: efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University, 1999.
[4]
M. Kulldorff. A spatial scan statistic. Communications in Statistics: Theory and Methods, 26(6):1481--1496, 1997.
[5]
M. Kulldorff. Spatial scan statistics: models, calculations, and applications. In J. Glaz and N. Balakrishnan, editors, Scan Statistics and Applications, pages 303--322. Birkhauser, 1999.
[6]
M. Kulldorff and N. Nagarwalla. Spatial disease clusters: detection and inference. Statistics in Medicine, 14:799--810, 1995.
[7]
D. B. Neill and A. W. Moore. A fast multi-resolution method for detection of significant spatial disease clusters. In S. Thrun, L. Saul, and B. Scholkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, 2004.
[8]
S. Openshaw, M. Charlton, A. Craft, and J. Birch. Investigation of leukemia clusters by use of a geographical analysis machine. Lancet, 1:272--3, 1988.
[9]
F. Preparata and M. Shamos. Computational Geometry: An Introduction. Springer-Verlag, New York, 1985.
[10]
H. Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA, 1990.
[11]
L. Waller, B. Turnbull, L. Clark, and P. Nasca. Spatial analysis to detect disease clusters. In N. Lange, editor, Case Studies in Biometry, pages 3--23. Wiley, 1994.
[12]
W. Wang, J. Yang, and R. Muntz. STING: a statistical information grid approach to spatial data mining. In Proc. 23rd Conference on Very Large Databases, pages 186--195, 1997.

Cited By

View all
  • (2024)Enhanced scan statistic with tightened window for detecting irregularly shaped hotspotsInternational Journal of Geographical Information Science10.1080/13658816.2024.239914439:1(207-230)Online publication date: 5-Sep-2024
  • (2024)Bayesian spatial cluster signal learning with application to adverse event (AE)Journal of Biopharmaceutical Statistics10.1080/10543406.2024.2325148(1-13)Online publication date: 21-Mar-2024
  • (2023)Spatial Data ScienceMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_18(401-422)Online publication date: 18-Aug-2023
  • Show More Cited By

Index Terms

  1. Rapid detection of significant spatial clusters

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2004
    874 pages
    ISBN:1581138881
    DOI:10.1145/1014052
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. biosurveillance
    2. cluster detection
    3. spatial data mining algorithms
    4. spatial scan statistics

    Qualifiers

    • Article

    Conference

    KDD04

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 16 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhanced scan statistic with tightened window for detecting irregularly shaped hotspotsInternational Journal of Geographical Information Science10.1080/13658816.2024.239914439:1(207-230)Online publication date: 5-Sep-2024
    • (2024)Bayesian spatial cluster signal learning with application to adverse event (AE)Journal of Biopharmaceutical Statistics10.1080/10543406.2024.2325148(1-13)Online publication date: 21-Mar-2024
    • (2023)Spatial Data ScienceMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_18(401-422)Online publication date: 18-Aug-2023
    • (2022)Tail bounds for empirically standardized sumsElectronic Journal of Statistics10.1214/22-EJS199516:1Online publication date: 1-Jan-2022
    • (2022)Statistically-Robust Clustering Techniques for Mapping Spatial Hotspots: A SurveyACM Computing Surveys10.1145/348789355:2(1-38)Online publication date: 18-Jan-2022
    • (2022)Calibrating the scan statistic: Finite sample performance versus asymptoticsJournal of the Royal Statistical Society: Series B (Statistical Methodology)10.1111/rssb.1254984:5(1608-1639)Online publication date: 16-Aug-2022
    • (2021)A New Model to Identify the Reliability and Trust of Internet Banking Users Using Fuzzy Theory and Data-MiningMathematics10.3390/math90909169:9(916)Online publication date: 21-Apr-2021
    • (2021)Foodborne Disease Risk Prediction Using Multigraph Structural Long Short-term Memory Networks: Algorithm Design and Validation StudyJMIR Medical Informatics10.2196/294339:8(e29433)Online publication date: 2-Aug-2021
    • (2021)Significant DBSCAN+: Statistically Robust Density-based ClusteringACM Transactions on Intelligent Systems and Technology10.1145/347484212:5(1-26)Online publication date: 24-Nov-2021
    • (2021)Discovery of arbitrarily shaped significant clusters in spatial point data with noiseApplied Soft Computing10.1016/j.asoc.2021.107452108(107452)Online publication date: Sep-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media