Abstract
Recently, there has been considerable interest in mining spatial colocation patterns from large spatial datasets. Spatial colocation patterns represent the subsets of spatial events whose instances are often located in close geographic proximity. Most studies of spatial colocation mining require the specification of two parameter constraints to find interesting colocation patterns. One is a minimum prevalent threshold of colocations, and the other is a distance threshold to define spatial neighborhood. However, it is difficult for users to decide appropriate threshold values without prior knowledge of their task-specific spatial data. In this paper, we propose a different framework for spatial colocation pattern mining. To remove the first constraint, we propose the problem of finding N-most prevalent colocated event sets, where N is the desired number of colocated event sets with the highest interest measure values per each pattern size. We developed two alternative algorithms for mining the N-most patterns. They reduce candidate events effectively and use a filter-and-refine strategy for efficiently finding colocation instances from a spatial dataset. We prove the algorithms are correct and complete in finding the N-most prevalent colocation patterns. For the second constraint, a distance threshold for spatial neighborhood determination, we present various methods to estimate appropriate distance bounds from user input data. The result can help an user to set a distance for a conceptualization of spatial neighborhood. Our experimental results with real and synthetic datasets show that our algorithmic design is computationally effective in finding the N-most prevalent colocation patterns. The discovered patterns were different depending on the distance threshold, which shows that it is important to select appropriate neighbor distances.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agarwal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the international conference on very large databases (VLDB)
Appice A, Ceci M, Lanza A (2003) Discovery of spatial association rules in geo-referenced census data: a relational mining approach. In: Proceedings of the intelligent data analysis
Arshad MU, Ayyaz MN (2006) Mining N-most interesting itemsets using support-ordered tries. In: IEEE international conference on computer systems and applications
Bailey T, Gaterell A (1995) Interactive spatial data analysis. Longman Scientific & Technical, London
Bembenik R, Rybinski H (2008) Mining spatial association rules with no distance parameter. In: Proceedings of the intelligent information processing and web mining, pp 499–508
Berg M, Kreveld M, Overmars M, Schwarzkopf O (2000) Computational geometry. Springer, Berlin
Castro V, Murray A (1998) Discovering associations in spatial data—an efficient medoid based approach. In: Proceedings of the international Pacific-Asia conference on knowledge discovery and data mining (PAKDD)
Ceci M, Appice A, Malerba D (2004) Spatial associative classification at different levels of granularity: a probabilistic approach. Lecture notes in computer science, vol 3202. Springer, Berlin
Chelghoum N, Zeitouni K (2004) Spatial data mining implementation—alternative and performances. In: GeoInfo
Cheung Y, Fu AW (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(9): 1052–1069
Cormen T, Leiserson C, Rivest R, Stein C (2003) Introduction to algorithms. McGraw-Hill Science, New York
Cover T (1995) Nearest neighbor pattern classification. Knowl Based Syst 6(8): 373–389
Cressie N (1993) Statistics for spatial data. Wiley, New York
Crimestat. http://www.icpsr.umich.edu/CrimaStat/
Ding W, Jiamthapthaksin1 R, Parmar R, Jiang D, Stepinski TF, Eick CF (2008) Towards region discovery in spatial datasets. In: Proceedings of the international Pacific-Asia conference on knowledge discovery and data mining (PAKDD)
Easter MJ (1991) Reasoning about binary topological relations. In: Proceedings of the international symposium on advances in spatial databases
Easter M, Kriegel H, Sander J (1999) Knowledge discovery in spatial databases. In: Proceedings of the international conference on artificial intelligence
Eick CF, Parmar R, Ding W, Stepinski TF, Nicot J (2008) Finding regional co-location patterns for sets of continuous variables in spatial datasets. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems (ACM–GIS)
ESRI. Arcgis. http://www.esri.com/software/arcgis/index.html
Frank R, Easter M, Knobbe A (2009) A multi-relational approach to spatial classification. In: Proceedings of the ACM international conference on knowledge discovery and data mining (KDD)
Fu AW, Kwong RW, Tang J (2000) Mining N-most interesting itemsets. In: International symposium on methodologies for intelligent systems
Goodchild M (1986) Spatial autocorrelation. Geo Books, Norwich
Griffith D (1987) Spatial autocorrelation: a primier. Resource publications in geography. Association of American Geographers, Washington, DC
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD conference on management of data
Hirate Y, Iwahashi E, Yamana H (2004) TF 2 P-growth: an efficient algorithm for mining frequent patterns without any thresholds. In: Proceedings of the IEEE ICDM’04 workshop on alternative techniques for data mining and knowledge discovery
Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of the European conference on principles of data mining and knowledge discovery
Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: Proceedings of international symposium on large spatial data bases, Maine, pp 47–66
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceeding of IEEE international conference on data mining
Li F, Cheng D, Hadjieleftheriou M, Kollios G, Teng S (2005) On trip planning queries in spatial databases. In: Proceedings of the 9th international symposium on advances in spatial and temporal databases (SSTD)
Miller HJ (2006) Geographic data mining and knowledge discovery. In: Wilson JP, Fotheringham AS (eds) Handbook of geographic information science. Blackwell, Oxford
Miller HJ, Han J (2001) Geographic data mining and knowledge discovery. Taylor and Francis, London
Morimoto Y (2001) Mining frequent neighboring class sets in spatial databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining
Papadias D, Theodoridis Y (1997) Spatial relations, minimum bounding rectangles, and spatial data structures. Int J Geogr Inf Sci 11: 111–138
R for Statistical Computing. http://www.r-project.org/index.html
Ripley B (1976) The second-order analysis of stationary point process. J Appl Probab 13: 255–266
Roddick J, Spiliopoulou M (1999) A bibliography of temporal, spatial and spatio-temporal data mining research. Proc SIGKDD Explor 1(1): 34–38
Santos MY, Amaral LA (2005) Geo-spatial data mining in the analysis of a demographic database. Soft Comput Fusion Found Methodol Appl 9(5): 374–384
Schlossberg M (2003) GIS, the US census and neighborhood scale analysis. Plan Pract Res 18(2–3):213–218
Shekhar S, Chawla S (2003) Spatial databases: a tour. Prentice Hall, Upper Saddle River
Shekhar S, Huang Y (2001) Co-location rules mining: a summary of results. In: Proceedings of the international symposium on spatio and temporal database (SSTD)
Shekhar S, Zhang P, Huang Y, Vatsavai R (2004) Trends in spatial data mining. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds) Data mining: next generation challenges and future directions. AAAI/MIT Press, Menlo Park
Spatstat. http://www.spatstat.org/spatstat/
Tobler W (1970) A computer movie simulating urban growth in the detroit region. Econ Geogr 46: 234–240
Toots B, Getis A (1998) Point pattern analysis. Sage Publications, Newbury Park
U. G. Survey. http://www.usgs.gov/
U.S.E.P. Agency. Environmental Interest Type. http://www.epa.gov/enviro/html/frs_demo/interest_types.pdf
U.S. EPA (Environmental Protection Agency) FRS (Facility Registry System) facilities. http://www.epa.gov/enviro/html/frs_demo/geospatial_data/geo_data_state_single.html
Wan Y, Zhou J (2008) Knfcom-t: a k-nearest features-based co-location pattern mining algorithm for large spatial data sets by using t-trees. Int J Bus Intell Data Min 3(4): 375–389
Wang J, Han J, Lu Y, Tzvetkov P (2005) TFP: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5): 652–664
Xiao X, Xie X, Luo Q, Ma W (2008) Density based co-location pattern discovery. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic incormation systems (ACM-GIS)
Xiong H, Shekhar S, Huang Y, Kumar V, Ma X, Yoo JS (2004) A framework for discovering co-location patterns in data sets with extended spatial objects. In: Proceedings of the SIAM international conference on data mining (SDM)
Yan X, Han J (2001) gSpan: graph-baseed substructure pattern mining. In: Proceedings of IEEE international conference on data mining
Yoo JS, Bow M (2009) Finding N-most prevalent colocated event sets. In: Proceedings of the international conference on data warehousing and knowledge discovery (DaWak)
Yoo JS, Shekhar S (2004) A partial join approach for mining co-location patterns. In: Proceedings of the ACM international symposium on advances in geographic information systems (ACM-GIS)
Yoo JS, Shekhar S (2006) A join-less approach for mining spatial co-location patterns. IEEE Trans Knowl Data Eng 18(10): 1323–1337
Zhang X, Mamoulis N, Cheung DW, Shouk Y (2004) Fast mining of spatial collocations. In: Proceedings of the ACM international conference on knowledge discovery and data mining (KDD)
Zou L, Chen L, Lu Y (2000) Top-k subgraph matching query in a large graph. In: Proceedings of the conference on information and knowledge management
Zou S, Zhao Y, Guan J, Huang J (2005) A neighborhood-based clustering algorithm. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: M.J. Zaki.
Rights and permissions
About this article
Cite this article
Yoo, J.S., Bow, M. Mining spatial colocation patterns: a different framework. Data Min Knowl Disc 24, 159–194 (2012). https://doi.org/10.1007/s10618-011-0223-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-011-0223-0