PatZip: Pattern-Preserved Spatial Data Compression

Yu Qian²¹,
Kang Zhang²¹ &
D. T. Huynh²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2661 Accesses

Abstract

This paper presents a compression method, PatZip, to improve the efficiency of spatial pattern mining methods. PatZip can avoid overcompression and stop automatically before pattern is destroyed. Compared with existing compression methods, PatZip is deterministic and its result is reproducible, and original data can be easily recovered. The compression process is data-driven and parameter-free, and requires only O(nlogn) time for n data points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Vouw: Geometric Pattern Mining Using the MDL Principle

Information Processing on Compressed Data

Spatial regionalization based on optimal information compression

Article Open access 10 October 2022

References

Bradley, P.S., Fayyad, U., Reina, C.: Scaling clustering algorithms to large databases. In: Proc. of 4th International Conf. on Knowledge Discovery and Data Mining (KDD 1998), pp. 9–15 (1998)
Google Scholar
Breunig, M.M., Kriegel, H., Kroger, P., Sander, J.: Data bubbles: quality preserving performance boosting for hierarchical clustering. In: Proc. of Int’l Conf. on Management of Data (SIGMOD 2001), pp. 79–90 (2001)
Google Scholar
Clarkson, K.L.: Fast algorithms for the all nearest neighbors problem. In: Proc. of 24th Annual Symposium on Foundations of Computer Science, pp. 226–232 (1983)
Google Scholar
DuMouchel, W., Volinsky, C., Johnson, T., Cortez, C., Pregibon, D.: Squashing Flat Files Flatter. In: Proc. of 5th Int’l Conf. on Knowledge Discovery and Data Mining (KDD 1999), pp. 6–15. AAAI press, Menlo Park (1999)
Chapter Google Scholar
Equitz, W.H.: A new vector quantization clustering algorithm. IEEE Trans. on Acoustics, Speech, and Signal Processing 37(10), 1568–1575 (1989)
Article Google Scholar
Franti, P., Kaukoranta, T., Shen, D., Chang, K.: Fast and memory efficient implementation of the exact PNN. IEEE Trans. on Image Processing 9(5), 773–777 (2000)
Article Google Scholar
Han, J., Kamber, M., Tung, A.K.H.: Spatial clustering methods in data mining: a survey. In: Miller, H., Han, J. (eds.) Geographic Data Mining and Knowledge Discovery. Taylor and Francis, Abington (2001)
Google Scholar
Karypis, G., Han, E., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer 32, 68–75 (1999)
Google Scholar
Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. on Knowledge and Data Eng. 15(5), 1–18 (2003)
Article Google Scholar
Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. on Commun. 28, 84–95 (1980)
Article Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Palmer, C.R., Faloutsos, C.: Density biased sampling: an improved method for data mining and clustering. In: Proc. of Int’l Conf. of Management of Data (SIGMOD 2000), pp. 82–92 (2000)
Google Scholar
Qian, Y., Zhang, G., Zhang, K.: FAÇADE: a fast and effective approach to discovery of dense clusters in noisy spatial data (demo abstract). In: Proc. of Int’l Conf on Management of Data (SIGMOD 2004), pp. 921–922 (2004)
Google Scholar
Qian, Y., Zhang, K.: GraphZip: A fast and automatic compression method for spatial data clustering. In: Proc. of 19th Annual ACM Symposium on Applied Computing (SAC 2004), pp. 571–575 (2004)
Google Scholar
Vaidya, P.M.: An O(nlogn) algorithm for the all-nearest-neighbors problem. Discrete & Computational Geometry 4, 101–115 (1989)
Article MATH MathSciNet Google Scholar
Zhang, T., Ramakrishnan, R., Linvy, M.: BIRCH: an efficient data clustering method for very large databases. In: Proc. of Int’l Conf. on Management of Data (SIGMOD 1996), pp. 103–114 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Texas at Dallas, Richardson, TX, 75083-0688, USA
Yu Qian, Kang Zhang & D. T. Huynh

Authors

Yu Qian
View author publications
You can also search for this author in PubMed Google Scholar
Kang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
D. T. Huynh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
Tu Bao Ho
University of Hong Kong, Pokfulam Road, Hong Kong, China
David Cheung
Department of Computer Science and Engineering, Arizona State University, Tempe, Arizona, USA
Huan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qian, Y., Zhang, K., Huynh, D.T. (2005). PatZip: Pattern-Preserved Spatial Data Compression. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_84

Download citation

DOI: https://doi.org/10.1007/11430919_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics