Abstract
This paper presents a compression method, PatZip, to improve the efficiency of spatial pattern mining methods. PatZip can avoid overcompression and stop automatically before pattern is destroyed. Compared with existing compression methods, PatZip is deterministic and its result is reproducible, and original data can be easily recovered. The compression process is data-driven and parameter-free, and requires only O(nlogn) time for n data points.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bradley, P.S., Fayyad, U., Reina, C.: Scaling clustering algorithms to large databases. In: Proc. of 4th International Conf. on Knowledge Discovery and Data Mining (KDD 1998), pp. 9–15 (1998)
Breunig, M.M., Kriegel, H., Kroger, P., Sander, J.: Data bubbles: quality preserving performance boosting for hierarchical clustering. In: Proc. of Int’l Conf. on Management of Data (SIGMOD 2001), pp. 79–90 (2001)
Clarkson, K.L.: Fast algorithms for the all nearest neighbors problem. In: Proc. of 24th Annual Symposium on Foundations of Computer Science, pp. 226–232 (1983)
DuMouchel, W., Volinsky, C., Johnson, T., Cortez, C., Pregibon, D.: Squashing Flat Files Flatter. In: Proc. of 5th Int’l Conf. on Knowledge Discovery and Data Mining (KDD 1999), pp. 6–15. AAAI press, Menlo Park (1999)
Equitz, W.H.: A new vector quantization clustering algorithm. IEEE Trans. on Acoustics, Speech, and Signal Processing 37(10), 1568–1575 (1989)
Franti, P., Kaukoranta, T., Shen, D., Chang, K.: Fast and memory efficient implementation of the exact PNN. IEEE Trans. on Image Processing 9(5), 773–777 (2000)
Han, J., Kamber, M., Tung, A.K.H.: Spatial clustering methods in data mining: a survey. In: Miller, H., Han, J. (eds.) Geographic Data Mining and Knowledge Discovery. Taylor and Francis, Abington (2001)
Karypis, G., Han, E., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer 32, 68–75 (1999)
Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. on Knowledge and Data Eng. 15(5), 1–18 (2003)
Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. on Commun. 28, 84–95 (1980)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Palmer, C.R., Faloutsos, C.: Density biased sampling: an improved method for data mining and clustering. In: Proc. of Int’l Conf. of Management of Data (SIGMOD 2000), pp. 82–92 (2000)
Qian, Y., Zhang, G., Zhang, K.: FAÇADE: a fast and effective approach to discovery of dense clusters in noisy spatial data (demo abstract). In: Proc. of Int’l Conf on Management of Data (SIGMOD 2004), pp. 921–922 (2004)
Qian, Y., Zhang, K.: GraphZip: A fast and automatic compression method for spatial data clustering. In: Proc. of 19th Annual ACM Symposium on Applied Computing (SAC 2004), pp. 571–575 (2004)
Vaidya, P.M.: An O(nlogn) algorithm for the all-nearest-neighbors problem. Discrete & Computational Geometry 4, 101–115 (1989)
Zhang, T., Ramakrishnan, R., Linvy, M.: BIRCH: an efficient data clustering method for very large databases. In: Proc. of Int’l Conf. on Management of Data (SIGMOD 1996), pp. 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qian, Y., Zhang, K., Huynh, D.T. (2005). PatZip: Pattern-Preserved Spatial Data Compression. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_84
Download citation
DOI: https://doi.org/10.1007/11430919_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)