Abstract
Off-line dictionary compression is becoming more attractive for applications where compressed data are searched directly in compressed form. While there has been large body of related work describing specific database compression algorithms, the Hibase [10] architecture is unique in processing queries in compressed data. However, this technique does not compress the representation of strings in the domain dictionaries. Primary keys, data with high cardinality and semi-structured data contribute very little or no compression. To achieve high performance irrespective of type of data, the string representation must be in compressed form. At the same time, the direct addressability of compressed data is maintained. Serial compression techniques cannot be used. In this paper, we present a prefix dictionary-based off-line method that can be incorporated with systems like Hibase where compressed data can be accessed directly without prior decompression. The complexity is O(n) in time and space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Huffman D. A. A method for the construction of minimum-redundancy code. Proc. IRE, 40:9:1098–1101, 1952.
W.P. Cockshott, D. McGregor, and Wilson J. Kotsis, N. Data compression in database systems. IDEAS’98, Cardi., July, 1998.
G. P. Copeland and S. Khoshafian. A decomposition storage model. In Proceedings of the 1985 ACM SIGMOD, Austin, Texas, May 1985.
G.V. Cormack. Data compression on a database system. Communication of the ACM, 28:12, 1985.
J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. Int Proce. IEEE Conf on Data Engineering, 1998.
McGregor D. R. Hoque A. S. M. L. Storage and querying high dimensional sparsely populated data in compressed representation. In Accepted in EurAsia-ICT, Tehran, Iran, October 2002.
Bentley J. and Mcllroy D. Data compression using long common strings. http://www.bell-labs.com .
Larson N. J. and Moffat A. Off-line dictionary-based compression. Proceedings of the IEEE, 88:11, 2000.
M Neumuller. Compact data structure for querying xml. Proceedings of EDTB, 2002.
Cockshott W. P., MCGregor D., and Wilson J. High-performance operations using a compressed architecture. The Computer Journal, 41:5:283–296, 1998.
F. Rubin. Experiments in text compression. Commun. ACM, 19, November 1976.
D. Solomon. Data Compression The Complete Reference. Springer-Verlag New York, Inc., 1998.
J.A. Storer and T.G. Szymanski. Data compression via textual substitution. Journal of the ACM, 29:928–951, 1982.
Weltch T.A. A technique for high-performance data compression. Computer, 17:6:8–19, 1984.
T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The implementation and performance of compressed databases. SIGMOD Record, 29:3:55–67, 2000.
J. Ziv and A. Lempel. Compression of individual sequences via variable rate coding. IEEE Transaction on Information Theory, IT-23(3):337–343, 1978.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hoque, A.S.M.L., McGregor, D., Wilson, J. (2002). Database Compression Using an Offline Dictionary Method. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2002. Lecture Notes in Computer Science, vol 2457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36077-8_2
Download citation
DOI: https://doi.org/10.1007/3-540-36077-8_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00009-9
Online ISBN: 978-3-540-36077-3
eBook Packages: Springer Book Archive