Abstract
We propose a vector quantisation method which does not only provide a compact description of data vectors in terms codebook vectors, but also gives an explanation of codebook vectors as binary combinations of elementary features. This corresponds to the intuitive notion that, in the real world, patterns can be usefully thought of as being constructed by compositions from simpler features. The model can be understood as a generative model, in which the codebook vector is generated by a hidden binary state vector. The model is non-probabilistic in the sense that it assigns each data vector to a single codebook vector. We describe exact and approximate learning algorithms for learning deterministic feature representations. In contrast to probabilistic models, the deterministic approach allows the use of message propagation algorithms within the learning scheme. These are compared with standard mean-field/Gibbs sampling learning. We show that Generative Vector Quantisation gives a good performance in large scale real world tasks like image compression and handwritten digit analysis with up to 400 data dimensions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Attias, H. 1999. Independent factor analysis. Neural Computation, 11(4):803–851.
Barlow, H.B. 1989. Unsupervised learning. Neural Computation, 1(3):295–311.
Barnes, C.F. and Frost, R.L. 1993. Vector Quantizers with Direct Sum Codebooks. IEEE Transactions on Information Theory, 36(2):565–580.
Bell, A.J. and Sejnowski, T.J. 1995. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6):1129–1159.
Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B, 39:1–38.
Ghahramani, Z. 1995. Factorial learning and the EM algorithm In Advances in Neural Information Processing Systems, vol. 7. Cambridge, MA: The MIT Press.
Ghahramani, Z. and Hinton, G.E. 1998. Hierarchical non-linear factor analysis and topographic maps. In Advances in Neural Information Processing Systems, volume 10. Cambridge, MA: The MIT Press.
Gray, R.M. 1984. Vector quantisation. IEEE ASSP Magazine, pp. 4–29.
Hull, J.J. 1997. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550–554.
Knagenhjelm, P. and Agrell, E. 1996. The Hadamard transform—a tool for index assignment. IEEE Transaction on Information Theory, 42(4):1139–1151.
McEliece, R., Rodemich, E., and Cheng, J. 1995. The Turbo Decision Algorithm. In Proc. 33rd Allerton Conference on Communications, Control and Computing, pp. 366–379.
McLaughlin, S.W., Neuhoff, D.L., and Ashley, J.J. 1995. Optimal binary index assignments for a class of equiprobable scalar and vector quantizers. IEEE Transactions on Information Theory, 41(6):2031–2037.
Mehes, A. and Zeger, K. 1998. Binary lattice vector quantisation with linear block codes and affine index assignments. IEEE Transactions on Information Theory, 44(1):79–94.
Neal, R.M. 1991. Connectionist learning of belief networks. Artificial Intelligence, 56:71–113.
Neal, R.M. and Hinton, G.E. 1998. Learning in Graphical Models, chapter A view of the EM algorithm that justifies incremental, sparse, and other variants, Dordrecht, The Netherlands: Kluwer Academic Publishers, pp. 355–368.
Olshausen, B.A. and Field, D.J. 1996. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609.
Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann Publishers.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. 1992. Numerical Recipes in C: The Art of Scientific Computing. 2nd ed. Cambridge, MA: Cambridge University Press.
Sallans, B., Hinton, G.E., and Ghahramani, Z. 1998. A hierarchical community of experts. In Neural Networks and Machine Learning, NATO ASI Series F, C.M. Bishop (Ed.). pp. 269–284, Berlin: Springer-Verlag.
Saul, L.K., Jaakkola, T., and Jordan, M.I. 1996. Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4:61–76.
Saund, E. 1995. A multiple cause mixture model for unsupervised learning. Neural Computation, 7(1):51–71.
Weiss, Y. 1997. Belief propagation and revision in networks with loops. Technical Report, MIT-AILab, AIM-1616.
Zemel, R.S. 1994. A minimum description length framework for unsupervised learning. Technical Report, University of Toronto, CRG-TR-93-2.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Westerdijk, M., Barber, D. & Wiegerinck, W. Deterministic Generative Models for Fast Feature Discovery. Data Mining and Knowledge Discovery 5, 337–363 (2001). https://doi.org/10.1023/A:1011405212443
Issue Date:
DOI: https://doi.org/10.1023/A:1011405212443