Abstract
Recently we presented text storage and retrieval in an auto-associative memory framework using the Hopfield neural-network. This realized the ideal functionality of Hopfield network as a content-addressable information retrieval system. In this paper, we extend this result to multi-modal patterns, namely, images with text captions and show that the Hopfield network indeed can store and retrieve such multi-modal patterns even in an auto-associative setting. Within this framework, we examine two central issues such as (i) performance characterization to show that the O(N) capacity of the Hopfield network for a network of size N neurons under the Pseudo-inverse learning rule is still retained in the multi-modal case, and (ii) the retrieval dynamics of the multi-modal pattern (i.e., image and caption together) under various types of queries such as image\(+\)caption, image only and caption only, in line with a typical multi-modal retrieval system where the entire multi-modal pattern is expected to be retrieved even with a partial query pattern from any of the modalities. We present results related to these two issues on a large database of 7000\(+\) captioned-images and establish the practical scalability of both the storage capacity and the retrieval robustness of the Hopfield network for content-addressable retrieval of multi-modal patterns. We point to the potential of this work to extend to a more wider definition of multi-modality as in multi-media content, with various modalities such as video (image sequence) synchronized with sub-title text, speech, music and non-speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Boniface, Y., Reghis, A.: Some experiments around a neural network for multimodal associations. In: IASTED The IASTED International Conference on Artificial Intelligence and Applications - AIA 2006, Innsbruck/Austria (2006)
Chaminade, T., Oztop, E., Chenga, G., Kawato, M.: From self-observation to imitation: visuomotor association on a robotic hand. Brain Res. Bull. 75, 775–784 (2008). https://doi.org/10.1016/j.brainresbull.2008.01.016
Collell, G., Zhang, T., Moens, M.F.: Learning to predict: a fast re-constructive method to generate multimodal embeddings, March 2017. arXiv:1703.08737v1
Gottlieb, L.J., Wong, J., de Chastelaine, M., Rugg, M.D.: Neural correlates of the encoding of multimodal contextual features. Learn. Mem. 19, 605–614 (2012). https://doi.org/10.1101/lm.027631.112
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall International Inc., Upper Saddle River (1999)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational capabilities. Proc. Natl. Acad. Sci. 79, 2554–2558 (1982). https://doi.org/10.1073/pnas.79.8.2554
Horner, A.J., Burgess, N.: The associative structure of memory for multi-element events. J. Exp. Psychol. Gener. 142(4), 1370–1383 (2013)
Iqbal, M.S.: Mulit-modal learning using an unsupervised deep learning architecture, M.S. thesis, Acadia University, Canada
Iqbal, M.S., Silver, D.L.: A scalable unsupervised deep multimodal learning system. In: Proceedings of the 29th International Florida Artificial Intelligence Research Society Conference (2016)
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015). IEEE Trans. Pattern Anal. Mach. Intell. 39(4) (2017). https://doi.org/10.1109/TPAMI.2016.2598339
Kiros, R., Salakhutdinov, R., Zemel, R.: Multimodal neural language models. In: Proceedings of the 31st International Conference on Machine Learning (ICML-2014), pp. 595–603 (2014). http://proceedings.mlr.press/v32/kiros14.html
Ladwani, V.M., Vaishnavi, Y., Ramasubramanian, V.: Hopfield auto-associative memory network for content-based text-retrieval. In: Proceedings of the ICANN-2017 26th International Conference on Artificial Neural Networks, September 2017. https://doi.org/10.1007/978-3-319-68612-7
Ladwani, V.M., et al.: Hopfield net framework for audio search. In: Proceedings NCC-2017 (2017). https://doi.org/10.1109/NCC.2017.8077074
Liu, J., Xu, C., Lu, H.: Cross-media retrieval: state-of-the-art and open issues. Int. J. Multimed. Intell. Secur. 1(1), 33–52 (2010). https://doi.org/10.1504/IJMIS.2010.035970
Marr, D.: Simple memory: a theory for archicortex. Philos. Trans. Roy. Soc. London B Biol. Sci. 262, 23–81 (1971). https://doi.org/10.1098/rstb.1971.0078
Meunier, D., Paugam-Moisy, H.: A spiking bidirectional associative memory for modeling intermodal priming. In: Proceedings of the 2nd IASTED International Conference Neural Network and Computational Intelligence, pp. 25–30 (2005). https://hal.archives-ouvertes.fr/hal-00001222
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 689–696 (2011)
Reynaud, E.: Modelisation connexionniste dune memoire associative multimodale. Ph.D. thesis, Institut National Polytechnique Grenoble (2002)
Reynaud, E., Paugam-Moisy, H.: A multiple BAM for hetero-association and multisensory integration modelling. In: Proceedings of the 2005 IEEE International Joint Conference on Neural Networks. IEEE (2005). https://doi.org/10.1109/IJCNN.2005.1556227
Rizzuto, D.S., Kahana, M.J.: An autoassociative neural network model of paired-associate learning. Neural Comput. 13, 2075–2092 (2001). https://doi.org/10.1162/089976601750399317
Rolls, E.T.: The connected hippocampus. Prog. Brain Res. 219, 21–43 (2015)
Socher, R., Karpathy, A., Le, Q.V., Manning, C.D., Ng, A.Y.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014). https://doi.org/10.1162/tacl_a_00177
Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: Advances in Neural Information Processing Systems, pp. 2222–2230 (2012). http://dl.acm.org/citation.cfm?id=2999325.2999383
Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep boltzmann machines. J. Mach. Learn. Res. 15, 2949–2980 (2014). https://dl.acm.org/citation.cfm?id=2627435.2697059
Vaishnavi, Y., Shreyas, R., Suhas, S., Surya, U.N., Ladwani, V.M., Ramasubramanian, V.: Associative memory framework for speech recognition: adaptation of hopfield network. In: Proceedings of the IEEE INDICON (2016). https://doi.org/10.1109/INDICON.2016.7839105
Wang, K., Yin, Q., Wang, W., Wu, S., Wang, L.: A comprehensive survey on cross-modal retrieval. CoRR abs/1607.06215 (2016)
Wang, S., Zhang, J., Zong, C.: Associative multichannel autoencoder for multimodal word representation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Association of Computational Linguistics), pp. 115–124 (2018). https://www.aclweb.org/anthology/D18-1011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Shriwas, R., Joshi, P., Ladwani, V.M., Ramasubramanian, V. (2019). Multi-modal Associative Storage and Retrieval Using Hopfield Auto-associative Memory Network. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-30487-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30486-7
Online ISBN: 978-3-030-30487-4
eBook Packages: Computer ScienceComputer Science (R0)