More Web Proxy on the site http://driver.im/

research-article

Cross-Modal Entity Resolution Based on Co-Attentional Generative Adversarial Network

Authors:

Chen ChangAuthors Info & Claims

ICMSSP '19: Proceedings of the 2019 4th International Conference on Multimedia Systems and Signal Processing

Pages 42 - 46

https://doi.org/10.1145/3330393.3330417

Published: 10 May 2019 Publication History

Abstract

Cross-modal entity resolution aims to find semantically similar items from objects of different modalities(e.g. image and text). The core way to solve the problem is to construct a shared space where multi-modal examples can be represented uniformly. In this paper, we propose a novel Co-Attentional Generative Adversarial Network(CAGAN) method for solving cross-modal entity resolution, which seeks an effective space based on co-attention mechanism and adversarial learning. The generative adversarial network that we design contains two parts, Generator and Discriminator, the generator aims to generate a shared space through intra-modal loss and inter-modal loss, while discriminator is a classifier which tries to discriminate the modalities based on the generated representation. In order to eliminate the imbalance of information between modalities, generate more consistent representation and accelerate the convergence speed of the network, co-attention mechanism is introduced into the network. Experimental results performed on two cross-modal datasets demonstrated the outstanding performance of the proposed method for cross-modal entity resolution.

References

[1]

Peng, Y. X., Huang, X., and Zhao, Y. Z. 2017. An overview of crossmedia retrieval: concepts, methodologies, benchmarks and challenges. IEEE Trans. on Circuits and Systems for Video Technology.

Digital Library

[2]

Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G., Levy, R., Vasconcelos, N. 2010. A new approach to cross-modal multimedia retrieval. In: Proc. of ACM. Int'l Conf. on Multimedia. Florence, Italy: 251--260.

Digital Library

[3]

Gong, Y., Ke, Q., Isard, M., Lazebnik, S. 2014. A multi-view embedding space for modeling inernet images, tags, and their semantics. Internet Journal of Computer Vision. 106(2): 210--233

Digital Library

[4]

Wang, L., Sun, W., Zhao, Z., and Su, F. 2017. Modeling intra- and inter-pair correlation via heterogeneous high-order preserving for cross-modal retrieval. Signal Processing, 131: 249--260.

Digital Library

[5]

Ou, M., Cui, P., Wang, F., Wang, J., Zhu, W., and Yang, S. 2013. Comparing apples to oranges: a scalable solution with heterogeneous hashing. In: Proc. of ACM. Int'l Conf. on Knowledge Discovery and Data Mining.

Digital Library

[6]

Lin, Z., Ding, G., Han, J., and Wang, J. 2016. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. on Cybernetics. 47(12): 4342--4355.

[7]

Fan, H., Chen, H.H. 2018. Research progress of cross-modal retrieval based on hash methods. Data Communication. 184(3): 43--49.

[8]

Blei, D.M., Jordan, M.I. 2003. Modeling annotated data. In: Proc. of ACM SIGIR, Int'l Conf. on Research and Development in Informaion Retrieval. ACM, 127--134.

Digital Library

[9]

Putthividhy, D., Attias, H.T., Nagarajan, S.S. 2010. Topic regression multi-modal latent dirichlet allocation for image annotation. In: Proc. of IEEE, Int'l Conf. on Computer Vision and Pattern Recognition. 3408--3415.

[10]

Zheng, Y., Zhang, Y.J., Larochelle, H. 2014. Topic modeling of multimodal data: an autoregressive approach. In: Proc. of IEEE, Int'l Conf. on Computer Vision and Pattern Recognition. 1370--1377.

Digital Library

[11]

Andrew, G., Arora, R., Bilmes, J.A., Livescu, K. 2013. Deep canonical correlation analysis. In: Proc. of ICML, Int'l Conf. on Machine Learning.

Digital Library

[12]

Peng, Y., Huang, X., and Qi, J. 2016. Cross-media shared representation by hierarchical learning with multiple deep networks. Int'l Conf. on artificial intelligence. 3846--3853.

Digital Library

[13]

Wei, Y.C., Zhang, Y., Lu, C.Y., Wei, S.K., Liu, L.Q., Zhu, Z.F., Yan, S.C. 2017. Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. on Cybernetics, 47(2): 449--460.

[14]

Wang, B.K., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T. 2017. Adversarial cross-modal retrieval. In: Proc. of 25th ACM. Int'l Conf. on Multimedia. Bucharest Romania.

Digital Library

[15]

Yu, J., Lu, Y.H., Qin, Z.C., Liu, Y.B., Tan, J.L., Guo, L., Zhang, W.F. 2018. Modeling text with graph convolutional network for cross-modal information retrieval. Advances in Multimedia Information Processing: PCM.

[16]

Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. 2014. Generative Adversarial Nets. Int'l Conf. on Neural Information Processing Systems MIT Press.

Digital Library

[17]

Hu, L., Kan M., Shan, S., Chen, X. 2018. Duplex Generative Adversarial Network for Unsupervised Domain Adaptation. IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]

Lu, Y., Yu, J., Liu, Y., Tan, J., Guo, L., Zhang, W. 2018. Fine-Grained Correlation Learning with Stacked Co-attention Networks for Cross-Modal Information Retrieval. Lecture Notes in Computer Science, 213--225.

[19]

Peng, Y., Qi, J., and Yuan, Y. 2018. Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network. IEEE Trans. on Image Processing. 1--1.

Digital Library

[20]

Wang, S., Chen, Y., Zhuo, J., Huang, Q., Tian, Q. 2018. Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval. In: Proc. of ACM. Int'l Conf. on Multimedia. Seoul, Korea, October, 22--26.

Digital Library

[21]

Kim, Y. 2014. Convolutional neural networks for sentence classification. ArXiv preprint arXiv:1408.5885.

[22]

Zhai, X., Peng, Y., and Xiao, J. 2012. Cross-modality correlation propagation for cross-media retrieval. In: Proc. of IEEE. Int'l Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2337--2340.

[23]

Zhai, X., Peng, Y., and Xiao, J. 2014. Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. on Circuits and Systems for Video Technology. 24: 965--978.

[24]

Zhai, X., Peng, Y., and Xiao, J. 2012. Effective heterogeneous similarity measure with nearest neighbors for cross-media retrieval. Int'l Conf. on MultiMedia Modeling (MMM), 313--322.

Digital Library

[25]

Kingma, D., Ba, J. 2014. Adam: a method for stochastic optimization. Computer Science.

Index Terms

Cross-Modal Entity Resolution Based on Co-Attentional Generative Adversarial Network
1. Information systems
  1. Information retrieval

Recommendations

Modality Consistent Generative Adversarial Network for Cross-Modal Retrieval
Pattern Recognition and Computer Vision
Abstract
Cross-modal retrieval, which aims to perform the retrieval task across different modalities of data, is a hot topic. Since different modalities of data have inconsistent distributions, how to reduce the gap of different modalities is the core of ...
Multi-loss Super-Resolution Generative Adversarial Network
AIPR '23: Proceedings of the 2023 6th International Conference on Artificial Intelligence and Pattern Recognition

Single-image super-resolution (SISR) based on deep neural networks has achieved excellent performance in recent years. However, how to recover texture details is still a challenging problem in the field of super-resolution. In this paper, in order to ...
Generative Adversarial Network Based Asymmetric Deep Cross-Modal Unsupervised Hashing
Algorithms and Architectures for Parallel Processing
Abstract
With the explosive growth of internet information, cross-modal retrieval has become an important and valuable frontier hotspot. Due to its low storage consumption and high search speed, deep hashing has achieved significant success in cross-modal ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMSSP '19: Proceedings of the 2019 4th International Conference on Multimedia Systems and Signal Processing

May 2019

213 pages

ISBN:9781450371711

DOI:10.1145/3330393

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Shenzhen University: Shenzhen University
Sun Yat-Sen University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

the China Postdoctoral Science Foundation
the National Natural Science Foundation of China

Conference

ICMSSP 2019

ICMSSP 2019: 2019 4th International Conference on Multimedia Systems and Signal Processing

May 10 - 12, 2019

Guangzhou, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
121
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents