[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3330393.3330417acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmsspConference Proceedingsconference-collections
research-article

Cross-Modal Entity Resolution Based on Co-Attentional Generative Adversarial Network

Published: 10 May 2019 Publication History

Abstract

Cross-modal entity resolution aims to find semantically similar items from objects of different modalities(e.g. image and text). The core way to solve the problem is to construct a shared space where multi-modal examples can be represented uniformly. In this paper, we propose a novel Co-Attentional Generative Adversarial Network(CAGAN) method for solving cross-modal entity resolution, which seeks an effective space based on co-attention mechanism and adversarial learning. The generative adversarial network that we design contains two parts, Generator and Discriminator, the generator aims to generate a shared space through intra-modal loss and inter-modal loss, while discriminator is a classifier which tries to discriminate the modalities based on the generated representation. In order to eliminate the imbalance of information between modalities, generate more consistent representation and accelerate the convergence speed of the network, co-attention mechanism is introduced into the network. Experimental results performed on two cross-modal datasets demonstrated the outstanding performance of the proposed method for cross-modal entity resolution.

References

[1]
Peng, Y. X., Huang, X., and Zhao, Y. Z. 2017. An overview of crossmedia retrieval: concepts, methodologies, benchmarks and challenges. IEEE Trans. on Circuits and Systems for Video Technology.
[2]
Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G., Levy, R., Vasconcelos, N. 2010. A new approach to cross-modal multimedia retrieval. In: Proc. of ACM. Int'l Conf. on Multimedia. Florence, Italy: 251--260.
[3]
Gong, Y., Ke, Q., Isard, M., Lazebnik, S. 2014. A multi-view embedding space for modeling inernet images, tags, and their semantics. Internet Journal of Computer Vision. 106(2): 210--233
[4]
Wang, L., Sun, W., Zhao, Z., and Su, F. 2017. Modeling intra- and inter-pair correlation via heterogeneous high-order preserving for cross-modal retrieval. Signal Processing, 131: 249--260.
[5]
Ou, M., Cui, P., Wang, F., Wang, J., Zhu, W., and Yang, S. 2013. Comparing apples to oranges: a scalable solution with heterogeneous hashing. In: Proc. of ACM. Int'l Conf. on Knowledge Discovery and Data Mining.
[6]
Lin, Z., Ding, G., Han, J., and Wang, J. 2016. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. on Cybernetics. 47(12): 4342--4355.
[7]
Fan, H., Chen, H.H. 2018. Research progress of cross-modal retrieval based on hash methods. Data Communication. 184(3): 43--49.
[8]
Blei, D.M., Jordan, M.I. 2003. Modeling annotated data. In: Proc. of ACM SIGIR, Int'l Conf. on Research and Development in Informaion Retrieval. ACM, 127--134.
[9]
Putthividhy, D., Attias, H.T., Nagarajan, S.S. 2010. Topic regression multi-modal latent dirichlet allocation for image annotation. In: Proc. of IEEE, Int'l Conf. on Computer Vision and Pattern Recognition. 3408--3415.
[10]
Zheng, Y., Zhang, Y.J., Larochelle, H. 2014. Topic modeling of multimodal data: an autoregressive approach. In: Proc. of IEEE, Int'l Conf. on Computer Vision and Pattern Recognition. 1370--1377.
[11]
Andrew, G., Arora, R., Bilmes, J.A., Livescu, K. 2013. Deep canonical correlation analysis. In: Proc. of ICML, Int'l Conf. on Machine Learning.
[12]
Peng, Y., Huang, X., and Qi, J. 2016. Cross-media shared representation by hierarchical learning with multiple deep networks. Int'l Conf. on artificial intelligence. 3846--3853.
[13]
Wei, Y.C., Zhang, Y., Lu, C.Y., Wei, S.K., Liu, L.Q., Zhu, Z.F., Yan, S.C. 2017. Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. on Cybernetics, 47(2): 449--460.
[14]
Wang, B.K., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T. 2017. Adversarial cross-modal retrieval. In: Proc. of 25th ACM. Int'l Conf. on Multimedia. Bucharest Romania.
[15]
Yu, J., Lu, Y.H., Qin, Z.C., Liu, Y.B., Tan, J.L., Guo, L., Zhang, W.F. 2018. Modeling text with graph convolutional network for cross-modal information retrieval. Advances in Multimedia Information Processing: PCM.
[16]
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. 2014. Generative Adversarial Nets. Int'l Conf. on Neural Information Processing Systems MIT Press.
[17]
Hu, L., Kan M., Shan, S., Chen, X. 2018. Duplex Generative Adversarial Network for Unsupervised Domain Adaptation. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[18]
Lu, Y., Yu, J., Liu, Y., Tan, J., Guo, L., Zhang, W. 2018. Fine-Grained Correlation Learning with Stacked Co-attention Networks for Cross-Modal Information Retrieval. Lecture Notes in Computer Science, 213--225.
[19]
Peng, Y., Qi, J., and Yuan, Y. 2018. Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network. IEEE Trans. on Image Processing. 1--1.
[20]
Wang, S., Chen, Y., Zhuo, J., Huang, Q., Tian, Q. 2018. Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval. In: Proc. of ACM. Int'l Conf. on Multimedia. Seoul, Korea, October, 22--26.
[21]
Kim, Y. 2014. Convolutional neural networks for sentence classification. ArXiv preprint arXiv:1408.5885.
[22]
Zhai, X., Peng, Y., and Xiao, J. 2012. Cross-modality correlation propagation for cross-media retrieval. In: Proc. of IEEE. Int'l Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2337--2340.
[23]
Zhai, X., Peng, Y., and Xiao, J. 2014. Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. on Circuits and Systems for Video Technology. 24: 965--978.
[24]
Zhai, X., Peng, Y., and Xiao, J. 2012. Effective heterogeneous similarity measure with nearest neighbors for cross-media retrieval. Int'l Conf. on MultiMedia Modeling (MMM), 313--322.
[25]
Kingma, D., Ba, J. 2014. Adam: a method for stochastic optimization. Computer Science.

Index Terms

  1. Cross-Modal Entity Resolution Based on Co-Attentional Generative Adversarial Network

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMSSP '19: Proceedings of the 2019 4th International Conference on Multimedia Systems and Signal Processing
    May 2019
    213 pages
    ISBN:9781450371711
    DOI:10.1145/3330393
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Shenzhen University: Shenzhen University
    • Sun Yat-Sen University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 May 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. co-attention mechanism
    2. cross-modal entity resolution
    3. generative adversarial network

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • the China Postdoctoral Science Foundation
    • the National Natural Science Foundation of China

    Conference

    ICMSSP 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 121
      Total Downloads
    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media