research-article

Semantic Structure Enhanced Contrastive Adversarial Hash Network for Cross-media Representation Learning

Authors:

Yang Yu,

Zhe Xue,

Min ZhangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 277 - 285

https://doi.org/10.1145/3503161.3548391

Published: 10 October 2022 Publication History

Get Access

Abstract

Deep cross-media hashing technology provides an efficient cross-media representation learning solution for cross-media search. However, the existing methods do not consider both fine-grained semantic features and semantic structures to mine implicit cross-media semantic associations, which leads to weaker semantic discrimination and consistency for cross-media representation. To tackle this problem, we propose a novel semantic structure enhanced contrastive adversarial hash network for cross-media representation learning (SCAHN). Firstly, in order to capture more fine-grained cross-media semantic associations, a fine-grained cross-media attention feature learning network is constructed, thus the learned saliency features of different modalities are more conducive to cross-media semantic alignment and fusion. Secondly, for further improving learning ability of implicit cross-media semantic associations, a semantic label association graph is constructed, and the graph convolutional network is utilized to mine the implicit semantic structures, thus guiding learning of discriminative features of different modalities. Thirdly, a cross-media and intra-media contrastive adversarial representation learning mechanism is proposed to further enhance the semantic discriminativeness of different modal representations, and a dual-way adversarial learning strategy is developed to maximize cross-media semantic associations, so as to obtain cross-media unified representations with stronger discriminativeness and semantic consistency preserving power. Extensive experiments on several cross-media benchmark datasets demonstrate that the proposed SCAHN outperforms the state-of-the-art methods.

Supplementary Material

MP4 File (MM22-fp2969.mp4)

Nowadays, there are massive cross-media data on the Internet, and the descriptions of different media data are complementary. Therefore, more information across different modalities can be obtained by cross-media search. However, existing methods are weak in semantic distinction and consistency for cross-media representation. To tackle this problem, we propose a novel semantic structure enhanced contrastive adversarial hash network for cross-media representation learning (SCAHN), which combines fine-grained attention semantic features and semantic structures to mine implicit cross-media semantic associations. Especially, we propose a cross-media and intra-media contrastive adversarial representation learning mechanism to further enhance semantic discriminativeness of different modal representations, and develop a dual-way adversarial learning strategy to maximize cross-media semantic associations. Extensive experiments on several benchmark datasets demonstrate that SCAHN outperforms state-of-the-art methods.

Download
487.62 MB

References

[1]

Vaswani A, Shazeer N, Parmar N, and et al. 2017. Attention is all you need. In NIPS. 6000--6010.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Fine-grained Cross-media Representation Learning with Deep Quantization Attention Network

Structures Aware Fine-Grained Contrastive Adversarial Hashing for Cross-Media Retrieval

Cross-media retrieval with collective deep semantic learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations