[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3323873.3325029acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Deep Semantic Space with Intra-class Low-rank Constraint for Cross-modal Retrieval

Published: 05 June 2019 Publication History

Abstract

In this paper, a novel Deep Semantic Space learning model with Intra-class Low-rank constraint (DSSIL) is proposed for cross-modal retrieval, which is composed of two subnetworks for modality-specific representation learning, followed by projection layers for common space mapping. In particular, DSSIL takes into account semantic consistency to fuse the cross-modal data in a high-level common space, and constrains the common representation matrix within the same class to be low-rank, in order to induce the intra-class representations more relevant. More formally, two regularization terms are devised for the two aspects, which have been incorporated into the objective of DSSIL. To optimize the modality-specific subnetworks and the projection layers simultaneously by exploiting the gradient decent directly, we approximate the nonconvex low-rank constraint by minimizing a few smallest singular values of the intra-class matrix with theoretical analysis. Extensive experiments conducted on three public datasets demonstrate the competitive superiority of DSSIL for cross-modal retrieval compared with the state-of-the-art methods.

References

[1]
Y. Peng, X. Huang, and Y. Zhao. 2017. An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges. TCSVT, (2017).
[2]
D. Li, N. Dimitrova, M. Li, and I. K. Sethi. 2003. Multimedia Content Processing Through Cross-modal Association. In ACM MM, (2013).
[3]
N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R.G. Lanckriet, R. Levy, N. Vasconcelos. 2010. A New Approach to Cross-Modal Multimedia Retrieval. In ACM MM, (2010).
[4]
K. Li, G. Qi, J. Ye, and K. Hua. 2016. Linear Subspace Ranking Hashing for Cross-modal Retrieval. TPAMI, 39(9),1825--1838 (2016).
[5]
Y. Peng, J. Qi, X. Huang, and Y. Yuan. 2018. CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network. In ACM MM, 20(2), 405--420 (2018).
[6]
G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. 2013. Robust Recovery of Subspace Structures by Low-Rank Representation. TPAMI, 35(1), 171--184 (2013).
[7]
X. Fang, N. Han, J. Wu, Y. Xu, J. Yang, W. K. Wong, and X. Li. 2018. Approximate Low-Rank Projection Learning for Feature Extraction. TNNLS, 29, 5228--5241 (2018).
[8]
J. Wen, Y. Xu, and H. Liu. 2018. Incomplete Multi-view Spectral Clustering with Adaptive Graph Learning. TCYB, (2018).
[9]
P. Kang, X. Fang, W. Zhang, S. Teng, L. Fei, Y. Xu, and Y. Zheng. 2018. Supervised Group Sparse Representation via Intra-class Low-Rank Constraint. In Chinese Conference on Biometric Recognition, (2018).
[10]
D. R. Hardoon, S. Szedmak, J. Shawe-Taylor. 2004. Canonical Correlation Analysis: An Overview with Application to Learning Methods. Neural Computation, 2004, 16(12):2639--2664 (2004).
[11]
Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. 2014. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics. IJCV, 106(2), 210--233 (2014).
[12]
V. Ranjan, N. Rasiwasia, and C. V. Jawahar. 2015. Multi-Label Cross-modal Retrieval. In ICCV, (2015).
[13]
F. Wu, X. Lu, Z. Zhang, S. Yan, Y. Rui, and Y. Zhuang. 2013. Cross-media Semantic Representation via Bi-directional Learning to Rank. In ACM MM, (2013).
[14]
F. Yan, K. Mikolajczyk. 2015. Deep Correlation for Matching Images and Text. In CVPR, 2015, 3441--3450 (2015).
[15]
F. Feng, X. Wang, and R. Li. 2014. Cross-modal Retrieval with Correspondence Autoencoder. In ACM MM, (2014).
[16]
Y. Peng, X. Huang, and J. Qi. 2016. Cross-media Shared Representation by Hierarchical Learning with Multiple Deep Networks. In IJCAI, 2016, 3846--3853 (2016).
[17]
Y. He, S. Xiang, C. Kang, J. Wang, and C. Pan. 2016. Cross-Modal Retrieval via Deep and Bidirectional Representation Learning. TMM, 18(7), 1363--1377 (2016).
[18]
L. Zhang, B. Ma, G. Li, Q. Huang, and Q. Tian. 2017. Multi-Networks Joint Learning for Large-Scale Cross-Modal Retrieval. In ACM MM, 2017, 907--915 (2017).
[19]
A. Salvador, N. Hynes, Y. Aytar, J. Marin, F. Ofli, I. Weber, and A. Torralba. 2017. Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. In CVPR, (2017).
[20]
Y. Wei, Y. Zhao, C. Lu, S. Wei, L. Liu, Z. Zhu, and S. Yan. 2017. Cross-Modal Retrieval With CNN Visual Features: A New Baseline. TCYE, 47(2), 449--460 (2017).
[21]
R. Situ, Z. Yang, J. Lv, Q. Li, and W. Liu. 2018. Cross-modal Event Retrieval: a Dataset and a Baseline Using Deep Semantic Learning. In Pacific Rim Conference on Multimedia, 2018,147--157 (2018).
[22]
S. Zhan, J. Wu, N. Han, J. Wen, and X. Fang. 2019. Unsupervised Feature Extraction by Low-rank and Sparsity Preserving Embedding. Neural Networks, 109, 56--66 (2019).
[23]
X. Cai, C. Ding, F. Nie, and H. Huang. 2013. On the Equivalent of Low-Rank Regressions and Linear Discriminant Analysis Based Regressions. In ACM SIGKDD, 2013, 1124--1132 (2013).
[24]
Z. Ding, M. Shao, Y. Fu. 2018. Generative Zero-Shot Learning via Low-Rank Embedded Semantic Dictionary. TPAMI, (2018).
[25]
F. Nie, H. Huang. 2016. Subspace Clustering via New Low-Rank Model with Discrete Group Structure Constraint. In IJCAI, 2016, 1874--1880 (2016).
[26]
Q. Qiu, G. Sapiro. 2014. Learning Transformations for Clustering and Classification. Journal of Machine Learning Research, 16, 187--225 (2014).
[27]
J. Lezama, Q. Qiu, and G. Sapiro. 2017. Not Afraid of the Dark: NIR-VIS Face Recognition via Cross Spectral Hallucination and Low-Rank Embedding. In CVPR, (2017).
[28]
Z. Ding, Y. Fu. 2018. Deep Transfer Low-Rank Coding for Cross-Domain Learning. TNNLS, (2018).
[29]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR, (2016).
[30]
C. Eckart, G. Young. 1939. A Principal Axis Transformation for Non-Hermitian Matrices. Bulletin of the American Mathematical Society, 45, 118--121 (1939).
[31]
X. Zhang. 2004. Matrix Analysis and Applications (2nd. Ed.). University of Tsinghua Press, Beijing, 5, 289--291 (2004).
[32]
C. Rashtchian, P. Young, M. Hodosh, J. Hockenmaier. 2010. Collecting Image Annotations Using Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, 139--147 (2010).
[33]
Z. Yang, Z. Lin, P. Kang, J. Lv, Q. Li, and W. Liu. 2019. Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval, https://arxiv.org/abs/1901.04268 (2019).
[34]
Z. Yang, Q. Li, W. Liu, and J. Lv. 2019. Shared Multi-view Data Representation for Multi-domain Event Detection. TPAMI, (2019)

Cited By

View all
  • (2024)Deep Cross-Modal Retrieval Between Spatial Image and Acoustic SpeechIEEE Transactions on Multimedia10.1109/TMM.2023.332387626(4480-4489)Online publication date: 2024
  • (2024)Semantics Disentangling for Cross-Modal RetrievalIEEE Transactions on Image Processing10.1109/TIP.2024.337411133(2226-2237)Online publication date: 2024
  • (2023)Self-Supervised Correlation Learning for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2022.315208625(2851-2863)Online publication date: 2023
  • Show More Cited By

Index Terms

  1. Deep Semantic Space with Intra-class Low-rank Constraint for Cross-modal Retrieval

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval
    June 2019
    427 pages
    ISBN:9781450367653
    DOI:10.1145/3323873
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-modal retrieval
    2. deep neural networks
    3. intra-class low-rank
    4. semantic space

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Guangzhou Science and Technology Planning Project
    • Guangdong Provincial Natural Science Foundation

    Conference

    ICMR '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Deep Cross-Modal Retrieval Between Spatial Image and Acoustic SpeechIEEE Transactions on Multimedia10.1109/TMM.2023.332387626(4480-4489)Online publication date: 2024
    • (2024)Semantics Disentangling for Cross-Modal RetrievalIEEE Transactions on Image Processing10.1109/TIP.2024.337411133(2226-2237)Online publication date: 2024
    • (2023)Self-Supervised Correlation Learning for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2022.315208625(2851-2863)Online publication date: 2023
    • (2022)Deep fused two-step cross-modal hashing with multiple semantic supervisionMultimedia Tools and Applications10.1007/s11042-022-12187-681:11(15653-15670)Online publication date: 1-May-2022
    • (2022)Intra-class low-rank regularization for supervised and semi-supervised cross-modal retrievalApplied Intelligence10.1007/s10489-021-02308-352:1(33-54)Online publication date: 1-Jan-2022
    • (2021)A Cross-Modal Image-Text Retrieval System with Deep LearningAdvances in Artificial Intelligence and Security10.1007/978-3-030-78615-1_47(538-548)Online publication date: 29-Jun-2021
    • (2020)Prototype-Based Discriminative Feature Representation for Class-incremental Cross-modal RetrievalInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142150018X35:05(2150018)Online publication date: 26-Dec-2020
    • (2020)Learning discriminative hashing codes for cross-modal retrieval based on multi-view featuresPattern Analysis and Applications10.1007/s10044-020-00870-z23:3(1421-1438)Online publication date: 12-Feb-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media