[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Deep contrastive representation learning for multi-modal clustering

Published: 09 July 2024 Publication History

Abstract

Benefiting from the informative expression capability of contrastive representation learning (CRL), recent multi-modal learning studies have achieved promising clustering performance. However, it should be pointed out that the existing multi-modal clustering methods based on CRL fail to simultaneously take the similarity information embedded in inter- and intra-modal levels. In this study, we mainly explore deep multi-modal contrastive representation learning, and present a multi-modal learning network, named trustworthy multi-modal contrastive clustering (TMCC), which incorporates contrastive learning and adaptively reliable sample selection with multi-modal clustering. Specifically, we are concerned with an adaptive filter to learn TMCC via progressing from ‘easy’ to ‘complex’ samples. Based on this, with the highly confident clustering labels, we present a new contrastive loss to learn modal-consensus representation, which takes into account not only the inter-modal similarity but also the intra-modal similarity. Experimental results show that these principles in TMCC consistently help promote clustering performance improvement.

References

[1]
Huang Y., Yang X., Xu C., Multimodal global relation knowledge distillation for egocentric action anticipation, in: ACM MM, 2021, pp. 245–254.
[2]
Cui Y., Yu Z., Wang C., Zhao Z., Zhang J., Wang M., Yu J., ROSITA: enhancing vision-and-language semantic alignments via cross- and intra-modal knowledge integration, in: ACM MM, 2021, pp. 797–806.
[3]
Jin D., Qi Z., Luo Y., Shan Y., TransFusion: Multi-modal fusion for video tag inference via translation-based knowledge embedding, in: ACM MM, ACM, 2021, pp. 1093–1101.
[4]
Ji Z., Wang H., Han J., Pang Y., SMAN: stacked multimodal attention network for cross-modal image-text retrieval, IEEE Trans. Cybern. 52 (2) (2022) 1086–1097.
[5]
Yi J., Chen Z., Multi-modal variational graph auto-encoder for recommendation systems, IEEE Trans. Multimedia 24 (2022) 1067–1079.
[6]
Gao L., Guan L., A discriminative vectorial framework for multi-modal feature representation, IEEE Trans. Multimedia 24 (2022) 1503–1514.
[7]
Han N., Chen J., Zhang H., Wang H., Chen H., Adversarial multi-grained embedding network for cross-modal text-video retrieval, ACM Trans. Multimedia Comput. Commun. Appl. 18 (2) (2022) 63:1–63:23.
[8]
Qian S., Hu J., Fang Q., Xu C., Knowledge-aware multi-modal adaptive graph convolutional networks for fake news detection, ACM Trans. Multimedia Comput. Commun. Appl. 17 (3) (2021) 98:1–98:23.
[9]
Wang C., Chen B., Xiao S., Wang Z., Zhang H., Wang P., Han N., Zhou M., Multimodal Weibull variational autoencoder for jointly modeling image-text data, IEEE Trans. Cybern. 52 (10) (2022) 11156–11171.
[10]
Zhang D., Huang G., Zhang Q., Han J., Han J., Yu Y., Cross-modality deep feature learning for brain tumor segmentation, Pattern Recognit. 110 (2021).
[11]
Zhang D., Huang G., Zhang Q., Han J., Han J., Wang Y., Yu Y., Exploring task structure for brain tumor segmentation from multi-modality MR images, IEEE Trans. Image Process. 29 (2020) 9032–9043.
[12]
Yue J., Fang L., Xia S., Deng Y., Ma J., Dif-fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models, IEEE Trans. Image Process. 32 (2023) 5705–5720.
[13]
Liu Y., Zhang D., Zhang Q., Han J., Part-object relational visual saliency, IEEE Trans. Pattern Anal. Mach. Intell. 44 (7) (2022) 3688–3704.
[14]
Shao Z., Han J., Marnerides D., Debattista K., Region-object relation-aware dense captioning via transformer, IEEE Trans. Neural Netw. Learn. Syst. (2022) 1–12.
[15]
Shao Z., Han J., Debattista K., Pang Y., Textual context-aware dense captioning with diverse words, IEEE Trans. Multimed. (2023) 1–15.
[16]
Peng X., Huang Z., Lv J., Zhu H., Zhou J.T., COMIC: Multi-view clustering without parameter selection, in: ICML, Vol. 97, 2019, pp. 5092–5101.
[17]
Huang Z., Zhou J.T., Peng X., Zhang C., Zhu H., Lv J., Multi-view spectral clustering network, in: IJCAI, 2019, pp. 2563–2569.
[18]
Zhang C., Fu H., Hu Q., Cao X., Xie Y., Tao D., Xu D., Generalized latent multi-view subspace clustering, IEEE Trans. Pattern Anal. Mach. Intell. 42 (1) (2020) 86–99.
[19]
Li X., Zhang H., Wang R., Nie F., Multiview clustering: A scalable and parameter-free bipartite graph fusion method, IEEE Trans. Pattern Anal. Mach. Intell. 44 (1) (2022) 330–344.
[20]
Xia W., Wang T., Gao Q., Yang M., Gao X., Graph embedding contrastive multi-modal representation learning for clustering, IEEE Trans. Image Process. 32 (2023) 1170–1183.
[21]
Kim T., Kittler J., Cipolla R., Discriminative learning and recognition of image set classes using canonical correlations, IEEE Trans. Pattern Anal. Mach. Intell. 29 (6) (2007) 1005–1018.
[22]
Xu C., Tao D., Xu C., Multi-view self-paced learning for clustering, in: IJCAI, 2015, pp. 3974–3980.
[23]
Gao Q., Xia W., Wan Z., Xie D., Zhang P., Tensor-SVD based graph learning for multi-view subspace clustering, in: AAAI, 2020, pp. 3930–3937.
[24]
Pan E., Kang Z., Multi-view contrastive graph clustering, in: NeurIPS, 2021.
[25]
Pan Y., Huang C., Wang D., Multiview spectral clustering via robust subspace segmentation, IEEE Trans. Cybern. 52 (4) (2022) 2467–2476.
[26]
Hu S., Shi Z., Ye Y., DMIB: dual-correlated multivariate information bottleneck for multiview clustering, IEEE Trans. Cybern. 52 (6) (2022) 4260–4274.
[27]
Qin Y., Feng G., Ren Y., Zhang X., Consistency-induced multiview subspace clustering, IEEE Trans. Cybern. 53 (2) (2023) 832–844.
[28]
Andrew G., Arora R., Bilmes J.A., Livescu K., Deep canonical correlation analysis, in: ICML, Vol. 28, 2013, pp. 1247–1255.
[29]
Wang W., Arora R., Livescu K., Bilmes J.A., On deep multi-view representation learning: Objectives and optimization, 2016, CoRR abs/1602.01024.
[30]
Zhang C., Liu Y., Fu H., AE2-nets: Autoencoder in autoencoder networks, in: IEEE CVPR, 2019, pp. 2577–2585.
[31]
Federici M., Dutta A., Forré P., Kushman N., Akata Z., Learning robust representations via multi-view information bottleneck, in: ICLR, 2020.
[32]
Mao Y., Yan X., Guo Q., Ye Y., Deep mutual information maximin for cross-modal clustering, in: AAAI, 2021, pp. 8893–8901.
[33]
Lin Y., Gou Y., Liu Z., Li B., Lv J., Peng X., COMPLETER: Incomplete multi-view clustering via contrastive prediction, in: IEEE CVPR, 2021, pp. 11174–11183.
[34]
Cheng J., Wang Q., Tao Z., Xie D., Gao Q., Multi-view attribute graph convolution networks for clustering, in: IJCAI, 2020, pp. 2973–2979.
[35]
Wu J., Zhu D., Fang L., Deng Y., Zhong Z., Efficient layer compression without pruning, IEEE Trans. Image Process. 32 (2023) 4689–4700.
[36]
Wu J., Lin Z., Zha H., Essential tensor learning for multi-view spectral clustering, IEEE Trans. Image Process. 28 (12) (2019) 5910–5922.
[37]
Deng C., Lv Z., Liu W., Huang J., Tao D., Gao X., Multi-view matrix decomposition: A new scheme for exploring discriminative information, in: IJCAI, 2015, pp. 3438–3444.
[38]
Chen T., Kornblith S., Norouzi M., Hinton G.E., A simple framework for contrastive learning of visual representations, in: ICML, 119, 2020, pp. 1597–1607.
[39]
Li Y., Hu P., Liu J.Z., Peng D., Zhou J.T., Peng X., Contrastive clustering, in: AAAI, 2021, pp. 8547–8555,.
[40]
Zhou Z., Hu Y., Zhang Y., Chen J., Cai H., Multiview deep graph infomax to achieve unsupervised graph embedding, IEEE Trans. Cybern. (2022),.
[41]
Liao Z., Zhang X., Su W., Zhan K., View-consistent heterogeneous network on graphs with few labeled nodes, IEEE Trans. Cybern. (2022).
[42]
Pan S., Hu R., Fung S., Long G., Jiang J., Zhang C., Learning graph embedding with adversarial training methods, IEEE Trans. Cybern. 50 (6) (2020) 2475–2487.
[43]
Wu Z., Xiong Y., Yu S.X., Lin D., Unsupervised feature learning via non-parametric instance discrimination, in: IEEE CVPR, 2018, pp. 3733–3742.
[44]
Peng X., Zhu H., Feng J., Shen C., Zhang H., Zhou J.T., Deep clustering with sample-assignment invariance prior, IEEE Trans. Neural Netw. Learn. Syst. 31 (11) (2020) 4857–4868.
[45]
Xie J., Girshick R.B., Farhadi A., Unsupervised deep embedding for clustering analysis, in: ICML, Vol. 48, 2016, pp. 478–487.
[46]
Lv J., Kang Z., Lu X., Xu Z., Pseudo-supervised deep subspace clustering, IEEE Trans. Image Process. 30 (2021) 5252–5263.
[47]
Lyu G., Feng S., Wang T., Lang C., A self-paced regularization framework for partial-label learning, IEEE Trans. Cybern. 52 (2) (2022) 899–911.
[48]
Chang J., Wang L., Meng G., Xiang S., Pan C., Deep adaptive image clustering, in: IEEE ICCV, 2017, pp. 5880–5888.
[49]
Cui G., Zhou J., Yang C., Liu Z., Adaptive graph encoder for attributed graph embedding, in: ACM SIGKDD, 2020, pp. 976–985.
[50]
Bengio Y., Louradour J., Collobert R., Weston J., Curriculum learning, in: ICML, 2009, pp. 41–48.
[51]
Kipf T.N., Welling M., Semi-supervised classification with graph convolutional networks, in: ICLR, 2017.
[52]
Grubinger M., Clough P., Müller H., Deselaers T., The iapr tc-12 benchmark: A new evaluation resource for visual information systems, in: International Workshop OntoImage, Vol. 2, 2006.
[53]
von Ahn L., Dabbish L., ESP: labeling images with a computer game, in: AAAI Knowledge Collection from Volunteer Contributors, 2005, pp. 91–98.
[54]
Huiskes M.J., Lew M.S., The MIR flickr retrieval evaluation, in: ACM SIGMM MIR, 2008, pp. 39–43.
[55]
Chua T., Tang J., Hong R., Li H., Luo Z., Zheng Y., NUS-WIDE: a real-world web image database from national university of Singapore, in: ACM CIVR, 2009.
[56]
Zhang Z., Wang Q., Tao Z., Gao Q., Feng W., Dropping pathways towards deep multi-view graph subspace clustering networks, in: ACM MM, 2023, pp. 3259–3267.
[57]
Lu H., Gao Q., Wang Q., Yang M., Xia W., Centerless multi-view K-means based on the adjacency matrix, in: AAAI, 2023, pp. 8949–8956.
[58]
van der Maaten L., Hinton G., Visualizing data using t-SNE, J. Mach. Learn. Res. 9 (86) (2008) 2579–2605.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 581, Issue C
May 2024
326 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 09 July 2024

Author Tags

  1. Multi-view representation learning
  2. Self-supervision
  3. Clustering

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media