More Web Proxy on the site http://driver.im/

Article

Graph based multi-modality learning

Authors:

Changshui Zhang,

Wei-Ying MaAuthors Info & Claims

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

Pages 862 - 871

https://doi.org/10.1145/1101149.1101337

Published: 06 November 2005 Publication History

Abstract

To better understand the content of multimedia, a lot of research efforts have been made on how to learn from multi-modal feature. In this paper, it is studied from a graph point of view: each kind of feature from one modality is represented as one independent graph; and the learning task is formulated as inferring from the constraints in every graph as well as supervision information (if available). For semi-supervised learning, two different fusion schemes, namely linear form and sequential form, are proposed. For each scheme, it is derived from optimization point of view; and further justified from two sides: similarity propagation and Bayesian interpretation. By doing so, we reveal the regular optimization nature, transductive learning nature as well as prior fusion nature of the proposed schemes, respectively. Moreover, the proposed method can be easily extended to unsupervised learning, including clustering and embedding. Systematic experimental results validate the effectiveness of the proposed method.

References

[1]

Belkin, M., and Niyogi, P. Laplacian Eigenmaps and spectral techniques for embedding and clustering. Neural Computation, pp. 1373--1396, 2003.]]

Digital Library

[2]

Bickel, S., and Scheffer, T. Multi-view clustering. Proc. of Int. Conf. on Data Mining, pp. 19--26, 2004.]]

Digital Library

[3]

Blum, A., and Mitchell, T. Combining labeled and unlabeled data with Co-Training. Proc. of the Conf. on Computational Learning Theory, pp. 92--100, 1998.]]

Digital Library

[4]

Cai, D., He, X., Li, Z., Ma, W.Y., and Wen, J.R. Hierarchical clustering of WWW image search results using visual, textual and link information. Proc. of the ACM Conf. on Information Retrieval, pp. 952--959, 2004.]]

Digital Library

[5]

Cascia, M.L., Sethi, S., and Sclaroff, S. Combining textural and visual cues for content-based image retrieval on the world wide web. IEEE Workshop on Content-based Access of Image and Video Libaries, pp. 24--28, 1998.]]

Digital Library

[6]

Dupont, S., and Luettin, J. Audio-visual speech modeling for continuous speech recognition. IEEE Trans. on Multimedia, 2(3): 141--151, 2000.]]

Digital Library

[7]

Feng, H., Shi, R., and Chua, T.S. A bootstrapping framework for annotating and retrieving WWW images. Proc. of the ACM Int. Conf. on Multimedia, pp. 960--967, 2004.]]

Digital Library

[8]

Garg, A., Potamianos, G., Neti, C., and Huang, T.S. Frame-dependent multi-stream reliability indications for audio-visual speech recognition, Proc. of Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 24--27, 2003.]]

[9]

Ghani, R. Combining labeled and unlabeled data multi-class text categorization. Proc. of the Intl. Conf. on Machine Learning, pp. 187--194, 2002.]]

Digital Library

[10]

He, J., Li, M., Zhang, H.J., Tong, H., and Zhang, C. Manifold ranking based image retrieval. Proc. of the ACM Conf. on Information Retrieval, pp. 9--16, 2004.]]

Digital Library

[11]

Heckmann, M., Berthommier, F., and Kroschel, K. Noise adaptive stream weighting in audio-visual speech recognition, EURASIP Journal on Applied Signal Process, pp. 1260--1273, 2002.]]

Digital Library

[12]

Huang, J., Kumar, S.R., Mitra, M., Zhu, W.J., and Zabih, R. Image indexing using color correlograms. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 762--768, 1997.]]

Digital Library

[13]

Kailing, K., Kriegel, H., Pryakhin, A., and Schubert, M. Clustering multi-represented objects with noise. Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 394--403, 2004.]]

[14]

Kittler, J., Hatef, M., and Duin, R.P.W. Combining classifiers. Pattern Recognition, pp. 897--901, 1996.]]

Digital Library

[15]

Mallat, S.G., A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674--693, 1989.]]

Digital Library

[16]

Ng, A.Y., Jordan, M.I., and Weiss, Y. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 2001.]]

[17]

Nigam, K., and Ghani, R. Analyzing the effectiveness and applicability of Co-Training. Proc. of Information and Knowledge Management, pp. 86--93, 2000]]

Digital Library

[18]

Swain, M., and Ballard, D. Color indexing. Int. Journal of Computer Vision, 7(1): 11--32, 1991.]]

Digital Library

[19]

Suen, C.Y., and Lam, L. Multiple classifier combination methodologies for different output level. Proc. of the First Int. Workshop on Multiple Classifier, pp. 52--66, 2000.]]

Digital Library

[20]

Reference removed for double-blind review]]

[21]

Tamura, H., Mori, S., and Yamawaki, T. Textural features corresponding to visual perception. IEEE Trans. on Systems., Man and Cybernetics, pp. 460--472, 1978.]]

[22]

The WebKB dataset. http://meganesia.int.gu.edu.au/~phmartin/WebKB/.]]

[23]

Wang, J., Zeng, H., Chen, Z., Lu, H., Tao, L., and Ma. W.Y. Recom: reinforcement clustering of multi-type interrelated data objects. Proc. of the ACM Conf. on Information Retrieval, pp. 274--281, 2003.]]

Digital Library

[24]

Wu, Y., Chang, E.Y., Chang, K.C.C., and Smith, J.R. Optimal multimodal fusion for multimedia data analysis. Proc. of the ACM Int. Conf. on Multimedia, pp. 572--579, 2004.]]

Digital Library

[25]

Yan, R., and Hauptmann, A.G. The combination limit in multimedia retrieval. Proc. of the ACM Int. Conf. on Multimedia, pp. 339--342, 2003.]]

Digital Library

[26]

Yi, X. Zhang, C, and Wang, J. Multi-view EM algorithm and its application to color image segmentation. IEEE Int. Conf. on Multimedia and Expo, pp. 351--354, 2004.]]

[27]

Zheng, X., Cai, D., He, X., Ma, W.Y., and Lin, X. Locality preserving clustering for image database. Proc. of the ACM Conf. on Information Retrieval, pp. 885--891, 2004.]]

Digital Library

[28]

Zhou, D., and Schölkopf, B. A regularization framework for learning from graph data. Workshop on Statistical Relational Learning at Int. Conf. on Machine Learning, pp. 132--137, 2004.]]

[29]

Zhou, D., and Schölkopf, B. Transductive Inference with Graphs. MPI Technical Report, 2004.]]

[30]

Zhou, D., Bousquet, O., Lal, T.N., Weston, J., and Schölkopf, B. Learning with local and global consistency. 18th Annual Conf. on Neural Information Processing Systems, pp. 237--244, 2003.]]

[31]

Zhou, D., Bousquet, O., Lal, T.N., Weston, J., and Schölkopf, B. Ranking on data manifolds. 18th Annual Conf. on Neural Information Processing System, pp. 169--176, 2003.]]

Cited By

Zhao MWang JLiao DWang YDuan HZhou SChen HDuh WHuang HKato MMothe JPoblete B(2023)Keyword-Based Diverse Image Retrieval by Semantics-aware Contrastive Learning and TransformerProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591705(1262-1272)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591705
Zeng YWang YLiao DLi GHuang WXu JCao DMan H(2023)Keyword-Based Diverse Image Retrieval With Variational Multiple Instance GraphIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.316843134:12(10528-10537)Online publication date: Dec-2023
https://doi.org/10.1109/TNNLS.2022.3168431
Xu GZhang YYin MHong WZou RWang S(2022)Bagging-based cross-media retrieval algorithmSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-07587-727:5(2615-2623)Online publication date: 14-Nov-2022
https://dl.acm.org/doi/10.1007/s00500-022-07587-7
Show More Cited By

Index Terms

Graph based multi-modality learning
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Inter-Modality Similarity Learning for Unsupervised Multi-Modality Person Re-Identification
RGB (visible), near-infrared (NI), and thermal infrared (TI) imaging modalities are commonly combined for round-the-clock surveillance. We introduce a novel unsupervised multi-modality person re-identification (MM-ReID) task, which, based on an individual&...
Learning unseen modality interaction
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. In this paper, we challenge this modality-complete assumption for multimodal learning and instead strive for ...
Guiding Graph Learning with Denoised Modality for Multi-modal Recommendation
Database Systems for Advanced Applications
Abstract
Multi-modal recommendation improves the recommendation accuracy by leveraging various modalities (e.g., visual, textual, and acoustic) of rich item content. However, most existing studies overlook that modality features can be noisy for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

November 2005

1110 pages

ISBN:1595930442

DOI:10.1145/1101149

General Chairs:
Hongjiang Zhang
Microsoft Research Asia, China
,
Tat-Seng Chua
National University of Singapore, Singapore
,
Program Chairs:
Ralf Steinmetz
Technische Universitat Darmstadt, Germany
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Lynn Wilcox
FXPAL

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM05

Sponsor:

MM05: 2005 13th Annual ACM International Conference on Multimedia

November 6 - 11, 2005

Hilton, Singapore

Acceptance Rates

MULTIMEDIA '05 Paper Acceptance Rate 49 of 312 submissions, 16%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
1,165
Total Downloads

Downloads (Last 12 months)63
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao MWang JLiao DWang YDuan HZhou SChen HDuh WHuang HKato MMothe JPoblete B(2023)Keyword-Based Diverse Image Retrieval by Semantics-aware Contrastive Learning and TransformerProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591705(1262-1272)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591705
Zeng YWang YLiao DLi GHuang WXu JCao DMan H(2023)Keyword-Based Diverse Image Retrieval With Variational Multiple Instance GraphIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.316843134:12(10528-10537)Online publication date: Dec-2023
https://doi.org/10.1109/TNNLS.2022.3168431
Xu GZhang YYin MHong WZou RWang S(2022)Bagging-based cross-media retrieval algorithmSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-07587-727:5(2615-2623)Online publication date: 14-Nov-2022
https://dl.acm.org/doi/10.1007/s00500-022-07587-7
Li LHe H(2020)Bipartite Graph based Multi-view ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.3021649(1-1)Online publication date: 2020
https://doi.org/10.1109/TKDE.2020.3021649
Zhang CFu HWang JLi WCao XHu Q(2020)Tensorized Multi-view Subspace Representation LearningInternational Journal of Computer Vision10.1007/s11263-020-01307-0128:8-9(2344-2361)Online publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1007/s11263-020-01307-0
Zhang YHu CHu YKasturi TRamasamy SGillingham MYamamoto K(2019)Large-Scale Gender/Age Prediction of Tumblr Users2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)10.1109/ICMLA.2019.00128(712-717)Online publication date: Dec-2019
https://doi.org/10.1109/ICMLA.2019.00128
Dong GZhang XLan LWang SLuo Z(2019)Label guided correlation hashing for large-scale cross-modal retrievalMultimedia Tools and Applications10.1007/s11042-019-7192-578:21(30895-30922)Online publication date: 6-Feb-2019
https://doi.org/10.1007/s11042-019-7192-5
Xu QLi MYu M(2019)Learning to rank with relational graph and pointwise constraint for cross-modal retrievalSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-018-3608-923:19(9413-9427)Online publication date: 1-Oct-2019
https://dl.acm.org/doi/10.1007/s00500-018-3608-9
Kalamaras IZamichos ASalamanis ADrosou AKehagias DMargaritis GPapadopoulos STzovaras D(2018)An Interactive Visual Analytics Platform for Smart Intelligent Transportation Systems ManagementIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2017.272714319:2(487-496)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1109/TITS.2017.2727143
Peng YHuang XZhao Y(2018)An Overview of Cross-Media RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2017.270506828:9(2372-2385)Online publication date: 1-Sep-2018
https://dl.acm.org/doi/10.1109/TCSVT.2017.2705068
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten