[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1101149.1101337acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Graph based multi-modality learning

Published: 06 November 2005 Publication History

Abstract

To better understand the content of multimedia, a lot of research efforts have been made on how to learn from multi-modal feature. In this paper, it is studied from a graph point of view: each kind of feature from one modality is represented as one independent graph; and the learning task is formulated as inferring from the constraints in every graph as well as supervision information (if available). For semi-supervised learning, two different fusion schemes, namely linear form and sequential form, are proposed. For each scheme, it is derived from optimization point of view; and further justified from two sides: similarity propagation and Bayesian interpretation. By doing so, we reveal the regular optimization nature, transductive learning nature as well as prior fusion nature of the proposed schemes, respectively. Moreover, the proposed method can be easily extended to unsupervised learning, including clustering and embedding. Systematic experimental results validate the effectiveness of the proposed method.

References

[1]
Belkin, M., and Niyogi, P. Laplacian Eigenmaps and spectral techniques for embedding and clustering. Neural Computation, pp. 1373--1396, 2003.]]
[2]
Bickel, S., and Scheffer, T. Multi-view clustering. Proc. of Int. Conf. on Data Mining, pp. 19--26, 2004.]]
[3]
Blum, A., and Mitchell, T. Combining labeled and unlabeled data with Co-Training. Proc. of the Conf. on Computational Learning Theory, pp. 92--100, 1998.]]
[4]
Cai, D., He, X., Li, Z., Ma, W.Y., and Wen, J.R. Hierarchical clustering of WWW image search results using visual, textual and link information. Proc. of the ACM Conf. on Information Retrieval, pp. 952--959, 2004.]]
[5]
Cascia, M.L., Sethi, S., and Sclaroff, S. Combining textural and visual cues for content-based image retrieval on the world wide web. IEEE Workshop on Content-based Access of Image and Video Libaries, pp. 24--28, 1998.]]
[6]
Dupont, S., and Luettin, J. Audio-visual speech modeling for continuous speech recognition. IEEE Trans. on Multimedia, 2(3): 141--151, 2000.]]
[7]
Feng, H., Shi, R., and Chua, T.S. A bootstrapping framework for annotating and retrieving WWW images. Proc. of the ACM Int. Conf. on Multimedia, pp. 960--967, 2004.]]
[8]
Garg, A., Potamianos, G., Neti, C., and Huang, T.S. Frame-dependent multi-stream reliability indications for audio-visual speech recognition, Proc. of Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 24--27, 2003.]]
[9]
Ghani, R. Combining labeled and unlabeled data multi-class text categorization. Proc. of the Intl. Conf. on Machine Learning, pp. 187--194, 2002.]]
[10]
He, J., Li, M., Zhang, H.J., Tong, H., and Zhang, C. Manifold ranking based image retrieval. Proc. of the ACM Conf. on Information Retrieval, pp. 9--16, 2004.]]
[11]
Heckmann, M., Berthommier, F., and Kroschel, K. Noise adaptive stream weighting in audio-visual speech recognition, EURASIP Journal on Applied Signal Process, pp. 1260--1273, 2002.]]
[12]
Huang, J., Kumar, S.R., Mitra, M., Zhu, W.J., and Zabih, R. Image indexing using color correlograms. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 762--768, 1997.]]
[13]
Kailing, K., Kriegel, H., Pryakhin, A., and Schubert, M. Clustering multi-represented objects with noise. Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 394--403, 2004.]]
[14]
Kittler, J., Hatef, M., and Duin, R.P.W. Combining classifiers. Pattern Recognition, pp. 897--901, 1996.]]
[15]
Mallat, S.G., A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674--693, 1989.]]
[16]
Ng, A.Y., Jordan, M.I., and Weiss, Y. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 2001.]]
[17]
Nigam, K., and Ghani, R. Analyzing the effectiveness and applicability of Co-Training. Proc. of Information and Knowledge Management, pp. 86--93, 2000]]
[18]
Swain, M., and Ballard, D. Color indexing. Int. Journal of Computer Vision, 7(1): 11--32, 1991.]]
[19]
Suen, C.Y., and Lam, L. Multiple classifier combination methodologies for different output level. Proc. of the First Int. Workshop on Multiple Classifier, pp. 52--66, 2000.]]
[20]
Reference removed for double-blind review]]
[21]
Tamura, H., Mori, S., and Yamawaki, T. Textural features corresponding to visual perception. IEEE Trans. on Systems., Man and Cybernetics, pp. 460--472, 1978.]]
[22]
The WebKB dataset. http://meganesia.int.gu.edu.au/~phmartin/WebKB/.]]
[23]
Wang, J., Zeng, H., Chen, Z., Lu, H., Tao, L., and Ma. W.Y. Recom: reinforcement clustering of multi-type interrelated data objects. Proc. of the ACM Conf. on Information Retrieval, pp. 274--281, 2003.]]
[24]
Wu, Y., Chang, E.Y., Chang, K.C.C., and Smith, J.R. Optimal multimodal fusion for multimedia data analysis. Proc. of the ACM Int. Conf. on Multimedia, pp. 572--579, 2004.]]
[25]
Yan, R., and Hauptmann, A.G. The combination limit in multimedia retrieval. Proc. of the ACM Int. Conf. on Multimedia, pp. 339--342, 2003.]]
[26]
Yi, X. Zhang, C, and Wang, J. Multi-view EM algorithm and its application to color image segmentation. IEEE Int. Conf. on Multimedia and Expo, pp. 351--354, 2004.]]
[27]
Zheng, X., Cai, D., He, X., Ma, W.Y., and Lin, X. Locality preserving clustering for image database. Proc. of the ACM Conf. on Information Retrieval, pp. 885--891, 2004.]]
[28]
Zhou, D., and Schölkopf, B. A regularization framework for learning from graph data. Workshop on Statistical Relational Learning at Int. Conf. on Machine Learning, pp. 132--137, 2004.]]
[29]
Zhou, D., and Schölkopf, B. Transductive Inference with Graphs. MPI Technical Report, 2004.]]
[30]
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., and Schölkopf, B. Learning with local and global consistency. 18th Annual Conf. on Neural Information Processing Systems, pp. 237--244, 2003.]]
[31]
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., and Schölkopf, B. Ranking on data manifolds. 18th Annual Conf. on Neural Information Processing System, pp. 169--176, 2003.]]

Cited By

View all
  • (2023)Keyword-Based Diverse Image Retrieval by Semantics-aware Contrastive Learning and TransformerProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591705(1262-1272)Online publication date: 19-Jul-2023
  • (2023)Keyword-Based Diverse Image Retrieval With Variational Multiple Instance GraphIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.316843134:12(10528-10537)Online publication date: Dec-2023
  • (2022)Bagging-based cross-media retrieval algorithmSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-07587-727:5(2615-2623)Online publication date: 14-Nov-2022
  • Show More Cited By

Index Terms

  1. Graph based multi-modality learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia
    November 2005
    1110 pages
    ISBN:1595930442
    DOI:10.1145/1101149
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bayesian interpretation
    2. graph model
    3. multi-modality analysis
    4. regularized optimization
    5. similarity propagation

    Qualifiers

    • Article

    Conference

    MM05

    Acceptance Rates

    MULTIMEDIA '05 Paper Acceptance Rate 49 of 312 submissions, 16%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)64
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 19 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Keyword-Based Diverse Image Retrieval by Semantics-aware Contrastive Learning and TransformerProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591705(1262-1272)Online publication date: 19-Jul-2023
    • (2023)Keyword-Based Diverse Image Retrieval With Variational Multiple Instance GraphIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.316843134:12(10528-10537)Online publication date: Dec-2023
    • (2022)Bagging-based cross-media retrieval algorithmSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-07587-727:5(2615-2623)Online publication date: 14-Nov-2022
    • (2020)Bipartite Graph based Multi-view ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.3021649(1-1)Online publication date: 2020
    • (2020)Tensorized Multi-view Subspace Representation LearningInternational Journal of Computer Vision10.1007/s11263-020-01307-0128:8-9(2344-2361)Online publication date: 1-Sep-2020
    • (2019)Large-Scale Gender/Age Prediction of Tumblr Users2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)10.1109/ICMLA.2019.00128(712-717)Online publication date: Dec-2019
    • (2019)Label guided correlation hashing for large-scale cross-modal retrievalMultimedia Tools and Applications10.1007/s11042-019-7192-578:21(30895-30922)Online publication date: 6-Feb-2019
    • (2019)Learning to rank with relational graph and pointwise constraint for cross-modal retrievalSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-018-3608-923:19(9413-9427)Online publication date: 1-Oct-2019
    • (2018)An Interactive Visual Analytics Platform for Smart Intelligent Transportation Systems ManagementIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2017.272714319:2(487-496)Online publication date: 1-Feb-2018
    • (2018)An Overview of Cross-Media RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2017.270506828:9(2372-2385)Online publication date: 1-Sep-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media