Semantics-Guided Intra-Category Knowledge Transfer for Generalized Zero-Shot Learning

1111 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Zero-shot learning (ZSL) requires one to associate visual and semantic information observed from data of seen classes, so that test data of unseen classes can be recognized based on the described semantic representation. Aiming at synthesizing visual data from the given semantic inputs, hallucination-based ZSL approaches might suffer from mode collapse and biased problems due to the lack of ability in modeling the desirable visual features for unseen categories. In this paper, we present a generative model of Cross-Modal Consistency GAN (CMC-GAN), which performs semantics-guided intra-category knowledge transfer across image categories, so that data hallucination for unseen classes can be achieved with proper semantics and sufficient visual diversity. In our experiments, we perform standard and generalized ZSL on four benchmark datasets, confirming the effectiveness of our approach over that of state-of-the-art ZSL methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

Application of CLIP for efficient zero-shot learning

Article 26 July 2024

Zero-shot recognition with latent visual attributes learning

Article 24 July 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2015a). Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(7), 1425–1438.
Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015b). Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2927–2936).
AlBahar, B., & Huang, J.B. (2019). Guided image-to-image translation with bi-directional feature transformation. In Proceedings of the IEEE international conference on computer vision (pp. 9016–9025).
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875.
Changpinyo, S., Chao, W.L., Gong, B., & Sha, F. (2016). Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5327–5336).
Changpinyo, S., Chao, W.L., & Sha, F. (2017). Predicting visual exemplars of unseen classes for zero-shot learning. In Proceedings of the IEEE international conference on computer vision (pp. 3476–3485).
Chao, W.L., Changpinyo, S., Gong, B., & Sha, F. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In European conference on computer vision (pp. 52–68). Springer.
Chen, L.C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.
Chen, M., Fang, Y., Wang, X., Luo, H., Geng, Y., Zhang, X., Huang, C., Liu, W., & Wang, B. (2020). Diversity transfer network for few-shot learning. In AAAI (pp. 10559–10566).
Dinu, G., Lazaridou, A., & Baroni, M. (2014). Improving zero-shot learning by mitigating the hubness problem. arXiv preprint arXiv:1412.6568.
Felix, R., Kumar, V.B., Reid, I., & Carneiro, G. (2018). Multi-modal cycle-consistent generalized zero-shot learning. In Proceedings of the European conference on computer vision (ECCV) (pp. 21–37).
Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., & Mikolov, T. (2013). Devise: a deep visual-semantic embedding model. In Advances in neural information processing systems (pp. 2121–2129).
Gao, R., Hou, X., Qin, J., Liu, L., Zhu, F., & Zhang, Z. (2018). A joint generative model for zero-shot learning. In Proceedings of the European conference on computer vision (ECCV) workshops.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).
Hariharan, B., & Girshick, R. (2017). Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the IEEE international conference on computer vision (pp. 3018–3027).
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision (pp. 1501–1510).
Jiang, H., Wang, R., Shan, S., & Chen, X. (2019). Transferable contrastive network for generalized zero-shot learning. In Proceedings of the IEEE international conference on computer vision (pp. 9765–9774).
Keshari, R., Singh, R., & Vatsa, M. (2020). Generalized zero-shot learning via over-complete distribution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13300–13308).
Kingma, D.P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3174–3183).
Kumar Verma, V., Arora, G., Mishra, A., & Rai, P. (2018). Generalized zero-shot learning via synthesized examples. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4281–4289).
Lampert, C.H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE conference on computer vision and pattern recognition (pp. 951–958). IEEE.
Lampert, C. H., Nickisch, H., & Harmeling, S. (2013). Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 453–465.
Article Google Scholar
Lee, H. Y., Tseng, H. Y., Mao, Q., Huang, J. B., Lu, Y. D., Singh, M., & Yang, M. H. (2020). Drit++: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision, 128(10), 2402–2417.
Article Google Scholar
Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., & Huang, Z. (2019). Leveraging the invariant side of generative zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7402–7411).
Li, J., Jing, M., Lu, K., Zhu, L., Yang, Y., & Huang, Z. (2019). Alleviating feature confusion for generative zero-shot learning. In Proceedings of the 27th ACM international conference on multimedia (pp. 1587–1595).
Liu, M.Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., & Kautz, J. (2019). Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10551–10560).
Liu, Y., Guo, J., Cai, D., & He, X. (2019). Attribute attention for semantic disambiguation in zero-shot learning. In Proceedings of the IEEE international conference on computer vision (pp. 6698–6707).
Mao, Q., Lee, H.Y., Tseng, H.Y., Ma, S., & Yang, M.H. (2019). Mode seeking generative adversarial networks for diverse image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1429–1437).
Mishra, A., Krishna Reddy, S., Mittal, A., & Murthy, H.A. (2018). A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 2188–2196).
Ni, J., Zhang, S., & Xie, H. (2019). Dual adversarial semantics-consistent network for generalized zero-shot learning. In Advances in neural information processing systems (pp. 6143–6154).
Nilsback, M.E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing (pp. 722–729). IEEE.
Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In 2012 IEEE conference on computer vision and pattern recognition (pp. 2751–2758). IEEE.
Pourpanah, F., Abdar, M., Luo, Y., Zhou, X., Wang, R., Lim, C.P., & Wang, X.Z. (2020). A review of generalized zero-shot learning methods. arXiv preprint arXiv:2011.08641.
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748–8763). PMLR.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).
Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning deep representations of fine-grained visual descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 49–58).
Romera-Paredes, B., & Torr, P. (2015). An embarrassingly simple approach to zero-shot learning. In International conference on machine learning (pp. 2152–2161).
Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., & Akata, Z. (2019). Generalized zero-and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8247–8255).
Schwartz, E., Karlinsky, L., Shtok, J., Harary, S., Marder, M., Kumar, A., Feris, R., Giryes, R., & Bronstein, A. (2018). Delta-encoder: an effective sample synthesis method for few-shot object recognition. In Advances in neural information processing systems (pp. 2845–2855).
Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., & Matsumoto, Y. (2015). Ridge regression, hubness, and zero-shot learning. In Joint European conference on machine learning and knowledge discovery in databases (pp. 135–151). Springer.
Socher, R., Ganjoo, M., Manning, C.D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems (pp. 935–943).
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., & Hospedales, T.M. (2018). Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1199–1208).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).
Vyas, M.R., Venkateswara, H., & Panchanathan, S. (2020). Leveraging seen and unseen semantic relationships for generative zero-shot learning. In European conference on computer vision (pp. 70–86). Springer.
Wang, X., Yu, F., Wang, R., Darrell, T., & Gonzalez, J.E. (2019). Tafe-net: Task-aware feature embeddings for low shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1831–1840).
Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., & Perona, P. (2010). Caltech-ucsd birds 200.
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 69–77).
Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2018). Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2251–2265.
Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018). Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5542–5551).
Xian, Y., Sharma, S., Schiele, B., & Akata, Z. (2019). f-vaegan-d2: A feature generating framework for any-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 10275–10284).
Yu, H., & Lee, B. (2019). Zero-shot learning via simultaneous generating and learning. In Advances in neural information processing systems (pp. 46–56).
Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2021–2030).
Zhu, Y., Elhoseiny, M., Liu, B., Peng, X., & Elgammal, A. (2018). A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1004–1013).
Zhu, Y., Xie, J., Liu, B., & Elgammal, A. (2019). Learning feature-to-feature translator by alternating back-propagation for generative zero-shot learning. In Proceedings of the IEEE international conference on computer vision (pp. 9844–9854).

Download references

Author information

Authors and Affiliations

Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan
Fu-En Yang, Yuan-Hao Lee & Yu-Chiang Frank Wang
Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan
Chia-Ching Lin & Yu-Chiang Frank Wang
NVIDIA, Taipei, Taiwan
Yu-Chiang Frank Wang

Authors

Fu-En Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan-Hao Lee
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Ching Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Chiang Frank Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu-Chiang Frank Wang.

Additional information

Communicated by Vittorio Ferrari.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, FE., Lee, YH., Lin, CC. et al. Semantics-Guided Intra-Category Knowledge Transfer for Generalized Zero-Shot Learning. Int J Comput Vis 131, 1331–1345 (2023). https://doi.org/10.1007/s11263-023-01767-0

Download citation

Received: 11 June 2021
Accepted: 01 February 2023
Published: 15 February 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11263-023-01767-0

Semantics-Guided Intra-Category Knowledge Transfer for Generalized Zero-Shot Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

Application of CLIP for efficient zero-shot learning

Zero-shot recognition with latent visual attributes learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Semantics-Guided Intra-Category Knowledge Transfer for Generalized Zero-Shot Learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

Application of CLIP for efficient zero-shot learning

Zero-shot recognition with latent visual attributes learning

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation