Abstract
Zero-shot learning (ZSL) requires one to associate visual and semantic information observed from data of seen classes, so that test data of unseen classes can be recognized based on the described semantic representation. Aiming at synthesizing visual data from the given semantic inputs, hallucination-based ZSL approaches might suffer from mode collapse and biased problems due to the lack of ability in modeling the desirable visual features for unseen categories. In this paper, we present a generative model of Cross-Modal Consistency GAN (CMC-GAN), which performs semantics-guided intra-category knowledge transfer across image categories, so that data hallucination for unseen classes can be achieved with proper semantics and sufficient visual diversity. In our experiments, we perform standard and generalized ZSL on four benchmark datasets, confirming the effectiveness of our approach over that of state-of-the-art ZSL methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2015a). Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(7), 1425–1438.
Akata, Z., Reed, S., Walter, D., Lee, H., & Schiele, B. (2015b). Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2927–2936).
AlBahar, B., & Huang, J.B. (2019). Guided image-to-image translation with bi-directional feature transformation. In Proceedings of the IEEE international conference on computer vision (pp. 9016–9025).
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875.
Changpinyo, S., Chao, W.L., Gong, B., & Sha, F. (2016). Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5327–5336).
Changpinyo, S., Chao, W.L., & Sha, F. (2017). Predicting visual exemplars of unseen classes for zero-shot learning. In Proceedings of the IEEE international conference on computer vision (pp. 3476–3485).
Chao, W.L., Changpinyo, S., Gong, B., & Sha, F. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In European conference on computer vision (pp. 52–68). Springer.
Chen, L.C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.
Chen, M., Fang, Y., Wang, X., Luo, H., Geng, Y., Zhang, X., Huang, C., Liu, W., & Wang, B. (2020). Diversity transfer network for few-shot learning. In AAAI (pp. 10559–10566).
Dinu, G., Lazaridou, A., & Baroni, M. (2014). Improving zero-shot learning by mitigating the hubness problem. arXiv preprint arXiv:1412.6568.
Felix, R., Kumar, V.B., Reid, I., & Carneiro, G. (2018). Multi-modal cycle-consistent generalized zero-shot learning. In Proceedings of the European conference on computer vision (ECCV) (pp. 21–37).
Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., & Mikolov, T. (2013). Devise: a deep visual-semantic embedding model. In Advances in neural information processing systems (pp. 2121–2129).
Gao, R., Hou, X., Qin, J., Liu, L., Zhu, F., & Zhang, Z. (2018). A joint generative model for zero-shot learning. In Proceedings of the European conference on computer vision (ECCV) workshops.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).
Hariharan, B., & Girshick, R. (2017). Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the IEEE international conference on computer vision (pp. 3018–3027).
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision (pp. 1501–1510).
Jiang, H., Wang, R., Shan, S., & Chen, X. (2019). Transferable contrastive network for generalized zero-shot learning. In Proceedings of the IEEE international conference on computer vision (pp. 9765–9774).
Keshari, R., Singh, R., & Vatsa, M. (2020). Generalized zero-shot learning via over-complete distribution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13300–13308).
Kingma, D.P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3174–3183).
Kumar Verma, V., Arora, G., Mishra, A., & Rai, P. (2018). Generalized zero-shot learning via synthesized examples. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4281–4289).
Lampert, C.H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE conference on computer vision and pattern recognition (pp. 951–958). IEEE.
Lampert, C. H., Nickisch, H., & Harmeling, S. (2013). Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 453–465.
Lee, H. Y., Tseng, H. Y., Mao, Q., Huang, J. B., Lu, Y. D., Singh, M., & Yang, M. H. (2020). Drit++: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision, 128(10), 2402–2417.
Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., & Huang, Z. (2019). Leveraging the invariant side of generative zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7402–7411).
Li, J., Jing, M., Lu, K., Zhu, L., Yang, Y., & Huang, Z. (2019). Alleviating feature confusion for generative zero-shot learning. In Proceedings of the 27th ACM international conference on multimedia (pp. 1587–1595).
Liu, M.Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., & Kautz, J. (2019). Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10551–10560).
Liu, Y., Guo, J., Cai, D., & He, X. (2019). Attribute attention for semantic disambiguation in zero-shot learning. In Proceedings of the IEEE international conference on computer vision (pp. 6698–6707).
Mao, Q., Lee, H.Y., Tseng, H.Y., Ma, S., & Yang, M.H. (2019). Mode seeking generative adversarial networks for diverse image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1429–1437).
Mishra, A., Krishna Reddy, S., Mittal, A., & Murthy, H.A. (2018). A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 2188–2196).
Ni, J., Zhang, S., & Xie, H. (2019). Dual adversarial semantics-consistent network for generalized zero-shot learning. In Advances in neural information processing systems (pp. 6143–6154).
Nilsback, M.E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing (pp. 722–729). IEEE.
Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In 2012 IEEE conference on computer vision and pattern recognition (pp. 2751–2758). IEEE.
Pourpanah, F., Abdar, M., Luo, Y., Zhou, X., Wang, R., Lim, C.P., & Wang, X.Z. (2020). A review of generalized zero-shot learning methods. arXiv preprint arXiv:2011.08641.
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748–8763). PMLR.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).
Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning deep representations of fine-grained visual descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 49–58).
Romera-Paredes, B., & Torr, P. (2015). An embarrassingly simple approach to zero-shot learning. In International conference on machine learning (pp. 2152–2161).
Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., & Akata, Z. (2019). Generalized zero-and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8247–8255).
Schwartz, E., Karlinsky, L., Shtok, J., Harary, S., Marder, M., Kumar, A., Feris, R., Giryes, R., & Bronstein, A. (2018). Delta-encoder: an effective sample synthesis method for few-shot object recognition. In Advances in neural information processing systems (pp. 2845–2855).
Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., & Matsumoto, Y. (2015). Ridge regression, hubness, and zero-shot learning. In Joint European conference on machine learning and knowledge discovery in databases (pp. 135–151). Springer.
Socher, R., Ganjoo, M., Manning, C.D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems (pp. 935–943).
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., & Hospedales, T.M. (2018). Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1199–1208).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).
Vyas, M.R., Venkateswara, H., & Panchanathan, S. (2020). Leveraging seen and unseen semantic relationships for generative zero-shot learning. In European conference on computer vision (pp. 70–86). Springer.
Wang, X., Yu, F., Wang, R., Darrell, T., & Gonzalez, J.E. (2019). Tafe-net: Task-aware feature embeddings for low shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1831–1840).
Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., & Perona, P. (2010). Caltech-ucsd birds 200.
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 69–77).
Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2018). Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2251–2265.
Xian, Y., Lorenz, T., Schiele, B., & Akata, Z. (2018). Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5542–5551).
Xian, Y., Sharma, S., Schiele, B., & Akata, Z. (2019). f-vaegan-d2: A feature generating framework for any-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 10275–10284).
Yu, H., & Lee, B. (2019). Zero-shot learning via simultaneous generating and learning. In Advances in neural information processing systems (pp. 46–56).
Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2021–2030).
Zhu, Y., Elhoseiny, M., Liu, B., Peng, X., & Elgammal, A. (2018). A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1004–1013).
Zhu, Y., Xie, J., Liu, B., & Elgammal, A. (2019). Learning feature-to-feature translator by alternating back-propagation for generative zero-shot learning. In Proceedings of the IEEE international conference on computer vision (pp. 9844–9854).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Vittorio Ferrari.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, FE., Lee, YH., Lin, CC. et al. Semantics-Guided Intra-Category Knowledge Transfer for Generalized Zero-Shot Learning. Int J Comput Vis 131, 1331–1345 (2023). https://doi.org/10.1007/s11263-023-01767-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01767-0