More Web Proxy on the site http://driver.im/

research-article

Inducing semantic hierarchy structure in empirical risk minimization with optimal transport measures

Authors:

Xiaofeng LiuAuthors Info & Claims

Volume 530, Issue C

Pages 1 - 10

https://doi.org/10.1016/j.neucom.2023.01.093

Published: 14 April 2023 Publication History

Abstract

The cross-entropy (CE) loss is arguably the most important empirical risk minimization objective for deep discriminative models for classification, and has achieved notable success in numerous applications. Though the CE loss is widely adopted, it essentially ignores the correlation between categories. For example, predicting a shepherd dog to husky is more acceptable than a tiger for the subsequent decision processes, while these two misclassifications result in the same CE loss. Therefore, the usually used CE loss does not incorporate the risk of misclassification of different categories, which can be measured by the distance between the predicted category and ground-truth category in a semantic hierarchical tree (SHT). In this work, to explicitly take the SHT-defined risk-aware inter-categorical correlation into consideration, by proposing a discrete optimal transport (DOT) training framework via configuring its ground distance matrix. We are able to predefine ground distance matrix in optimal transport measurement following a priori of hierarchical semantic risk. Specifically, the tree-induced error (TIE) on SHT is adopted as our ground distance matrix. Furthermore, it can be extended to its increasing function from the optimization perspective. In addition, we can also adaptively learn the matrix following an alternative optimization scheme. The semantic similarity in each level of a tree is integrated with the information gain. We demonstrated the effectiveness of our framework in several benchmarks of large-scale image classification with the semantic tree structure, and showed superior performance in a plug-and-play manner. The code is available in: https://anonymous.4open.science/r/OTM-Neurocomputing/

References

[1]

X. Liu, C. Yang, J. You, C.-C.J. Kuo, B.V. Kumar, Mutual information regularized feature-level frankenstein for discriminative recognition, IEEE Trans. Pattern Anal. Mach. Intell. 44 (9) (2021) 5243–5260.

[2]

Y. Han, X. Liu, Z. Sheng, Y. Ren, X. Han, J. You, R. Liu, Z. Luo, Wasserstein loss-based deep object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 998–999.

[3]

X. Liu, Y. Zou, T. Che, J.K.B.V.K. You, Conservative wasserstein training for pose estimation, ICCV, in, 2019.

[4]

M. Ceci, D. Malerba, Classifying web documents in a hierarchy of categories: a comprehensive study, J. Intell. Inform. Syst. 28 (1) (2007) 37–78.

[5]

Y. Wang, Z. Wang, Q. Hu, Y. Zhou, H. Su, Hierarchical semantic risk minimization for large-scale classification, IEEE Trans. Cybern. 52 (9) (2022) 9546–9558.

[6]

T. Zhao, B. Zhang, M. He, W. Zhang, N. Zhou, J. Yu, J. Fan, Embedding visual hierarchy with deep networks for large-scale visual recognition, IEEE Trans. Image Process. 27 (10) (2018) 4740–4755.

Digital Library

[7]

Y. Wang, Q. Hu, Y. Zhou, H. Zhao, Y. Qian, J. Liang, Local bayes risk minimization based stopping strategy for hierarchical classification, in: 2017 IEEE International Conference on Data Mining (ICDM), IEEE, 2017, pp. 515–524.

[8]

K. Lee, K. Lee, K. Min, Y. Zhang, J. Shin, H. Lee, Hierarchical novelty detection for visual object recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1034–1042.

[9]

S.H. Khan, M. Hayat, M. Bennamoun, F.A. Sohel, R. Togneri, Cost-sensitive learning of deep feature representations from imbalanced data, IEEE Trans. Neural Networks Learn. Syst. 29 (8) (2017) 3573–3587.

[10]

X. Liu, Y. Han, S. Bai, Y. Ge, T. Wang, X. Han, S. Li, J. You, J. Lu, Importance-aware semantic segmentation in self-driving with discrete wasserstein training, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 11629–11636.

[11]

X. Liu, X. Han, Y. Qiao, Y. Ge, S. Li, J. Lu, Unimodal-uniform constrained wasserstein training for medical diagnosis, in: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 0–0.

[12]

X. Liu, W. Ji, J. You, G.E. Fakhri, J. Woo, Severity-aware semantic segmentation with reinforced wasserstein training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12566–12575.

[13]

X. Liu, Y. Lu, X. Liu, S. Bai, S. Li, J. You, Wasserstein loss with alternative reinforcement learning for severity-aware semantic segmentation, IEEE Trans. Intell. Transp. Syst. 23 (1) (2020) 587–596.

[14]

X. Liu, Y. Zhang, X. Liu, S. Bai, S. Li, J. You, Reinforced wasserstein training for severity-aware semantic segmentation in autonomous driving, arXiv preprint arXiv:2008.04751.

[15]

Y. Rubner, C. Tomasi, L.J. Guibas, The earth mover’s distance as a metric for image retrieval, Int. J. Comput. Vision 40 (2) (2000) 99–121.

Digital Library

[16]

L. Rüschendorf, The wasserstein distance and approximation theorems, Probab. Theory Relat. Fields 70 (1) (1985) 117–129.

[17]

C.N. Silla, A.A. Freitas, A survey of hierarchical classification across different application domains, Data Min. Knowl. Disc. 22 (1–2) (2011) 31–72.

[18]

A. Kosmopoulos, I. Partalas, E. Gaussier, G. Paliouras, I. Androutsopoulos, Evaluation measures for hierarchical classification: a unified view and novel approaches, Data Min. Knowl. Disc. 29 (3) (2015) 820–865.

[19]

J. Deng, J. Krause, A.C. Berg, L. Fei-Fei, Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2012, pp. 3450–3457.

[20]

Y. Ge, S. Li, X. Li, F. Fan, W. Xie, J. You, X. Liu, Embedding semantic hierarchy in discrete optimal transport for risk minimization, in: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 2835–2839.

[21]

C. Farabet, C. Couprie, L. Najman, Y. LeCun, Learning hierarchical features for scene labeling, IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2012) 1915–1929.

Digital Library

[22]

P. Kontschieder, M. Fiterau, A. Criminisi, S. Rota Bulo, Deep neural decision forests, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1467–1475.

[23]

J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy, S. Bengio, Y. Li, H. Neven, H. Adam, Large-scale object classification using label relation graphs, in: European conference on computer vision, Springer, 2014, pp. 48–64.

[24]

D. Koller, M. Sahami, Hierarchically classifying documents using very few words, Tech. rep, Stanford InfoLab, 1997.

[25]

S. Kolouri, Y. Zou, G.K. Rohde, Sliced wasserstein kernels for probability distributions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5258–5267.

[26]

M.Z. Alaya, M. Berar, G. Gasso, A. Rakotomamonjy, Theoretical guarantees for bridging metric measure embedding and optimal transport, Neurocomputing 468 (2022) 416–430.

[27]

M. Arjovsky, S. Chintala, L. Bottou, Wasserstein gan, arXiv preprint arXiv:1701.07875.

[28]

W.-Z. Shao, J.-J. Xu, L. Chen, Q. Ge, L.-Q. Wang, B.-K. Bao, H.-B. Li, On potentials of regularized wasserstein generative adversarial networks for realistic hallucination of tiny faces, Neurocomputing 364 (2019) 1–15.

[29]

S. Wang, B. Wang, Z. Zhang, A.A. Heidari, H. Chen, Class-aware sample reweighting optimal transport for multi-source domain adaptation, Neurocomputing 523 (2023) 213–223.

Digital Library

[30]

J. Xiao, T. Liu, R. Zhao, K.-M. Lam, Balanced distortion and perception in single-image super-resolution based on optimal transport in wavelet domain, Neurocomputing 464 (2021) 408–420.

[31]

F. Zhou, Z. Jiang, C. Shui, B. Wang, B. Chaib-draa, Domain generalization via optimal transport with metric similarity learning, Neurocomputing 456 (2021) 469–480.

[32]

M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, in: Advances in neural information processing systems, 2013, pp. 2292–2300.

[33]

C. Frogner, C. Zhang, H. Mobahi, M. Araya, T.A. Poggio, Learning with a wasserstein loss, in: Advances in Neural Information Processing Systems, 2015, pp. 2053–2061.

[34]

S.-H. Cha, S.N. Srihari, Distance between histograms of angular measurements and its application to handwritten character similarity, Pattern Recognition, 2000. Proceedings. 15th International Conference on, vol. 2, IEEE, 2000, pp. 21–24.

[35]

S.-H. Cha, S.N. Srihari, On measuring the distance between histograms, Pattern Recogn. 35 (6) (2002) 1355–1370.

[36]

S.-H. Cha, A fast hue-based colour image indexing algorithm, Machine Graphics & Vision, Int. J. 11 (2/3) (2002) 285–295.

[37]

B. Su, G. Hua, Order-preserving wasserstein distance for sequence matching, in: IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2906–2914.

[38]

P.J. Huber, Robust statistics, in: International Encyclopedia of Statistical Science, Springer, 2011, pp. 1248–1251.

[39]

A.J. Bekker, J. Goldberger, Training deep neural-networks based on unreliable labels, in: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, IEEE, 2016, pp. 2682–2686.

[40]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.

[41]

G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, G. Hinton, Regularizing neural networks by penalizing confident output distributions, arXiv preprint arXiv:1701.06548.

[42]

C. Villani, Topics in optimal transportation, 58, American Mathematical Soc, 2003.

[43]

X. Liu, B. Kumar, C. Yang, Q. Tang, J. You, Dependency-aware attention control for unconstrained face recognition with image sets, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 548–565.

[44]

P.A. Knight, D. Ruiz, A fast algorithm for matrix balancing, IMA J. Numer. Anal. 33 (3) (2013) 1029–1047.

[45]

M.L. Rizzo, G.J. Székely, Energy distance, Wiley Interdisciplinary Reviews: Computational Statistics 8 (1) (2016) 27–38.

[46]

K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, W. Zuo, End-to-end blind image quality assessment using deep neural networks, IEEE Trans. Image Process. 27 (3) (2018) 1202–1213.

[47]

M. Everingham, L. Van Gool, C.K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (voc) challenge, Int. J. Comput. Vision 88 (2) (2010) 303–338.

Digital Library

[48]

J. Krause, T. Gebru, J. Deng, L.-J. Li, L. Fei-Fei, Learning features and parts for fine-grained recognition, in: 2014 22nd International Conference on Pattern Recognition, IEEE, 2014, pp. 26–33.

[49]

G. Griffin, A. Holub, P. Perona, Caltech-256 object category dataset.

[50]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., Imagenet large scale visual recognition challenge, Int. J. Comput. Vision 115 (3) (2015) 211–252.

Digital Library

Recommendations

Reinsurer's optimal reinsurance strategy with upper and lower premium constraints under distortion risk measures

Motivated by Cui etźal. (2013) and Zheng and Cui (2014), we study in this paper the optimal (from the reinsurer's point of view) reinsurance problem where the risk is measured by distortion risk measures, the premiums are calculated under the distortion ...
Visual Coding in a Semantic Hierarchy
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

In recent years, tremendous research endeavours have been dedicated to seeking effective visual representations for facilitating various multimedia applications, such as visual annotation and retrieval. Nonetheless, existing approaches can hardly ...
A Wasserstein distance approach for concentration of empirical risk estimates

This paper presents a unified approach based on Wasserstein distance to derive concentration bounds for empirical estimates for two broad classes of risk measures defined in the paper. The classes of risk measures introduced include as special cases well ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 530, Issue C

Apr 2023

206 pages

ISSN:0925-2312

Issue’s Table of Contents

Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 14 April 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents