[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Inducing semantic hierarchy structure in empirical risk minimization with optimal transport measures

Published: 14 April 2023 Publication History

Abstract

The cross-entropy (CE) loss is arguably the most important empirical risk minimization objective for deep discriminative models for classification, and has achieved notable success in numerous applications. Though the CE loss is widely adopted, it essentially ignores the correlation between categories. For example, predicting a shepherd dog to husky is more acceptable than a tiger for the subsequent decision processes, while these two misclassifications result in the same CE loss. Therefore, the usually used CE loss does not incorporate the risk of misclassification of different categories, which can be measured by the distance between the predicted category and ground-truth category in a semantic hierarchical tree (SHT). In this work, to explicitly take the SHT-defined risk-aware inter-categorical correlation into consideration, by proposing a discrete optimal transport (DOT) training framework via configuring its ground distance matrix. We are able to predefine ground distance matrix in optimal transport measurement following a priori of hierarchical semantic risk. Specifically, the tree-induced error (TIE) on SHT is adopted as our ground distance matrix. Furthermore, it can be extended to its increasing function from the optimization perspective. In addition, we can also adaptively learn the matrix following an alternative optimization scheme. The semantic similarity in each level of a tree is integrated with the information gain. We demonstrated the effectiveness of our framework in several benchmarks of large-scale image classification with the semantic tree structure, and showed superior performance in a plug-and-play manner. The code is available in: https://anonymous.4open.science/r/OTM-Neurocomputing/

References

[1]
X. Liu, C. Yang, J. You, C.-C.J. Kuo, B.V. Kumar, Mutual information regularized feature-level frankenstein for discriminative recognition, IEEE Trans. Pattern Anal. Mach. Intell. 44 (9) (2021) 5243–5260.
[2]
Y. Han, X. Liu, Z. Sheng, Y. Ren, X. Han, J. You, R. Liu, Z. Luo, Wasserstein loss-based deep object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 998–999.
[3]
X. Liu, Y. Zou, T. Che, J.K.B.V.K. You, Conservative wasserstein training for pose estimation, ICCV, in, 2019.
[4]
M. Ceci, D. Malerba, Classifying web documents in a hierarchy of categories: a comprehensive study, J. Intell. Inform. Syst. 28 (1) (2007) 37–78.
[5]
Y. Wang, Z. Wang, Q. Hu, Y. Zhou, H. Su, Hierarchical semantic risk minimization for large-scale classification, IEEE Trans. Cybern. 52 (9) (2022) 9546–9558.
[6]
T. Zhao, B. Zhang, M. He, W. Zhang, N. Zhou, J. Yu, J. Fan, Embedding visual hierarchy with deep networks for large-scale visual recognition, IEEE Trans. Image Process. 27 (10) (2018) 4740–4755.
[7]
Y. Wang, Q. Hu, Y. Zhou, H. Zhao, Y. Qian, J. Liang, Local bayes risk minimization based stopping strategy for hierarchical classification, in: 2017 IEEE International Conference on Data Mining (ICDM), IEEE, 2017, pp. 515–524.
[8]
K. Lee, K. Lee, K. Min, Y. Zhang, J. Shin, H. Lee, Hierarchical novelty detection for visual object recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1034–1042.
[9]
S.H. Khan, M. Hayat, M. Bennamoun, F.A. Sohel, R. Togneri, Cost-sensitive learning of deep feature representations from imbalanced data, IEEE Trans. Neural Networks Learn. Syst. 29 (8) (2017) 3573–3587.
[10]
X. Liu, Y. Han, S. Bai, Y. Ge, T. Wang, X. Han, S. Li, J. You, J. Lu, Importance-aware semantic segmentation in self-driving with discrete wasserstein training, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 11629–11636.
[11]
X. Liu, X. Han, Y. Qiao, Y. Ge, S. Li, J. Lu, Unimodal-uniform constrained wasserstein training for medical diagnosis, in: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 0–0.
[12]
X. Liu, W. Ji, J. You, G.E. Fakhri, J. Woo, Severity-aware semantic segmentation with reinforced wasserstein training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12566–12575.
[13]
X. Liu, Y. Lu, X. Liu, S. Bai, S. Li, J. You, Wasserstein loss with alternative reinforcement learning for severity-aware semantic segmentation, IEEE Trans. Intell. Transp. Syst. 23 (1) (2020) 587–596.
[14]
X. Liu, Y. Zhang, X. Liu, S. Bai, S. Li, J. You, Reinforced wasserstein training for severity-aware semantic segmentation in autonomous driving, arXiv preprint arXiv:2008.04751.
[15]
Y. Rubner, C. Tomasi, L.J. Guibas, The earth mover’s distance as a metric for image retrieval, Int. J. Comput. Vision 40 (2) (2000) 99–121.
[16]
L. Rüschendorf, The wasserstein distance and approximation theorems, Probab. Theory Relat. Fields 70 (1) (1985) 117–129.
[17]
C.N. Silla, A.A. Freitas, A survey of hierarchical classification across different application domains, Data Min. Knowl. Disc. 22 (1–2) (2011) 31–72.
[18]
A. Kosmopoulos, I. Partalas, E. Gaussier, G. Paliouras, I. Androutsopoulos, Evaluation measures for hierarchical classification: a unified view and novel approaches, Data Min. Knowl. Disc. 29 (3) (2015) 820–865.
[19]
J. Deng, J. Krause, A.C. Berg, L. Fei-Fei, Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2012, pp. 3450–3457.
[20]
Y. Ge, S. Li, X. Li, F. Fan, W. Xie, J. You, X. Liu, Embedding semantic hierarchy in discrete optimal transport for risk minimization, in: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 2835–2839.
[21]
C. Farabet, C. Couprie, L. Najman, Y. LeCun, Learning hierarchical features for scene labeling, IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2012) 1915–1929.
[22]
P. Kontschieder, M. Fiterau, A. Criminisi, S. Rota Bulo, Deep neural decision forests, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1467–1475.
[23]
J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy, S. Bengio, Y. Li, H. Neven, H. Adam, Large-scale object classification using label relation graphs, in: European conference on computer vision, Springer, 2014, pp. 48–64.
[24]
D. Koller, M. Sahami, Hierarchically classifying documents using very few words, Tech. rep, Stanford InfoLab, 1997.
[25]
S. Kolouri, Y. Zou, G.K. Rohde, Sliced wasserstein kernels for probability distributions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5258–5267.
[26]
M.Z. Alaya, M. Berar, G. Gasso, A. Rakotomamonjy, Theoretical guarantees for bridging metric measure embedding and optimal transport, Neurocomputing 468 (2022) 416–430.
[27]
M. Arjovsky, S. Chintala, L. Bottou, Wasserstein gan, arXiv preprint arXiv:1701.07875.
[28]
W.-Z. Shao, J.-J. Xu, L. Chen, Q. Ge, L.-Q. Wang, B.-K. Bao, H.-B. Li, On potentials of regularized wasserstein generative adversarial networks for realistic hallucination of tiny faces, Neurocomputing 364 (2019) 1–15.
[29]
S. Wang, B. Wang, Z. Zhang, A.A. Heidari, H. Chen, Class-aware sample reweighting optimal transport for multi-source domain adaptation, Neurocomputing 523 (2023) 213–223.
[30]
J. Xiao, T. Liu, R. Zhao, K.-M. Lam, Balanced distortion and perception in single-image super-resolution based on optimal transport in wavelet domain, Neurocomputing 464 (2021) 408–420.
[31]
F. Zhou, Z. Jiang, C. Shui, B. Wang, B. Chaib-draa, Domain generalization via optimal transport with metric similarity learning, Neurocomputing 456 (2021) 469–480.
[32]
M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, in: Advances in neural information processing systems, 2013, pp. 2292–2300.
[33]
C. Frogner, C. Zhang, H. Mobahi, M. Araya, T.A. Poggio, Learning with a wasserstein loss, in: Advances in Neural Information Processing Systems, 2015, pp. 2053–2061.
[34]
S.-H. Cha, S.N. Srihari, Distance between histograms of angular measurements and its application to handwritten character similarity, Pattern Recognition, 2000. Proceedings. 15th International Conference on, vol. 2, IEEE, 2000, pp. 21–24.
[35]
S.-H. Cha, S.N. Srihari, On measuring the distance between histograms, Pattern Recogn. 35 (6) (2002) 1355–1370.
[36]
S.-H. Cha, A fast hue-based colour image indexing algorithm, Machine Graphics & Vision, Int. J. 11 (2/3) (2002) 285–295.
[37]
B. Su, G. Hua, Order-preserving wasserstein distance for sequence matching, in: IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2906–2914.
[38]
P.J. Huber, Robust statistics, in: International Encyclopedia of Statistical Science, Springer, 2011, pp. 1248–1251.
[39]
A.J. Bekker, J. Goldberger, Training deep neural-networks based on unreliable labels, in: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, IEEE, 2016, pp. 2682–2686.
[40]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
[41]
G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, G. Hinton, Regularizing neural networks by penalizing confident output distributions, arXiv preprint arXiv:1701.06548.
[42]
C. Villani, Topics in optimal transportation, 58, American Mathematical Soc, 2003.
[43]
X. Liu, B. Kumar, C. Yang, Q. Tang, J. You, Dependency-aware attention control for unconstrained face recognition with image sets, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 548–565.
[44]
P.A. Knight, D. Ruiz, A fast algorithm for matrix balancing, IMA J. Numer. Anal. 33 (3) (2013) 1029–1047.
[45]
M.L. Rizzo, G.J. Székely, Energy distance, Wiley Interdisciplinary Reviews: Computational Statistics 8 (1) (2016) 27–38.
[46]
K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, W. Zuo, End-to-end blind image quality assessment using deep neural networks, IEEE Trans. Image Process. 27 (3) (2018) 1202–1213.
[47]
M. Everingham, L. Van Gool, C.K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (voc) challenge, Int. J. Comput. Vision 88 (2) (2010) 303–338.
[48]
J. Krause, T. Gebru, J. Deng, L.-J. Li, L. Fei-Fei, Learning features and parts for fine-grained recognition, in: 2014 22nd International Conference on Pattern Recognition, IEEE, 2014, pp. 26–33.
[49]
G. Griffin, A. Holub, P. Perona, Caltech-256 object category dataset.
[50]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., Imagenet large scale visual recognition challenge, Int. J. Comput. Vision 115 (3) (2015) 211–252.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 530, Issue C
Apr 2023
206 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 14 April 2023

Author Tags

  1. Optimal transport distance
  2. Risk minimization
  3. Semantic hierarchy
  4. Wasserstein distance

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media