More Web Proxy on the site http://driver.im/

research-article

Free access

Searching to exploit memorization effect in learning with noisy labels

AUTHORs:

James T. KwokAuthors Info & Claims

ICML'20: Proceedings of the 37th International Conference on Machine Learning

Article No.: 1000, Pages 10789 - 10798

Published: 13 July 2020 Publication History

PDF eReader Publisher Site

Abstract

Sample selection approaches are popular in robust learning from noisy labels. However, how to properly control the selection process so that deep networks can benefit from the memorization effect is a hard problem. In this paper, motivated by the success of automated machine learning (AutoML), we model this issue as a function approximation problem. Specifically, we design a domain-specific search space based on general patterns of the memorization effect and propose a novel Newton algorithm to solve the bi-level optimization problem efficiently. We further provide theoretical analysis of the algorithm, which ensures a good approximation to critical points. Experiments are performed on benchmark data sets. Results demonstrate that the proposed method is much better than the state-of-the-art noisy-label-learning approaches, and also much more efficient than existing AutoML algorithms.

Supplementary Material

Additional material (3524938.3525938_supp.pdf)

Supplemental material.

Download
382.50 KB

References

[1]

Akimoto, Y., Shirakawa, S., Yoshinari, N., Uchida, K., Saito, S., and Nishida, K. Adaptive stochastic natural gradient method for one-shot neural architecture search. In ICML, pp. 171-180, 2019.

[2]

Amari, S. Natural gradient works efficiently in learning. NeuComp, 10(2):251-276, 1998.

[3]

Arpit, D., Jastrzkbski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M., Maharaj, T., Fischer, A., Courville, A., and Bengio, Y. A closer look at memorization in deep networks. In ICML, pp. 233-242, 2017.

[4]

Baker, B., Gupta, O., Naik, N., and Raskar, R. Designing neural network architectures using reinforcement learning. In ICLR, 2017.

[5]

Bengio, Y. Gradient-based optimization of hyperparameters. NeuComp, 12(8):1889-1900, 2000.

[6]

Bergstra, J. and Bengio, Y. Random search for hyperparameter optimization. JMLR, 13(Feb):281-305, 2012.

[7]

Bergstra, J. S., Bardenet, R., Bengio, Y., and Kégl, B. Algorithms for hyper-parameter optimization. In NIPS, pp. 2546-2554, 2011.

Digital Library

[8]

Bolte, J., Sabach, S., and Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. MathProg, 146(1-2):459-494, 2014.

[9]

Bottou, L. Large-scale machine learning with stochastic gradient descent. In ICCS, pp. 177-186, 2010.

[10]

Cheng, J., Liu, T., Ramamohanarao, K., and Tao, D. Learning with bounded instance-and label-dependent label noise. ICML, 2020.

[11]

Colson, B., Marcotte, P., and Savard, G. An overview of bilevel optimization. Ann. Oper. Res., 153(1):235-256, 2007.

[12]

Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., and Hutter, F. Efficient and robust automated machine learning. In NIPS, pp. 2962-2970, 2015.

Digital Library

[13]

Geman, S. and Geman, D. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. TPAMI, (6):721-741, 1984.

[14]

Ghosh, A., Kumar, H., and Sastry, P. Robust loss functions under label noise for deep neural networks. In AAAI, pp. 1919-1925, 2017.

[15]

Goodfellow, I., Bengio, Y., and Courville, A. Deep Learning. MIT, 2016.

[16]

Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., and Sugiyama, M. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In NIPS, pp. 8527-8537, 2018.

[17]

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In CVPR, pp. 770-778, 2016.

[18]

Hinton, G., Srivastava, N., and Swersky, K. An overview of mini-batch gradient descent. Technical report, Neural networks for machine learning: Lecture 6, 2012.

[19]

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. Densely connected convolutional networks. In CVPR, pp. 4700-4708, 2017.

[20]

Hutter, F., Kotthoff, L., and Vanschoren, J. (eds.). Automated Machine Learning: Methods, Systems, Challenges. Springer, 2018.

[21]

Jiang, L., Zhou, Z., Leung, T., Li, J., and Li, F.-F. MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In ICML, pp. 2309-2318, 2018.

[22]

Kingma, D. and Ba, J. Adam: A method for stochastic optimization. In ICLR, 2014.

[23]

Laine, S. and Aila, T. Temporal ensembling for semisupervised learning. In ICLR, 2017.

[24]

Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. Hyperband: A novel bandit-based approach to hyperparameter optimization. JMLR, 18(1):6765-6816, 2017.

Digital Library

[25]

Liu, H., Simonyan, K., and Yang, Y. DARTS: Differentiable architecture search. In ICLR, 2019.

[26]

Liu, T. and Tao, D. Classification with noisy labels by importance reweighting. TPAMI, 38(3):447-461, 2015.

Digital Library

[27]

Malach, E. and Shalev-Shwartz, S. Decoupling "when to update" from "how to update". In NIPS, pp. 960-970, 2017.

[28]

Miyato, T., Maeda, S., Koyama, M., and Ishii, S. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. In ICLR, 2016.

[29]

Patrini, G., Rozza, A., Menon, A., Nock, R., and Qu, L. Making deep neural networks robust to label noise: A loss correction approach. In CVPR, pp. 2233-2241, 2017.

[30]

Pham, H., Guan, M., Zoph, B., Le, Q., and Dean, J. Efficient neural architecture search via parameter sharing. In ICML, pp. 4092-4101, 2018.

[31]

Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., and Rabinovich, A. Training deep neural networks on noisy labels with bootstrapping. In ICLR Workshop, 2015.

[32]

Ren, M., Zeng, W., Yang, B., and Urtasun, R. Learning to reweight examples for robust deep learning. In ICML, pp. 4331-4340, 2018.

[33]

Rockafellar, R. T. Convex Analysis. Princeton University Press, 1970.

[34]

Schmidt, M., Roux, N. L., and Bach, F. R. Convergence rates of inexact proximal-gradient methods for convex optimization. In NIPS, pp. 1458-1466, 2011.

Digital Library

[35]

Sciuto, C., Yu, K., Jaggi, M., Musat, C., and Salzmann, M. Evaluating the search phase of neural architecture search. In ICLR, 2020.

[36]

Snoek, J., Larochelle, H., and Adams, R. Practical Bayesian optimization of machine learning algorithms. In NIPS, pp. 2951-2959, 2012.

Digital Library

[37]

Sra, S. Scalable nonconvex inexact proximal splitting. In NIPS, pp. 530-538, 2012.

Digital Library

[38]

Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., and Fergus, R. Training convolutional networks with noisy labels. In ICLR Workshop, 2015.

[39]

Tarvainen, A. and Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NIPS, 2017.

Digital Library

[40]

Wang, X., Wang, S., Wang, J., Shi, H., and Mei, T. Co-Mining: Deep face recognition with noisy labels. In ICCV, pp. 9358-9367, 2019.

[41]

Xie, L. and Yuille, A. Genetic CNN. In ICCV, pp. 1388- 1397, 2017.

[42]

Yao, Q. and Wang, M. Taking human out of learning applications: A survey on automated machine learning. Technical report, Arxiv: 1810.13306, 2018.

[43]

Yao, Q., Kwok, J., Gao, F., Chen, W., and Liu, T.- Y. Efficient inexact proximal gradient algorithm for nonconvex problems. In IJCAI, 2017.

[44]

Yao, Q., Xu, J., Tu, W.-W., and Zhu, Z. Efficient neural architecture search via proximal iterations. In AAAI, 2020.

[45]

Yu, X., Han, B., Yao, J., Niu, G., Tsang, I., and Sugiyama, M. How does disagreement help generalization against label corruption? In ICML, pp. 7164-7173, 2019.

[46]

Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. Understanding deep learning requires rethinking generalization. ICLR, 2016.

[47]

Zhang, H., Yao, Q., Yang, M., Xu, Y., and Bai, X. Efficient backbone search for scene text recognition. In ECCV, 2020.

Digital Library

[48]

Zoph, B. and Le, Q. Neural architecture search with reinforcement learning. In ICLR, 2017.

Recommendations

Searching to Exploit Memorization Effect in Deep Learning With Noisy Labels
Sample selection approaches are popular in robust learning from noisy labels. However, how to control the selection process properly so that deep networks can benefit from the memorization effect is a hard problem. In this paper, motivated by the success ...
NCMatch: Semi-supervised Learning with Noisy Labels via Noisy Sample Filter and Contrastive Learning
Pattern Recognition and Computer Vision
Abstract
Semi-supervised learning (SSL) has been widely studied in recent years, which aims to improve the performance of supervised learning by utilizing unlabeled data. However, the presence of noisy labels on labeled data is an inevitable consequence of ...
SplitNet: Learnable Clean-Noisy Label Splitting for Learning with Noisy Labels
Abstract
Annotating the dataset with high-quality labels is crucial for deep networks’ performance, but in real-world scenarios, the labels are often contaminated by noise. To address this, some methods were recently proposed to automatically split clean ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'20: Proceedings of the 37th International Conference on Machine Learning

July 2020

11702 pages

Editors:
Hal Daumé,
Aarti Singh

Copyright © 2020.

Publisher

JMLR.org

Publication History

Published: 13 July 2020

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
64
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)6

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten