[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3524938.3525938guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Searching to exploit memorization effect in learning with noisy labels

Published: 13 July 2020 Publication History

Abstract

Sample selection approaches are popular in robust learning from noisy labels. However, how to properly control the selection process so that deep networks can benefit from the memorization effect is a hard problem. In this paper, motivated by the success of automated machine learning (AutoML), we model this issue as a function approximation problem. Specifically, we design a domain-specific search space based on general patterns of the memorization effect and propose a novel Newton algorithm to solve the bi-level optimization problem efficiently. We further provide theoretical analysis of the algorithm, which ensures a good approximation to critical points. Experiments are performed on benchmark data sets. Results demonstrate that the proposed method is much better than the state-of-the-art noisy-label-learning approaches, and also much more efficient than existing AutoML algorithms.

Supplementary Material

Additional material (3524938.3525938_supp.pdf)
Supplemental material.

References

[1]
Akimoto, Y., Shirakawa, S., Yoshinari, N., Uchida, K., Saito, S., and Nishida, K. Adaptive stochastic natural gradient method for one-shot neural architecture search. In ICML, pp. 171-180, 2019.
[2]
Amari, S. Natural gradient works efficiently in learning. NeuComp, 10(2):251-276, 1998.
[3]
Arpit, D., Jastrzkbski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M., Maharaj, T., Fischer, A., Courville, A., and Bengio, Y. A closer look at memorization in deep networks. In ICML, pp. 233-242, 2017.
[4]
Baker, B., Gupta, O., Naik, N., and Raskar, R. Designing neural network architectures using reinforcement learning. In ICLR, 2017.
[5]
Bengio, Y. Gradient-based optimization of hyperparameters. NeuComp, 12(8):1889-1900, 2000.
[6]
Bergstra, J. and Bengio, Y. Random search for hyperparameter optimization. JMLR, 13(Feb):281-305, 2012.
[7]
Bergstra, J. S., Bardenet, R., Bengio, Y., and Kégl, B. Algorithms for hyper-parameter optimization. In NIPS, pp. 2546-2554, 2011.
[8]
Bolte, J., Sabach, S., and Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. MathProg, 146(1-2):459-494, 2014.
[9]
Bottou, L. Large-scale machine learning with stochastic gradient descent. In ICCS, pp. 177-186, 2010.
[10]
Cheng, J., Liu, T., Ramamohanarao, K., and Tao, D. Learning with bounded instance-and label-dependent label noise. ICML, 2020.
[11]
Colson, B., Marcotte, P., and Savard, G. An overview of bilevel optimization. Ann. Oper. Res., 153(1):235-256, 2007.
[12]
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., and Hutter, F. Efficient and robust automated machine learning. In NIPS, pp. 2962-2970, 2015.
[13]
Geman, S. and Geman, D. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. TPAMI, (6):721-741, 1984.
[14]
Ghosh, A., Kumar, H., and Sastry, P. Robust loss functions under label noise for deep neural networks. In AAAI, pp. 1919-1925, 2017.
[15]
Goodfellow, I., Bengio, Y., and Courville, A. Deep Learning. MIT, 2016.
[16]
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., and Sugiyama, M. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In NIPS, pp. 8527-8537, 2018.
[17]
He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In CVPR, pp. 770-778, 2016.
[18]
Hinton, G., Srivastava, N., and Swersky, K. An overview of mini-batch gradient descent. Technical report, Neural networks for machine learning: Lecture 6, 2012.
[19]
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. Densely connected convolutional networks. In CVPR, pp. 4700-4708, 2017.
[20]
Hutter, F., Kotthoff, L., and Vanschoren, J. (eds.). Automated Machine Learning: Methods, Systems, Challenges. Springer, 2018.
[21]
Jiang, L., Zhou, Z., Leung, T., Li, J., and Li, F.-F. MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In ICML, pp. 2309-2318, 2018.
[22]
Kingma, D. and Ba, J. Adam: A method for stochastic optimization. In ICLR, 2014.
[23]
Laine, S. and Aila, T. Temporal ensembling for semisupervised learning. In ICLR, 2017.
[24]
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. Hyperband: A novel bandit-based approach to hyperparameter optimization. JMLR, 18(1):6765-6816, 2017.
[25]
Liu, H., Simonyan, K., and Yang, Y. DARTS: Differentiable architecture search. In ICLR, 2019.
[26]
Liu, T. and Tao, D. Classification with noisy labels by importance reweighting. TPAMI, 38(3):447-461, 2015.
[27]
Malach, E. and Shalev-Shwartz, S. Decoupling "when to update" from "how to update". In NIPS, pp. 960-970, 2017.
[28]
Miyato, T., Maeda, S., Koyama, M., and Ishii, S. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. In ICLR, 2016.
[29]
Patrini, G., Rozza, A., Menon, A., Nock, R., and Qu, L. Making deep neural networks robust to label noise: A loss correction approach. In CVPR, pp. 2233-2241, 2017.
[30]
Pham, H., Guan, M., Zoph, B., Le, Q., and Dean, J. Efficient neural architecture search via parameter sharing. In ICML, pp. 4092-4101, 2018.
[31]
Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., and Rabinovich, A. Training deep neural networks on noisy labels with bootstrapping. In ICLR Workshop, 2015.
[32]
Ren, M., Zeng, W., Yang, B., and Urtasun, R. Learning to reweight examples for robust deep learning. In ICML, pp. 4331-4340, 2018.
[33]
Rockafellar, R. T. Convex Analysis. Princeton University Press, 1970.
[34]
Schmidt, M., Roux, N. L., and Bach, F. R. Convergence rates of inexact proximal-gradient methods for convex optimization. In NIPS, pp. 1458-1466, 2011.
[35]
Sciuto, C., Yu, K., Jaggi, M., Musat, C., and Salzmann, M. Evaluating the search phase of neural architecture search. In ICLR, 2020.
[36]
Snoek, J., Larochelle, H., and Adams, R. Practical Bayesian optimization of machine learning algorithms. In NIPS, pp. 2951-2959, 2012.
[37]
Sra, S. Scalable nonconvex inexact proximal splitting. In NIPS, pp. 530-538, 2012.
[38]
Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., and Fergus, R. Training convolutional networks with noisy labels. In ICLR Workshop, 2015.
[39]
Tarvainen, A. and Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NIPS, 2017.
[40]
Wang, X., Wang, S., Wang, J., Shi, H., and Mei, T. Co-Mining: Deep face recognition with noisy labels. In ICCV, pp. 9358-9367, 2019.
[41]
Xie, L. and Yuille, A. Genetic CNN. In ICCV, pp. 1388- 1397, 2017.
[42]
Yao, Q. and Wang, M. Taking human out of learning applications: A survey on automated machine learning. Technical report, Arxiv: 1810.13306, 2018.
[43]
Yao, Q., Kwok, J., Gao, F., Chen, W., and Liu, T.- Y. Efficient inexact proximal gradient algorithm for nonconvex problems. In IJCAI, 2017.
[44]
Yao, Q., Xu, J., Tu, W.-W., and Zhu, Z. Efficient neural architecture search via proximal iterations. In AAAI, 2020.
[45]
Yu, X., Han, B., Yao, J., Niu, G., Tsang, I., and Sugiyama, M. How does disagreement help generalization against label corruption? In ICML, pp. 7164-7173, 2019.
[46]
Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. Understanding deep learning requires rethinking generalization. ICLR, 2016.
[47]
Zhang, H., Yao, Q., Yang, M., Xu, Y., and Bai, X. Efficient backbone search for scene text recognition. In ECCV, 2020.
[48]
Zoph, B. and Le, Q. Neural architecture search with reinforcement learning. In ICLR, 2017.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'20: Proceedings of the 37th International Conference on Machine Learning
July 2020
11702 pages

Publisher

JMLR.org

Publication History

Published: 13 July 2020

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 64
    Total Downloads
  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)6
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media