Open access
Author
Date
2020-06Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Optimization is the key component of deep learning. Increasing depth, which is
vital for reaching a good performance for deep neural networks, is realized only
by recent advanced optimization techniques, including batch-normalization. Despite
substantial empirical bene ts of these techniques, the inner mechanism of
them is not theoretically understood. This thesis contributes in the understanding
of optimization for neural networks. Particularly, we establish the following 4
contributions:
i. We study batch-normalization (BN), which is a breakthrough in deep learning.
The understanding of BN can provide us insights on optimization of
deep neural networks, since is it speci cally developed for deep nets. Leveraging
tools from Markov chain Theory and Ergodic Theory, we will prove
BN avoids rank collapse in deep neural networks. Through a set of extensive
experiments, we highlight the important role of the rank in optimization of
deep nets.
ii. We will show how stochastic gradient descent (SGD) can gain from its
stochastic approximation of the gradient in the context of neural networks.
Although the noise of SGD slows optimization in convex optimization, this
noise is advantageous in training neural networks as it facilitates escaping
saddles.
iii. We introduce the framework of local saddle point optimization for neural networks,
and underline the important barrier for gradient-based saddle-point
methods: the existence of undesired saddle points that are stable attractors
of gradient dynamics. We will show that how the exploitation of minimal
second-order information allows gradient-based method to escape these undesired
saddles.
iv. We highlight the important role of statistical accuracy (generalization error)
in optimization for machine learning applications. Leveraging the adaptive sample-
sizes technique, we design statistical-aware optimization methods. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000421981Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichOrganisational unit
09462 - Hofmann, Thomas / Hofmann, Thomas
More
Show all metadata
ETH Bibliography
yes
Altmetrics