Rychener Y, Kuhn D and Sutter T. End-to-end learning for stochastic optimization. Proceedings of the 40th International Conference on Machine Learning. (29455-29472).

Chen L, Wang H, Zhao J, Koutris P and Papailiopoulos D. The effect of network width on the performance of large-batch training. Proceedings of the 32nd International Conference on Neural Information Processing Systems. (9322-9332).

Tan K, Huang S, Prahlad V and Lee T. Geometrical error compensation of gantry stage using neural networks. Proceedings of the Second international conference on Advances in Neural Networks - Volume Part III. (897-902).

https://doi.org/10.1007/11427469_142

Gurvits L. (2001). A note on a scale-sensitive dimension of linear bounded functionals in Banach spaces. Theoretical Computer Science. 261:1. (81-90). Online publication date: 20-Jun-2001.

https://doi.org/10.1016/S0304-3975(00)00134-1

Wah B. (1999). Generalization and Generalizability Measures. IEEE Transactions on Knowledge and Data Engineering. 11:1. (175-186). Online publication date: 1-Jan-1999.

https://doi.org/10.1109/69.755626

DasGupta B, Siegelmann H and Sontag E. On a learnability question associated to neural networks with continuous activations (extended abstract). Proceedings of the seventh annual conference on Computational learning theory. (47-56).

https://doi.org/10.1145/180139.181009

Ji C. Generalization error and the expected network complexity. Proceedings of the 7th International Conference on Neural Information Processing Systems. (367-374).

/doi/10.5555/2987189.2987236

Darken C, Donahue M, Gurvits L and Sontag E. Rate of approximation results motivated by robust neural network learning. Proceedings of the sixth annual conference on Computational learning theory. (303-309).

https://doi.org/10.1145/168304.168357

Kimber D and Long P. The learning complexity of smooth functions of a single variable. Proceedings of the fifth annual workshop on Computational learning theory. (153-159).

https://doi.org/10.1145/130385.130402