Direct Zero-Norm Minimization for Neural Network Pruning and Training

S. P. Adam^4,5,
George D. Magoulas⁶ &
M. N. Vrahatis⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 311))

Included in the following conference series:

International Conference on Engineering Applications of Neural Networks

1616 Accesses

Abstract

Designing a feed-forward neural network with optimal topology in terms of complexity (hidden layer nodes and connections between nodes) and training performance has been a matter of considerable concern since the very beginning of neural networks research. Typically, this issue is dealt with by pruning a fully interconnected network with “many” nodes in the hidden layers, eliminating “superfluous” connections and nodes. However the problem has not been solved yet and it seems to be even more relevant today in the context of deep learning networks. In this paper we present a method of direct zero-norm minimization for pruning while training a Multi Layer Perceptron. The method employs a cooperative scheme using two swarms of particles and its purpose is to minimize an aggregate function corresponding to the total risk functional. Our discussion highlights relevant computational and methodological issues of the approach that are not apparent and well defined in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Pruning Neural Nets by Optimal Neuron Merging

Battle royale optimizer for training multi-layer perceptron

Article 21 August 2021

Spectral pruning of fully connected layers

Article Open access 01 July 2022

References

Norgaard, M.: Neural Network Based System Identification Toolbox, version 2. Technical report, 00-E-891, Dept. of Automation, Technical University of Denmark (2000)
Google Scholar
Stepniewski, S.W., Keane, A.J.: Topology Design of Feedforward Neural Networks by Genetic Algorithms. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN 1996. LNCS, vol. 1141, pp. 771–780. Springer, Heidelberg (1996)
Chapter Google Scholar
Pinkus, A.: Approximation theory of the MLP model in neural model. Acta Numerica, 143–195 (1999)
Google Scholar
Jones, L.K.: A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. The Annals of Statistics 20, 601–613 (1992)
Article Google Scholar
Barron, A.R.: Universal approximation bounds for superposition of a sigmoidal function. IEEE Trans. Inform. Theory 39, 930–945 (1993)
Article MathSciNet MATH Google Scholar
Kůrková, V., Kainen, P.C., Kreinovich, V.: Estimates of the number of hidden units and variation with respect to half-spaces. Neural Networks 10, 1061–1068 (1997)
Article Google Scholar
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 251–257 (1991)
Article Google Scholar
Reed, R.: Pruning algorithms - A Survey. IEEE Trans. Neural Networks 4, 740–747 (1993)
Article Google Scholar
Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-posed Problems. W.H. Winston, Washington, DC (1977)
Google Scholar
Haykin, S.: Neural networks: A comprehensive Foundation. Prentice-Hall, Upper Saddle River (1999)
MATH Google Scholar
Hinton, G.E.: Connectionist learning procedures. Artificial Intelligence 40, 185–234 (1989)
Article Google Scholar
Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by weight-elimination with application to forecasting. In: Lippmann, R., Moody, J., Touretzky, D. (eds.) Advances in Neural Information Processing Systems (3), pp. 875–882. Morgan-Kaufmann, San Mateo (1991)
Google Scholar
Mozer, M.C., Smolensky, P.: Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems (1), pp. 40–48. Morgan Kaufmann, San Francisco (1989)
Google Scholar
Karnin, E.D.: A simple procedure for pruning back-propagation trained neural networks. IEEE Trans. Neural Networks 1, 239–242 (1990)
Article Google Scholar
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal Brain Damage. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems (2), pp. 598–605. Morgan Kaufmann, San Francisco (1990)
Google Scholar
Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: Optimal Brain Surgeon. In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural Information Processing Systems (5), pp. 164–172. Morgan-Kaufmann, San Mateo (1993)
Google Scholar
Hancock, P.J.B.: Pruning neural networks by genetic algorithm. In: Aleksander, I., Taylor, J.G. (eds.) Proc. of the International Conference on Artificial Neural Networks, pp. 991–994. Elsevier, Brighton (1992)
Google Scholar
Whitley, D.: Genetic Algorithms and Neural Networks. Genetic Algorithms in Engineering and Computer Science, pp. 191–201. John Wiley (1995)
Google Scholar
Garro, B.A., Sossa, H., Vazquez, R.A.: Design of artificial neural networks using a modified particle swarm optimization algorithm. In: Proc. IEEE International Joint Conference on Neural Networks, Atlanta, pp. 938–945 (2009)
Google Scholar
Zhao, L., Qian, F.: Tuning the structure and parameters of a neural network using cooperative binary-real particle swarm optimization. Expert Systems with Applications (2010)
Google Scholar
Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. J. Machine Learning Res. 3, 1439–1461 (2003)
MATH Google Scholar
Fung, G.M., Mangasarian, O.L., Smola, A.J.: Minimal kernel classifiers. J. Machine Learning Res. 3, 303–321 (2002)
MathSciNet Google Scholar
Amaldi, E., Kann, V.: On the approximability of minimizing non zero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 237–260 (1998)
Google Scholar
Moody, J.E., Rögnvaldsson, T.: Smoothing regularizers for projective basis function networks. In: Mozer, M., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems (9), pp. 585–591. MIT Press, Denver (1997)
Google Scholar
Hanson, S.J., Pratt, L.Y.: Comparing biases for minimal network construction with back-propagation. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems (1), pp. 177–185. Morgan Kaufmann, San Francisco (1989)
Google Scholar
Parsopoulos, K.E., Tasoulis, D.K., Vrahatis, M.N.: Multi-objective optimization using parallel vector evaluated particle swarm optimization. In: Proc. of the IASTED International Conference on Artificial Intelligence and Applications (AIA), Innsbruck, vol. 2, pp. 823–828 (2004)
Google Scholar
van de Bergh, F., Engelbrecht, A.P.: A cooperative approach to particle swarm optimization. IEEE Trans. Evolutionary Computation 8, 1–15 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computational Intelligence Laboratory, Dept. of Mathematics, University of Patras, Rion, Patras, Greece
S. P. Adam & M. N. Vrahatis
Dept. of Informatics and Telecommunications, Technological Education Institute of Epirus, Arta, Greece
S. P. Adam
Dept. of Computer Science and Information Systems, Birkbeck College, University of London, United Kingdom
George D. Magoulas

Authors

S. P. Adam
View author publications
You can also search for this author in PubMed Google Scholar
George D. Magoulas
View author publications
You can also search for this author in PubMed Google Scholar
M. N. Vrahatis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Coventry University, Priory Street,, CV1 5FB, Coventry, UK
Chrisina Jayne
University of Lincoln, LN6 7TS, Lincoln, UK
Shigang Yue
University of Thrace, 193 Pandazidou st., 68200 N, Orestiada, Greece
Lazaros Iliadis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adam, S.P., Magoulas, G.D., Vrahatis, M.N. (2012). Direct Zero-Norm Minimization for Neural Network Pruning and Training. In: Jayne, C., Yue, S., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2012. Communications in Computer and Information Science, vol 311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32909-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-32909-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32908-1
Online ISBN: 978-3-642-32909-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Direct Zero-Norm Minimization for Neural Network Pruning and Training

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Pruning Neural Nets by Optimal Neuron Merging

Battle royale optimizer for training multi-layer perceptron

Spectral pruning of fully connected layers

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Direct Zero-Norm Minimization for Neural Network Pruning and Training

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Pruning Neural Nets by Optimal Neuron Merging

Battle royale optimizer for training multi-layer perceptron

Spectral pruning of fully connected layers

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation