Non-iterative Knowledge Fusion in Deep Convolutional Neural Networks

Mikhail Iu. Leontev^1,3,
Viktoriia Islenteva² &
Sergey V. Sukhov ORCID: orcid.org/0000-0002-8966-6030^2,3

485 Accesses
8 Citations
8 Altmetric
Explore all metrics

Abstract

Incorporation of new knowledge into neural networks with simultaneous preservation of the previous knowledge is known to be a nontrivial problem. This problem becomes even more complex when the new knowledge is contained not in new training examples, but inside the parameters (e.g., connection weights) of another neural network. In this correspondence, we propose and test two methods of combining knowledge contained in separate networks. The first method is based on a summation of weights. The second incorporates new knowledge by modification of weights nonessential for the preservation of previously stored information. We show that with these methods, the knowledge can be transferred non-iteratively from one network to another without requiring additional training sessions. The fused network operates efficiently, performing classification at a level similar to that of an ensemble of networks. The efficiency of the methods is quantified on several publicly available data sets in classification tasks both for shallow and deep feedforward neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Knowledge Fusion in Feedforward Artificial Neural Networks

Article 20 September 2017

Joint Autoencoders: A Flexible Meta-learning Framework

Learning Channel-Wise Ordered Aggregations in Deep Neural Networks

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Yu J, Zhu C, Zhang J et al (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2908982
Google Scholar
Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47:4014–4024
Google Scholar
McCloskey M, Cohen NJ (1989) Catastrophic interference in connectionist networks: the sequential learning problem. Psychol Learn Motiv Adv Res Theory 24:109–165. https://doi.org/10.1016/S0079-7421(08)60536-8
Google Scholar
Ratcliff R (1990) Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol Rev 97:285–308. https://doi.org/10.1037/0033-295X.97.2.285
Google Scholar
Caruana R (1997) Multitask learning. Mach Learn 28:41–75. https://doi.org/10.1023/A:1007379606734
Google Scholar
Li Z, Hoiem D (2016) Learning without forgetting. In: Proceedings of the European conference on computer vision (ECCV), pp 614–629
Donahue J, Jia Y, Vinyals O, et al (2014) DeCAF: a deep convolutional activation feature for generic visual recognition. In: Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning, PMLR, Bejing, China, pp 647–655
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 580–587
French RM, Ans B, Rousset S (2001) Pseudopatterns and dual‐network memory models: advantages and shortcomings. In: French R, Sougné J (eds) Connectionist models of learning, development and evolution. Springer, London, pp 13–22
Google Scholar
Li H, Wang X, Ding S (2017) Research and development of neural network ensembles: a survey. Artif Intell Rev. https://doi.org/10.1007/s10462-016-9535-1
Google Scholar
Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems, pp 1–15
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Google Scholar
Domingos P (2000) Bayesian averaging of classifiers and the overfitting problem. In: 17th International conference on machine learning, San Francisco, pp 223–230
Wolpert DH (1992) Stacked generalization. Neural Netw 5:241–259. https://doi.org/10.1016/S0893-6080(05)80023-1
Google Scholar
Bhardwaj M, Bhatnagar V (2015) Towards an optimally pruned classifier ensemble. Int J Mach Learn Cybern 6:699–718. https://doi.org/10.1007/s13042-014-0303-8
Google Scholar
Fujii T, Asama H, Fujita T, et al (1996) Knowledge sharing among multiple autonomous mobile robots through indirect communication using intelligent data carriers. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems, IROS’96. IEEE, pp 1466–1471
Yu J, Kuang Z, Zhang B et al (2018) Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Trans Inf Forensics Secur 13:1317–1332
Google Scholar
Paul R, Hawkins SH, Balagurunathan Y et al (2016) Deep feature transfer learning in combination with traditional features predicts survival among patients with lung adenocarcinoma. Tomography 2:388
Google Scholar
Zeng X, Martinez TR (2000) Using a neural network to approximate an ensemble of classifiers. Neural Process Lett 12:225–237
Google Scholar
Buciluǎ C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD’06. ACM Press, New York, New York, USA, p 535
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv Preprint arXiv:1503.02531
Papamakarios G (2015) Distilling model knowledge. arXiv Preprint arXiv:1510.02437
Alexandra C, Cristea P, Okamoto T (1997) Neural network knowledge extraction. Rev Roum des Sci Tech Ser EE (Electrotechn Energ) 42:477–491
Google Scholar
Towell GG, Shavlik JW (1993) Extracting refined rules from knowledge-based neural networks. Mach Learn 13:71–101
Google Scholar
Kolman E, Margaliot M (2005) Are artificial neural networks white boxes? IEEE Trans Neural Netw 16:844–852
Google Scholar
Mantas CJ (2008) A generic fuzzy aggregation operator: rules extraction from and insertion into artificial neural networks. Soft Comput 12:493–514
Google Scholar
Hruschka ER, Ebecken NFF (2006) Extracting rules from multilayer perceptrons in classification problems: a clustering-based approach. Neurocomputing 70:384–397
Google Scholar
McGarry KJ, MacIntyre J (1999) Knowledge extraction and insertion from radial basis function networks. In: IEE Colloquium on Applied Statistical Pattern Recognition (Ref. No. 1999/063), pp 15/1–15/6
Kasabov N, Woodford B (1999) Rule insertion and rule extraction from evolving fuzzy neural networks: algorithms and applications for building adaptive, intelligent expert systems. In: Proceedings of the FUZZ-IEEE, pp 1406–1411
Tran SN, Garcez ADA (2013) Knowledge extraction from deep belief networks for images. In: IJCAI-2013 workshop on neural-symbolic learning and reasoning
Tran SN, Garcez ADA (2018) Deep logic networks: inserting and extracting knowledge from deep belief networks. IEEE Trans Neural Netw Learn Syst 29:246–258
Google Scholar
Utans J (1996) Weight averaging for neural networks and local resampling schemes. In: AAAI-96 Workshop on integrating multiple learned models, pp 133–138
Smith J, Gashler M (2017) An investigation of how neural networks learn from the experiences of peers through periodic weight averaging. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 731–736
Akhlaghi MI, Sukhov SV (2018) Knowledge fusion in feedforward artificial neural networks. Neural Process Lett 48:257–272. https://doi.org/10.1007/s11063-017-9712-5
Google Scholar
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359. https://doi.org/10.1109/TKDE.2009.191
Google Scholar
Thrun S, Pratt L (1998) Learning to learn. Springer, Berlin
Google Scholar
Hu J, Lu J, Tan Y-P (2015) Deep transfer metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 325–333
Wu Y, Ji Q (2016) Constrained deep transfer feature learning and its applications. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5101–5109
Huang G-B, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2:107–122
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Blundell C, Cornebise J, Kavukcuoglu K, Wierstra D (2015) Weight uncertainty in neural networks. In: Proceedings of the 32nd international conference on international conference on machine learning-volume 37, JMLR.org, pp 1613–1622
Schmidt WF, Kraaijveld MA, Duin RPW (1992) Feedforward neural networks with random weights. In: Proceedings 11th IAPR international conference on pattern recognition, Vol. II. Conference B: pattern recognition methodology and systems. IEEE Computer Society Press, pp 1–4
Bellido I, Fiesler E (1993) Do backpropagation trained neural networks have normal weight distributions? In: ICANN’93. Springer, London, pp 772–775
French RM, Chater N (2002) Using noise to compute error surfaces in connectionist networks: a novel means of reducing catastrophic forgetting. Neural Comput 14:1755–1769
Google Scholar
Kirkpatrick J, Pascanu R, Rabinowitz N et al (2017) Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci 114:3521–3526
Google Scholar
Pascanu R, Bengio Y (2013) Revisiting Natural Gradient for Deep Networks. arXiv Preprint arXiv:1301.3584
Buntine WL, Weigend AS (1994) Computing second derivatives in feed-forward networks: a review. IEEE Trans Neural Netw 5:480–488. https://doi.org/10.1109/72.286919
Google Scholar
Ashmore S, Gashler M (2015) A method for finding similarity between multi-layer perceptrons by Forward Bipartite Alignment. In: 2015 International joint conference on neural networks (IJCNN), IEEE, pp 1–7
Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27:2420–2432
Google Scholar
Kuhn HW (1955) The Hungarian algorithm for the assignment problem. Nav Res Logist Q 2:83–97
Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2323. https://doi.org/10.1109/5.726791
Google Scholar
Chollet F (2016) Keras: deep learning library for theano and tensorflow. https://keras.io. Accessed 02 July 2019
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv Preprint arXiv:1412.6980
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Google Scholar
Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121:256–285
Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report, University of Toronto 1:7
Russakovsky O, Deng J, Su H et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115:211–252. https://doi.org/10.1007/s11263-015-0816-y
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv Preprint arXiv:1409.1556
Zhu L, Ikeda K, Pang S et al (2018) Merging weighted SVMs for parallel incremental learning. Neural Netw 100:25–38. https://doi.org/10.1016/j.neunet.2018.01.001
Google Scholar

Download references

Acknowledgements

We thank Dr. D. Haefner for careful reading of the manuscript and for useful comments. The reported study was funded by the Russian Foundation for Basic Research and the government of Ulyanovsk region according to the research Project No. 18-47-732006.

Author information

Authors and Affiliations

S.P. Kapitsa Research Institute of Technology (Technological Research Institute), Ulyanovsk State University, 4 Bld., 1 Universitetskaya Naberejnaya Str., Ulyanovsk, Russia, 432000
Mikhail Iu. Leontev
Ulyanovsk State Technical University, 32 Severny Venets Str., Ulyanovsk, Russia, 432027
Viktoriia Islenteva & Sergey V. Sukhov
Kotel’nikov Institute of Radio Engineering and Electronics (Ulyanovsk Branch), Russian Academy of Sciences, 14 Spasskaya Str., Ulyanovsk, Russia, 432011
Mikhail Iu. Leontev & Sergey V. Sukhov

Authors

Mikhail Iu. Leontev
View author publications
You can also search for this author in PubMed Google Scholar
Viktoriia Islenteva
View author publications
You can also search for this author in PubMed Google Scholar
Sergey V. Sukhov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey V. Sukhov.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 140 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leontev, M.I., Islenteva, V. & Sukhov, S.V. Non-iterative Knowledge Fusion in Deep Convolutional Neural Networks. Neural Process Lett 51, 1–22 (2020). https://doi.org/10.1007/s11063-019-10074-0

Download citation

Published: 04 July 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11063-019-10074-0

Non-iterative Knowledge Fusion in Deep Convolutional Neural Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Knowledge Fusion in Feedforward Artificial Neural Networks

Joint Autoencoders: A Flexible Meta-learning Framework

Learning Channel-Wise Ordered Aggregations in Deep Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (PDF 140 kb)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Non-iterative Knowledge Fusion in Deep Convolutional Neural Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Knowledge Fusion in Feedforward Artificial Neural Networks

Joint Autoencoders: A Flexible Meta-learning Framework

Learning Channel-Wise Ordered Aggregations in Deep Neural Networks

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (PDF 140 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation