More Web Proxy on the site http://driver.im/

research-article

Open access

Zeroth-Order Optimization of Optical Neural Networks with Linear Combination Natural Gradient and Calibrated Model

Authors:

Hiroshi Sawada,

Kohei IkedaAuthors Info & Claims

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

Article No.: 23, Pages 1 - 6

https://doi.org/10.1145/3649329.3655664

Published: 07 November 2024 Publication History

Abstract

Optical neural networks (ONNs) have attracted great attention due to their low energy consumption and high-speed processing. The usual neural network training scheme leads to poor performance for ONNs because of their special parameterization and fabrication variations. This paper contributes to extend zeroth-order (ZO) optimization, which can be used to train such ONNs, in two ways. The first is to propose linear combination natural gradient, which mitigates the optimization difficulty caused by the special parameterization of an ONN. The second is to generate a guided direction vector by calibration for better guessing than random vectors generated in ZO optimization. Experimental results show that the two extensions significantly outperformed the existing ZO optimization and related methods with little computational overhead.

References

[1]

S. Amari. 1998. Natural gradient works efficiently in learning. Neural Computation 10, 2 (1998), 251--276.

Digital Library

[2]

M. Arjovsky, A. Shah, and Y. Bengio. 2016. Unitary evolution recurrent neural networks. In International Conference on Machine Learning. PMLR, 1120--1128.

[3]

F. Ashtiani, A. J. Geers, and F. Aflatouni. 2022. An on-chip photonic deep neural network for image classification. Nature 606 (2022), 501--506.

[4]

J. Bae, P. Vicol, J. Z. HaoChen, and R. B. Grosse. 2022. Amortized proximal optimization. Advances in Neural Information Processing Systems 35 (2022), 8982--8997.

[5]

S. Bandyopadhyay, A. Sludds, S. Krastanov, R. Hamerly, N. Harris, D. Bunandar, M. Streshinsky, M. Hochberg, and D. Englund. 2022. Single chip photonic deep neural network with accelerated training. arXiv preprint arXiv:2208.01623 (2022).

[6]

S. Banerjee, M. Nikdast, and K. Chakrabarty. 2023. Characterizing Coherent Integrated Photonic Neural Networks Under Imperfections. Journal of Lightwave Technology 41, 5 (2023), 1464--1479.

[7]

C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.

Digital Library

[8]

W. Bogaerts, D. Pérez, J. Capmany, D. A. B. Miller, J. Poon, D. Englund, F. Morichetti, and A. Melloni. 2020. Programmable photonic circuits. Nature 586 (2020), 207--216.

[9]

L. Bottou. 2012. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade: Second Edition. Springer, 421--436.

[10]

W. R. Clements, P. C. Humphreys, B. J. Metcalf, W. S. Kolthammer, and I. A. Walmsley. 2016. Optimal design for universal multiport interferometers. Optica 3, 12 (2016), 1460--1465.

[11]

M. Y.-S. Fang, S. Manipatruni, C. Wierzynski, A. Khosrowshahi, and M. R. DeWeese. 2019. Design of optical neural networks with component imprecisions. Optical Express 27, 10 (2019), 14009--14029.

[12]

I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep learning. MIT press.

[13]

J. Gu, C. Feng, Z. Zhao, Z. Ying, R. T. Chen, and D. Z. Pan. 2021. Efficient on-chip learning for optical neural networks through power-aware sparse zeroth-order optimization. In Proc. AAAI Conf. Artificial Intelligence, Vol. 35. 7583--7591.

[14]

J. Gu, Z. Zhao, C. Feng, W. Li, R. T. Chen, and D. Z. Pan. 2020. FLOPS: Efficient on-chip learning for optical neural networks through stochastic zeroth-order optimization. In Proc. 57th ACM/IEEE Design Automation Conference (DAC). 1--6.

[15]

D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[16]

F. Kunstner, P. Hennig, and L. Balles. 2019. Limitations of the empirical Fisher approximation for natural gradient descent. Advances in Neural Information Processing Systems 32 (2019).

[17]

Y. LeCun, Y. Bengio, and G. Hinton. 2015. Deep learning. Nature 521 (2015), 436--444.

[18]

Y. LeCun and C. Cortes. 2010. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist

[19]

S. Liu, P.-Y. Chen, B. Kailkhura, G. Zhang, A. O. Hero III, and P. K. Varshney. 2020. A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications. IEEE Signal Processing Magazine 37, 5 (2020), 43--54.

[20]

K. D. G. Maduranga, K. E. Helfrich, and Q. Ye. 2019. Complex unitary recurrent neural networks using scaled Cayley transform. In Proc. AAAI Conf. Artificial Intelligence. 4528--4535.

[21]

N. Maheswaranathan, L. Metz, G. Tucker, D. Choi, and J. Sohl-Dickstein. 2019. Guided evolutionary strategies: Augmenting random search with surrogate gradients. In International Conference on Machine Learning. PMLR, 4264--4273.

[22]

L. Malagò and G. Pistone. 2015. Information geometry of the Gaussian distribution in view of stochastic optimization. In Proc. ACM Conference on Foundations of Genetic Algorithms XIII. 150--162.

[23]

J. Martens. 2020. New insights and perspectives on the natural gradient method. The Journal of Machine Learning Research 21, 1 (2020), 5776--5851.

Digital Library

[24]

J. Martens and R. Grosse. 2015. Optimizing neural networks with Kronecker-factored approximate curvature. In International Conference on Machine Learning. PMLR, 2408--2417.

[25]

J. C. Mikkelsen, W. D. Sacher, and J. K. S. Poon. 2014. Dimensional variation tolerant silicon-on-insulator directional couplers. Opt. Express 22, 3 (Feb 2014), 3145--3150.

[26]

R. Pascanu and Y. Bengio. 2013. Revisiting natural gradient for deep networks. arXiv preprint arXiv:1301.3584 (2013).

[27]

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (2019).

[28]

M. Reck and A. Zeilinger. 1994. Experimental realization of any discrete unitary operator. Phys. Rev. Lett. 73, 1 (1994), 58--61.

[29]

Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić. 2017. Deep learning with coherent nanophotonic circuits. Nat. Photonics 11 (2017), 441--446.

[30]

R. Tang, R. Tanomura, T. Tanemura, and Y. Nakano. 2021. Ten-Port Unitary Optical Processor on a Silicon Photonic Chip. ACS Photonics 8, 7 (2021), 2074--2080.

[31]

S. K. Vadlamani, D. Englund, and R. Hamerly. 2023. Transferable learning on analog hardware. Science Advances 9, 28 (2023), eadh3436.

[32]

I. A. D. Williamson, T. W. Hughes, M. Minkov, B. Bartlett, S. Pai, and S. Fan. 2020. Reprogrammable Electro-Optic Nonlinear Activation Functions for Optical Neural Networks. IEEE Journal of Selected Topics in Quantum Electronics 26, 1 (2020), 1--12.

[33]

P. Zhao, P.-Y. Chen, S. Wang, and X. Lin. 2020. Towards query-efficient black-box adversary with zeroth-order natural gradient descent. In Proc. AAAI Conf. Artificial Intelligence, Vol. 34. 6909--6916.

[34]

H. Zhou, Y. Zhao, X. Wang, D. Gao, J. Dong, and X. Zhang. 2020. Self-configuring and reconfigurable silicon photonic signal processor. ACS Photonics 7, 3 (2020), 792--799.

Index Terms

Zeroth-Order Optimization of Optical Neural Networks with Linear Combination Natural Gradient and Calibrated Model

Index terms have been assigned to the content through auto-classification.

Recommendations

Natural conjugate gradient training of multilayer perceptrons

Natural gradient (NG) descent, arguably the fastest on-line method for multilayer perceptron (MLP) training, exploits the ''natural'' Riemannian metric that the Fisher information matrix defines in the MLP weight space. It also accelerates ordinary ...
Part 2: Multilayer perceptron and natural gradient learning
Abstract
Since the perceptron was developed for learning to classify input patterns, there have been plenty of studies on simple perceptrons and multilayer perceptrons. Despite wide and active studies in theory and applications, multilayer perceptrons ...
Training neural networks using Central Force Optimization and Particle Swarm Optimization: Insights and comparisons

Central Force Optimization (CFO) is a novel and upcoming metaheuristic technique that is based upon physical kinematics. It has previously been demonstrated that CFO is effective when compared with other metaheuristic techniques when applied to multiple ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

June 2024

2159 pages

ISBN:9798400706011

DOI:10.1145/3649329

Chair:
Vivek De

Copyright © 2024 Copyright held by the owner/author(s).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '24

Sponsor:

SIGDA

DAC '24: 61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
213
Total Downloads

Downloads (Last 12 months)213
Downloads (Last 6 weeks)107

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten