[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3583780.3614873acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Exploring Low-Dimensional Manifolds of Deep Neural Network Parameters for Improved Model Optimization

Published: 21 October 2023 Publication History

Abstract

Manifold learning techniques have significantly enhanced the comprehension of massive data by exploring the geometric properties of the data manifold in low-dimensional subspaces. However, existing research on manifold learning primarily focuses on understanding the intricate data, overlooking the explosive growth of the scale and complexity of deep neural networks (DNNs), which presents a significant challenge for model optimization. In this work, we propose to explore the intrinsic low-dimensional manifold of network parameters for efficient model optimization. Specifically, we analyze parameter distributions in a deep model and perform sampling to map them onto a low-dimensional parameter manifold using the local tangent space alignment (LTSA). Since our focus is on studying parameter manifolds to guide model optimization, we therefore select dynamic optimal training trajectories for sampling and approximate tangent spaces to obtain low-dimensional representations of DNNs. By applying manifold learning techniques and employing a two-step alternate optimization method, we achieve a fixed subspace that reduces training time and resource costs for commonly used deep networks. The trained low-dimensional network can be mapped back to the original parameter space for further use. We demonstrate the benefits of learning low-dimensional parameterization of DNNs on both noisy label learning and federated learning tasks. Extensive experimental results on various benchmarks show the effectiveness of our method concerning both superior accuracy and reduced resource consumption.

References

[1]
Hervé Abdi and Lynne J Williams. 2010. Principal component analysis. Wiley interdisciplinary reviews: computational statistics, Vol. 2, 4 (2010), 433--459.
[2]
Mukund Balasubramanian and Eric L Schwartz. 2002. The isomap algorithm and topological stability. Science, Vol. 295, 5552 (2002), 7--7.
[3]
Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, Vol. 15, 6 (2003), 1373--1396.
[4]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.
[5]
J Douglas Carroll and Phipps Arabie. 1998. Multidimensional scaling. Measurement, judgment and decision making (1998), 179--250.
[6]
Xin Chen, Jian Weng, Wei Lu, Jiaming Xu, and Jiasi Weng. 2017. Deep manifold learning combined with convolutional neural networks for action recognition. IEEE transactions on neural networks and learning systems, Vol. 29, 9 (2017), 3938--3952.
[7]
Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Advances in neural information processing systems, Vol. 27 (2014).
[8]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[9]
Felix Draxler, Kambis Veschgini, Manfred Salmhofer, and Fred Hamprecht. 2018. Essentially no barriers in neural network energy landscape. In International conference on machine learning. PMLR, 1309--1318.
[10]
Haw-ren Fang, Sophia Sakellaridi, and Yousef Saad. 2010. Multilevel manifold learning with application to spectral clustering. In Proceedings of the 19th ACM international conference on Information and knowledge management. 419--428.
[11]
C Daniel Freeman and Joan Bruna. 2016. Topology and geometry of half-rectified network optimization. arXiv preprint arXiv:1611.01540 (2016).
[12]
Ian J Goodfellow, Oriol Vinyals, and Andrew M Saxe. 2014. Qualitatively characterizing neural network optimization problems. arXiv preprint arXiv:1412.6544 (2014).
[13]
Frithjof Gressmann, Zach Eaton-Rosen, and Carlo Luschi. 2020. Improving neural network training in low dimensional random bases. Advances in Neural Information Processing Systems, Vol. 33 (2020), 12140--12150.
[14]
Guy Gur-Ari, Daniel A Roberts, and Ethan Dyer. 2018. Gradient descent happens in a tiny subspace. arXiv preprint arXiv:1812.04754 (2018).
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[16]
Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European conference on computer vision (ECCV). 784--800.
[17]
Chaoqun Hong, Jun Yu, Jian Zhang, Xiongnan Jin, and Kyong-Ho Lee. 2018. Multimodal face-pose estimation with multitask manifold deep learning. IEEE transactions on industrial informatics, Vol. 15, 7 (2018), 3952--3961.
[18]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.
[19]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[20]
Jongmin Lee, Joo Young Choi, Ernest K Ryu, and Albert No. 2022. Neural tangent kernel analysis of deep narrow neural networks. In International Conference on Machine Learning. PMLR, 12282--12351.
[21]
Tao Li, Lei Tan, Zhehao Huang, Qinghua Tao, Yipeng Liu, and Xiaolin Huang. 2022. Low dimensional trajectory hypothesis is true: Dnns can be trained in tiny subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
[22]
Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, and Joey Gonzalez. 2020. Train big, then compress: Rethinking model size for efficient training and inference of transformers. In International Conference on machine learning. PMLR, 5958--5968.
[23]
Dong C Liu and Jorge Nocedal. 1989. On the limited memory BFGS method for large scale optimization. Mathematical programming, Vol. 45, 1--3 (1989), 503--528.
[24]
Hongtao Liu, Zhepeng Lv, Qing Yang, Dongliang Xu, and Qiyao Peng. 2022. ExpertBert: Pretraining Expert Finding. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 4244--4248.
[25]
Yiping Lu, Chao Ma, Yulong Lu, Jianfeng Lu, and Lexing Ying. 2020. A mean field analysis of deep resnet and beyond: Towards provably optimization via overparameterization from depth. In International Conference on Machine Learning. PMLR, 6426--6436.
[26]
Alexandra L'heureux, Katarina Grolinger, Hany F Elyamany, and Miriam AM Capretz. 2017. Machine learning with big data: Challenges and approaches. Ieee Access, Vol. 5 (2017), 7776--7797.
[27]
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Artificial intelligence and statistics. 1273--1282.
[28]
Joe Mellor, Jack Turner, Amos Storkey, and Elliot J Crowley. 2021. Neural architecture search without training. In International Conference on Machine Learning. PMLR, 7588--7598.
[29]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, Vol. 32 (2019).
[30]
Samira Pouyanfar, Yimin Yang, Shu-Ching Chen, Mei-Ling Shyu, and SS Iyengar. 2018. Multimedia big data analytics: A survey. ACM computing surveys (CSUR), Vol. 51, 1 (2018), 1--34.
[31]
Sam T Roweis and Lawrence K Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. science, Vol. 290, 5500 (2000), 2323--2326.
[32]
Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016).
[33]
Itay Safran and Ohad Shamir. 2016. On the quality of the initial basin in overspecified neural networks. In International Conference on Machine Learning. PMLR, 774--782.
[34]
Ameet Talwalkar, Sanjiv Kumar, and Henry Rowley. 2008. Large-scale manifold learning. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
[35]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[36]
Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, Vol. 575, 7782 (2019), 350--354.
[37]
Jun Wu and Jingrui He. 2019. Scalable manifold-regularized attributed network embedding via maximum mean discrepancy. In Proceedings of the 28th ACM international conference on information and knowledge management. 2101--2104.
[38]
Xindong Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding. 2013. Data mining with big data. IEEE transactions on knowledge and data engineering, Vol. 26, 1 (2013), 97--107.
[39]
Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, and Xian-sheng Hua. 2019. Quantization networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7308--7316.
[40]
Yang Yang, Junmei Hao, Canjia Li, Zili Wang, Jingang Wang, Fuzheng Zhang, Rao Fu, Peixu Hou, Gong Zhang, and Zhongyuan Wang. 2020. Query-aware tip generation for vertical search. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2893--2900.
[41]
Tong Yu and Hong Zhu. 2020. Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689 (2020).
[42]
Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2019. Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536 (2019).
[43]
Zhenyue Zhang and Hongyuan Zha. 2004. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM journal on scientific computing, Vol. 26, 1 (2004), 313--338.

Index Terms

  1. Exploring Low-Dimensional Manifolds of Deep Neural Network Parameters for Improved Model Optimization

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
        October 2023
        5508 pages
        ISBN:9798400701245
        DOI:10.1145/3583780
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 21 October 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. manifold learning
        2. model optimization
        3. neural networks
        4. parameter manifold

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        CIKM '23
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

        Upcoming Conference

        CIKM '25

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 150
          Total Downloads
        • Downloads (Last 12 months)109
        • Downloads (Last 6 weeks)7
        Reflects downloads up to 25 Dec 2024

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media