More Web Proxy on the site http://driver.im/

research-article

Blended, precise semantic program embeddings

Authors:

Zhendong SuAuthors Info & Claims

PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 121 - 134

https://doi.org/10.1145/3385412.3385999

Published: 11 June 2020 Publication History

Abstract

Learning neural program embeddings is key to utilizing deep neural networks in program languages research --- precise and efficient program representations enable the application of deep models to a wide range of program analysis tasks. Existing approaches predominately learn to embed programs from their source code, and, as a result, they do not capture deep, precise program semantics. On the other hand, models learned from runtime information critically depend on the quality of program executions, thus leading to trained models with highly variant quality. This paper tackles these inherent weaknesses of prior approaches by introducing a new deep neural network, Liger, which learns program representations from a mixture of symbolic and concrete execution traces. We have evaluated Liger on two tasks: method name prediction and semantics classification. Results show that Liger is significantly more accurate than the state-of-the-art static model code2seq in predicting method names, and requires on average around 10x fewer executions covering nearly 4x fewer paths than the state-of-the-art dynamic model DYPRO in both tasks. Liger offers a new, interesting design point in the space of neural program embeddings and opens up this new direction for exploration.

References

[1]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. International Conference on Learning Representations (2018).

[2]

Uri Alon, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. International Conference on Learning Representations (2019).

[3]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2Vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL, Article 40 (Jan. 2019), 29 pages.

Digital Library

[4]

Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2015. Multiple object recognition with visual attention. International Conference on Learning Representations (2015).

[5]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (2015).

[6]

Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, and Yoshua Bengio. 2016. End-to-end attention-based large vocabulary speech recognition. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4945–4949.

Digital Library

[7]

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A Neural Probabilistic Language Model. J. Mach. Learn. Res. 3 (March 2003), 1137–1155. http://dl.acm.org/citation.cfm?id=944919.

Digital Library

[8]

944966

[9]

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014.

[10]

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724– 1734.

[11]

Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. In Advances in neural information processing systems. 577– 585.

[12]

Daniel DeFreez, Aditya V. Thakur, and Cindy Rubio-González. 2018. Path-based Function Embedding and Its Application to Error-handling Specification Mining. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). 423–433.

Digital Library

[13]

Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, and John Makhoul. 2014. Fast and robust neural network joint models for statistical machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1370–1380.

[14]

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain Adaptation for Large-scale Sentiment Classification: A Deep Learning Approach. In International Conference on Machine Learning (ICML). 513–520.

[15]

Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In Thirty-First AAAI Conference on Artificial Intelligence.

[16]

Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, and Thomas Reps. 2018. Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 163–174.

Digital Library

[17]

L. C. Jain and L. R. Medsker. 1999. Recurrent Neural Networks: Design and Applications (1st ed.). CRC Press, Inc., USA.

[18]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. International Conference on Learning Representations (2015).

[19]

Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In International Conference on Machine Learning. 1188–1196.

[20]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations (2013).

[21]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Neural Information Processing Systems (NIPS). 3111–3119.

[22]

Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. 2014. Recurrent models of visual attention. In Advances in neural information processing systems. 2204–2212.

[23]

Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Thirtieth AAAI Conference on Artificial Intelligence.

Digital Library

[24]

Carlos Pacheco and Michael D. Ernst. 2007. Randoop: Feedback-Directed Random Testing for Java. In Companion to the 22nd ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications Companion (OOPSLA ’07). Association for Computing Machinery.

[25]

Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. Sk_P: A Neural Program Corrector for MOOCs. In Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH). 39–40.

Digital Library

[26]

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics.

[27]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.

[28]

Ke Wang. 2019. Learning Scalable and Precise Representation of Program Semantics. arXiv preprint arXiv:1905.05251 (2019).

[29]

Ke Wang and Mihai Christodorescu. 2019. COSET: A Benchmark for Evaluating Neural Program Embeddings. arXiv preprint arXiv:1905.11445 (2019).

[30]

Ke Wang, Rishabh Singh, and Zhendong Su. 2018. Dynamic Neural Program Embedding for Program Repair. International Conference on Learning Representations (2018).

[31]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning (2015).

Digital Library

Cited By

Pei KLi WJin QLiu SGeng SCavallaro LYang JJana SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Exploiting code symmetries for learning program semanticsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693695(40092-40113)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693695
Chen QYu CLiu RZhang CWang YWang KSu TWang L(2024)Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis TasksProceedings of the ACM on Programming Languages10.1145/36498298:OOPSLA1(500-528)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649829
Wang HTang ZTan SWang JLiu YFang HXia CWang ZRoychoudhury APaiva AAbreu RStorey M(2024)Combining Structured Static Code Information and Dynamic Symbolic Traces for Software Vulnerability PredictionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639212(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639212
Show More Cited By

Index Terms

Blended, precise semantic program embeddings
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning latent representations
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages

Recommendations

Learning semantic program embeddings with graph interval neural network

Learning distributed representations of source code has been a challenging task for machine learning models. Earlier works treated programs as text so that natural language methods can be readily applied. Unfortunately, such approaches do not capitalize ...
DeepInteract: Multi-view features interactive learning for sequential recommendation
Highlights
- Multi-view feature interactive learning was introduced for sequential recommendation.
Abstract
Deep learning models have been successfully applied in sequential recommendations. However, previous studies ignored the interaction between static and dynamic features of both items and users, thus fail to exactly capture users’ ...
Joint Graph Learning and Matching for Semantic Feature Correspondence
Highlights
- We analyze shortcomings of graph construction strategies in previous algorithms.
- We propose to boost graph matching by learning reliable graph patterns without input graph structures.
- We integrate the learning of graph structures, ...
Abstract
In recent years, powered by the learned discriminative representation via graph neural network (GNN) models, deep graph matching methods have made great progresses in the task of matching semantic features. However, these methods usually rely on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2020

1174 pages

ISBN:9781450376136

DOI:10.1145/3385412

General Chair:
Alastair F. Donaldson
Imperial College London, UK
,
Program Chair:
Emina Torlak
University of Washington, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '20

Sponsor:

SIGPLAN

PLDI '20: 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation

June 15 - 20, 2020

London, UK

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
1,007
Total Downloads

Downloads (Last 12 months)75
Downloads (Last 6 weeks)5

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pei KLi WJin QLiu SGeng SCavallaro LYang JJana SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Exploiting code symmetries for learning program semanticsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693695(40092-40113)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693695
Chen QYu CLiu RZhang CWang YWang KSu TWang L(2024)Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis TasksProceedings of the ACM on Programming Languages10.1145/36498298:OOPSLA1(500-528)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649829
Wang HTang ZTan SWang JLiu YFang HXia CWang ZRoychoudhury APaiva AAbreu RStorey M(2024)Combining Structured Static Code Information and Dynamic Symbolic Traces for Software Vulnerability PredictionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639212(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639212
Ding YSteenhoek BPei KKaiser GLe WRay BRoychoudhury APaiva AAbreu RStorey M(2024)TRACED: Execution-aware Pre-training for Source CodeProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608140(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3608140
Li LFlynn THoisie A(2024)Learning Generalizable Program and Architecture Representations for Performance ModelingProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00072(1-15)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00072
Wang JXu MChen H(2024)Detecting Source Code Vulnerabilities using High-Precision Code Representation and Bimodal Contrastive Learning2024 International Conference on Networking and Network Applications (NaNA)10.1109/NaNA63151.2024.00094(536-541)Online publication date: 9-Aug-2024
https://doi.org/10.1109/NaNA63151.2024.00094
Mlinarić DDončević JBrčić MBotički I(2024)Revolutionizing Software Development: Autonomous Software Evolution2024 47th MIPRO ICT and Electronics Convention (MIPRO)10.1109/MIPRO60963.2024.10569871(224-228)Online publication date: 20-May-2024
https://doi.org/10.1109/MIPRO60963.2024.10569871
Huang ZDutta SMisailovic S(2024)Debugging convergence problems in probabilistic programs via program representation learning with SixthSenseInternational Journal on Software Tools for Technology Transfer10.1007/s10009-024-00737-2Online publication date: 19-Feb-2024
https://doi.org/10.1007/s10009-024-00737-2
Wang YWang KWang L(2023)An Explanation Method for Models of CodeProceedings of the ACM on Programming Languages10.1145/36228267:OOPSLA2(801-827)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622826
Souza BPradel MChandra SBlincoe KTonella P(2023)LExecutor: Learning-Guided ExecutionProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616254(1522-1534)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616254
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents