[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3385412.3385999acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Blended, precise semantic program embeddings

Published: 11 June 2020 Publication History

Abstract

Learning neural program embeddings is key to utilizing deep neural networks in program languages research --- precise and efficient program representations enable the application of deep models to a wide range of program analysis tasks. Existing approaches predominately learn to embed programs from their source code, and, as a result, they do not capture deep, precise program semantics. On the other hand, models learned from runtime information critically depend on the quality of program executions, thus leading to trained models with highly variant quality. This paper tackles these inherent weaknesses of prior approaches by introducing a new deep neural network, Liger, which learns program representations from a mixture of symbolic and concrete execution traces. We have evaluated Liger on two tasks: method name prediction and semantics classification. Results show that Liger is significantly more accurate than the state-of-the-art static model code2seq in predicting method names, and requires on average around 10x fewer executions covering nearly 4x fewer paths than the state-of-the-art dynamic model DYPRO in both tasks. Liger offers a new, interesting design point in the space of neural program embeddings and opens up this new direction for exploration.

References

[1]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. International Conference on Learning Representations (2018).
[2]
Uri Alon, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. International Conference on Learning Representations (2019).
[3]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2Vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL, Article 40 (Jan. 2019), 29 pages.
[4]
Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2015. Multiple object recognition with visual attention. International Conference on Learning Representations (2015).
[5]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (2015).
[6]
Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, and Yoshua Bengio. 2016. End-to-end attention-based large vocabulary speech recognition. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4945–4949.
[7]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A Neural Probabilistic Language Model. J. Mach. Learn. Res. 3 (March 2003), 1137–1155. http://dl.acm.org/citation.cfm?id=944919.
[9]
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014.
[10]
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724– 1734.
[11]
Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. In Advances in neural information processing systems. 577– 585.
[12]
Daniel DeFreez, Aditya V. Thakur, and Cindy Rubio-González. 2018. Path-based Function Embedding and Its Application to Error-handling Specification Mining. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). 423–433.
[13]
Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, and John Makhoul. 2014. Fast and robust neural network joint models for statistical machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1370–1380.
[14]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain Adaptation for Large-scale Sentiment Classification: A Deep Learning Approach. In International Conference on Machine Learning (ICML). 513–520.
[15]
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In Thirty-First AAAI Conference on Artificial Intelligence.
[16]
Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, and Thomas Reps. 2018. Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 163–174.
[17]
L. C. Jain and L. R. Medsker. 1999. Recurrent Neural Networks: Design and Applications (1st ed.). CRC Press, Inc., USA.
[18]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. International Conference on Learning Representations (2015).
[19]
Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In International Conference on Machine Learning. 1188–1196.
[20]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations (2013).
[21]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Neural Information Processing Systems (NIPS). 3111–3119.
[22]
Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. 2014. Recurrent models of visual attention. In Advances in neural information processing systems. 2204–2212.
[23]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Thirtieth AAAI Conference on Artificial Intelligence.
[24]
Carlos Pacheco and Michael D. Ernst. 2007. Randoop: Feedback-Directed Random Testing for Java. In Companion to the 22nd ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications Companion (OOPSLA ’07). Association for Computing Machinery.
[25]
Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. Sk_P: A Neural Program Corrector for MOOCs. In Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH). 39–40.
[26]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics.
[27]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
[28]
Ke Wang. 2019. Learning Scalable and Precise Representation of Program Semantics. arXiv preprint arXiv:1905.05251 (2019).
[29]
Ke Wang and Mihai Christodorescu. 2019. COSET: A Benchmark for Evaluating Neural Program Embeddings. arXiv preprint arXiv:1905.11445 (2019).
[30]
Ke Wang, Rishabh Singh, and Zhendong Su. 2018. Dynamic Neural Program Embedding for Program Repair. International Conference on Learning Representations (2018).
[31]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning (2015).

Cited By

View all
  • (2024)Exploiting code symmetries for learning program semanticsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693695(40092-40113)Online publication date: 21-Jul-2024
  • (2024)Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis TasksProceedings of the ACM on Programming Languages10.1145/36498298:OOPSLA1(500-528)Online publication date: 29-Apr-2024
  • (2024)Combining Structured Static Code Information and Dynamic Symbolic Traces for Software Vulnerability PredictionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639212(1-13)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2020
1174 pages
ISBN:9781450376136
DOI:10.1145/3385412
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Attention Network
  2. Semantic Program Embedding
  3. Static and Dynamic Program Features

Qualifiers

  • Research-article

Conference

PLDI '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)75
  • Downloads (Last 6 weeks)5
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploiting code symmetries for learning program semanticsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693695(40092-40113)Online publication date: 21-Jul-2024
  • (2024)Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis TasksProceedings of the ACM on Programming Languages10.1145/36498298:OOPSLA1(500-528)Online publication date: 29-Apr-2024
  • (2024)Combining Structured Static Code Information and Dynamic Symbolic Traces for Software Vulnerability PredictionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639212(1-13)Online publication date: 20-May-2024
  • (2024)TRACED: Execution-aware Pre-training for Source CodeProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608140(1-12)Online publication date: 20-May-2024
  • (2024)Learning Generalizable Program and Architecture Representations for Performance ModelingProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00072(1-15)Online publication date: 17-Nov-2024
  • (2024)Detecting Source Code Vulnerabilities using High-Precision Code Representation and Bimodal Contrastive Learning2024 International Conference on Networking and Network Applications (NaNA)10.1109/NaNA63151.2024.00094(536-541)Online publication date: 9-Aug-2024
  • (2024)Revolutionizing Software Development: Autonomous Software Evolution2024 47th MIPRO ICT and Electronics Convention (MIPRO)10.1109/MIPRO60963.2024.10569871(224-228)Online publication date: 20-May-2024
  • (2024)Debugging convergence problems in probabilistic programs via program representation learning with SixthSenseInternational Journal on Software Tools for Technology Transfer10.1007/s10009-024-00737-2Online publication date: 19-Feb-2024
  • (2023)An Explanation Method for Models of CodeProceedings of the ACM on Programming Languages10.1145/36228267:OOPSLA2(801-827)Online publication date: 16-Oct-2023
  • (2023)LExecutor: Learning-Guided ExecutionProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616254(1522-1534)Online publication date: 30-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media