[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ASE.2019.00061acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

CodeKernel: a graph kernel based approach to the selection of API usage examples

Published: 07 February 2020 Publication History

Abstract

Developers often want to find out how to use a certain API (e.g., FileReader.read in JDK library). API usage examples are very helpful in this regard. Over the years, many automated methods have been proposed to generate code examples by clustering and summarizing relevant code snippets extracted from a code corpus. These approaches simplify source code as method invocation sequences or feature vectors. Such simplifications only model partial aspects of the code and tend to yield inaccurate examples.
We propose CodeKernel, a graph kernel based approach to the selection of API usage examples. Instead of approximating source code as method invocation sequences or feature vectors, CodeKernel represents source code as object usage graphs. Then, it clusters graphs by embedding them into a continuous space using a graph kernel. Finally, it outputs code examples by selecting a representative graph from each cluster using designed ranking metrics. Our empirical evaluation shows that CodeKernel selects more accurate code examples than the related work (MUSE and eXoaDocs). A user study involving 25 developers in a multinational company also confirms the usefulness of CodeKernel in selecting API usage examples.

References

[1]
Github Search. https://github.com/search?type=code.
[2]
S. Harris. (2003) Simian - Similarity Analyzer.[Online]. Available: http://www.harukizaemon.com/simian/.
[3]
Spectral Clusterer for WEKA. http://www.luigidragone.com/software/spectral-clusterer-for-weka/.
[4]
Stack Overflow. http://stackoverflow.com/.
[5]
M. Allamanis and C. Sutton. Mining idioms from source code. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 472--483. ACM, 2014.
[6]
C. M. Bishop. Pattern recognition and machine learning. springer, 2006.
[7]
K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Data Mining, Fifth IEEE International Conference on, pages 8--pp. IEEE, 2005.
[8]
K. M. Borgwardt, C. S. Ong, S. Schönauer, S. Vishwanathan, A. J. Smola, and H.-P. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(suppl 1):i47--i56, 2005.
[9]
M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 213--222. ACM, 2009.
[10]
R. P. Buse and W. Weimer. Synthesizing API usage examples. In Software Engineering (ICSE), 2012 34th International Conference on, pages 782--792. IEEE, 2012.
[11]
C. Cai, L. Han, Z. Ji, and Y. Chen. Enzyme family classification by support vector machines. Proteins: Structure, Function, and Bioinformatics, 55(1):66--76, 2004.
[12]
D. Cai, X. He, and J. Han. Document clustering using locality preserving indexing. Knowledge and Data Engineering, IEEE Transactions on, 17(12):1624--1637, 2005.
[13]
J. Fowkes and C. Sutton. Parameter-free probabilistic api mining at github scale. arXiv preprint arXiv:1512.05558, 2015.
[14]
T. Gärtner. A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter, 5(1):49--58, 2003.
[15]
M. Ghafari, K. Rubinov, and M. M. Pourhashem K. Mining unit test cases to synthesize api usage examples. Journal of Software: Evolution and Process, 29(12):e1841, 2017.
[16]
X. Gu, H. Zhang, and S. Kim. Deep code search. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pages 933--944. IEEE, 2018.
[17]
X. Gu, H. Zhang, D. Zhang, and S. Kim. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 631--642. ACM, 2016.
[18]
J. Han and C. Moraga. The influence of the sigmoid function parameters on the speed of backpropagation learning. In From Natural to Artificial Neural Computation, pages 195--201. Springer, 1995.
[19]
B. Hummel, E. Juergens, L. Heinemann, and M. Conradt. Index-based code clone detection: incremental, distributed, scalable. In Software Maintenance (ICSM), 2010 IEEE International Conference on, pages 1--9. IEEE, 2010.
[20]
I. Keivanloo, J. Rilling, and Y. Zou. Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering, pages 664--675. ACM, 2014.
[21]
J. Kim, S. Lee, S.-w. Hwang, and S. Kim. Towards an intelligent code search engine. In Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.
[22]
S. Kim, T. Zimmermann, and N. Nagappan. Crash graphs: An aggregated view of multiple crashes to improve crash triage. In Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference on, pages 486--493. IEEE, 2011.
[23]
A. Kuhn, S. Ducasse, and T. Gírba. Semantic clustering: Identifying topics in source code. Information and Software Technology, 49(3):230--243, 2007.
[24]
B. Kulis, S. Basu, I. Dhillon, and R. Mooney. Semi-supervised graph clustering: a kernel approach. Machine learning, 74(1):1--22, 2009.
[25]
F. Lv, H. Zhang, J. Lou, S. Wang, D. Zhang, and J. Zhao. Codehow: Effective code search based on API understanding and extended boolean model. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, ASE '15, pages 260--270, Piscataway, NJ, USA, 2015. IEEE Press.
[26]
C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.
[27]
A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 169--178. ACM, 2000.
[28]
L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus. How can I use this method? In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE'15). IEEE, 2015.
[29]
A. Narayanan, G. Meng, L. Yang, J. Liu, and L. Chen. Contextual weisfeiler-lehman graph kernel for malware detection. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 4701--4708. IEEE, 2016.
[30]
A. T. Nguyen, M. Hilton, M. Codoban, H. A. Nguyen, L. Mast, E. Rademacher, T. N. Nguyen, and D. Dig. Api code recommendation using statistical learning from fine-grained changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 511--522. ACM, 2016.
[31]
A. T. Nguyen, T. D. Nguyen, H. D. Phan, and T. N. Nguyen. A deep neural network language model with contexts for source code. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 323--334. IEEE, 2018.
[32]
A. T. Nguyen and T. N. Nguyen. Graph-based statistical language model for code. In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE'15). IEEE, 2015.
[33]
A. T. Nguyen, T. T. Nguyen, H. A. Nguyen, A. Tamrawi, H. V. Nguyen, J. Al-Kofahi, and T. N. Nguyen. Graph-based pattern-oriented, context-sensitive source code completion. In Proceedings of the 34th International Conference on Software Engineering, pages 69--79. IEEE Press, 2012.
[34]
H. A. Nguyen, T. T. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Accurate and efficient structural characteristic feature extraction for clone detection. In Fundamental Approaches to Software Engineering, pages 440--455. Springer, 2009.
[35]
P. Nguyen, J. Di Rocco, D. Ruscio, L. Ochoa, T. Degueule, and M. Di Penta. Focus: A recommender system for mining API function calls and usage patterns. In 41st ACM/IEEE International Conference on Software Engineering (ICSE), 2019.
[36]
T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Graph-based mining of multiple object usage patterns. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 383--392. ACM, 2009.
[37]
T. T. Nguyen, H. V. Pham, P. M. Vu, and T. T. Nguyen. Learning API usages from bytecode: A statistical approach. arXiv preprint arXiv:1507.07306, 2015.
[38]
H. Niu, I. Keivanloo, and Y. Zou. Learning to rank code examples for code search engines. Empirical Software Engineering, 22(1):259--291, 2017.
[39]
H.-S. Park and C.-H. Jun. A simple and fast algorithm for k-medoids clustering. Expert Systems with Applications, 36(2):3336--3341, 2009.
[40]
J. C. Platt. Autoalbum: Clustering digital photographs using probabilistic model merging. In Content-based Access of Image and Video Libraries, 2000. Proceedings. IEEE Workshop on, pages 96--100. IEEE, 2000.
[41]
L. R. Rabiner and B.-H. Juang. An introduction to hidden markov models. ASSP Magazine, IEEE, 3(1):4--16, 1986.
[42]
L. Ralaivola, S. J. Swamidass, H. Saigo, and P. Baldi. Graph kernels for chemical informatics. Neural Networks, 18(8):1093--1110, 2005.
[43]
K. D. Rosa, R. Shah, B. Lin, A. Gershman, and R. Frederking. Topical clustering of tweets. Proceedings of the ACM SIGIR: SWSM, 2011.
[44]
C. K. Roy, J. R. Cordy, and R. Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming, 74(7):470--495, 2009.
[45]
J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge university press, 2004.
[46]
S. V. N. Vishwanathan, N. N. Schraudolph, R. Kondor, and K. M. Borgwardt. Graph kernels. The Journal of Machine Learning Research, 11:1201--1242, 2010.
[47]
J. Wang, Y. Dang, H. Zhang, K. Chen, T. Xie, and D. Zhang. Mining succinct and high-coverage API usage patterns from source code. In Proceedings of the 10th Working Conference on Mining Software Repositories, pages 319--328. IEEE Press, 2013.
[48]
L. Wu, L. Du, B. Liu, G. Xu, Y. Ge, Y. Fu, J. Li, Y. Zhou, and H. Xiong. Heterogeneous metric learning with content-based regularization for software artifact retrieval. In Data Mining (ICDM), 2014 IEEE International Conference on, pages 610--619. IEEE, 2014.
[49]
T. Xie and J. Pei. Mapo: Mining API usages from open source repositories. In Proceedings of the 2006 international workshop on Mining software repositories, pages 54--57. ACM, 2006.
[50]
D. Zhang, Y. Liu, L. Si, J. Zhang, and R. D. Lawrence. Multiple instance learning on structured data. In Advances in Neural Information Processing Systems (NIPS), pages 145--153, 2011.
[51]
D.-Q. Zhang, C.-Y. Lin, S.-F. Chang, and J. R. Smith. Semantic video clustering across sources using bipartite spectral clustering. In Multimedia and Expo, 2004. ICME'04. 2004 IEEE International Conference on, volume 1, pages 117--120. IEEE, 2004.
[52]
H. Zhang, A. Jain, G. Khandelwal, C. Kaushik, S. Ge, and W. Hu. Bing developer assistant: Improving developer productivity by recommending sample code. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pages 956--961, New York, NY, USA, 2016. ACM.
[53]
H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. Mapo: Mining and recommending API usage patterns. In ECOOP 2009--Object-Oriented Programming, pages 318--343. Springer, 2009.

Cited By

View all

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering
November 2019
1333 pages
ISBN:9781728125084

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 07 February 2020

Check for updates

Qualifiers

  • Research-article

Conference

ASE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring behaviours of RESTful APIs in an industrial settingSoftware Quality Journal10.1007/s11219-024-09686-032:3(1287-1324)Online publication date: 1-Sep-2024
  • (2024)Exploring API behaviours through generated examplesSoftware Quality Journal10.1007/s11219-024-09668-232:2(729-763)Online publication date: 1-Jun-2024
  • (2023)Big Code Search: A BibliographyACM Computing Surveys10.1145/360490556:1(1-49)Online publication date: 26-Aug-2023
  • (2023)Fitting missing API puzzles with machine translation techniquesExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119477216:COnline publication date: 15-Apr-2023
  • (2022)A study on identifying code author from real developmentProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3560878(1627-1631)Online publication date: 7-Nov-2022
  • (2022)Clone-based code method usage pattern miningProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527880(543-547)Online publication date: 16-May-2022
  • (2021)Opportunities and Challenges in Code Search ToolsACM Computing Surveys10.1145/348002754:9(1-40)Online publication date: 8-Oct-2021
  • (2021)Adversarial attacks to API recommender systemsProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678946(253-265)Online publication date: 15-Nov-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media