More Web Proxy on the site http://driver.im/

research-article

CodeKernel: a graph kernel based approach to the selection of API usage examples

Authors:

Sunghun KimAuthors Info & Claims

ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering

Pages 590 - 601

https://doi.org/10.1109/ASE.2019.00061

Published: 07 February 2020 Publication History

Abstract

Developers often want to find out how to use a certain API (e.g., FileReader.read in JDK library). API usage examples are very helpful in this regard. Over the years, many automated methods have been proposed to generate code examples by clustering and summarizing relevant code snippets extracted from a code corpus. These approaches simplify source code as method invocation sequences or feature vectors. Such simplifications only model partial aspects of the code and tend to yield inaccurate examples.

We propose CodeKernel, a graph kernel based approach to the selection of API usage examples. Instead of approximating source code as method invocation sequences or feature vectors, CodeKernel represents source code as object usage graphs. Then, it clusters graphs by embedding them into a continuous space using a graph kernel. Finally, it outputs code examples by selecting a representative graph from each cluster using designed ranking metrics. Our empirical evaluation shows that CodeKernel selects more accurate code examples than the related work (MUSE and eXoaDocs). A user study involving 25 developers in a multinational company also confirms the usefulness of CodeKernel in selecting API usage examples.

References

[1]

Github Search. https://github.com/search?type=code.

[2]

S. Harris. (2003) Simian - Similarity Analyzer.[Online]. Available: http://www.harukizaemon.com/simian/.

[3]

Spectral Clusterer for WEKA. http://www.luigidragone.com/software/spectral-clusterer-for-weka/.

[4]

Stack Overflow. http://stackoverflow.com/.

[5]

M. Allamanis and C. Sutton. Mining idioms from source code. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 472--483. ACM, 2014.

Digital Library

[6]

C. M. Bishop. Pattern recognition and machine learning. springer, 2006.

[7]

K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Data Mining, Fifth IEEE International Conference on, pages 8--pp. IEEE, 2005.

Digital Library

[8]

K. M. Borgwardt, C. S. Ong, S. Schönauer, S. Vishwanathan, A. J. Smola, and H.-P. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(suppl 1):i47--i56, 2005.

Digital Library

[9]

M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 213--222. ACM, 2009.

Digital Library

[10]

R. P. Buse and W. Weimer. Synthesizing API usage examples. In Software Engineering (ICSE), 2012 34th International Conference on, pages 782--792. IEEE, 2012.

[11]

C. Cai, L. Han, Z. Ji, and Y. Chen. Enzyme family classification by support vector machines. Proteins: Structure, Function, and Bioinformatics, 55(1):66--76, 2004.

[12]

D. Cai, X. He, and J. Han. Document clustering using locality preserving indexing. Knowledge and Data Engineering, IEEE Transactions on, 17(12):1624--1637, 2005.

Digital Library

[13]

J. Fowkes and C. Sutton. Parameter-free probabilistic api mining at github scale. arXiv preprint arXiv:1512.05558, 2015.

[14]

T. Gärtner. A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter, 5(1):49--58, 2003.

Digital Library

[15]

M. Ghafari, K. Rubinov, and M. M. Pourhashem K. Mining unit test cases to synthesize api usage examples. Journal of Software: Evolution and Process, 29(12):e1841, 2017.

[16]

X. Gu, H. Zhang, and S. Kim. Deep code search. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pages 933--944. IEEE, 2018.

Digital Library

[17]

X. Gu, H. Zhang, D. Zhang, and S. Kim. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 631--642. ACM, 2016.

Digital Library

[18]

J. Han and C. Moraga. The influence of the sigmoid function parameters on the speed of backpropagation learning. In From Natural to Artificial Neural Computation, pages 195--201. Springer, 1995.

[19]

B. Hummel, E. Juergens, L. Heinemann, and M. Conradt. Index-based code clone detection: incremental, distributed, scalable. In Software Maintenance (ICSM), 2010 IEEE International Conference on, pages 1--9. IEEE, 2010.

Digital Library

[20]

I. Keivanloo, J. Rilling, and Y. Zou. Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering, pages 664--675. ACM, 2014.

Digital Library

[21]

J. Kim, S. Lee, S.-w. Hwang, and S. Kim. Towards an intelligent code search engine. In Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.

Digital Library

[22]

S. Kim, T. Zimmermann, and N. Nagappan. Crash graphs: An aggregated view of multiple crashes to improve crash triage. In Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conference on, pages 486--493. IEEE, 2011.

Digital Library

[23]

A. Kuhn, S. Ducasse, and T. Gírba. Semantic clustering: Identifying topics in source code. Information and Software Technology, 49(3):230--243, 2007.

Digital Library

[24]

B. Kulis, S. Basu, I. Dhillon, and R. Mooney. Semi-supervised graph clustering: a kernel approach. Machine learning, 74(1):1--22, 2009.

Digital Library

[25]

F. Lv, H. Zhang, J. Lou, S. Wang, D. Zhang, and J. Zhao. Codehow: Effective code search based on API understanding and extended boolean model. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, ASE '15, pages 260--270, Piscataway, NJ, USA, 2015. IEEE Press.

Digital Library

[26]

C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.

[27]

A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 169--178. ACM, 2000.

Digital Library

[28]

L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus. How can I use this method? In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE'15). IEEE, 2015.

[29]

A. Narayanan, G. Meng, L. Yang, J. Liu, and L. Chen. Contextual weisfeiler-lehman graph kernel for malware detection. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 4701--4708. IEEE, 2016.

[30]

A. T. Nguyen, M. Hilton, M. Codoban, H. A. Nguyen, L. Mast, E. Rademacher, T. N. Nguyen, and D. Dig. Api code recommendation using statistical learning from fine-grained changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 511--522. ACM, 2016.

Digital Library

[31]

A. T. Nguyen, T. D. Nguyen, H. D. Phan, and T. N. Nguyen. A deep neural network language model with contexts for source code. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 323--334. IEEE, 2018.

[32]

A. T. Nguyen and T. N. Nguyen. Graph-based statistical language model for code. In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE'15). IEEE, 2015.

[33]

A. T. Nguyen, T. T. Nguyen, H. A. Nguyen, A. Tamrawi, H. V. Nguyen, J. Al-Kofahi, and T. N. Nguyen. Graph-based pattern-oriented, context-sensitive source code completion. In Proceedings of the 34th International Conference on Software Engineering, pages 69--79. IEEE Press, 2012.

Digital Library

[34]

H. A. Nguyen, T. T. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Accurate and efficient structural characteristic feature extraction for clone detection. In Fundamental Approaches to Software Engineering, pages 440--455. Springer, 2009.

Digital Library

[35]

P. Nguyen, J. Di Rocco, D. Ruscio, L. Ochoa, T. Degueule, and M. Di Penta. Focus: A recommender system for mining API function calls and usage patterns. In 41st ACM/IEEE International Conference on Software Engineering (ICSE), 2019.

Digital Library

[36]

T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Graph-based mining of multiple object usage patterns. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 383--392. ACM, 2009.

Digital Library

[37]

T. T. Nguyen, H. V. Pham, P. M. Vu, and T. T. Nguyen. Learning API usages from bytecode: A statistical approach. arXiv preprint arXiv:1507.07306, 2015.

[38]

H. Niu, I. Keivanloo, and Y. Zou. Learning to rank code examples for code search engines. Empirical Software Engineering, 22(1):259--291, 2017.

Digital Library

[39]

H.-S. Park and C.-H. Jun. A simple and fast algorithm for k-medoids clustering. Expert Systems with Applications, 36(2):3336--3341, 2009.

Digital Library

[40]

J. C. Platt. Autoalbum: Clustering digital photographs using probabilistic model merging. In Content-based Access of Image and Video Libraries, 2000. Proceedings. IEEE Workshop on, pages 96--100. IEEE, 2000.

[41]

L. R. Rabiner and B.-H. Juang. An introduction to hidden markov models. ASSP Magazine, IEEE, 3(1):4--16, 1986.

[42]

L. Ralaivola, S. J. Swamidass, H. Saigo, and P. Baldi. Graph kernels for chemical informatics. Neural Networks, 18(8):1093--1110, 2005.

Digital Library

[43]

K. D. Rosa, R. Shah, B. Lin, A. Gershman, and R. Frederking. Topical clustering of tweets. Proceedings of the ACM SIGIR: SWSM, 2011.

[44]

C. K. Roy, J. R. Cordy, and R. Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming, 74(7):470--495, 2009.

Digital Library

[45]

J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge university press, 2004.

[46]

S. V. N. Vishwanathan, N. N. Schraudolph, R. Kondor, and K. M. Borgwardt. Graph kernels. The Journal of Machine Learning Research, 11:1201--1242, 2010.

Digital Library

[47]

J. Wang, Y. Dang, H. Zhang, K. Chen, T. Xie, and D. Zhang. Mining succinct and high-coverage API usage patterns from source code. In Proceedings of the 10th Working Conference on Mining Software Repositories, pages 319--328. IEEE Press, 2013.

Digital Library

[48]

L. Wu, L. Du, B. Liu, G. Xu, Y. Ge, Y. Fu, J. Li, Y. Zhou, and H. Xiong. Heterogeneous metric learning with content-based regularization for software artifact retrieval. In Data Mining (ICDM), 2014 IEEE International Conference on, pages 610--619. IEEE, 2014.

Digital Library

[49]

T. Xie and J. Pei. Mapo: Mining API usages from open source repositories. In Proceedings of the 2006 international workshop on Mining software repositories, pages 54--57. ACM, 2006.

Digital Library

[50]

D. Zhang, Y. Liu, L. Si, J. Zhang, and R. D. Lawrence. Multiple instance learning on structured data. In Advances in Neural Information Processing Systems (NIPS), pages 145--153, 2011.

[51]

D.-Q. Zhang, C.-Y. Lin, S.-F. Chang, and J. R. Smith. Semantic video clustering across sources using bipartite spectral clustering. In Multimedia and Expo, 2004. ICME'04. 2004 IEEE International Conference on, volume 1, pages 117--120. IEEE, 2004.

[52]

H. Zhang, A. Jain, G. Khandelwal, C. Kaushik, S. Ge, and W. Hu. Bing developer assistant: Improving developer productivity by recommending sample code. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pages 956--961, New York, NY, USA, 2016. ACM.

Digital Library

[53]

H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. Mapo: Mining and recommending API usage patterns. In ECOOP 2009--Object-Oriented Programming, pages 318--343. Springer, 2009.

Digital Library

Cited By

Karlsson SJongeling RČaušević ASundmark D(2024)Exploring behaviours of RESTful APIs in an industrial settingSoftware Quality Journal10.1007/s11219-024-09686-032:3(1287-1324)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s11219-024-09686-0
Karlsson SHughes JJongeling RČaušević ASundmark D(2024)Exploring API behaviours through generated examplesSoftware Quality Journal10.1007/s11219-024-09668-232:2(729-763)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s11219-024-09668-2
Kim KGhatpande SKim DZhou XLiu KBissyandé TKlein JLe Traon Y(2023)Big Code Search: A BibliographyACM Computing Surveys10.1145/360490556:1(1-49)Online publication date: 26-Aug-2023
https://dl.acm.org/doi/10.1145/3604905
Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering

November 2019

1333 pages

ISBN:9781728125084

General Chair:
Thomas Zimmermann
Microsoft Research
,
Program Chairs:
Julia Lawall
Inria/LIP6, France
,
Darko Marinov
University of Illinois at Urbana-Champaign

Sponsors

In-Cooperation

IEEE CS

Publisher

IEEE Press

Publication History

Published: 07 February 2020

Check for updates

Qualifiers

Research-article

Conference

ASE '19

Sponsor:

ASE '19: 34nd IEEE/ACM International Conference on Automated Software Engineering

November 10 - 15, 2019

California, San Diego

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
84
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Karlsson SJongeling RČaušević ASundmark D(2024)Exploring behaviours of RESTful APIs in an industrial settingSoftware Quality Journal10.1007/s11219-024-09686-032:3(1287-1324)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s11219-024-09686-0
Karlsson SHughes JJongeling RČaušević ASundmark D(2024)Exploring API behaviours through generated examplesSoftware Quality Journal10.1007/s11219-024-09668-232:2(729-763)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s11219-024-09668-2
Kim KGhatpande SKim DZhou XLiu KBissyandé TKlein JLe Traon Y(2023)Big Code Search: A BibliographyACM Computing Surveys10.1145/360490556:1(1-49)Online publication date: 26-Aug-2023
https://dl.acm.org/doi/10.1145/3604905
Nguyen PDi Sipio CDi Rocco JDi Ruscio DDi Penta M(2023)Fitting missing API puzzles with machine translation techniquesExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119477216:COnline publication date: 15-Apr-2023
https://dl.acm.org/doi/10.1016/j.eswa.2022.119477
Gong SZhong HRoychoudhury ACadar CKim M(2022)A study on identifying code author from real developmentProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3560878(1627-1631)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3560878
Xue ZZhang YXu RRastogi ATufano RBavota GArnaoudova VHaiduc S(2022)Clone-based code method usage pattern miningProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527880(543-547)Online publication date: 16-May-2022
https://dl.acm.org/doi/10.1145/3524610.3527880
Liu CXia XLo DGao CYang XGrundy J(2021)Opportunities and Challenges in Code Search ToolsACM Computing Surveys10.1145/348002754:9(1-40)Online publication date: 8-Oct-2021
https://dl.acm.org/doi/10.1145/3480027
Nguyen PDi Sipio CDi Rocco JDi Penta MDi Ruscio DGrundy J(2021)Adversarial attacks to API recommender systemsProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678946(253-265)Online publication date: 15-Nov-2021
https://dl.acm.org/doi/10.1109/ASE51524.2021.9678946

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents