More Web Proxy on the site http://driver.im/

research-article

API-misuse detection driven by fine-grained API-constraint knowledge graph

Authors:

Zhenchang Xing,

Jianling SunAuthors Info & Claims

ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

Pages 461 - 472

https://doi.org/10.1145/3324884.3416551

Published: 27 January 2021 Publication History

Abstract

API misuses cause significant problem in software development. Existing methods detect API misuses against frequent API usage patterns mined from codebase. They make a naive assumption that API usage that deviates from the most-frequent API usage is a misuse. However, there is a big knowledge gap between API usage patterns and API usage caveats in terms of comprehensiveness, explainability and best practices. In this work, we propose a novel approach that detects API misuses directly against the API caveat knowledge, rather than API usage patterns. We develop open information extraction methods to construct a novel API-constraint knowledge graph from API reference documentation. This knowledge graph explicitly models two types of API-constraint relations (call-order and condition-checking) and enriches return and throw relations with return conditions and exception triggers. It empowers the detection of three types of frequent API misuses - missing calls, missing condition checking and missing exception handling, while existing detectors mostly focus on only missing calls. As a proof-of-concept, we apply our approach to Java SDK API Specification. Our evaluation confirms the high accuracy of the extracted API-constraint relations. Our knowledge-driven API misuse detector achieves 0.60 (68/113) precision and 0.28 (68/239) recall for detecting Java API misuses in the API misuse benchmark MuBench. This performance is significantly higher than that of existing pattern-based API misused detectors. A pilot user study with 12 developers shows that our knowledge-driven API misuse detection is very promising in helping developers avoid API misuses and debug the bugs caused by API misuses.

References

[1]

Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L Mazurek, and Christian Stransky. 2016. You get where you're looking for: The impact of information sources on code security. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 289--305.

[2]

Mithun Acharya and Tao Xie. 2009. Mining API error-handling specifications from source code. In International Conference on Fundamental Approaches to Software Engineering. Springer, 370--384.

Digital Library

[3]

Sven Amann, Sarah Nadi, Hoan A Nguyen, Tien N Nguyen, and Mira Mezini. 2016. MUBench: a benchmark for API-misuse detectors. In Proceedings of the 13th International Conference on Mining Software Repositories. 464--467.

Digital Library

[4]

Sven Amann, Hoan Anh Nguyen, Sarah Nadi, Tien N Nguyen, and Mira Mezini. 2018. A systematic evaluation of static api-misuse detectors. IEEE Transactions on Software Engineering 45, 12 (2018), 1170--1188.

[5]

Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 450--461.

Digital Library

[6]

Guibin Chen, Chunyang Chen, Zhenchang Xing, and Bowen Xu. 2016. Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 744--755.

Digital Library

[7]

Mengsu Chen, Felix Fischer, Na Meng, Xiaoyin Wang, and Jens Grossklags. 2019. How reliable is the crowdsourced knowledge of security implementation?. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 536--547.

Digital Library

[8]

Niklas Eén and Niklas Sörensson. 2003. An extensible SAT-solver. In International conference on theory and applications of satisfiability testing. Springer, 502--518.

[9]

Manuel Egele, David Brumley, Yanick Fratantonio, and Christopher Kruegel. 2013. An empirical study of cryptographic misuse in android applications. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. 73--84.

Digital Library

[10]

Sascha Fahl, Marian Harbach, Thomas Muders, Lars Baumgärtner, Bernd Freisleben, and Matthew Smith. 2012. Why Eve and Mallory love Android: An analysis of Android SSL (in) security. In Proceedings of the 2012 ACM conference on Computer and communications security. 50--61.

Digital Library

[11]

Martin Georgiev, Subodh Iyengar, Suman Jana, Rishita Anubhai, Dan Boneh, and Vitaly Shmatikov. 2012. The most dangerous code in the world: validating SSL certificates in non-browser software. In Proceedings of the 2012 ACM conference on Computer and communications security. 38--49.

Digital Library

[12]

Zhuobing Han, Xiaohong Li, Hongtao Liu, Zhenchang Xing, and Zhiyong Feng. 2018. Deepweak: Reasoning common software weaknesses via knowledge graph embedding. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 456--466.

[13]

Qiao Huang, Xin Xia, Zhenchang Xing, David Lo, and Xinyu Wang. 2018. API method recommendation without worrying about the task-API knowledge gap. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 293--304.

Digital Library

[14]

J Richard Landis and Gary G Koch. 1977. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics (1977), 363--374.

[15]

Woong Ki Lee, Yeon Su Lee, Hyoung-Gyu Lee, Won Ho Ryu, and Hae Chang Rim. 2012. Open Information Extraction for SOV Language Based on Entity-Predicate Pair Detection. In Proceedings of COLING 2012: Demonstration Papers. 305--312.

[16]

Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving api caveats accessibility by mining api caveats knowledge graph. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 183--193.

[17]

Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. ACM SIGSOFT Software Engineering Notes 30, 5 (2005), 306--315.

Digital Library

[18]

Christian Lindig. 2015. Mining patterns and violations using concept analysis. In The Art and Science of Analyzing Software Data. Elsevier, 17--38.

[19]

Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 120--130.

Digital Library

[20]

Walid Maalej and Martin P Robillard. 2013. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering 39, 9 (2013), 1264--1282.

Digital Library

[21]

Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. 55--60.

[22]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. arXiv: Computation and Language (2013).

[23]

Martin Monperrus, Marcel Bruch, and Mira Mezini. 2010. Detecting missing method calls in object-oriented software. In European Conference on Object-Oriented Programming. Springer, 2--25.

Digital Library

[24]

Martin Monperrus and Mira Mezini. 2013. Detecting missing method calls as violations of the majority rule. ACM Transactions on Software Engineering and Methodology (TOSEM) 22, 1 (2013), 1--25.

Digital Library

[25]

Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H Pham, Jafar M Al-Kofahi, and Tien N Nguyen. 2009. Graph-based mining of multiple object usage patterns. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT symposium on the Foundations of Software Engineering. 383--392.

Digital Library

[26]

Tam The Nguyen, Hung Viet Pham, Phong Minh Vu, and Tung Thanh Nguyen. 2015. Recommending API usages for mobile apps with hidden markov model. In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 795--800.

Digital Library

[27]

"Artifact Page". 2017. [Online]. Available: http://www.st.informatik.tu-darmstadt.de/artifacts/mustudy/.

[28]

Bill Pugh and David Hovemeye. 2015. FindBugs. http://findbugs.sourceforge.net/.

[29]

Murali Krishna Ramanathan, Ananth Grama, and Suresh Jagannathan. 2007. Path-sensitive inference of function precedence protocols. In 29th International Conference on Software Engineering (ICSE'07). IEEE, 240--250.

Digital Library

[30]

Murali Krishna Ramanathan, Ananth Grama, and Suresh Jagannathan. 2007. Static specification inference using predicate mining. ACM SIGPLAN Notices 42, 6 (2007), 123--134.

Digital Library

[31]

Anastasia Reinhardt, Tianyi Zhang, Mihir Mathur, and Miryung Kim. 2018. Augmenting stack overflow with API usage patterns mined from GitHub. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 880--883.

Digital Library

[32]

Xiaoxue Ren, Jiamou Sun, Zhenchang Xing, Xin Xia, and Jianling Sun. [n.d.]. Demystify Official API Usage Directives with Crowdsourced API Misuse Scenarios, Erroneous Code Examples and Patches. ([n. d.]).

[33]

Xiaoxue Ren, Zhenchang Xing, Xin Xia, Guoqiang Li, and Jianling Sun. 2019. Discovering, Explaining and Summarizing Controversial Discussions in Community Q&A Sites. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 151--162.

Digital Library

[34]

Martin P Robillard, Eric Bodden, David Kawrykow, Mira Mezini, and Tristan Ratchford. 2012. Automated API property inference techniques. IEEE Transactions on Software Engineering 39, 5 (2012), 613--637.

Digital Library

[35]

Ravindra Singh and Naurang Singh Mangat. 2013. Elements of survey sampling. Vol. 15. Springer Science & Business Media.

[36]

Jiamou Sun, Zhenchang Xing, Rui Chu, Heilai Bai, Jinshui Wang, and Xin Peng. [n.d.]. Know-How in Programming Tasks: From Textual Tutorials to Task-Oriented Knowledge Graph. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 257--268.

[37]

Joshua Sushine, James D Herbsleb, and Jonathan Aldrich. 2015. Searching the state space: A qualitative study of API protocol usability. In 2015 IEEE 23rd International Conference on Program Comprehension. IEEE, 82--93.

Digital Library

[38]

Amann Sven, Hoan Anh Nguyen, Sarah Nadi, Tien N Nguyen, and Mira Mezini. 2019. Investigating next steps in static API-misuse detection. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 265--275.

Digital Library

[39]

Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /^* iComment: Bugs or bad comments?^*. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles. 145--158.

Digital Library

[40]

Sylvain Thenault et al. 2006. PylintÂ¡Â^aCode Analysis for Python.

[41]

Suresh Thummalapenta and Tao Xie. 2009. Alattin: Mining alternative patterns for detecting neglected conditions. In 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE, 283--294.

Digital Library

[42]

Suresh Thummalapenta and Tao Xie. 2009. Mining exception-handling rules as sequence association rules. In 2009 IEEE 31st International Conference on Software Engineering. IEEE, 496--506.

Digital Library

[43]

Andrzej Wasylkowski and Andreas Zeller. 2011. Mining temporal specifications from object usage. Automated Software Engineering 18, 3--4 (2011), 263--292.

Digital Library

[44]

Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting object usage anomalies. In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 35--44.

Digital Library

[45]

Bowen Xu, Deheng Ye, Zhenchang Xing, Xin Xia, Guibin Chen, and Shanping Li. 2016. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 51--62.

Digital Library

[46]

Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th international conference on software engineering. ACM, 404--415.

Digital Library

[47]

Xuejiao Zhao, Zhenchang Xing, Muhammad Ashad Kabir, Naoya Sawada, Jing Li, and Shangwei Lin. 2017. HDSKG: Harvesting domain specific knowledge graph from content of webpages. (2017), 56--67.

[48]

Hao Zhong, Lu Zhang, Tao Xie, and Hong Mei. 2009. Inferring resource specifications from natural language API documentation. In 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE, 307--318.

Digital Library

[49]

Yu Zhou, Ruihang Gu, Taolue Chen, Zhiqiu Huang, Sebastiano Panichella, and Harald Gall. 2017. Analyzing APIs documentation and code to detect directive defects. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 27--37.

Digital Library

Cited By

Wu DZhang HFeng YDong Z(2025)MITU: Locating relevant tutorial fragments of APIs with multi-source API knowledgeJournal of Systems and Software10.1016/j.jss.2024.112296222(112296)Online publication date: Apr-2025
https://doi.org/10.1016/j.jss.2024.112296
Wang SMao XYang SWu MZhang Z(2025)ROS package search for robot software development: a knowledge graph-based approachFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-3660-919:6Online publication date: 1-Jun-2025
https://dl.acm.org/doi/10.1007/s11704-024-3660-9
Zhang RQiao ZYu Y(2024)Security Analysis of Large Language Models on API Misuse Programming RepairInternational Journal of Intelligent Systems10.1155/2024/71357652024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/7135765
Show More Cited By

Index Terms

API-misuse detection driven by fine-grained API-constraint knowledge graph
1. Software and its engineering
  1. Software notations and tools
    1. Software libraries and repositories

Recommendations

KGAMD: an API-misuse detector driven by fine-grained API-constraint knowledge graph
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Application Programming Interfaces (APIs) typically come with usage constraints. The violations of these constraints (i.e. API misuses) can cause significant problems in software development. Existing methods mine frequent API usage patterns from ...
Boosting API Misuse Detection via Integrating API Constraints from Multiple Sources
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

In modern software development, developers access reusable functionality provided by third-party libraries through Application Programming Interfaces (APIs). However, using APIs requires developers to conform specific constraints and guidelines, ...
Cooperative API misuse detection using correction rules
ICSE-NIER '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results

Application Programming Interfaces (APIs) grant developers access to the functionalities of code libraries. Due to missing knowledge of how an API is correctly used, developers can unintentionally misuse APIs, and thus introduce bugs. To tackle this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

December 2020

1449 pages

ISBN:9781450367684

DOI:10.1145/3324884

General Chair:
John Grundy,
Program Chairs:
Claire Le Goues,
David Lo

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Australian Research Council's Discovery Early Career Researcher Award (DECRA)
National Key R&D Program of China
Alibaba-Zhejiang University Joint Institute of Frontier Technologies
NSFC
ANU-Data61 Collaborative Research Project

Conference

ASE '20

Sponsor:

ASE '20: 35th IEEE/ACM International Conference on Automated Software Engineering

December 21 - 25, 2020

Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
471
Total Downloads

Downloads (Last 12 months)126
Downloads (Last 6 weeks)12

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu DZhang HFeng YDong Z(2025)MITU: Locating relevant tutorial fragments of APIs with multi-source API knowledgeJournal of Systems and Software10.1016/j.jss.2024.112296222(112296)Online publication date: Apr-2025
https://doi.org/10.1016/j.jss.2024.112296
Wang SMao XYang SWu MZhang Z(2025)ROS package search for robot software development: a knowledge graph-based approachFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-3660-919:6Online publication date: 1-Jun-2025
https://dl.acm.org/doi/10.1007/s11704-024-3660-9
Zhang RQiao ZYu Y(2024)Security Analysis of Large Language Models on API Misuse Programming RepairInternational Journal of Intelligent Systems10.1155/2024/71357652024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/7135765
Huang QSun YXing ZCao YChen JXu XJin HLu J(2024)Let’s Discover More API Relations: A Large Language Model-Based AI Chain for Unsupervised API Relation InferenceACM Transactions on Software Engineering and Methodology10.1145/368046933:8(1-34)Online publication date: 23-Jul-2024
https://dl.acm.org/doi/10.1145/3680469
Galappaththi ANadi STreude C(2024)An Empirical Study of API Misuses of Data-Centric LibrariesProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686685(245-256)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3686685
Wang CZhang JWu RZhang C(2024)DAInfer: Inferring API Aliasing Specifications from Library Documentation via Neurosymbolic OptimizationProceedings of the ACM on Software Engineering10.1145/36608161:FSE(2469-2492)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660816
Liao SCheng LLuo XSong ZCai HYao DHu HLuo BLiao XXu JKirda ELie D(2024)A First Look at Security and Privacy Risks in the RapidAPI EcosystemProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690294(1626-1640)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3690294
Ma YTian WGao XSun HLi LChristakis MPradel M(2024)API Misuse Detection via Probabilistic Graphical ModelProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652112(88-99)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652112
Li CZhang JTang YLi ZSun TSpinellis DConstantinou EBacchelli A(2024)Boosting API Misuse Detection via Integrating API Constraints from Multiple SourcesProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644904(14-26)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644904
Khosravi Tabrizi AEzzati-Jivan NTetreault FBalsamo SKnottenbelt WAbad CShang W(2024)An Adaptive Logging System (ALS): Enhancing Software Logging with Reinforcement Learning TechniquesProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645033(37-47)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3629526.3645033
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents