[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3324884.3416551acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

API-misuse detection driven by fine-grained API-constraint knowledge graph

Published: 27 January 2021 Publication History

Abstract

API misuses cause significant problem in software development. Existing methods detect API misuses against frequent API usage patterns mined from codebase. They make a naive assumption that API usage that deviates from the most-frequent API usage is a misuse. However, there is a big knowledge gap between API usage patterns and API usage caveats in terms of comprehensiveness, explainability and best practices. In this work, we propose a novel approach that detects API misuses directly against the API caveat knowledge, rather than API usage patterns. We develop open information extraction methods to construct a novel API-constraint knowledge graph from API reference documentation. This knowledge graph explicitly models two types of API-constraint relations (call-order and condition-checking) and enriches return and throw relations with return conditions and exception triggers. It empowers the detection of three types of frequent API misuses - missing calls, missing condition checking and missing exception handling, while existing detectors mostly focus on only missing calls. As a proof-of-concept, we apply our approach to Java SDK API Specification. Our evaluation confirms the high accuracy of the extracted API-constraint relations. Our knowledge-driven API misuse detector achieves 0.60 (68/113) precision and 0.28 (68/239) recall for detecting Java API misuses in the API misuse benchmark MuBench. This performance is significantly higher than that of existing pattern-based API misused detectors. A pilot user study with 12 developers shows that our knowledge-driven API misuse detection is very promising in helping developers avoid API misuses and debug the bugs caused by API misuses.

References

[1]
Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L Mazurek, and Christian Stransky. 2016. You get where you're looking for: The impact of information sources on code security. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 289--305.
[2]
Mithun Acharya and Tao Xie. 2009. Mining API error-handling specifications from source code. In International Conference on Fundamental Approaches to Software Engineering. Springer, 370--384.
[3]
Sven Amann, Sarah Nadi, Hoan A Nguyen, Tien N Nguyen, and Mira Mezini. 2016. MUBench: a benchmark for API-misuse detectors. In Proceedings of the 13th International Conference on Mining Software Repositories. 464--467.
[4]
Sven Amann, Hoan Anh Nguyen, Sarah Nadi, Tien N Nguyen, and Mira Mezini. 2018. A systematic evaluation of static api-misuse detectors. IEEE Transactions on Software Engineering 45, 12 (2018), 1170--1188.
[5]
Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 450--461.
[6]
Guibin Chen, Chunyang Chen, Zhenchang Xing, and Bowen Xu. 2016. Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 744--755.
[7]
Mengsu Chen, Felix Fischer, Na Meng, Xiaoyin Wang, and Jens Grossklags. 2019. How reliable is the crowdsourced knowledge of security implementation?. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 536--547.
[8]
Niklas Eén and Niklas Sörensson. 2003. An extensible SAT-solver. In International conference on theory and applications of satisfiability testing. Springer, 502--518.
[9]
Manuel Egele, David Brumley, Yanick Fratantonio, and Christopher Kruegel. 2013. An empirical study of cryptographic misuse in android applications. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. 73--84.
[10]
Sascha Fahl, Marian Harbach, Thomas Muders, Lars Baumgärtner, Bernd Freisleben, and Matthew Smith. 2012. Why Eve and Mallory love Android: An analysis of Android SSL (in) security. In Proceedings of the 2012 ACM conference on Computer and communications security. 50--61.
[11]
Martin Georgiev, Subodh Iyengar, Suman Jana, Rishita Anubhai, Dan Boneh, and Vitaly Shmatikov. 2012. The most dangerous code in the world: validating SSL certificates in non-browser software. In Proceedings of the 2012 ACM conference on Computer and communications security. 38--49.
[12]
Zhuobing Han, Xiaohong Li, Hongtao Liu, Zhenchang Xing, and Zhiyong Feng. 2018. Deepweak: Reasoning common software weaknesses via knowledge graph embedding. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 456--466.
[13]
Qiao Huang, Xin Xia, Zhenchang Xing, David Lo, and Xinyu Wang. 2018. API method recommendation without worrying about the task-API knowledge gap. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 293--304.
[14]
J Richard Landis and Gary G Koch. 1977. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics (1977), 363--374.
[15]
Woong Ki Lee, Yeon Su Lee, Hyoung-Gyu Lee, Won Ho Ryu, and Hae Chang Rim. 2012. Open Information Extraction for SOV Language Based on Entity-Predicate Pair Detection. In Proceedings of COLING 2012: Demonstration Papers. 305--312.
[16]
Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving api caveats accessibility by mining api caveats knowledge graph. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 183--193.
[17]
Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. ACM SIGSOFT Software Engineering Notes 30, 5 (2005), 306--315.
[18]
Christian Lindig. 2015. Mining patterns and violations using concept analysis. In The Art and Science of Analyzing Software Data. Elsevier, 17--38.
[19]
Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 120--130.
[20]
Walid Maalej and Martin P Robillard. 2013. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering 39, 9 (2013), 1264--1282.
[21]
Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. 55--60.
[22]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. arXiv: Computation and Language (2013).
[23]
Martin Monperrus, Marcel Bruch, and Mira Mezini. 2010. Detecting missing method calls in object-oriented software. In European Conference on Object-Oriented Programming. Springer, 2--25.
[24]
Martin Monperrus and Mira Mezini. 2013. Detecting missing method calls as violations of the majority rule. ACM Transactions on Software Engineering and Methodology (TOSEM) 22, 1 (2013), 1--25.
[25]
Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H Pham, Jafar M Al-Kofahi, and Tien N Nguyen. 2009. Graph-based mining of multiple object usage patterns. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT symposium on the Foundations of Software Engineering. 383--392.
[26]
Tam The Nguyen, Hung Viet Pham, Phong Minh Vu, and Tung Thanh Nguyen. 2015. Recommending API usages for mobile apps with hidden markov model. In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 795--800.
[27]
"Artifact Page". 2017. [Online]. Available: http://www.st.informatik.tu-darmstadt.de/artifacts/mustudy/.
[28]
Bill Pugh and David Hovemeye. 2015. FindBugs. http://findbugs.sourceforge.net/.
[29]
Murali Krishna Ramanathan, Ananth Grama, and Suresh Jagannathan. 2007. Path-sensitive inference of function precedence protocols. In 29th International Conference on Software Engineering (ICSE'07). IEEE, 240--250.
[30]
Murali Krishna Ramanathan, Ananth Grama, and Suresh Jagannathan. 2007. Static specification inference using predicate mining. ACM SIGPLAN Notices 42, 6 (2007), 123--134.
[31]
Anastasia Reinhardt, Tianyi Zhang, Mihir Mathur, and Miryung Kim. 2018. Augmenting stack overflow with API usage patterns mined from GitHub. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 880--883.
[32]
Xiaoxue Ren, Jiamou Sun, Zhenchang Xing, Xin Xia, and Jianling Sun. [n.d.]. Demystify Official API Usage Directives with Crowdsourced API Misuse Scenarios, Erroneous Code Examples and Patches. ([n. d.]).
[33]
Xiaoxue Ren, Zhenchang Xing, Xin Xia, Guoqiang Li, and Jianling Sun. 2019. Discovering, Explaining and Summarizing Controversial Discussions in Community Q&A Sites. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 151--162.
[34]
Martin P Robillard, Eric Bodden, David Kawrykow, Mira Mezini, and Tristan Ratchford. 2012. Automated API property inference techniques. IEEE Transactions on Software Engineering 39, 5 (2012), 613--637.
[35]
Ravindra Singh and Naurang Singh Mangat. 2013. Elements of survey sampling. Vol. 15. Springer Science & Business Media.
[36]
Jiamou Sun, Zhenchang Xing, Rui Chu, Heilai Bai, Jinshui Wang, and Xin Peng. [n.d.]. Know-How in Programming Tasks: From Textual Tutorials to Task-Oriented Knowledge Graph. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 257--268.
[37]
Joshua Sushine, James D Herbsleb, and Jonathan Aldrich. 2015. Searching the state space: A qualitative study of API protocol usability. In 2015 IEEE 23rd International Conference on Program Comprehension. IEEE, 82--93.
[38]
Amann Sven, Hoan Anh Nguyen, Sarah Nadi, Tien N Nguyen, and Mira Mezini. 2019. Investigating next steps in static API-misuse detection. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 265--275.
[39]
Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /* iComment: Bugs or bad comments?*. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles. 145--158.
[40]
Sylvain Thenault et al. 2006. Pylint¡ÂaCode Analysis for Python.
[41]
Suresh Thummalapenta and Tao Xie. 2009. Alattin: Mining alternative patterns for detecting neglected conditions. In 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE, 283--294.
[42]
Suresh Thummalapenta and Tao Xie. 2009. Mining exception-handling rules as sequence association rules. In 2009 IEEE 31st International Conference on Software Engineering. IEEE, 496--506.
[43]
Andrzej Wasylkowski and Andreas Zeller. 2011. Mining temporal specifications from object usage. Automated Software Engineering 18, 3--4 (2011), 263--292.
[44]
Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting object usage anomalies. In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 35--44.
[45]
Bowen Xu, Deheng Ye, Zhenchang Xing, Xin Xia, Guibin Chen, and Shanping Li. 2016. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 51--62.
[46]
Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th international conference on software engineering. ACM, 404--415.
[47]
Xuejiao Zhao, Zhenchang Xing, Muhammad Ashad Kabir, Naoya Sawada, Jing Li, and Shangwei Lin. 2017. HDSKG: Harvesting domain specific knowledge graph from content of webpages. (2017), 56--67.
[48]
Hao Zhong, Lu Zhang, Tao Xie, and Hong Mei. 2009. Inferring resource specifications from natural language API documentation. In 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE, 307--318.
[49]
Yu Zhou, Ruihang Gu, Taolue Chen, Zhiqiu Huang, Sebastiano Panichella, and Harald Gall. 2017. Analyzing APIs documentation and code to detect directive defects. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 27--37.

Cited By

View all
  • (2025)MITU: Locating relevant tutorial fragments of APIs with multi-source API knowledgeJournal of Systems and Software10.1016/j.jss.2024.112296222(112296)Online publication date: Apr-2025
  • (2025)ROS package search for robot software development: a knowledge graph-based approachFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-3660-919:6Online publication date: 1-Jun-2025
  • (2024)Security Analysis of Large Language Models on API Misuse Programming RepairInternational Journal of Intelligent Systems10.1155/2024/71357652024Online publication date: 1-Jan-2024
  • Show More Cited By

Index Terms

  1. API-misuse detection driven by fine-grained API-constraint knowledge graph

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering
    December 2020
    1449 pages
    ISBN:9781450367684
    DOI:10.1145/3324884
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 January 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    • Australian Research Council's Discovery Early Career Researcher Award (DECRA)
    • National Key R&D Program of China
    • Alibaba-Zhejiang University Joint Institute of Frontier Technologies
    • NSFC
    • ANU-Data61 Collaborative Research Project

    Conference

    ASE '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 82 of 337 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)126
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 07 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)MITU: Locating relevant tutorial fragments of APIs with multi-source API knowledgeJournal of Systems and Software10.1016/j.jss.2024.112296222(112296)Online publication date: Apr-2025
    • (2025)ROS package search for robot software development: a knowledge graph-based approachFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-3660-919:6Online publication date: 1-Jun-2025
    • (2024)Security Analysis of Large Language Models on API Misuse Programming RepairInternational Journal of Intelligent Systems10.1155/2024/71357652024Online publication date: 1-Jan-2024
    • (2024)Let’s Discover More API Relations: A Large Language Model-Based AI Chain for Unsupervised API Relation InferenceACM Transactions on Software Engineering and Methodology10.1145/368046933:8(1-34)Online publication date: 23-Jul-2024
    • (2024)An Empirical Study of API Misuses of Data-Centric LibrariesProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686685(245-256)Online publication date: 24-Oct-2024
    • (2024)DAInfer: Inferring API Aliasing Specifications from Library Documentation via Neurosymbolic OptimizationProceedings of the ACM on Software Engineering10.1145/36608161:FSE(2469-2492)Online publication date: 12-Jul-2024
    • (2024)A First Look at Security and Privacy Risks in the RapidAPI EcosystemProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690294(1626-1640)Online publication date: 2-Dec-2024
    • (2024)API Misuse Detection via Probabilistic Graphical ModelProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652112(88-99)Online publication date: 11-Sep-2024
    • (2024)Boosting API Misuse Detection via Integrating API Constraints from Multiple SourcesProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644904(14-26)Online publication date: 15-Apr-2024
    • (2024)An Adaptive Logging System (ALS): Enhancing Software Logging with Reinforcement Learning TechniquesProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645033(37-47)Online publication date: 7-May-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media