[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3416505.3423564acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

DeepIaC: deep learning-based linguistic anti-pattern detection in IaC

Published: 13 November 2020 Publication History

Abstract

Linguistic anti-patterns are recurring poor practices concerning inconsistencies among the naming, documentation, and implementation of an entity. They impede readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in infrastructure as code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their names. To this end, we propose a novel automated approach that employs word embeddings and deep learning techniques. We build and use the abstract syntax tree of IaC code units to create their code embedments. Our experiments with a dataset systematically extracted from open source repositories show that our approach yields an accuracy between 0.785 and 0.915 in detecting inconsistencies.

References

[1]
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In International conference on machine learning. 2091-2100.
[2]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, Article 40 ( Jan. 2019 ), 29 pages.
[3]
V. Arnaoudova, M. Di Penta, G. Antoniol, and Y. GuÃľhÃľneuc. 2013. A New Family of Software Anti-patterns: Linguistic Anti-patterns. In 2013 17th European Conference on Software Maintenance and Reengineering. 187-196.
[4]
S. Fakhoury, V. Arnaoudova, C. Noiseux, F. Khomh, and G. Antoniol. 2018. Keep it simple: Is deep learning good for linguistic smell detection?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 602-611.
[5]
Martin Folwer. 1999. Refactoring: Improving the Design of Existing Programs. ( 1999 ).
[6]
Michele Guerriero, Martin Garriga, Damian A Tamburri, and Fabio Palomba. 2019. Adoption, Support, and Challenges of Infrastructure-as-Code: Insights from Industry. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 580-589.
[7]
Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2011. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering 38, 6 ( 2011 ), 1276-1304.
[8]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1746-1751.
[9]
Indika Kumara et al. 2020. Towards Semantic Detection of Smells in Cloud Infrastructure Code. In Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics (WIMS 2020 ). Association for Computing Machinery, New York, NY, USA, 63âĂŞ67. https://doi.org/10.1145/3405962.3405979
[10]
G. Li, H. Liu, J. Jin, and Q. Umer. 2020. Deep Learning Based Identification of Suspicious Return Statements. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. 480-491.
[11]
K. Liu et al. 2019. Learning to Spot and Refactor Inconsistent Method Names. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 1-12.
[12]
Masakazu Matsugu, Katsuhiko Mori, Yusuke Mitari, and Yuji Kaneda. 2003. Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Networks 16, 5-6 ( 2003 ), 555-559.
[13]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jef Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111-3119.
[14]
David S Moore, William I Notz, and Michael A Fligner. 2015. The basic practice of statistics. Macmillan Higher Education.
[15]
Kief Morris. 2016. Infrastructure as code: managing servers in the cloud. " O'Reilly Media, Inc.".
[16]
Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang. 2, Article 147 (Oct. 2018 ), 25 pages. https://doi.org/10.1145/3276517
[17]
Akond Rahman, Rezvan Mahdavi-Hezaveh, and Laurie Williams. 2018. Where Are The Gaps? A Systematic Mapping Study of Infrastructure as Code Research. arXiv preprint arXiv: 1807. 04872 ( 2018 ).
[18]
Akond Rahman, Chris Parnin, and Laurie Williams. 2019. The Seven Sins: Security Smells in Infrastructure as Code Scripts. In Proceedings of the 41st International Conference on Software Engineering. 164-175.
[19]
Akond Rahman and Laurie Williams. 2018. Characterizing defective configuration scripts used for continuous deployment. In 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST). IEEE, 34-45.
[20]
Akond Rahman and Laurie Williams. 2019. Source code properties of defective infrastructure as code scripts. Information and Software Technology 112 ( 2019 ), 148-163.
[21]
Julian Schwarz, Andreas Stefens, and Horst Lichter. 2018. Code Smells in Infrastructure as Code. In 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC). IEEE, 220-228.
[22]
Tushar Sharma, Marios Fragkoulis, and Diomidis Spinellis. 2016. Does your configuration code smell?. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). IEEE, 189-200.
[23]
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. Pydriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 908-911.
[24]
Peng Wang et al. 2015. Semantic clustering and convolutional neural network for short text categorization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 352-357.

Cited By

View all
  • (2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
  • (2024)Repairing Infrastructure-as-Code using Large Language Models2024 IEEE Secure Development Conference (SecDev)10.1109/SecDev61143.2024.00008(20-27)Online publication date: 7-Oct-2024
  • (2023)A Comparative Analysis on the Detection of Web Service Anti-Patterns Using Various MetricsProceedings of the 16th Innovations in Software Engineering Conference10.1145/3578527.3578534(1-7)Online publication date: 23-Feb-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MaLTeSQuE 2020: Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation
November 2020
36 pages
ISBN:9781450381246
DOI:10.1145/3416505
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code Embedding
  2. Deep Learning
  3. Defects
  4. IaC
  5. Infrastructure Code
  6. Linguistic Anti-patterns
  7. Word2Vec

Qualifiers

  • Research-article

Conference

ESEC/FSE '20
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)152
  • Downloads (Last 6 weeks)15
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
  • (2024)Repairing Infrastructure-as-Code using Large Language Models2024 IEEE Secure Development Conference (SecDev)10.1109/SecDev61143.2024.00008(20-27)Online publication date: 7-Oct-2024
  • (2023)A Comparative Analysis on the Detection of Web Service Anti-Patterns Using Various MetricsProceedings of the 16th Innovations in Software Engineering Conference10.1145/3578527.3578534(1-7)Online publication date: 23-Feb-2023
  • (2023)SoK: Static Configuration Analysis in Infrastructure as Code Scripts2023 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR57506.2023.10224925(281-288)Online publication date: 31-Jul-2023
  • (2023)Machine learning with word embedding for detecting web-services anti-patternsJournal of Computer Languages10.1016/j.cola.2023.10120775(101207)Online publication date: Jun-2023
  • (2022)Making the most of small Software Engineering datasets with modern machine learningIEEE Transactions on Software Engineering10.1109/TSE.2021.3135465(1-1)Online publication date: 2022
  • (2022)Static Analysis of Infrastructure as Code: a Survey2022 IEEE 19th International Conference on Software Architecture Companion (ICSA-C)10.1109/ICSA-C54293.2022.00049(218-225)Online publication date: Mar-2022
  • (2022)FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-codeEmpirical Software Engineering10.1007/s10664-022-10215-527:7Online publication date: 1-Dec-2022
  • (2022)Quality Assurance and Design-Time OptimizationDeployment and Operation of Complex Software in Heterogeneous Execution Environments10.1007/978-3-031-04961-3_4(53-66)Online publication date: 15-Jul-2022
  • (2021)Automated detection of design patterns in declarative deployment modelsProceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing10.1145/3468737.3494085(1-10)Online publication date: 6-Dec-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media