More Web Proxy on the site http://driver.im/

research-article

Open access

DeepIaC: deep learning-based linguistic anti-pattern detection in IaC

Authors:

Nemania Borovits,

Parvathy Krishnan,

Stefano Dalla Palma,

Dario Di Nucci,

Damian A. Tamburri,

Willem-Jan van den HeuvelAuthors Info & Claims

MaLTeSQuE 2020: Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation

Pages 7 - 12

https://doi.org/10.1145/3416505.3423564

Published: 13 November 2020 Publication History

Abstract

Linguistic anti-patterns are recurring poor practices concerning inconsistencies among the naming, documentation, and implementation of an entity. They impede readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in infrastructure as code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their names. To this end, we propose a novel automated approach that employs word embeddings and deep learning techniques. We build and use the abstract syntax tree of IaC code units to create their code embedments. Our experiments with a dataset systematically extracted from open source repositories show that our approach yields an accuracy between 0.785 and 0.915 in detecting inconsistencies.

References

[1]

Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In International conference on machine learning. 2091-2100.

[2]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, Article 40 ( Jan. 2019 ), 29 pages.

Digital Library

[3]

V. Arnaoudova, M. Di Penta, G. Antoniol, and Y. GuÃľhÃľneuc. 2013. A New Family of Software Anti-patterns: Linguistic Anti-patterns. In 2013 17th European Conference on Software Maintenance and Reengineering. 187-196.

[4]

S. Fakhoury, V. Arnaoudova, C. Noiseux, F. Khomh, and G. Antoniol. 2018. Keep it simple: Is deep learning good for linguistic smell detection?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 602-611.

[5]

Martin Folwer. 1999. Refactoring: Improving the Design of Existing Programs. ( 1999 ).

[6]

Michele Guerriero, Martin Garriga, Damian A Tamburri, and Fabio Palomba. 2019. Adoption, Support, and Challenges of Infrastructure-as-Code: Insights from Industry. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 580-589.

[7]

Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2011. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering 38, 6 ( 2011 ), 1276-1304.

[8]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1746-1751.

[9]

Indika Kumara et al. 2020. Towards Semantic Detection of Smells in Cloud Infrastructure Code. In Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics (WIMS 2020 ). Association for Computing Machinery, New York, NY, USA, 63âĂŞ67. https://doi.org/10.1145/3405962.3405979

Digital Library

[10]

G. Li, H. Liu, J. Jin, and Q. Umer. 2020. Deep Learning Based Identification of Suspicious Return Statements. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. 480-491.

[11]

K. Liu et al. 2019. Learning to Spot and Refactor Inconsistent Method Names. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 1-12.

Digital Library

[12]

Masakazu Matsugu, Katsuhiko Mori, Yusuke Mitari, and Yuji Kaneda. 2003. Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Networks 16, 5-6 ( 2003 ), 555-559.

Digital Library

[13]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jef Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111-3119.

[14]

David S Moore, William I Notz, and Michael A Fligner. 2015. The basic practice of statistics. Macmillan Higher Education.

[15]

Kief Morris. 2016. Infrastructure as code: managing servers in the cloud. " O'Reilly Media, Inc.".

[16]

Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang. 2, Article 147 (Oct. 2018 ), 25 pages. https://doi.org/10.1145/3276517

Digital Library

[17]

Akond Rahman, Rezvan Mahdavi-Hezaveh, and Laurie Williams. 2018. Where Are The Gaps? A Systematic Mapping Study of Infrastructure as Code Research. arXiv preprint arXiv: 1807. 04872 ( 2018 ).

[18]

Akond Rahman, Chris Parnin, and Laurie Williams. 2019. The Seven Sins: Security Smells in Infrastructure as Code Scripts. In Proceedings of the 41st International Conference on Software Engineering. 164-175.

Digital Library

[19]

Akond Rahman and Laurie Williams. 2018. Characterizing defective configuration scripts used for continuous deployment. In 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST). IEEE, 34-45.

[20]

Akond Rahman and Laurie Williams. 2019. Source code properties of defective infrastructure as code scripts. Information and Software Technology 112 ( 2019 ), 148-163.

[21]

Julian Schwarz, Andreas Stefens, and Horst Lichter. 2018. Code Smells in Infrastructure as Code. In 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC). IEEE, 220-228.

[22]

Tushar Sharma, Marios Fragkoulis, and Diomidis Spinellis. 2016. Does your configuration code smell?. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). IEEE, 189-200.

Digital Library

[23]

Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. Pydriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 908-911.

Digital Library

[24]

Peng Wang et al. 2015. Semantic clustering and convolutional neural network for short text categorization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 352-357.

Cited By

Kotsiantis SVerykios VTzagarakis M(2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
https://doi.org/10.3390/electronics13040767
Low ECheh CChen B(2024)Repairing Infrastructure-as-Code using Large Language Models2024 IEEE Secure Development Conference (SecDev)10.1109/SecDev61143.2024.00008(20-27)Online publication date: 7-Oct-2024
https://doi.org/10.1109/SecDev61143.2024.00008
Tummalapalli SKumar LNeti LKrishna A(2023)A Comparative Analysis on the Detection of Web Service Anti-Patterns Using Various MetricsProceedings of the 16th Innovations in Software Engineering Conference10.1145/3578527.3578534(1-7)Online publication date: 23-Feb-2023
https://dl.acm.org/doi/10.1145/3578527.3578534
Show More Cited By

Index Terms

DeepIaC: deep learning-based linguistic anti-pattern detection in IaC

Recommendations

Towards Semantic Detection of Smells in Cloud Infrastructure Code
WIMS 2020: Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics

Automated deployment and management of Cloud applications relies on descriptions of their deployment topologies, often referred to as Infrastructure Code. As the complexity of applications and their deployment models increases, developers inadvertently ...
FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code
Abstract
Linguistic anti-patterns are recurring poor practices concerning inconsistencies in the naming, documentation, and implementation of an entity. They impede the readability, understandability, and maintainability of source code. This paper attempts ...
NaturalCC: an open-source toolkit for code intelligence
ICSE '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings

We present NaturalCC, an efficient and extensible open-source toolkit for machine-learning-based source code analysis (i.e., code intelligence). Using NaturalCC, researchers can conduct rapid prototyping, reproduce state-of-the-art models, and/or ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MaLTeSQuE 2020: Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation

November 2020

36 pages

ISBN:9781450381246

DOI:10.1145/3416505

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '20

Sponsor:

SIGSOFT

ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 13, 2020

Virtual, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
544
Total Downloads

Downloads (Last 12 months)152
Downloads (Last 6 weeks)15

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kotsiantis SVerykios VTzagarakis M(2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
https://doi.org/10.3390/electronics13040767
Low ECheh CChen B(2024)Repairing Infrastructure-as-Code using Large Language Models2024 IEEE Secure Development Conference (SecDev)10.1109/SecDev61143.2024.00008(20-27)Online publication date: 7-Oct-2024
https://doi.org/10.1109/SecDev61143.2024.00008
Tummalapalli SKumar LNeti LKrishna A(2023)A Comparative Analysis on the Detection of Web Service Anti-Patterns Using Various MetricsProceedings of the 16th Innovations in Software Engineering Conference10.1145/3578527.3578534(1-7)Online publication date: 23-Feb-2023
https://dl.acm.org/doi/10.1145/3578527.3578534
Reddy Konala PKumar VBainbridge D(2023)SoK: Static Configuration Analysis in Infrastructure as Code Scripts2023 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR57506.2023.10224925(281-288)Online publication date: 31-Jul-2023
https://doi.org/10.1109/CSR57506.2023.10224925
Kumar LTummalapalli SRathi SMurthy LKrishna AMisra S(2023)Machine learning with word embedding for detecting web-services anti-patternsJournal of Computer Languages10.1016/j.cola.2023.10120775(101207)Online publication date: Jun-2023
https://doi.org/10.1016/j.cola.2023.101207
Prenner JRobbes R(2022)Making the most of small Software Engineering datasets with modern machine learningIEEE Transactions on Software Engineering10.1109/TSE.2021.3135465(1-1)Online publication date: 2022
https://doi.org/10.1109/TSE.2021.3135465
Chiari MDe Pascalis MPradella M(2022)Static Analysis of Infrastructure as Code: a Survey2022 IEEE 19th International Conference on Software Architecture Companion (ICSA-C)10.1109/ICSA-C54293.2022.00049(218-225)Online publication date: Mar-2022
https://doi.org/10.1109/ICSA-C54293.2022.00049
Borovits NKumara IDi Nucci DKrishnan PPalma SPalomba FTamburri DHeuvel W(2022)FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-codeEmpirical Software Engineering10.1007/s10664-022-10215-527:7Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s10664-022-10215-5
Kumara ILazzaro AMujkanovic NVasileiou ZTamburri D(2022)Quality Assurance and Design-Time OptimizationDeployment and Operation of Complex Software in Heterogeneous Execution Environments10.1007/978-3-031-04961-3_4(53-66)Online publication date: 15-Jul-2022
https://doi.org/10.1007/978-3-031-04961-3_4
Harzenetter LBreitenbücher UFalazi GLeymann FWersching ABrandic ISakellariou RSpillner J(2021)Automated detection of design patterns in declarative deployment modelsProceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing10.1145/3468737.3494085(1-10)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3468737.3494085
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents