More Web Proxy on the site http://driver.im/

research-article

Open access

Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning

Authors:

Phuong T. Nguyen,

Claudio Di Sipio,

Davide Di Ruscio,

Massimiliano Di PentaAuthors Info & Claims

ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pages 468 - 474

https://doi.org/10.1145/3674805.3690752

Published: 24 October 2024 Publication History

All formats PDF

Abstract

In the GitHub ecosystem, workflows are used as an effective means to automate development tasks and to set up a Continuous Integration and Delivery (CI/CD pipeline). GitHub Actions (GHA) has been conceived to provide developers with a practical tool to create and maintain workflows, avoiding “reinventing the wheel” and cluttering the workflow with shell commands. Properly leveraging the power of GitHub Actions can facilitate the development processes, enhance collaboration, and significantly impact project outcomes. To expose actions to search engines, GitHub allows developers to assign them to one or more categories manually. These are used as an effective means to group actions sharing similar functionality. Nevertheless, while providing a practical way to execute workflows, many actions have unclear purposes, and sometimes they are not categorized. In this work, we bridge such a gap by conceptualizing Gavel, a practical solution to increasing the visibility of actions in GitHub. By leveraging the content of README.MD files for each action, we use Transformer to assign suitable categories to the action. We conducted an empirical investigation and compared Gavel with a state-of-the-art baseline. The results show that our approach can assign categories to GitHub actions effectively, thus outperforming the baseline.

References

[1]

2023. Python-Markdown. https://github.com/Python-Markdown/markdown original-date: 2010-05-29T02:59:45Z.

[2]

Sebastian Blaes and Thomas Burwick. 2017. Few-shot learning in deep networks through global prototyping. Neural Networks 94 (2017), 159–172. https://doi.org/10.1016/j.neunet.2017.07.001

[3]

Michael Buckland and Fredric Gey. 1994. The relationship between recall and precision. Journal of the American society for information science 45, 1 (1994), 12–19. Publisher: Wiley Online Library.

Digital Library

[4]

Fabio Calefato, Filippo Lanubile, and Luigi Quaranta. 2022. A Preliminary Investigation of MLOps Practices in GitHub. In Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement(ESEM ’22). Association for Computing Machinery, New York, NY, USA, 283–288. https://doi.org/10.1145/3544902.3546636

Digital Library

[5]

Alexandre Decan, Tom Mens, and Hassan Onsori Delicheh. 2023. On the outdatedness of workflows in the GitHub Actions ecosystem. Journal of Systems and Software 206 (2023), 111827. https://doi.org/10.1016/j.jss.2023.111827

Digital Library

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/ARXIV.1810.04805

[7]

Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio, Phuong T. Nguyen, and Riccardo Rubei. 2022. HybridRec: A recommender system for tagging GitHub repositories. Applied Intelligence (Aug. 2022). https://doi.org/10.1007/s10489-022-03864-y

Digital Library

[8]

Claudio Di Sipio, Riccardo Rubei, Juri Di Rocco, Davide Di Ruscio, and Phuong T. Nguyen. 2024. Automated categorization of pre-trained models in software engineering: A case study with a Hugging Face dataset. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering (Salerno, Italy) (EASE ’24). Association for Computing Machinery, New York, NY, USA, 351–356. https://doi.org/10.1145/3661167.3661215

Digital Library

[9]

Kexue Fu, Peng Gao, Shaolei Liu, Linhao Qu, Longxiang Gao, and Manning Wang. 2024. POS-BERT: Point cloud one-stage BERT pre-training. Expert Systems with Applications 240 (2024), 122563. https://doi.org/10.1016/j.eswa.2023.122563

Digital Library

[10]

Mehdi Golzadeh, Alexandre Decan, and Tom Mens. 2022. On the rise and fall of CI services in GitHub. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 662–672. https://doi.org/10.1109/SANER53432.2022.00084 ISSN: 1534-5351.

[11]

Maliheh Izadi, Abbas Heydarnoori, and Georgios Gousios. 2021. Topic recommendation for software repositories using multi-label classification algorithms. Empirical Software Engineering 26, 5 (2021), 93. https://doi.org/10.1007/s10664-021-09976-2

Digital Library

[12]

Maliheh Izadi, Mahtab Nejati, and Abbas Heydarnoori. 2023. Semantically-enhanced topic recommendation systems for software projects. Empirical Software Engineering 28, 2 (Feb. 2023), 50. https://doi.org/10.1007/s10664-022-10272-w

Digital Library

[13]

Christos Katsanos, Vasileios Christoforidis, and Christina Demertzi. 2023. Task-Based Open Card Sorting: Towards a New Method to Produce Usable Information Architectures. In Human Interface and the Management of Information, Hirohiko Mori and Yumi Asahi (Eds.). Springer Nature Switzerland, Cham, 68–80.

[14]

Timothy Kinsman, Mairieli Wessel, Marco A. Gerosa, and Christoph Treude. 2021. How Do Software Developers Use GitHub Actions to Automate Their Workflows?. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 420–431. https://doi.org/10.1109/MSR52588.2021.00054

[15]

Timothy Kinsman, Mairieli Wessel, Marco A. Gerosa, and Christoph Treude. 2021. How Do Software Developers Use GitHub Actions to Automate Their Workflows?IEEE Computer Society, 420–431. https://doi.org/10.1109/MSR52588.2021.00054

[16]

Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. 2009. Systematic literature reviews in software engineering – A systematic literature review. Information and Software Technology 51, 1 (2009), 7–15. https://doi.org/10.1016/j.infsof.2008.09.009 Special Section - Most Cited Articles in 2002 and Regular Research Papers.

Digital Library

[17]

Antonio Mastropaolo, Fiorella Zampetti, Gabriele Bavota, and Massimiliano Di Penta. 2024. Toward Automatically Completing GitHub Workflows. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 13, 12 pages. https://doi.org/10.1145/3597503.3623351

Digital Library

[18]

Phuong T. Nguyen, Juri Di Rocco, Claudio Di Sipio, Mudita Shakya, Davide Di Ruscio, and Massimiliano Di Penta. 2024. Replication Package: Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning. https://anonymous.4open.science/r/Gavel-Replication-Package-5786/

[19]

Pornntiwa Pawara, Emmanuel Okafor, Marc Groefsema, Sheng He, Lambert R.B. Schomaker, and Marco A. Wiering. 2020. One-vs-One classification for deep neural networks. Pattern Recognition 108 (2020), 107528. https://doi.org/10.1016/j.patcog.2020.107528

[20]

Gede Artha Azriadi Prana, Christoph Treude, Ferdian Thung, Thushari Atapattu, and David Lo. 2019. Categorizing the Content of GitHub README Files. Empir. Softw. Eng. 24, 3 (2019), 1296–1327. https://doi.org/10.1007/s10664-018-9660-3

Digital Library

[21]

Pooya Rostami Mazrae, Tom Mens, Mehdi Golzadeh, and Alexandre Decan. 2023. On the usage, co-usage and migration of CI/CD tools: A qualitative analysis. Empirical Software Engineering 28, 2 (March 2023), 52. https://doi.org/10.1007/s10664-022-10285-5

Digital Library

[22]

Iqbal H. Sarker. 2021. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput. Sci. 2, 6 (2021), 420. https://doi.org/10.1007/S42979-021-00815-1

Digital Library

[23]

Sk Golam Saroar and Maleknaz Nayebi. 2023. Developers’ Perception of GitHub Actions: A Survey Analysis. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering (Oulu, Finland) (EASE ’23). Association for Computing Machinery, New York, NY, USA, 121–130. https://doi.org/10.1145/3593434.3593475

Digital Library

[24]

Cezar Sas, Andrea Capiluppi, Claudio Di Sipio, Juri Di Rocco, and Davide Di Ruscio. 2023. GitRanking: A ranking of GitHub topics for software classification using active sampling. Software: Practice and Experience 53, 10 (2023), 1982–2006. https://doi.org/10.1002/spe.3238 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.3238.

[25]

Pablo Valenzuela-Toledo and Alexandre Bergel. 2022. Evolution of GitHub Action Workflows. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 123–127. https://doi.org/10.1109/SANER53432.2022.00026 ISSN: 1534-5351.

[26]

Leiming Yan, Yuhui Zheng, and Jie Cao. 2018. Few-shot learning for short text classification. Multim. Tools Appl. 77, 22 (2018), 29799–29810. https://doi.org/10.1007/S11042-018-5772-4

[27]

Hsiaoming Yang. 2023. Mistune v3. https://github.com/lepture/mistune original-date: 2014-02-18T10:13:04Z.

[28]

He Zhang, Muhammad Ali Babar, and Paolo Tell. 2011. Identifying Relevant Studies in Software Engineering. Inf. Softw. Technol. 53, 6 (jun 2011), 625–637. https://doi.org/10.1016/j.infsof.2010.12.010

Digital Library

[29]

Yuqi Zhou, Jiawei Wu, and Yanchun Sun. 2021. GHTRec: A Personalized Service to Recommend GitHub Trending Repositories for Developers. In 2021 IEEE International Conference on Web Services, ICWS 2021, Chicago, IL, USA, September 5-10, 2021, Carl K. Chang, Ernesto Daminai, Jing Fan, Parisa Ghodous, Michael Maximilien, Zhongjie Wang, Robert Ward, and Jia Zhang (Eds.). IEEE, 314–323. https://doi.org/10.1109/ICWS53863.2021.00049

Index Terms

Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning
1. Human-centered computing
  1. Visualization
    1. Visualization techniques
2. Software and its engineering
  1. Software creation and management
    1. Collaboration in software development
      1. Open source model
      2. Programming teams
  2. Software notations and tools

Index terms have been assigned to the content through auto-classification.

Recommendations

A dataset of GitHub Actions workflow histories
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories

GitHub Actions is the de facto workflow automation tool for GitHub repositories. Its popularity has increased dramatically over the recent years, opening up opportunities for empirical studies related to its usage. To enable such studies, we implemented ...
Developers’ Perception of GitHub Actions: A Survey Analysis
EASE '23: Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

GitHub Actions is a powerful tool for automating workflows on GitHub repositories, with thousands of Actions currently available on the GitHub Marketplace. So far, the research community has conducted mining studies on Actions, with much of the focus ...
ActionsRemaker: Reproducing GitHub Actions
ICSE '23: Proceedings of the 45th International Conference on Software Engineering: Companion Proceedings

Mining Continuous Integration and Continuous Delivery (CI/CD) has enabled new research opportunities for the software engineering (SE) research community. However, it remains a challenge to reproduce CI/CD build processes, which is crucial for several ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

October 2024

633 pages

ISBN:9798400710476

DOI:10.1145/3674805

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

PRIN 2022
PRIN 2020

Conference

ESEM '24

Sponsor:

SIGSOFT

ESEM '24: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

October 24 - 25, 2024

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
150
Total Downloads

Downloads (Last 12 months)150
Downloads (Last 6 weeks)48

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten