[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3674805.3690752acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article
Open access

Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning

Published: 24 October 2024 Publication History

Abstract

In the GitHub ecosystem, workflows are used as an effective means to automate development tasks and to set up a Continuous Integration and Delivery (CI/CD pipeline). GitHub Actions (GHA) has been conceived to provide developers with a practical tool to create and maintain workflows, avoiding “reinventing the wheel” and cluttering the workflow with shell commands. Properly leveraging the power of GitHub Actions can facilitate the development processes, enhance collaboration, and significantly impact project outcomes. To expose actions to search engines, GitHub allows developers to assign them to one or more categories manually. These are used as an effective means to group actions sharing similar functionality. Nevertheless, while providing a practical way to execute workflows, many actions have unclear purposes, and sometimes they are not categorized. In this work, we bridge such a gap by conceptualizing Gavel, a practical solution to increasing the visibility of actions in GitHub. By leveraging the content of README.MD files for each action, we use Transformer to assign suitable categories to the action. We conducted an empirical investigation and compared Gavel with a state-of-the-art baseline. The results show that our approach can assign categories to GitHub actions effectively, thus outperforming the baseline.

References

[1]
2023. Python-Markdown. https://github.com/Python-Markdown/markdown original-date: 2010-05-29T02:59:45Z.
[2]
Sebastian Blaes and Thomas Burwick. 2017. Few-shot learning in deep networks through global prototyping. Neural Networks 94 (2017), 159–172. https://doi.org/10.1016/j.neunet.2017.07.001
[3]
Michael Buckland and Fredric Gey. 1994. The relationship between recall and precision. Journal of the American society for information science 45, 1 (1994), 12–19. Publisher: Wiley Online Library.
[4]
Fabio Calefato, Filippo Lanubile, and Luigi Quaranta. 2022. A Preliminary Investigation of MLOps Practices in GitHub. In Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement(ESEM ’22). Association for Computing Machinery, New York, NY, USA, 283–288. https://doi.org/10.1145/3544902.3546636
[5]
Alexandre Decan, Tom Mens, and Hassan Onsori Delicheh. 2023. On the outdatedness of workflows in the GitHub Actions ecosystem. Journal of Systems and Software 206 (2023), 111827. https://doi.org/10.1016/j.jss.2023.111827
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/ARXIV.1810.04805
[7]
Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio, Phuong T. Nguyen, and Riccardo Rubei. 2022. HybridRec: A recommender system for tagging GitHub repositories. Applied Intelligence (Aug. 2022). https://doi.org/10.1007/s10489-022-03864-y
[8]
Claudio Di Sipio, Riccardo Rubei, Juri Di Rocco, Davide Di Ruscio, and Phuong T. Nguyen. 2024. Automated categorization of pre-trained models in software engineering: A case study with a Hugging Face dataset. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering (Salerno, Italy) (EASE ’24). Association for Computing Machinery, New York, NY, USA, 351–356. https://doi.org/10.1145/3661167.3661215
[9]
Kexue Fu, Peng Gao, Shaolei Liu, Linhao Qu, Longxiang Gao, and Manning Wang. 2024. POS-BERT: Point cloud one-stage BERT pre-training. Expert Systems with Applications 240 (2024), 122563. https://doi.org/10.1016/j.eswa.2023.122563
[10]
Mehdi Golzadeh, Alexandre Decan, and Tom Mens. 2022. On the rise and fall of CI services in GitHub. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 662–672. https://doi.org/10.1109/SANER53432.2022.00084 ISSN: 1534-5351.
[11]
Maliheh Izadi, Abbas Heydarnoori, and Georgios Gousios. 2021. Topic recommendation for software repositories using multi-label classification algorithms. Empirical Software Engineering 26, 5 (2021), 93. https://doi.org/10.1007/s10664-021-09976-2
[12]
Maliheh Izadi, Mahtab Nejati, and Abbas Heydarnoori. 2023. Semantically-enhanced topic recommendation systems for software projects. Empirical Software Engineering 28, 2 (Feb. 2023), 50. https://doi.org/10.1007/s10664-022-10272-w
[13]
Christos Katsanos, Vasileios Christoforidis, and Christina Demertzi. 2023. Task-Based Open Card Sorting: Towards a New Method to Produce Usable Information Architectures. In Human Interface and the Management of Information, Hirohiko Mori and Yumi Asahi (Eds.). Springer Nature Switzerland, Cham, 68–80.
[14]
Timothy Kinsman, Mairieli Wessel, Marco A. Gerosa, and Christoph Treude. 2021. How Do Software Developers Use GitHub Actions to Automate Their Workflows?. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 420–431. https://doi.org/10.1109/MSR52588.2021.00054
[15]
Timothy Kinsman, Mairieli Wessel, Marco A. Gerosa, and Christoph Treude. 2021. How Do Software Developers Use GitHub Actions to Automate Their Workflows?IEEE Computer Society, 420–431. https://doi.org/10.1109/MSR52588.2021.00054
[16]
Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. 2009. Systematic literature reviews in software engineering – A systematic literature review. Information and Software Technology 51, 1 (2009), 7–15. https://doi.org/10.1016/j.infsof.2008.09.009 Special Section - Most Cited Articles in 2002 and Regular Research Papers.
[17]
Antonio Mastropaolo, Fiorella Zampetti, Gabriele Bavota, and Massimiliano Di Penta. 2024. Toward Automatically Completing GitHub Workflows. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 13, 12 pages. https://doi.org/10.1145/3597503.3623351
[18]
Phuong T. Nguyen, Juri Di Rocco, Claudio Di Sipio, Mudita Shakya, Davide Di Ruscio, and Massimiliano Di Penta. 2024. Replication Package: Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning. https://anonymous.4open.science/r/Gavel-Replication-Package-5786/
[19]
Pornntiwa Pawara, Emmanuel Okafor, Marc Groefsema, Sheng He, Lambert R.B. Schomaker, and Marco A. Wiering. 2020. One-vs-One classification for deep neural networks. Pattern Recognition 108 (2020), 107528. https://doi.org/10.1016/j.patcog.2020.107528
[20]
Gede Artha Azriadi Prana, Christoph Treude, Ferdian Thung, Thushari Atapattu, and David Lo. 2019. Categorizing the Content of GitHub README Files. Empir. Softw. Eng. 24, 3 (2019), 1296–1327. https://doi.org/10.1007/s10664-018-9660-3
[21]
Pooya Rostami Mazrae, Tom Mens, Mehdi Golzadeh, and Alexandre Decan. 2023. On the usage, co-usage and migration of CI/CD tools: A qualitative analysis. Empirical Software Engineering 28, 2 (March 2023), 52. https://doi.org/10.1007/s10664-022-10285-5
[22]
Iqbal H. Sarker. 2021. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput. Sci. 2, 6 (2021), 420. https://doi.org/10.1007/S42979-021-00815-1
[23]
Sk Golam Saroar and Maleknaz Nayebi. 2023. Developers’ Perception of GitHub Actions: A Survey Analysis. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering (Oulu, Finland) (EASE ’23). Association for Computing Machinery, New York, NY, USA, 121–130. https://doi.org/10.1145/3593434.3593475
[24]
Cezar Sas, Andrea Capiluppi, Claudio Di Sipio, Juri Di Rocco, and Davide Di Ruscio. 2023. GitRanking: A ranking of GitHub topics for software classification using active sampling. Software: Practice and Experience 53, 10 (2023), 1982–2006. https://doi.org/10.1002/spe.3238 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.3238.
[25]
Pablo Valenzuela-Toledo and Alexandre Bergel. 2022. Evolution of GitHub Action Workflows. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 123–127. https://doi.org/10.1109/SANER53432.2022.00026 ISSN: 1534-5351.
[26]
Leiming Yan, Yuhui Zheng, and Jie Cao. 2018. Few-shot learning for short text classification. Multim. Tools Appl. 77, 22 (2018), 29799–29810. https://doi.org/10.1007/S11042-018-5772-4
[27]
Hsiaoming Yang. 2023. Mistune v3. https://github.com/lepture/mistune original-date: 2014-02-18T10:13:04Z.
[28]
He Zhang, Muhammad Ali Babar, and Paolo Tell. 2011. Identifying Relevant Studies in Software Engineering. Inf. Softw. Technol. 53, 6 (jun 2011), 625–637. https://doi.org/10.1016/j.infsof.2010.12.010
[29]
Yuqi Zhou, Jiawei Wu, and Yanchun Sun. 2021. GHTRec: A Personalized Service to Recommend GitHub Trending Repositories for Developers. In 2021 IEEE International Conference on Web Services, ICWS 2021, Chicago, IL, USA, September 5-10, 2021, Carl K. Chang, Ernesto Daminai, Jing Fan, Parisa Ghodous, Michael Maximilien, Zhongjie Wang, Robert Ward, and Jia Zhang (Eds.). IEEE, 314–323. https://doi.org/10.1109/ICWS53863.2021.00049

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
October 2024
633 pages
ISBN:9798400710476
DOI:10.1145/3674805
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2024

Check for updates

Author Tags

  1. Few-shot learning
  2. GitHub Actions
  3. Pre-trained models
  4. Sentence transformers

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • PRIN 2022
  • PRIN 2020

Conference

ESEM '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 150
    Total Downloads
  • Downloads (Last 12 months)150
  • Downloads (Last 6 weeks)48
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media