[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2901739.2901776acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Findings from GitHub: methods, datasets and limitations

Published: 14 May 2016 Publication History

Abstract

GitHub, one of the most popular social coding platforms, is the platform of reference when mining Open Source repositories to learn from past experiences. In the last years, a number of research papers have been published reporting findings based on data mined from GitHub. As the community continues to deepen in its understanding of software engineering thanks to the analysis performed on this platform, we believe it is worthwhile to reflect how research papers have addressed the task of mining GitHub repositories over the last years. In this regard, we present a meta-analysis of 93 research papers which addresses three main dimensions of those papers: i) the empirical methods employed, ii) the datasets they used and iii) the limitations reported. Results of our meta-analysis show some concerns regarding the dataset collection process and size, the low level of replicability, poor sampling techniques, lack of longitudinal studies and scarce variety of methodologies.

References

[1]
K. Aggarwal, A. Hindle, and E. Stroulia. Co-evolution of project documentation and popularity within GitHub. MSR, pages 360--363, 2014.
[2]
A. S. Badashian, A. Esteki, A. Gholipour, A. Hindle, and E. Stroulia. Involvement, contribution and influence in GitHub and StackOverflow. CSSE, pages 19--33, 2014.
[3]
C. Bird, P. Rigby, and E. Barr. The promises and perils of mining git. In MSR conf., pages 1--10, 2009.
[4]
K. Crowston, K. Wei, J. Howison, and A. Wiggins. Free/libre open-source software development: What we know and what we do not know. ACM Computing Surveys (CSUR), 44(2):7, 2012.
[5]
R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. ICSE, pages 422--431, 2013.
[6]
G. Gousios and D. Spinellis. Ghtorrent: GitHub's data from a firehose. MSR, pages 12--21, 2012.
[7]
H. Hemmati, S. Nadi, O. Baysal, O. Kononenko, W. Wang, R. Holmes, and M. W. Godfrey. The msr cookbook: Mining a decade of research. MSR.
[8]
J. Howison and K. Crowston. The perils and pitfalls of mining SourceForge. In MSR conf., pages 7--11, 2004.
[9]
E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering, pages 1--37, 2015.
[10]
M. Nagappan, T. Zimmermann, and C. Bird. Diversity in software engineering research. ESEC/FSE, pages 466--476, 2013.
[11]
R. Padhye, S. Mani, and V. S. Sinha. A study of external community contribution to open-source projects on GitHub. In 11th Working Conference on Mining Software Repositories, pages 332--335, 2014.
[12]
G. Robles. Replicating msr: A study of the potential replicability of papers published in the mining software repositories proceedings. MSR, pages 171--180, 2010.
[13]
A. Serebrenik and T. Mens. Challenges in software ecosystems research. ECSAW, pages 40:1--40:6, 2015.
[14]
F. Thung, T. F. Bissyande, D. Lo, and L. Jiang. Network Structure of Social Coding in GitHub. In 17th European Conference on Software Maintenance and Reengineering, pages 323--326, 2013.
[15]
B. Vasilescu, V. Filkov, and A. Serebrenik. Stackoverflow and GitHub: associations between software development and crowdsourced knowledge. SocialCom, pages 188--195, 2013.
[16]
J. Xavier and A. Macedo. Understanding the popularity of reporters and assignees in the GitHub. In 26th International Conference on Software Engineering and Knowledge Engineering, pages 484--489, 2014.

Cited By

View all
  • (2025)On the suitability of hugging face hub for empirical studiesEmpirical Software Engineering10.1007/s10664-024-10608-830:2Online publication date: 18-Jan-2025
  • (2024)PyMoosh: a comprehensive numerical toolkit for computing the optical properties of multilayered structuresJournal of the Optical Society of America B10.1364/JOSAB.50617541:2(A67)Online publication date: 19-Jan-2024
  • (2024)On the Creation of Representative Samples of Software RepositoriesProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690747(434-439)Online publication date: 24-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories
May 2016
544 pages
ISBN:9781450341868
DOI:10.1145/2901739
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GitHub
  2. meta-analysis
  3. systematic review

Qualifiers

  • Research-article

Conference

ICSE '16
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)363
  • Downloads (Last 6 weeks)34
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)On the suitability of hugging face hub for empirical studiesEmpirical Software Engineering10.1007/s10664-024-10608-830:2Online publication date: 18-Jan-2025
  • (2024)PyMoosh: a comprehensive numerical toolkit for computing the optical properties of multilayered structuresJournal of the Optical Society of America B10.1364/JOSAB.50617541:2(A67)Online publication date: 19-Jan-2024
  • (2024)On the Creation of Representative Samples of Software RepositoriesProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690747(434-439)Online publication date: 24-Oct-2024
  • (2024)The Role of Data Filtering in Open Source Software Ranking and SelectionProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering10.1145/3643664.3648210(7-12)Online publication date: 16-Apr-2024
  • (2024)A longitudinal study on the temporal validity of software samplesInformation and Software Technology10.1016/j.infsof.2024.107404168:COnline publication date: 1-Apr-2024
  • (2024)An Empirical Study on the Urgent Self-admitted Technical DebtComputer Supported Cooperative Work and Social Computing10.1007/978-981-99-9640-7_23(309-320)Online publication date: 5-Jan-2024
  • (2023)Energy Consumption of Electric Vehicles: Analysis of Selected Parameters Based on Created DatabaseEnergies10.3390/en1603143716:3(1437)Online publication date: 1-Feb-2023
  • (2023)RESTful API Analysis, Recommendation, and Client Code RetrievalElectronics10.3390/electronics1205125212:5(1252)Online publication date: 5-Mar-2023
  • (2023)Congestion in Onboarding Workers and Sticky R&DSSRN Electronic Journal10.2139/ssrn.4465590Online publication date: 2023
  • (2023)SecTKG: A Knowledge Graph for Open-Source Security ToolsInternational Journal of Intelligent Systems10.1155/2023/44649742023(1-22)Online publication date: 14-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media