[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3196398.3196436acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Large-scale analysis of the co-commit patterns of the active developers in github's top repositories

Published: 28 May 2018 Publication History

Abstract

GitHub, the largest code hosting site (with 25 million public active repositories and contributions from 6 million active users), provides an unprecedented opportunity to observe the collaboration patterns of software developers. Understanding the patterns behind the social coding phenomena is an active research area where the insights gained can guide the design of better collaboration tools, and can also help to identify and select developer talent. In this paper, we present a large-scale analysis of the co-commit patterns in GitHub. We analyze 10 million commits made by 200 thousand developers to 16 thousand repositories, using 17 of the most popular programming languages over a period of 3 years. Although a large volume of data is included in our study, we pay close attention to the participation criteria for repositories and developers. We select repositories by reputation (based on star ranking), and we introduce the notion of active developer in GitHub (observing that a limited subset of developers is responsible for the vast majority of the commits). Using co-authorship networks, we analyze the co-commit patterns of the active developer network for each programming language. We observe that the active developer networks are less connected and more centralized than the general GitHub developer networks, and that the patterns vary significantly among languages. We compare our results to other collaborative environments (Wikipedia and scientific research networks), and we also describe the evolution of the co-commit patterns over time.

References

[1]
Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 1 (2002), 47.
[2]
Albert-Laszlo Barabâsi, Hawoong Jeong, Zoltan Néda, Erzsebet Ravasz, Andras Schubert, and Tamas Vicsek. 2002. Evolution of the social network of scientific collaborations. Physica A: Statistical mechanics and its applications 311, 3 (2002), 590--614.
[3]
Pamela Bhattacharya, Marios Iliofotou, Iulian Neamtiu, and Michalis Faloutsos. 2012. Graph-based analysis and prediction for software evolution. In 34th International Conference on Software Engineering (ICSE'12). 419--429.
[4]
Christian Bird, Premkumar Devanbu, Earl Barr, Vladimir Filkov, Andre Nash, and Zhendong Su. 2009. Structure and dynamics of research collaboration in computer science. In Proceedings of the 2009 SIAM International Conference on Data Mining (SDM'09). 826--837.
[5]
Sarvenaz Choobdar, Pedro Ribeiro, Sylwia Bugla, and Fernando Silva. 2012. Comparison of co-authorship networks across scientific fields using motifs. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM'12). 147--152.
[6]
Valerio Cosentino, Javier Luis, and Jordi Cabot. 2016. Findings from GitHub: Methods, datasets and limitations. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR'16). 137--141.
[7]
Christina DesMarais. 2017. Need Tech Talent? 6 New Places to Look. Retrieved August 24, 2017 from https://www.inc.com/christina-desmarais/6-unexpected-places-to-find-technical-talent.html
[8]
Linton C Freeman. 1977. A set of measures of centrality based on betweenness. Sociometry (1977), 35--41.
[9]
Linton C Freeman. 1978. Centrality in social networks conceptual clarification. Social networks 1, 3 (1978), 215--239.
[10]
Michelle Girvan and Mark EJ Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821--7826.
[11]
Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 233--236. http://dl.acm.org/citation.cfm?id=2487085.2487132
[12]
H. Hemmati, S. Nadi, O. Baysal, O. Kononenko, W. Wang, R. Holmes, and M. W. Godfrey. 2013. The MSR Cookbook: Mining a decade of research. In 2013 10th Working Conference on Mining Software Repositories (MSR). 343--352.
[13]
Jian Huang, Ziming Zhuang, Jia Li, and C Lee Giles. 2008. Collaboration over time: Characterizing and modeling network evolution. In Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM'08). 107--116.
[14]
J. Jiang, L. Zhang, and L. Li. 2013. Understanding project dissemination on a social coding site. In 2013 20th Working Conference on Reverse Engineering (WCRE'13). 132--141.
[15]
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR'14). 92--101.
[16]
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2016. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering 21, 5 (2016), 2035--2071.
[17]
David Laniado and Riccardo Tasso. 2011. Co-authorship 2.0: Patterns of collaboration in Wikipedia. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia (HT'11). 201--210.
[18]
Antonio Lima, Luca Rossi, and Mirco Musolesi. 2014. Coding Together at Scale: GitHub as a Collaborative Social Network. In Eighth International AAAI Conference on Weblogs and Social Media (ICWSM'14).
[19]
Xiaoming Liu, Johan Bollen, Michael L Nelson, and Herbert Van de Sompel. 2005. Co-authorship networks in the digital library research community. Information Processing & Management 41, 6 (2005), 1462--1480.
[20]
Dmitry Lizorkin, Olena Medelyan, and Maria Grineva. 2009. Analysis of community structure in Wikipedia. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 1221--1222.
[21]
Luis Lopez-Fernandez, Gregorio Robles, Jesus M Gonzalez-Barahona, et al. 2004. Applying social network analysis to the information in CVS repositories. In International Workshop on Mining Software Repositories (MSR'04). 101--105.
[22]
A. Meneely and L. Williams. 2011. Socio-technical developer networks: should we trust our measurements?. In 2011 33rd International Conference on Software Engineering (ICSE). 281--290.
[23]
Mark EJ Newman. 2001. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98, 2 (2001), 404--409.
[24]
Mark EJ Newman. 2004. Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences 101, 1 (2004), 5200--5205.
[25]
Mark EJ Newman. 2004. Who is the best connected scientist? A study of scientific coauthorship networks. In Complex networks. Springer, 337--370.
[26]
Mark EJ Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23 (2006), 8577--8582.
[27]
Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E 69, 2 (2004), 026113.
[28]
Christian Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2014. NetworKit: An Interactive Tool Suite for High-Performance Network Analysis. CoRR abs/1403.3005 (2014). http://arxiv.org/abs/1403.3005
[29]
Christian L Staudt and Henning Meyerhenke. 2016. Engineering parallel algorithms for community detection in massive networks. IEEE Transactions on Parallel and Distributed Systems 27, 1 (2016), 171--184.
[30]
Didi Surian, David Lo, and Ee-Peng Lim. 2010. Mining collaboration patterns from a large developer network. In 17th Working Conference on Reverse Engineering (WCRE'10). 269--273.
[31]
Daniel Terdiman. 2012. Forget LinkedIn: Companies turn to GitHub to find tech talent. Retrieved August 24, 2017 from https://www.cnet.com/news/forget-linkedin-companies-turn-to-github-to-find-tech-talent
[32]
Ferdian Thung, Tegawende F Bissyande, David Lo, and Lingxiao Jiang. 2013. Network structure of social coding in GitHub. In 17th European Conference on Software Maintenance and Reengineering (CSMR'13). 323--326.
[33]
Jin Xu, Yongqin Gao, Scott Christley, and Gregory Madey. 2005. A topological analysis of the open souce software development community. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05). 198a--198a.
[34]
Yue Yu, Gang Yin, Huaimin Wang, and Tao Wang. 2014. Exploring the Patterns of Social Behavior in GitHub. In Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies (CrowdSoft'14). 31--36.

Cited By

View all
  • (2022)A systematic process for Mining Software RepositoriesInformation and Software Technology10.1016/j.infsof.2021.106791144:COnline publication date: 1-Apr-2022
  • (2022)Average Nearest Neighbor Degree and Its Distribution in Social NetworksDigital Transformation and Global Society10.1007/978-3-030-93715-7_3(36-50)Online publication date: 25-Jan-2022
  • (2021)Understanding the Working Habits of GH-SO Users on GitHub Commit Activity and Stack Overflow Post ActivityInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402150046731:10(1399-1419)Online publication date: 15-Nov-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories
May 2018
627 pages
ISBN:9781450357166
DOI:10.1145/3196398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '18
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A systematic process for Mining Software RepositoriesInformation and Software Technology10.1016/j.infsof.2021.106791144:COnline publication date: 1-Apr-2022
  • (2022)Average Nearest Neighbor Degree and Its Distribution in Social NetworksDigital Transformation and Global Society10.1007/978-3-030-93715-7_3(36-50)Online publication date: 25-Jan-2022
  • (2021)Understanding the Working Habits of GH-SO Users on GitHub Commit Activity and Stack Overflow Post ActivityInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402150046731:10(1399-1419)Online publication date: 15-Nov-2021
  • (2021)Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2netEmpirical Software Engineering10.1007/s10664-020-09928-226:4Online publication date: 1-Jul-2021
  • (2020)Which Metrics Should Researchers Use to Collect Repositories: An Empirical Study2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS51102.2020.00065(458-466)Online publication date: Dec-2020
  • (2019)Empirical study on the usage of graph query languages in open source Java projectsProceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3357766.3359541(152-166)Online publication date: 20-Oct-2019
  • (2019)Introducing Privacy in Screen Event Frequency Analysis for Android Apps2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM.2019.00037(268-279)Online publication date: Sep-2019
  • (2019)git2netProceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00070(433-444)Online publication date: 26-May-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media