Abstract
Forking is the creation of a new software repository by copying another repository. Though forking is controversial in traditional open source software (OSS) community, it is encouraged and is a built-in feature in GitHub. Developers freely fork repositories, use codes as their own and make changes. A deep understanding of repository forking can provide important insights for OSS community and GitHub. In this paper, we explore why and how developers fork what from whom in GitHub. We collect a dataset containing 236,344 developers and 1,841,324 forks. We make surveys, and analyze programming languages and owners of forked repositories. Our main observations are: (1) Developers fork repositories to submit pull requests, fix bugs, add new features and keep copies etc. Developers find repositories to fork from various sources: search engines, external sites (e.g., Twitter, Reddit), social relationships, etc. More than 42 % of developers that we have surveyed agree that an automated recommendation tool is useful to help them pick repositories to fork, while more than 44.4 % of developers do not value a recommendation tool. Developers care about repository owners when they fork repositories. (2) A repository written in a developer’s preferred programming language is more likely to be forked. (3) Developers mostly fork repositories from creators. In comparison with unattractive repository owners, attractive repository owners have higher percentage of organizations, more followers and earlier registration in GitHub. Our results show that forking is mainly used for making contributions of original repositories, and it is beneficial for OSS community. Moreover, our results show the value of recommendation and provide important insights for GitHub to recommend repositories.














Similar content being viewed by others
References
Begel A, Bosch J, Storey MA (2013) Social networking meets software development: perspectives from github, msdn, stack exchange, and topcoder. IEEE Soft 30(1):52–66
Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: Proceedings of MSR, Vancouver
Crowston K, Wei K, Howison J, Wiggins A (2012) Free/libre open source software development: What we know and what we do not know. ACM Comput Surv:44
Dabbish L, Stuart C, Herbsleb J (2012) Social coding in github: transparency and collaboration in an open software repository. In: Proceedings of CSCW, Washington
Dabbish L, Stuart C, Tsay J, Herbsleb J (2013) Leveraging transparency. IEE Soft 30(1):37– 43
DiBona C, Ockman S, Stone M (eds) (1999) Open sources: voices from the open source revolution. O’Reilly
Ernst NA, Easterbrook S, Mylopoulos J (2010) Code forking in open-source software: a requirements perspective. arXiv:1004.2889
FBissyande T, Thung F, Lo D, Jiang L, Reveillere L (2013) Popularity, interoperability, and impact of programming languages in 100,000 open source projects. In: Proceedings of COMPSAC , Kyoto
Fung KH, Aurum A, Tang D (2012) Social forking in open source software: an empirical study. In: CAiSE forum, Poland
Gousios G, Pinzger M, van Deursen A (2014) An exploratory study of the pull-based software development model. In: ICSE, Hyderabad
Happel HJ, Maalej W (2008) Potentials and challenges of recommendation systems for software development. In: Proceedings of the international workshop on Recommendation systems for software engineering, pp 11–15
Jiang J, Zhang L, Li L (2013) Understanding project dissemination on a social coding site. In: Proceedings of WCRE, Koblenz
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German D M, Damian D (2014) The promises and perils of mining github. In: Proceedings of MSR, Hyderabad
Lee MJ, Hahn J, Ferwerda B, Moon JY, Choi J, Kim J (2013) Github developers use rockstars to overcome overflow of news. In: Proceedings of CHI, pp 133–138
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60
Marlow J, Dabbish L (2013) Activity traces and signals in software developer recruitment and hiring. In: San Antonio
Muffatto M, Faldani M (2003) Open source as a complex adaptive system. EMERGENCE 5(3):83– 100
Nagy D, Yassin A, Bhattacherjee A (2010) Organizational adoption of open source software: barriers and remedies. Commun ACM 53(3):148–151
Neville-Neil G V (2011) Think before you fork. Commun ACM 54(6):34–35
Nyman L, Lindman J (2013) Code forking, governance, and sustainability in open source software. Technology Innovation Management Review:7–12
Pham R, Singer L, Liskin O, Filho FF, Schneider K (2013) Creating a shared understanding of testing culture on a social coding site. In: Proceedings of ICSE, San Francisco
Robillard MP, Walker RJ, Zimmermann T (2010) Recommendation systems for software engineering. IEEE Soft 27(4):80–86
Robillard MP, Maalej W, Walker RJ, Zimmermann T (2014) Recommendation systems in software engineering. Springer
Robles G, Gonzalez-Barahona JM (2012) A comprehensive study of software forks: Dates, reasons and outcomes. Open Source Systems: Long-Term Sustainability 378:1–14
Thung F, FBissyande T, Lo D, Jiang L (2013) Network structure of social coding in github. In: 17th European conference on software maintenance and reengineering, Genova
Tian Y, Achananuparp P, Lubis IN, Lo D, Lim EP (2012) What does software engineering community microblog about?. In: MSR, pp 247–250
Tsay J, Herbsleb J, Dabbish L (2012) Social media and success in open source projects. In: Proceedings of CSCW, Seattle
Zhang L, Zou Y, Xie B, Zhu Z (2014) Recommending relevant projects via user behaviour: An exploratory study on github. In: Proceedings of the international workshop on crowd-based software development methods and technologies, pp 25–30
Acknowledgments
This work is supported by National Natural Science Foundation of China under Grant No.61300006, the State Key Laboratory of Software Development Environment under Grant No.SKLSDE-2015ZX-24, and Beijing Natural Science Foundation under Grant No.4163074.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by: Massimiliano Di Penta
Li Zhang is the first corresponding author
Rights and permissions
About this article
Cite this article
Jiang, J., Lo, D., He, J. et al. Why and how developers fork what from whom in GitHub. Empir Software Eng 22, 547–578 (2017). https://doi.org/10.1007/s10664-016-9436-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-016-9436-6