Abstract
In 2004 a collaborative research team based at Syracuse University and Elon University began collecting and sharing data in order to understand how free/libre open source software (FLOSS) is made. Embodying some of the same FLOSS ethos, this team created a public-facing repository for their own data and analyses and encouraged other researchers to use it and contribute to it. This chapter tells the story of how the FLOSSmole project began, where the data comes from and what we have learned from it, and how the project has grown and changed over the years. In addition to capturing snapshots of the current state of the FLOSS landscape, FLOSSmole also serves as a mirror to the larger FLOSS ecosystem, since changes in FLOSSmoleās mission and goals over the years necessarily reflect some of the cultural and technological changes taking place in FLOSS itself. As such, FLOSSmole will continue to face many challenges in the future, including the continual need to provide broader access and more sophisticated and relevant data and analyses and to do all this in a way that is sustainable and community driven.
References
Biazzini, M., & Baudry, B.. (2014) āMay the fork be with youā: Novel metrics to analyze collaboration on GitHub. Proceedings of the 5th International Workshop on Emerging Trends in Software Metrics ā WETSoM 2014, New York, pp. 37ā43.
Booch, G., & Brown, A. W. (2003). Collaborative development environments. Advances in Computers, 59, 1ā27.
Conklin, M. (2006). Beyond low-hanging fruit: Seeking the next generation in FLOSS data mining. In Proceedings of the 2nd IFIP WG 2.13 International Conference on Open Source Systems. Como: IFIP, Elsevier. June 8ā10. pp. 47ā57.
Corona, E. I. M., & Rossi, B. (2013). Linchpin developers in open source software projects. In Proceedings of The IASTED International Conference on Software Engineering (pp. 8). Innsbruck: ACTA Press. February 11ā13.
Crowston, K., Howison, J., & Hala, A. (2006). Information systems success in free and open source software development: Theory and measures. Software ProcessāImprovement and Practice, 11(2), 123ā148.
Crowston, K., Wiggins, A., Howison, J. (2010). Analyzing leadership dynamics in distributed group communication, 43rd Hawaii International Conference on System Sciences (HICSS 2010), Honolulu, Hawaii, USA, pp. 1ā10.
Gousios, G. (2013). The GHTorent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (pp. 233ā236). IEEE Press. May 18.
Howison, J. (2008) Cross-repository data linking with RDF and OWL. 3rd Workshop on Public Data about Software Development (WoPDaSD 2008), pp. 15ā22.
Howison, J., & Crowston, K. (2004). The perils and pitfalls of mining SourceForge. In Proceedings of the International Workshop on Mining Software Repositories (MSR 2004) (pp. 7ā11).
Howison, J., Conklin, M., & Crowston, K. (2006). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering, 1(3), 17ā26.
Iqbal, A., Cyganiak, R., Hausenblas, M. (2012). Integrating FLOSS repositories on the Web, Technical Report #2012-12-10 of the Digital Enterprise Research Institute (DERI) at the National University of Ireland, Galway.
Kina, K., Tsunoda, M., Tamada, H., & Igaki, H. (2016). Analyzing the decision criteria of software developers based on prospect theory. 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016) at Osaka, 03/2016.
Mockus, A. (2009). Amassing and indexing a large sample of version control systems: towards the census of public source code history. 6th IEEE Working Conference on Mining Software Repositories, May 16ā17.
Piggot, J., & Amrit, C. (2013). How healthy is my project? Open source project attributes as indicators of success. IFIP Advances in Information and Communication Technology Open Source Software: Quality Verification, Volume 404, Berlin, Heidelberg, pp. 30ā44.
Rezende, H. R., & Esmin, A. A. A. (2010). Proposed application of data mining techniques for clustering software projects. INFOCOMP Special Edition (pp. 43ā48).
Rossi, B., Russo, B., & Succi, G. (2010). Download patterns and releases in open source software projects: A perfect symbiosis? Open Source Software: New Horizons, 319, 252ā267.
Samoladas, I., Bibi, S., Stamelos, I., Sowe Sulayman, K., Deligiannis, I. (2007). A preliminary analysis of publicly available FLOSS measurements: Towards discovering maintainability trends. 2nd Workshop on Public Data about Software Development (WoPDaSD 2007).
Schweik, C. M., & English, R. (2012). Internet success: A study of open source software commons. Cambridge, MA: MIT Press.
Sood, A., Mohamed, T. P., Varma, V. (2013). Topic-focused summarization of chat conversations. Advances in Information Retrieval, Volume 7814 of the series Lecture Notes in Computer Science. 35th European Conference on IR Research, ECIR 2013, Moscow, Russia, March 24ā27. Springer. pp. 800ā803.
Squire, M. (2012). How the FLOSS research community uses email archives. International Journal of Open Source Software and Processes, 4(1), 37ā59.
Squire, M. (2013a). Project roles in the apache Software Foundation: A dataset. In Proceedings 10th Working Conference on Mining Software Repositories (MSR2013) (pp. 301ā304). San Francisco: IEEE. May 18ā19.
Squire, M. (2013b). Apache-affiliated Twitter screen names: A dataset. In Proceedings 10th Working Conference on Mining Software Repositories (MSR2013) (pp. 305ā308). San Francisco: IEEE. May 18ā19.
Squire, M. (2013c). A replicable infrastructure for empirical studies of email archives. In Proceedings 3rd International Workshop on Replication in Empirical Software Engineering (RESER 2013) (pp. 43ā50). Baltimore: IEEE. October 9.
Squire, M. (2015). āShould we move to Stack Overflow?ā Measuring the utility of social media for developer support. In Proceedings of 37th International Conference on Software Engineering (ICSE-2015) vol. 2 (pp. 219ā228). Florence: IEEE. May 20ā22.
Squire, M. (2016). Data sets: The circle of life in Ruby hosting, 2003-2015. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR2016) (pp. 452ā455). Austin: ACM. May 15.
Squire, M. & Gazda, R. (2015). FLOSS as a source for profanity and insults: Collecting the data. In Proceedings of 48th Hawai'i International Conference on System Sciences (HICSS-48) (pp. 5290ā5298). Hawaii: IEEE. January 6ā8.
Squire, M., & Smith, A. (2015). The diffusion of pastebin tools to enhance communication in FLOSS mailing lists. In Proceedings of the 11th International Conference on Open Source Systems (OSS2015) (pp. 45ā57). Florence: IFIP, Elsevier. May 16.
Taylor, Q. C., Stevenson James, E.., Delorey Daniel, P., Knutson Charles, D. (2008). Author entropy: A metric for characterization of software authorship patterns, 3rd Workshop on Public Data about Software Development (WoPDaSD 2008), pp. 42ā47.
Valverde, S., Theraulaz, G., Gautrais, J., Fourcassie, V., & Sole, R. V. (2006). Self-organization patterns in wasp and open source communities. IEEE Intelligent Systems., 03/2006, 21(2), 36ā40.
Van Antwerp, M., & Madey, G. (2008). Advances in the Sourceforge Research Data Archive. In Workshop on Public Data about Software Development (WoPDaSD) at The 4th International Conference on Open Source Systems. Milan.
Wasserman, A., & Das, A.. (2007). Using FLOSSmole data in determining business readiness ratings. 2nd Workshop on Public Data about Software Development (WopDaSD 2007).
Zhang, F., Mockus, A., Zou, Y., Khomh, F., and Hassan Ahmed, E. (2013). How does context affect the distribution of software maintainability metrics? Proceedings of the 29th IEEE International Conference on Software Maintainability.
Zhu, L. & Hovy, E. (2005). Digesting virtual āgeekā culture: The summarization of technical internet relay chats. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL ā05) (pp. 298ā305). Stroudsburg: Association for Computational Linguistics..
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Crowston, K., Squire, M. (2017). Lessons Learned from a Decade of FLOSS Data Collection. In: Matei, S., Jullien, N., Goggins, S. (eds) Big Data Factories. Computational Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-59186-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-59186-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59185-8
Online ISBN: 978-3-319-59186-5
eBook Packages: Computer ScienceComputer Science (R0)