Skip to main content

Lessons Learned from a Decade of FLOSS Data Collection

  • Chapter
  • First Online:
Big Data Factories

Part of the book series: Computational Social Sciences ((CSS))

Abstract

In 2004 a collaborative research team based at Syracuse University and Elon University began collecting and sharing data in order to understand how free/libre open source software (FLOSS) is made. Embodying some of the same FLOSS ethos, this team created a public-facing repository for their own data and analyses and encouraged other researchers to use it and contribute to it. This chapter tells the story of how the FLOSSmole project began, where the data comes from and what we have learned from it, and how the project has grown and changed over the years. In addition to capturing snapshots of the current state of the FLOSS landscape, FLOSSmole also serves as a mirror to the larger FLOSS ecosystem, since changes in FLOSSmoleā€™s mission and goals over the years necessarily reflect some of the cultural and technological changes taking place in FLOSS itself. As such, FLOSSmole will continue to face many challenges in the future, including the continual need to provide broader access and more sophisticated and relevant data and analyses and to do all this in a way that is sustainable and community driven.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Biazzini, M., & Baudry, B.. (2014) ā€œMay the fork be with youā€: Novel metrics to analyze collaboration on GitHub. Proceedings of the 5th International Workshop on Emerging Trends in Software Metrics ā€“ WETSoM 2014, New York, pp. 37ā€“43.

    Google ScholarĀ 

  • Booch, G., & Brown, A. W. (2003). Collaborative development environments. Advances in Computers, 59, 1ā€“27.

    ArticleĀ  Google ScholarĀ 

  • Conklin, M. (2006). Beyond low-hanging fruit: Seeking the next generation in FLOSS data mining. In Proceedings of the 2nd IFIP WG 2.13 International Conference on Open Source Systems. Como: IFIP, Elsevier. June 8ā€“10. pp. 47ā€“57.

    Google ScholarĀ 

  • Corona, E. I. M., & Rossi, B. (2013). Linchpin developers in open source software projects. In Proceedings of The IASTED International Conference on Software Engineering (pp. 8). Innsbruck: ACTA Press. February 11ā€“13.

    Google ScholarĀ 

  • Crowston, K., Howison, J., & Hala, A. (2006). Information systems success in free and open source software development: Theory and measures. Software Processā€“Improvement and Practice, 11(2), 123ā€“148.

    ArticleĀ  Google ScholarĀ 

  • Crowston, K., Wiggins, A., Howison, J. (2010). Analyzing leadership dynamics in distributed group communication, 43rd Hawaii International Conference on System Sciences (HICSS 2010), Honolulu, Hawaii, USA, pp. 1ā€“10.

    Google ScholarĀ 

  • Gousios, G. (2013). The GHTorent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (pp. 233ā€“236). IEEE Press. May 18.

    Google ScholarĀ 

  • Howison, J. (2008) Cross-repository data linking with RDF and OWL. 3rd Workshop on Public Data about Software Development (WoPDaSD 2008), pp. 15ā€“22.

    Google ScholarĀ 

  • Howison, J., & Crowston, K. (2004). The perils and pitfalls of mining SourceForge. In Proceedings of the International Workshop on Mining Software Repositories (MSR 2004) (pp. 7ā€“11).

    Google ScholarĀ 

  • Howison, J., Conklin, M., & Crowston, K. (2006). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering, 1(3), 17ā€“26.

    ArticleĀ  Google ScholarĀ 

  • Iqbal, A., Cyganiak, R., Hausenblas, M. (2012). Integrating FLOSS repositories on the Web, Technical Report #2012-12-10 of the Digital Enterprise Research Institute (DERI) at the National University of Ireland, Galway.

    Google ScholarĀ 

  • Kina, K., Tsunoda, M., Tamada, H., & Igaki, H. (2016). Analyzing the decision criteria of software developers based on prospect theory. 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016) at Osaka, 03/2016.

    Google ScholarĀ 

  • Mockus, A. (2009). Amassing and indexing a large sample of version control systems: towards the census of public source code history. 6th IEEE Working Conference on Mining Software Repositories, May 16ā€“17.

    Google ScholarĀ 

  • Piggot, J., & Amrit, C. (2013). How healthy is my project? Open source project attributes as indicators of success. IFIP Advances in Information and Communication Technology Open Source Software: Quality Verification, Volume 404, Berlin, Heidelberg, pp. 30ā€“44.

    Google ScholarĀ 

  • Rezende, H. R., & Esmin, A. A. A. (2010). Proposed application of data mining techniques for clustering software projects. INFOCOMP Special Edition (pp. 43ā€“48).

    Google ScholarĀ 

  • Rossi, B., Russo, B., & Succi, G. (2010). Download patterns and releases in open source software projects: A perfect symbiosis? Open Source Software: New Horizons, 319, 252ā€“267.

    Google ScholarĀ 

  • Samoladas, I., Bibi, S., Stamelos, I., Sowe Sulayman, K., Deligiannis, I. (2007). A preliminary analysis of publicly available FLOSS measurements: Towards discovering maintainability trends. 2nd Workshop on Public Data about Software Development (WoPDaSD 2007).

    Google ScholarĀ 

  • Schweik, C. M., & English, R. (2012). Internet success: A study of open source software commons. Cambridge, MA: MIT Press.

    BookĀ  Google ScholarĀ 

  • Sood, A., Mohamed, T. P., Varma, V. (2013). Topic-focused summarization of chat conversations. Advances in Information Retrieval, Volume 7814 of the series Lecture Notes in Computer Science. 35th European Conference on IR Research, ECIR 2013, Moscow, Russia, March 24ā€“27. Springer. pp. 800ā€“803.

    Google ScholarĀ 

  • Squire, M. (2012). How the FLOSS research community uses email archives. International Journal of Open Source Software and Processes, 4(1), 37ā€“59.

    ArticleĀ  Google ScholarĀ 

  • Squire, M. (2013a). Project roles in the apache Software Foundation: A dataset. In Proceedings 10th Working Conference on Mining Software Repositories (MSR2013) (pp. 301ā€“304). San Francisco: IEEE. May 18ā€“19.

    Google ScholarĀ 

  • Squire, M. (2013b). Apache-affiliated Twitter screen names: A dataset. In Proceedings 10th Working Conference on Mining Software Repositories (MSR2013) (pp. 305ā€“308). San Francisco: IEEE. May 18ā€“19.

    Google ScholarĀ 

  • Squire, M. (2013c). A replicable infrastructure for empirical studies of email archives. In Proceedings 3rd International Workshop on Replication in Empirical Software Engineering (RESER 2013) (pp. 43ā€“50). Baltimore: IEEE. October 9.

    Google ScholarĀ 

  • Squire, M. (2015). ā€œShould we move to Stack Overflow?ā€ Measuring the utility of social media for developer support. In Proceedings of 37th International Conference on Software Engineering (ICSE-2015) vol. 2 (pp. 219ā€“228). Florence: IEEE. May 20ā€“22.

    Google ScholarĀ 

  • Squire, M. (2016). Data sets: The circle of life in Ruby hosting, 2003-2015. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR2016) (pp. 452ā€“455). Austin: ACM. May 15.

    Google ScholarĀ 

  • Squire, M. & Gazda, R. (2015). FLOSS as a source for profanity and insults: Collecting the data. In Proceedings of 48th Hawai'i International Conference on System Sciences (HICSS-48) (pp. 5290ā€“5298). Hawaii: IEEE. January 6ā€“8.

    Google ScholarĀ 

  • Squire, M., & Smith, A. (2015). The diffusion of pastebin tools to enhance communication in FLOSS mailing lists. In Proceedings of the 11th International Conference on Open Source Systems (OSS2015) (pp. 45ā€“57). Florence: IFIP, Elsevier. May 16.

    Google ScholarĀ 

  • Taylor, Q. C., Stevenson James, E.., Delorey Daniel, P., Knutson Charles, D. (2008). Author entropy: A metric for characterization of software authorship patterns, 3rd Workshop on Public Data about Software Development (WoPDaSD 2008), pp. 42ā€“47.

    Google ScholarĀ 

  • Valverde, S., Theraulaz, G., Gautrais, J., Fourcassie, V., & Sole, R. V. (2006). Self-organization patterns in wasp and open source communities. IEEE Intelligent Systems., 03/2006, 21(2), 36ā€“40.

    ArticleĀ  Google ScholarĀ 

  • Van Antwerp, M., & Madey, G. (2008). Advances in the Sourceforge Research Data Archive. In Workshop on Public Data about Software Development (WoPDaSD) at The 4th International Conference on Open Source Systems. Milan.

    Google ScholarĀ 

  • Wasserman, A., & Das, A.. (2007). Using FLOSSmole data in determining business readiness ratings. 2nd Workshop on Public Data about Software Development (WopDaSD 2007).

    Google ScholarĀ 

  • Zhang, F., Mockus, A., Zou, Y., Khomh, F., and Hassan Ahmed, E. (2013). How does context affect the distribution of software maintainability metrics? Proceedings of the 29th IEEE International Conference on Software Maintainability.

    Google ScholarĀ 

  • Zhu, L. & Hovy, E. (2005). Digesting virtual ā€œgeekā€ culture: The summarization of technical internet relay chats. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL ā€™05) (pp. 298ā€“305). Stroudsburg: Association for Computational Linguistics..

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Megan Squire .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Crowston, K., Squire, M. (2017). Lessons Learned from a Decade of FLOSS Data Collection. In: Matei, S., Jullien, N., Goggins, S. (eds) Big Data Factories. Computational Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-59186-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59186-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59185-8

  • Online ISBN: 978-3-319-59186-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics