skip to main content
10.1145/2512938.2512954acmconferencesArticle/Chapter ViewAbstractPublication PagescosnConference Proceedingsconference-collections
research-article

Fit or unfit: analysis and prediction of 'closed questions' on stack overflow

Published:07 October 2013Publication History

ABSTRACT

Stack Overflow is widely regarded as the most popular Community driven Question Answering (CQA) website for programmers. Questions posted on Stack Overflow which are not related to programming topics, are marked as `closed' by experienced users and community moderators. A question can be `closed' for five reasons -- duplicate, off-topic, subjective, not a real question and too localized. In this work, we present the first study of `closed' questions on Stack Overflow. We download 4 years of publicly available data which contains 3.4 Million questions. We first analyze and characterize the complete set of 0.1 Million `closed' questions. Next, we use a machine learning framework and build a predictive model to identify a `closed' question at the time of question creation.

One of our key findings is that despite being marked as `closed', subjective questions contain high information value and are very popular with the users. We observe an increasing trend in the percentage of closed questions over time and find that this increase is positively correlated to the number of newly registered users. In addition, we also see a decrease in community participation to mark a `closed' question which has led to an increase in moderation job time. We also find that questions closed with the Duplicate and Off Topic labels are relatively more prone to reputation gaming. Our analysis suggests broader implications for content quality maintenance on CQA websites. For the `closed' question prediction task, we make use of multiple genres of feature sets based on - user profile, community process, textual style and question content. We use a state-of-art machine learning classifier based on an ensemble framework and achieve an overall accuracy of 70.3%. Analysis of the feature space reveals that `closed' questions are relatively less informative and descriptive than non-`closed' questions. To the best of our knowledge, this is the first experimental study to analyze and predict `closed' questions on Stack Overflow.

References

  1. Privileges - create tags. http://stackoverflow.com/privileges/create-tags.Google ScholarGoogle Scholar
  2. Why are some questions closed, and what does "closed" mean? http://stackoverflow.com/help/closed-questions.Google ScholarGoogle Scholar
  3. What are "community wiki" posts? http://meta.stackoverflow.com/questions/11740/what-are-community-wiki-posts, September 2008.Google ScholarGoogle Scholar
  4. What is a "locked" post? http://meta.stackoverflow.com/questions/22228/what-is-a-locked-post, September 2008.Google ScholarGoogle Scholar
  5. What is a "protected" question? http://meta.stackoverflow.com/questions/52764/what-is-a-protected-question/, June 2010.Google ScholarGoogle Scholar
  6. Who are the diamond moderators, and what is their role? http://meta.stackoverflow.com/a/75192/214223, January 2011.Google ScholarGoogle Scholar
  7. Stack exchange data dump. http://www.clearbits.net/torrents/2076-aug-2012, August 2012.Google ScholarGoogle Scholar
  8. List of stack exchange moderators by sites. http://stackexchange.com/about/moderators?by=sites, June 2013.Google ScholarGoogle Scholar
  9. What is a day in life of a stackoverflow moderator? http://meta.stackoverflow.com/a/166630/214223, February 2013.Google ScholarGoogle Scholar
  10. E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the international conference on Web search and web data mining, pages 183--194. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec. Discovering value from community activity on focused question answering sites: a case study of stack overflow. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 850--858. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Atwood. Stack overflow creative commons data dump. http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/, June 2009.Google ScholarGoogle Scholar
  13. J. C. Campbell, C. Zhang, Z. Xu, A. Hindle, and J. Miller. Deficient documentation detection: a methodology to locate deficient project documentation using topic analysis. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 57--60. IEEE Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367--378, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. He and E. A. Garcia. Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions on, 21(9):1263--1284, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. S. Jeff Atwood. Stack exchange platform. http://stackexchange.com, September 2009.Google ScholarGoogle Scholar
  17. J. Jeon, W. B. Croft, J. H. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '06, pages 228--235, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Li, T. Jin, M. R. Lyu, I. King, and B. Mak. Analyzing and predicting question quality in community question answering services. In Proceedings of the 21st international conference companion on World Wide Web, WWW '12 Companion, pages 775--782, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Linares-Vásquez, B. Dit, and D. Poshyvanyk. An exploratory analysis of mobile development issues using stack overflow. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 93--96. IEEE Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Lotufo, L. Passos, and K. Czarnecki. Towards improving bug tracking systems with game mechanisms. In 9th Working Conference on Mining Software Repositories (MSR'12), Zurich, Switzerland, 06/2012 2012. IEEE (also published as GSDLAB TR 2011 09 29), IEEE (also published as GSDLAB TR 2011 09 29).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann. Design lessons from the fastest q&a site in the west. In Proceedings of the 2011 annual conference on Human factors in computing systems, pages 2857--2866. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Parnin, C. Treude, L. Grammel, and M.-A. Storey. Crowd documentation: Exploring the coverage and the dynamics of api discussions on stack overflow. Georgia Institute of Technology, Tech. Rep.Google ScholarGoogle Scholar
  23. T. Sakai, D. Ishikawa, N. Kando, Y. Seki, K. Kuriyama, and C.-Y. Lin. Using graded-relevance metrics for evaluating community qa answer selection. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 187--196. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Shah and J. Pomerantz. Evaluating and predicting answer quality in community qa. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 411--418. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Wang, K. Gill, M. Mohanlal, H. Zheng, and B. Y. Zhao. Wisdom in the social crowd: an analysis of quora.Google ScholarGoogle Scholar

Index Terms

  1. Fit or unfit: analysis and prediction of 'closed questions' on stack overflow

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      COSN '13: Proceedings of the first ACM conference on Online social networks
      October 2013
      254 pages
      ISBN:9781450320849
      DOI:10.1145/2512938

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 October 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      COSN '13 Paper Acceptance Rate22of138submissions,16%Overall Acceptance Rate69of307submissions,22%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader