research-article

Fit or unfit: analysis and prediction of 'closed questions' on stack overflow

Authors:
Denzil Correa

IIIT-Delhi, Delhi, India

IIIT-Delhi, Delhi, India
View Profile

,
Ashish Sureka

IIIT-Delhi, Delhi, India

IIIT-Delhi, Delhi, India
View Profile

COSN '13: Proceedings of the first ACM conference on Online social networksOctober 2013Pages 201–212https://doi.org/10.1145/2512938.2512954

Published:07 October 2013Publication History

COSN '13: Proceedings of the first ACM conference on Online social networks

Pages 201–212

ABSTRACT

Stack Overflow is widely regarded as the most popular Community driven Question Answering (CQA) website for programmers. Questions posted on Stack Overflow which are not related to programming topics, are marked as `closed' by experienced users and community moderators. A question can be `closed' for five reasons -- duplicate, off-topic, subjective, not a real question and too localized. In this work, we present the first study of `closed' questions on Stack Overflow. We download 4 years of publicly available data which contains 3.4 Million questions. We first analyze and characterize the complete set of 0.1 Million `closed' questions. Next, we use a machine learning framework and build a predictive model to identify a `closed' question at the time of question creation.

One of our key findings is that despite being marked as `closed', subjective questions contain high information value and are very popular with the users. We observe an increasing trend in the percentage of closed questions over time and find that this increase is positively correlated to the number of newly registered users. In addition, we also see a decrease in community participation to mark a `closed' question which has led to an increase in moderation job time. We also find that questions closed with the Duplicate and Off Topic labels are relatively more prone to reputation gaming. Our analysis suggests broader implications for content quality maintenance on CQA websites. For the `closed' question prediction task, we make use of multiple genres of feature sets based on - user profile, community process, textual style and question content. We use a state-of-art machine learning classifier based on an ensemble framework and achieve an overall accuracy of 70.3%. Analysis of the feature space reveals that `closed' questions are relatively less informative and descriptive than non-`closed' questions. To the best of our knowledge, this is the first experimental study to analyze and predict `closed' questions on Stack Overflow.

References

Privileges - create tags. http://stackoverflow.com/privileges/create-tags.Google Scholar
Why are some questions closed, and what does "closed" mean? http://stackoverflow.com/help/closed-questions.Google Scholar
What are "community wiki" posts? http://meta.stackoverflow.com/questions/11740/what-are-community-wiki-posts, September 2008.Google Scholar
What is a "locked" post? http://meta.stackoverflow.com/questions/22228/what-is-a-locked-post, September 2008.Google Scholar
What is a "protected" question? http://meta.stackoverflow.com/questions/52764/what-is-a-protected-question/, June 2010.Google Scholar
Who are the diamond moderators, and what is their role? http://meta.stackoverflow.com/a/75192/214223, January 2011.Google Scholar
Stack exchange data dump. http://www.clearbits.net/torrents/2076-aug-2012, August 2012.Google Scholar
List of stack exchange moderators by sites. http://stackexchange.com/about/moderators?by=sites, June 2013.Google Scholar
What is a day in life of a stackoverflow moderator? http://meta.stackoverflow.com/a/166630/214223, February 2013.Google Scholar
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the international conference on Web search and web data mining, pages 183--194. ACM, 2008. Google ScholarDigital Library
A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec. Discovering value from community activity on focused question answering sites: a case study of stack overflow. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 850--858. ACM, 2012. Google ScholarDigital Library
J. Atwood. Stack overflow creative commons data dump. http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/, June 2009.Google Scholar
J. C. Campbell, C. Zhang, Z. Xu, A. Hindle, and J. Miller. Deficient documentation detection: a methodology to locate deficient project documentation using topic analysis. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 57--60. IEEE Press, 2013. Google ScholarDigital Library
J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367--378, 2002. Google ScholarDigital Library
H. He and E. A. Garcia. Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions on, 21(9):1263--1284, 2009. Google ScholarDigital Library
J. S. Jeff Atwood. Stack exchange platform. http://stackexchange.com, September 2009.Google Scholar
J. Jeon, W. B. Croft, J. H. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '06, pages 228--235, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
B. Li, T. Jin, M. R. Lyu, I. King, and B. Mak. Analyzing and predicting question quality in community question answering services. In Proceedings of the 21st international conference companion on World Wide Web, WWW '12 Companion, pages 775--782, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
M. Linares-Vásquez, B. Dit, and D. Poshyvanyk. An exploratory analysis of mobile development issues using stack overflow. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 93--96. IEEE Press, 2013. Google ScholarDigital Library
R. Lotufo, L. Passos, and K. Czarnecki. Towards improving bug tracking systems with game mechanisms. In 9th Working Conference on Mining Software Repositories (MSR'12), Zurich, Switzerland, 06/2012 2012. IEEE (also published as GSDLAB TR 2011 09 29), IEEE (also published as GSDLAB TR 2011 09 29).Google ScholarDigital Library
L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann. Design lessons from the fastest q&a site in the west. In Proceedings of the 2011 annual conference on Human factors in computing systems, pages 2857--2866. ACM, 2011. Google ScholarDigital Library
C. Parnin, C. Treude, L. Grammel, and M.-A. Storey. Crowd documentation: Exploring the coverage and the dynamics of api discussions on stack overflow. Georgia Institute of Technology, Tech. Rep.Google Scholar
T. Sakai, D. Ishikawa, N. Kando, Y. Seki, K. Kuriyama, and C.-Y. Lin. Using graded-relevance metrics for evaluating community qa answer selection. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 187--196. ACM, 2011. Google ScholarDigital Library
C. Shah and J. Pomerantz. Evaluating and predicting answer quality in community qa. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 411--418. ACM, 2010. Google ScholarDigital Library
G. Wang, K. Gill, M. Mohanlal, H. Zheng, and B. Y. Zhao. Wisdom in the social crowd: an analysis of quora.Google Scholar

Index Terms

Fit or unfit: analysis and prediction of 'closed questions' on stack overflow
1. Information systems
  1. Information retrieval

Recommendations

Chaff from the wheat: characterization and modeling of deleted questions on stack overflow
WWW '14: Proceedings of the 23rd international conference on World wide web

Stack Overflow is the most popular Community based Question Answering (CQA) website for programmers on the web with 2.05M users, 5.1M questions and 9.4M answers. Stack Overflow has explicit, detailed guidelines on how to post questions and an ebullient ...
Read More
Towards Understanding Negative Votes in a Question and Answer Social Network
Social Computing and Social Media. Design, Human Behavior and Analytics
Abstract
Online community question answering (CQA) social networking sites thrive when community members actively participate in the network. To influence participation, some CQA sites such as Stack Overflow reward members with incentives such as ...
Read More
Why will my question be closed?: NLP-based pre-submission predictions of question closing reasons on stack overflow
ICSE-NIER '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results

Closing a question on a community question answering forum such as Stack Overflow is a highly divisive event. On one hand, moderation is of crucial importance in maintaining the content quality indispensable for the future sustainability of the site. On ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
COSN '13: Proceedings of the first ACM conference on Online social networks
October 2013
254 pages
ISBN:9781450320849
DOI:10.1145/2512938
General Chair:
Muthu Muthukrishnan
Microsoft & Rutgers University, USA
,
Program Chairs:
Amr El Abbadi
UC Santa Barbara, USA
,
Balachander Krishnamurthy
AT&T Labs-Research, USA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
question quality
question-answering
stack overflow
Qualifiers
- research-article
Conference

Acceptance Rates
COSN '13 Paper Acceptance Rate22of138submissions,16%Overall Acceptance Rate69of307submissions,22%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 62
  Total Citations
  View Citations
- 799
  Total Downloads
- Downloads (Last 12 months)87
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fit or unfit: analysis and prediction of 'closed questions' on stack overflow

COSN '13: Proceedings of the first ACM conference on Online social networks

ABSTRACT

References

Cited By

Index Terms

Recommendations

Chaff from the wheat: characterization and modeling of deleted questions on stack overflow

Towards Understanding Negative Votes in a Question and Answer Social Network

Why will my question be closed?: NLP-based pre-submission predictions of question closing reasons on stack overflow

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fit or unfit: analysis and prediction of 'closed questions' on stack overflow

COSN '13: Proceedings of the first ACM conference on Online social networks

ABSTRACT

References

Cited By

Index Terms

Recommendations

Chaff from the wheat: characterization and modeling of deleted questions on stack overflow

Towards Understanding Negative Votes in a Question and Answer Social Network

Why will my question be closed?: NLP-based pre-submission predictions of question closing reasons on stack overflow

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media