research-article

Public Access

Predicting the quality of user contributions via LSTMs

Authors:
Rakshit Agrawal

University of California, Santa Cruz, Santa Cruz, CA, USA

University of California, Santa Cruz, Santa Cruz, CA, USA
View Profile

,
Luca deAlfaro

University of California, Santa Cruz, Santa Cruz, CA, USA

University of California, Santa Cruz, Santa Cruz, CA, USA
View Profile

OpenSym '16: Proceedings of the 12th International Symposium on Open CollaborationAugust 2016Article No.: 19Pages 1–10https://doi.org/10.1145/2957792.2957811

Published:17 August 2016Publication History

OpenSym '16: Proceedings of the 12th International Symposium on Open Collaboration

Pages 1–10

ABSTRACT

In many collaborative systems it is useful to automatically estimate the quality of new contributions; the estimates can be used for instance to flag contributions for review. To predict the quality of a contribution by a user, it is useful to take into account both the characteristics of the revision itself, and the past history of contributions by that user. In several approaches, the user's history is first summarized into a number of features, such as number of contributions, user reputation, time from previous revision, and so forth. These features are then passed along with features of the current revision to a machine-learning classifier, which outputs a prediction for the user contribution. The summarization step is used because the usual machine learning models, such as neural nets, SVMs, etc. rely on a fixed number of input features. We show in this paper that this manual selection of summarization features can be avoided by adopting machine-learning approaches that are able to cope with temporal sequences of input.

In particular, we show that Long-Short Term Memory (LSTM) neural nets are able to process directly the variable-length history of a user's activity in the system, and produce an output that is highly predictive of the quality of the next contribution by the user. Our approach does not eliminate the process of feature selection, which is present in all machine learning. Rather, it eliminates the need for deciding which features from a user's past are most useful for predicting the future: we can simply pass to the machine-learning apparatus all the past, and let it come up with an estimate for the quality of the next contribution.

We present models combining LSTM and NN for predicting revision quality and show that the prediction accuracy attained is far superior to the one obtained using the NN alone. More interestingly, we also show that the prediction attained is superior to the one obtained using user reputation as a feature summarizing the quality of a user's past work. This can be explained by noting that the primary function of user reputation is to provide an incentive towards performing useful contributions, rather than to be a feature optimized for prediction of future contribution quality. We also show that the LSTM output changes in a natural way in response to user behavior, increasing when the user performs a sequence of good quality contributions, and decreasing when the user performs a sequence of low-quality work. The LSTM output for a user could thus be usefully shown to other users, alongside the user's reputation and other information.

References

B. Adler, L. de Alfaro, and I. Pye. Detecting wikipedia vandalism using wikitrust. Notebook papers of CLEF, 1:22--23, 2010.Google Scholar
B. T. Adler and L. De Alfaro. A content-driven reputation system for the Wikipedia. In Proceedings of the 16th international conference on World Wide Web, pages 261--270. ACM, 2007. Google ScholarDigital Library
B. T. Adler, L. De Alfaro, S. M. Mola-Velasco, P. Rosso, and A. G. West. Wikipedia vandalism detection: Combining natural language, metadata, and reputation features. In Computational linguistics and intelligent text processing, pages 277--288. Springer, 2011. Google ScholarDigital Library
B. T. Adler, L. de Alfaro, I. Pye, and V. Raman. Measuring author contributions to the Wikipedia. In Proceedings of the 4th International Symposium on Wikis, page 15. ACM, 2008. Google ScholarDigital Library
S.-C. Chin, W. N. Street, P. Srinivasan, and D. Eichmann. Detecting Wikipedia vandalism with active learning and statistical language models. In Proceedings of the 4th workshop on Information credibility, pages 3--10. ACM, 2010. Google ScholarDigital Library
CLEF 2010 Labs and Workshops, M. Braschler, D. K. Harman, E. Pianta, and CLEF. Abstracts of the notebook papers. s. n.}, S. l., 2010.Google Scholar
G. De la Calzada and A. Dekhtyar. On measuring the quality of Wikipedia articles. In Proceedings of the 4th workshop on Information credibility, pages 11--18. ACM, 2010. Google ScholarDigital Library
F. A. Gers, N. N. Schraudolph, and J. Schmidhuber. Learning precise timing with LSTM recurrent networks. The Journal of Machine Learning Research, 3:115--143, 2003. Google ScholarDigital Library
A. Graves. Supervised Sequence Labelling with Recurrent Neural Networks. PhD thesis, Technishe Universit\"at M\"unchen, 2012.Google Scholar
A. Graves, N. Jaitly, and A.-r. Mohamed. Hybrid speech recognition with deep bidirectional LSTM. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on, pages 273--278. IEEE, 2013.Google ScholarCross Ref
A. Graves, M. Liwicki, S. FernAandez, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel connectionist system for unconstrained handwriting recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(5):855--868, 2009. Google ScholarDigital Library
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997. Google ScholarDigital Library
M. Hu, E.-P. Lim, A. Sun, H. W. Lauw, and B.-Q. Vuong. Measuring article quality in wikipedia: models and evaluation. In Proceedings of the sixteenth A CM conference on Conference on information and knowledge management, pages 243--252. ACM, 2007. Google ScholarDigital Library
S. Javanmardi, D. W. McDonald, and C. V. Lopes. Vandalism detection in Wikipedia: a high-performing, feature-rich model and its reduction through Lasso. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration, pages 82--90. ACM, 2011. Google ScholarDigital Library
MediaWiki. Mediawiki API, 2006.Google Scholar
S. M. Mola-Velasco. Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals: Lab Report for PAN at CLEF 2010. arXiv preprint arXiv: 1210.5560, 2010.Google Scholar
M. Potthast, B. Stein, and R. Gerling. Automatic vandalism detection in Wikipedia. In Advances in Information Retrieval, pages 663--668. Springer, 2008. Google ScholarDigital Library
K. Smets, B. Goethals, and B. Verdonk. Automatic vandalism detection in Wikipedia: Towards a machine learning approach. In AAAI workshop on Wikipedia and artificial intelligence: An Evolving Synergy, pages 43--48, 2008.Google Scholar
P. J. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550--1560, 1990.Google ScholarCross Ref
A. G. West, S. Kannan, and I. Lee. Detecting wikipedia vandalism via spatio-temporal analysis of revision metadata? In Proceedings of the Third European Workshop on System Security, pages 22--28. CM, 2010. Google ScholarDigital Library
Wikipedia. Wikipedia, the free encyclopedia, 2004.Google Scholar
D. M. Wilkinson and B. A. Huberman. Cooperation and quality in wikipedia. In Proceedings of the 2007 international symposium on Wikis, pages 157--164. ACM, 2007. Google ScholarDigital Library
T. WÃűhner and R. Peters. Assessing the quality of Wikipedia articles with lifecycle based metrics. In Proceedings of the 5th International Symposium on Wikis and Open Collaboration, page 16. ACM, 2009. Google ScholarDigital Library
M. D. Zeiler. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.Google Scholar

Predicting the quality of user contributions via LSTMs
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Modeling user reputation in wikis

Collaborative systems available on the Web allow millions of users to share information through a growing collection of tools and platforms such as wikis, blogs, and shared forums. By their very nature, these systems contain resources and information ...
Read More
Motivating User Contributions in Online Knowledge Communities: Virtual Rewards and Reputation
HICSS '15: Proceedings of the 2015 48th Hawaii International Conference on System Sciences

User contribution determines the success of online knowledge communities. As user contributions are voluntary, many online communities failed due to declining user contributions. Hence, it becomes critical to understand and design mechanisms that are ...
Read More
Air quality prediction using CT-LSTM
Abstract
With the development of industry, air pollution has become a serious problem. It is very important to create an air quality prediction model with high accuracy and good performance. Therefore, a new method of CT-LSTM is proposed in this paper, in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
OpenSym '16: Proceedings of the 12th International Symposium on Open Collaboration
August 2016
168 pages
ISBN:9781450344517
DOI:10.1145/2957792
General Chair:
Anthony I. (Tony) Wasserman
Carnegie Mellon University, Silicon Valley
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
LSTM
Machine Learning
Neural Networks
Reputation
Wikipedia
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
OpenSym '16 Paper Acceptance Rate23of49submissions,47%Overall Acceptance Rate108of195submissions,55%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 167
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predicting the quality of user contributions via LSTMs

OpenSym '16: Proceedings of the 12th International Symposium on Open Collaboration

ABSTRACT

References

Cited By

Recommendations

Modeling user reputation in wikis

Motivating User Contributions in Online Knowledge Communities: Virtual Rewards and Reputation

Air quality prediction using CT-LSTM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Predicting the quality of user contributions via LSTMs

OpenSym '16: Proceedings of the 12th International Symposium on Open Collaboration

ABSTRACT

References

Cited By

Recommendations

Modeling user reputation in wikis

Motivating User Contributions in Online Knowledge Communities: Virtual Rewards and Reputation

Air quality prediction using CT-LSTM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media