Abstract
The number of articles in Wikipedia is growing rapidly. It is important for Wikipedia to provide users with high quality and reliable articles. However, the quality assessment metric provided by Wikipedia are inefficient, and other mainstream quality detection methods only focus on the qualities of the English Wikipedia articles, and usually analyze the text contents of articles, which is also a time-consuming process. In this paper, we propose a method for detecting the article qualities of the Chinese Wikipedia based on C4.5 decision tree. The problem of quality detection is transformed to classification problem of high-quality and low-quality articles. By using the fields from the tables in the Chinese Wikipedia database, we built the decision trees to distinguish high-quality articles from low-quality ones.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lih, A.: Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource. In: Proc. of the 5th International Symposium on Online Journalism (2004)
Wilkinson, D.M., Huberman, B.A.: Cooperation and Quality in Wikipedia. In: Proc. of the International Symposium on Wikis, pp. 157–164 (2007)
Blumenstock, J.E.: Size Matters: Word Count as a Measure of Quality on Wikipedia. In: Proc. of the 17th International Conference on World Wide Web, pp. 1095–1096 (2008)
Zeng, H.L., Alhossaini, M.A., Ding, L., et al.: Computing Trust from Revision History. In: Proc. of the International Conference on Privacy, Security and Trust (2006)
Hu, M.Q., Lim, E.P., Sun, A.X., et al.: Measuring Article Quality in Wikipedia: Models and Evaluation. In: Proc. of the 16th ACM Conference on Information and Knowledge Management, pp. 243–252 (2007)
Lipka, N., Stein, B.: Identifying Featured Articles in Wikipedia: Writing Style Matters. In: Proc. of the 19th International Conference on World Wide Web, pp. 1147–1148 (2010)
Lex, E., Voelske, M., Errecalde, M., et al.: Measuring The Quality of Web Content Using Factual Information. In: Proc. of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality, pp. 7–10 (2012)
Anderka, M., Stein, B., Lipka, N.: Towards Automatic Quality Assurance in Wikipedia. In: Proc. of the 20th International Conference on World Wide Web, pp. 5–6 (2011)
Li, D.Y., Zhang, H.S., Wang, S.L., Wu, J.B.: Quality of Articles in Wikipedia. Geomatics and Information Science of Wuhan University 36(12), 1387–1391 (2011)
Quinlan, J.R.: C4. 5: Programs for Machine Learning. Morgan Kaufmann (1993)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xiao, K., Li, B., He, P., Yang, Xh. (2013). Detection of Article Qualities in the Chinese Wikipedia Based on C4.5 Decision Tree. In: Wang, M. (eds) Knowledge Science, Engineering and Management. KSEM 2013. Lecture Notes in Computer Science(), vol 8041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39787-5_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-39787-5_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39786-8
Online ISBN: 978-3-642-39787-5
eBook Packages: Computer ScienceComputer Science (R0)