Skip to main content

Detection of Article Qualities in the Chinese Wikipedia Based on C4.5 Decision Tree

  • Conference paper
  • 2232 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8041))

Abstract

The number of articles in Wikipedia is growing rapidly. It is important for Wikipedia to provide users with high quality and reliable articles. However, the quality assessment metric provided by Wikipedia are inefficient, and other mainstream quality detection methods only focus on the qualities of the English Wikipedia articles, and usually analyze the text contents of articles, which is also a time-consuming process. In this paper, we propose a method for detecting the article qualities of the Chinese Wikipedia based on C4.5 decision tree. The problem of quality detection is transformed to classification problem of high-quality and low-quality articles. By using the fields from the tables in the Chinese Wikipedia database, we built the decision trees to distinguish high-quality articles from low-quality ones.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lih, A.: Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource. In: Proc. of the 5th International Symposium on Online Journalism (2004)

    Google Scholar 

  2. Wilkinson, D.M., Huberman, B.A.: Cooperation and Quality in Wikipedia. In: Proc. of the International Symposium on Wikis, pp. 157–164 (2007)

    Google Scholar 

  3. Blumenstock, J.E.: Size Matters: Word Count as a Measure of Quality on Wikipedia. In: Proc. of the 17th International Conference on World Wide Web, pp. 1095–1096 (2008)

    Google Scholar 

  4. Zeng, H.L., Alhossaini, M.A., Ding, L., et al.: Computing Trust from Revision History. In: Proc. of the International Conference on Privacy, Security and Trust (2006)

    Google Scholar 

  5. Hu, M.Q., Lim, E.P., Sun, A.X., et al.: Measuring Article Quality in Wikipedia: Models and Evaluation. In: Proc. of the 16th ACM Conference on Information and Knowledge Management, pp. 243–252 (2007)

    Google Scholar 

  6. Lipka, N., Stein, B.: Identifying Featured Articles in Wikipedia: Writing Style Matters. In: Proc. of the 19th International Conference on World Wide Web, pp. 1147–1148 (2010)

    Google Scholar 

  7. Lex, E., Voelske, M., Errecalde, M., et al.: Measuring The Quality of Web Content Using Factual Information. In: Proc. of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality, pp. 7–10 (2012)

    Google Scholar 

  8. Anderka, M., Stein, B., Lipka, N.: Towards Automatic Quality Assurance in Wikipedia. In: Proc. of the 20th International Conference on World Wide Web, pp. 5–6 (2011)

    Google Scholar 

  9. Li, D.Y., Zhang, H.S., Wang, S.L., Wu, J.B.: Quality of Articles in Wikipedia. Geomatics and Information Science of Wuhan University 36(12), 1387–1391 (2011)

    Google Scholar 

  10. Quinlan, J.R.: C4. 5: Programs for Machine Learning. Morgan Kaufmann (1993)

    Google Scholar 

  11. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xiao, K., Li, B., He, P., Yang, Xh. (2013). Detection of Article Qualities in the Chinese Wikipedia Based on C4.5 Decision Tree. In: Wang, M. (eds) Knowledge Science, Engineering and Management. KSEM 2013. Lecture Notes in Computer Science(), vol 8041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39787-5_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39787-5_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39786-8

  • Online ISBN: 978-3-642-39787-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics