Skip to main content

An Empirical Study on Word Sense Disambiguation for Adult Content Filtering

  • Conference paper
International Joint Conference SOCO’14-CISIS’14-ICEUTE’14

Abstract

It is obvious that Internet can act as a powerful source of information. However, as happens with other media, each type of information is targeted to a different type of public. Specifically, adult content should not be accessible for children. In this context, several approaches for content filtering have been proposed both in the industry and the academia. Some of these approaches use the text content of a webpage to model a classic bag-of-word model to categorise them and filter the inappropriate content. These methods, to the best of our knowledge, have no semantic information at all and, therefore, they may be surpassed using different attacks that exploit the well-known ambiguity of natural language. Given this background, we present the first semantics-aware adult filtering approach that models webpages, applying a previous word-sense-disambiguation step in order to face the ambiguity. We show that this approach can improve the filtering results of the classic statistical models. abstract environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Gómez Hidalgo, J., Sanz, E., García, F., Rodríguez, M.: Web content filtering. Advances in Computers 76, 257–306 (2009)

    Article  Google Scholar 

  2. Choi, B., Chung, B., Ryou, J.: Adult Image Detection Using Bayesian Decision Rule Weighted by SVM Probability. In: 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology, pp. 659–662. IEEE (2009)

    Google Scholar 

  3. Du, R., Safavi-Naini, R., Susilo, W.: Web filtering using text classification. In: The 11th IEEE International Conference on Networks, ICON 2003, pp. 325–330. IEEE (2003)

    Google Scholar 

  4. Kim, Y., Nam, T.: An efficient text filter for adult web documents. In: The 8th International Conference on Advanced Communication Technology, ICACT 2006, vol. 1, 3 p. IEEE (2006)

    Google Scholar 

  5. Ho, W., Watters, P.: Statistical and structural approaches to filtering internet pornography. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4792–4798. IEEE (2004)

    Google Scholar 

  6. Sanderson, M.: Wsd and ir. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 142–151. Springer, New York (1994)

    Google Scholar 

  7. Nelson, B., Barreno, M., et al.: Misleading learners: Co-opting your spam filter. In: Machine Learning in Cyber Trust, pp. 17–51 (2009)

    Google Scholar 

  8. Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced topic-based vector space model for semantics-aware spam filtering. Expert Systems With Applications (39), 437–444, doi:10.1016/j.eswa.2011.07.034

    Google Scholar 

  9. Laorden, C., Santos, I., Sanz, B., Alvarez, G., Bringas, P.G.: Word sense disambiguation for spam filtering. Electronic Commerce Research and Applications 11, 290–298 (2012), doi:10.1016/j.elerap.2011.11.004

    Article  Google Scholar 

  10. Mavroeidis, D., Tsatsaronis, G., Vazirgiannis, M., Theobald, M., Weikum, G.: Word sense disambiguation for exploiting hierarchical thesauri in text classification. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 181–192. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Xu, H., Yu, B.: Automatic thesaurus construction for spam filtering using revised back propagation neural network. Expert Systems with Applications 37, 18–23 (2010)

    Article  Google Scholar 

  12. Padr, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012), Istanbul, Turkey. ELRA (2012)

    Google Scholar 

  13. Agirre, E., Soroa, A.: Personalizing pagerank for wsd. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–41 (2009)

    Google Scholar 

  14. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web (1999)

    Google Scholar 

  15. Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: An open-source suite of language analyzers. In: Proceedings of the 4th LREC, vol. 4 (2004)

    Google Scholar 

  16. Carreras, X., Padró, L.: A flexible distributed architecture for natural language analyzers. In: Proceedings of the LREC, vol. 2 (2002)

    Google Scholar 

  17. Garner, S.R., et al.: Weka: The waikato environment for knowledge analysis

    Google Scholar 

  18. Salton, G., McGill, M.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)

    Google Scholar 

  19. Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. Int. J. Comput. Appl. Technol. 35, 183–193 (2009)

    Article  Google Scholar 

  20. Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)

    Article  MATH  Google Scholar 

  21. Becker, J., Kuropka, D.: Topic-based vector space model. In: Proceedings of the 6th International Conference on Business Information Systems, pp. 7–12 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor Santos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Santos, I. et al. (2014). An Empirical Study on Word Sense Disambiguation for Adult Content Filtering. In: de la Puerta, J., et al. International Joint Conference SOCO’14-CISIS’14-ICEUTE’14. Advances in Intelligent Systems and Computing, vol 299. Springer, Cham. https://doi.org/10.1007/978-3-319-07995-0_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07995-0_53

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07994-3

  • Online ISBN: 978-3-319-07995-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics