Skip to main content

Web Usage Mining in Noisy and Ambiguous Environments: Exploring the Role of Concept Hierarchies, Compression, and Robust User Profiles

  • Conference paper
From Web to Social Web: Discovering and Deploying User and Content Profiles (WebMine 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4737))

Included in the following conference series:

Abstract

Recent efforts in Web usage mining have started incorporating more semantics into the data in order to obtain a representation deeper than shallow clicks. In this paper, we review these approaches, and examine the incorporation of simple cues from a website hierarchy in order to relate clickstream events that would otherwise seem unrelated, and thus perform URL compression. We study their effect on data reduction and on the quality of the resulting knowledge discovery. Web usage data is also notorious for containing moderate to high amounts of noise, thus motivating the use of robust knowledge discovery algorithms that can resist noise and outliers with various degrees of resistance or robustness. Therefore, we also examine the effect of robustness on the final quality of the knowledge discovery. Our experimental results conclude that post-processed and robust user profiles have better quality than raw profiles that are estimated through optimization alone. However URL compression, as expected, tends to reduce the quality, but also can drastically reduce the size of the data set, resulting in faster mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Karnam, P., Joshi, A., Punyapu, C.: Personalization and asynchronicity to support mobile web access. In: Workshop on Web Information and Data Management, ACM 7th Intl. Conf. on Information and Knowledge Management, ACM Press, New York (1998)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Mining generalized association rules. In: 21st VLDB Conference, Zurich (September 1995)

    Google Scholar 

  3. Berendt, B.: Understanding web usage at different levels of abstraction: coarsening and visualizing sequences. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WEBKDD 2001 - Mining Web Log Data Across All Customers Touch Points. LNCS (LNAI), vol. 2356, Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Borges, J., Levene, M.: Data mining of user navigation patterns. In: Masand, B., Spiliopoulou, M. (eds.) Web Usage Analysis and User Profiling. LNCS (LNAI), vol. 1836, pp. 92–111. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Buchner, A., Mulvenna, M.D.: Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record 4(27) (1999)

    Google Scholar 

  6. Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Using taxonomy, discriminants, and signatures for navigation in text databases. In: 23rd VLDB Conference, Athens, Greece (1997)

    Google Scholar 

  7. Cooley, R., Mobasher, B., Srivastava, J.: Web mining: Information and pattern discovery on the world wide web. In: IEEE Intl. Conf. Tools with AI, Newport Beach, CA, pp. 558–567. IEEE Computer Society Press, Los Alamitos (1997)

    Chapter  Google Scholar 

  8. Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Journal of knowledge and information systems 1(1) (1999)

    Google Scholar 

  9. Dai, H., Mobasher, B.: Using ontologies to discover domain-level web usage profiles. In: 2nd Semantic Web Mining Workshop at ECML/PKDD-2002 (2002)

    Google Scholar 

  10. Eirinaki, M., Lampos, H., Vazirgiannis, M., Varlamis, I.: Sewep: Using site semantics and a taxonomy to enhance the web personalization process. In: ACM conference on Knowledge Discovery in Data, Washington DC, USA (August 2003)

    Google Scholar 

  11. Ganesan, P., Garcia-Molina, H., Widom, J.: Exploiting hierarchical domain structure to compute similarity. ACM Trans. Inf. Syst. 21(1), 64–93 (2003)

    Article  Google Scholar 

  12. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2–3), 107–145 (2001)

    Article  MATH  Google Scholar 

  13. Holland, J.H.: Adaptation in natural and artificial systems. MIT Press, Cambridge (1975)

    Google Scholar 

  14. Joshi, A., Weerawarana, S., Houstis, E.: On disconnected browsing of distributed information. In: RIDE. Seventh IEEE Intl. Workshop on Research Issues in Data Engineering, pp. 101–108. IEEE Computer Society Press, Los Alamitos (1997)

    Google Scholar 

  15. Klir, G.J., Yuan, B.: Fuzzy Sets and Fuzzy Logic. Prentice-Hall, Englewood Cliffs (1995)

    MATH  Google Scholar 

  16. Levene, M., Borges, J., Loizou, G.: Zipf’s law for web surfers. Knowl. Inf. Syst. 3(1), 120–129 (2001)

    Article  MATH  Google Scholar 

  17. Mladenic, D.: Text learning and related intelligent agents. IEEE Expert (July 1999)

    Google Scholar 

  18. Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Effective personalizaton based on association rule discovery from web usage data. In: ACM Workshop on Web information and data management, Atlanta, GA (November 2001)

    Google Scholar 

  19. Nasraoui, O., Goswami, S.: Mining and validating localized frequent itemsets with dynamic tolerance. In: SIAM conference on Data Mining, Bethesda, MD, USA (April 2006)

    Google Scholar 

  20. Nasraoui, O., Krishnapuram, R.: A new evolutionary approach to web usage and context sensitive associations mining. International Journal on Computational Intelligence and Applications - Special Issue on Internet Intelligent Systems 2(3), 339–348

    Google Scholar 

  21. Nasraoui, O., Krishnapuram, R.: A novel approach to unsupervised robust clustering using genetic niching. In: IEEE International Conference on Fuzzy Systems, New Orleans, pp. 170–175. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  22. Nasraoui, O., Krishnapuram, R.: One step evolutionary mining of context sensitive associations and web navigation patterns. In: SIAM conference on Data Mining, Arlington, VA, pp. 531–547 (2002)

    Google Scholar 

  23. Nasraoui, O., Krishnapuram, R., Frigui, H., Joshi, A.: Extracting web user profiles using relational competitive fuzzy clustering. International Journal of Artificial Intelligence Tools 9(4), 509–526 (2000)

    Article  Google Scholar 

  24. Nasraoui, O., Krishnapuram, R., Joshi, A.: Mining web access logs using a relational clustering algorithm based on a robust estimator. In: 8th International World Wide Web Conference, Toronto, Canada, pp. 40–41 (1999)

    Google Scholar 

  25. Nasraoui, O., Krishnapuram, R., Joshi, A.: Mining web access logs using a relational clustering algorithm based on a robust estimator. In: NAFIPS Conference, New York, NY, pp. 705–709 (June 1999)

    Google Scholar 

  26. Nasraoui, O., Soliman, M., Badia, A.: Mining evolving user profiles and more: A real-life case study. In: Data Mining meets Marketing workshop, New York, NY, USA (November 2005)

    Google Scholar 

  27. Oberle, D., Berendt, B., Hotho, A., Gonzalez, J.: Conceptual user tracking. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI), vol. 2663, Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  28. Perkowitz, M., Etzioni, O.: Adaptive web sites: an ai challenge. In: Intl. Joint Conf. on AI (1997)

    Google Scholar 

  29. Perkowitz, M., Etzioni, O.: Adaptive web sites: Automatically synthesizing web pages. In: AAAI 98 (1998)

    Google Scholar 

  30. Shahabi, C., Zarkesh, A.M., Abidi, J., Shah, V.: Knowledge discovery from users web-page navigation. In: Proceedings of workshop on research issues in Data engineering, Birmingham, England (1997)

    Google Scholar 

  31. Spiliopoulou, M., Faulstich, L.C.: Wum: A web utilization miner. In: Proceedings of EDBT workshop WebDB98, Valencia, Spain (1999)

    Google Scholar 

  32. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.N.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1(2), 1–12 (2000)

    Article  Google Scholar 

  33. Terveen, L., Hill, W., Amento, B.: Phoaks - a system for sharing recommendations. Comm. ACM 40(3) (1997)

    Google Scholar 

  34. Yan, T., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From user access patterns to dynamic hypertext linking. In: Proceedings of the 5th International World Wide Web conference, Paris, France (1996)

    Google Scholar 

  35. Zaiane, O., Han, J.: Webml: Querying the world-wide web for resources and knowledge. In: Workshop on Web Information and Data Management, 7th Intl. Conf. on Information and Knowledge Management (1998)

    Google Scholar 

  36. Zaiane, O., Xin, M., Han, J.: Discovering web access patterns and trends by applying olap and data mining technology on web logs. In: Advances in Digital Libraries, Santa Barbara, CA, pp. 19–29 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bettina Berendt Andreas Hotho Dunja Mladenic Giovanni Semeraro

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nasraoui, O., Saka, E. (2007). Web Usage Mining in Noisy and Ambiguous Environments: Exploring the Role of Concept Hierarchies, Compression, and Robust User Profiles. In: Berendt, B., Hotho, A., Mladenic, D., Semeraro, G. (eds) From Web to Social Web: Discovering and Deploying User and Content Profiles. WebMine 2006. Lecture Notes in Computer Science(), vol 4737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74951-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74951-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74950-9

  • Online ISBN: 978-3-540-74951-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics