Abstract
In our chapter we are working within the field of Web content mining. In relation to the user’s description of a Web page, we define a new term: Named object. Named objects are used for a new classification of selected methods dealing with mining, information from Web pages. This classification has been made on the basis of a survey of published methods. Our approach is based on the perception of a Web page through an intention. This intention is important both for the users and authors of a Web page. Named object is near to Web design patterns, which became a basis for our own mining method, Pattrio. The Pattrio method is introduced in this work together with a few experiments.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Alexander, Ch.: A Pattern Language: Towns, Buildings, Construction. Oxford University Press, New York (1977)
Boese, E. S., Howe, A. E.: Effects of web document evolution on genre classification. 14th ACM Information and Knowledge Management (Bremen, Germany, October 31–November 05, 2005). CIKM’ 05. ACM, New York NY, pp. 632–639 (2005)
Borchers, J.O.: Interaction design patterns: twelve theses, Position paper, Workshop on Pattern Languages for Interaction Design. CHI 2000 Conference on Human Factors in Computing Systems, pp. 1–6 (2000)
Chaker, J., Ounelli, H.: Genre Categorization of Web Pages. ICDM Workshops (2007)
Chang Ch.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems, IEEE Transactions on Knowledge and Data Engineering, 18, 1411–1428 (2006)
Conrad, J.G., Schilder, F.: Opinion mining in legal blogs. Artificial intelligence and Law (Stanford, June 04–08, 2007). ICAIL’ 07. ACM, New York, NY, pp. 231–236. (2007)
Dong, L., Watters, C.R., Duffy J., Shepherd, M.A.: An Examination of Genre Attributes for Web Page Classification. HICSS (2008)
Van Duyne, D.K., Landay, J.A., Hong, J.I. The Design of Sites: Patterns, Principles, and Processes, for Crafting a Customer-Centered Web Experience. Pearson Education (2002)
Embley, D.E., Tao, C., Liddle, S.W.: Automating the extraction of data from HTML tables with unknown structure. Data Knowl. Eng. 5, 3–28
Flieder, K., Modritscher, F. Foundations of a pattern language based on Gestalt principles. In CHI’ 06 Extended Abstracts on Human Factors in Computing Systems, pp. 773–778 (2006)
Gagneux, A., Eglin, V., Emptoz, H.: Quality Approach of Web Documents by an Evaluation of Structure Relevance, Proceedings of WDA (2001)
Gatterbauer, W., Bohunsky, P., Herzog, M., Krupl, B., Pollak, B.: Towards domain-independent information extraction from web tables. World Wide Web’ 07, (2007)
Goldberg, J. H., Stimson, M. J., Lewenstein, M., Scott, N., Wichansky, A. M.: Eye tracking in web search tasks: design implications. Symposium on Eye Tracking Research & Applications, ETRA’ 02, ACM, pp. 51–58 (2002)
Graham, L.: A pattern language for web usability. Addison-Wesley (2003)
Han, J. Chang, K.: Data Mining for Web Intelligence. Computer 35: 11, 64–70 (2002)
Han J., Kamber, M.: Data mining: concepts and techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA. (2000)
Kanaris, I., Stamatatos, E.: Webpage Genre Identification Using Variable-Length Character n-Grams Tools with Artificial Intelligence, 2007. ICTAI 2007, pp. 3–10 (2007)
Kennedy, A., Shepherd, M.: Automatic identification of home pages on the web. Annual Hawaii International Conference on System Sciences (2005)
Kocibova, J., Klos, K., Lehecka, O., Kudelka, M., Snasel, V.: Web Page Analysis: Experiments Based on Discussion and Purchase Web Patterns. IEEE/ACM WIC Web Intelligence Workshops (2007).
Kohonen, T.: Self-Organizing Maps, Springer (2006)
Kosala, K. Blockeel, H.: Web Mining Research: A Survey, SIGKDD Explorations 2. 1–15 (2000)
Kudelka, M., Snasel, V., Lehecka, O., El-Qawasmeh, E.: Semantic Analysis of Web Pages Using Web Patterns. IEEE/ACM/WIC Web Intelligence (2006)
Kudelka, M., Snasel, V., Lehecka, O., El-Qawasmeh, E., Pokorny, J.: Web Pages Reordering and Clustering Based on Web Patterns. SOFSEM 2008, Novy Smokovec, Slovakia, in Springer LNCS (2008)
Kudelka, M., Snasel V., Lehecka, O., El-Qawasmeh, E.: Web Content Mining Using Web Design Patterns, IEEE International Conference on Information Reuse and Integration (2008)
Lee, D., Jeong, O., and Lee, S.: Opinion mining of customer feedback data on the web. Conference on Ubiquitous information Management and Communication ICUIMC’ 08. pp. 230–235 (2008).
Lerman, K., Getoor, L., Minton, S., Knoblock, C.: Using the structure of Web sites for automatic segmentation of tables. ACM SIGMOD Management of Data, SIGMOD’ 04. pp. 119–130 (2004)
Limanto, H. Y., Giang, N. N., Trung, V. T., Zhang, J., He, Q., Huy, N. Q.: An information extraction engine for web discussion forums. World Wide Web www’ 05. pp. 978–979 (2005)
Nie, Z., Wen, J-R., Ma W-Y.: Object-level Vertical Search. CIDR 2007, pp. 235–246. (2007)
Nielsen, J., Loranger, H.: Prioritizing Web Usability. New Riders Press, Berkeley. (2006)
Pivk, A., Cimiano, P., Sure, Y., Gams, M., Rajkovic, V., Studer, R.: Transforming arbitrary tables into logical form with TARTAR. Data Knowl. Eng. 60, 567–595 (2007)
Reis, D.C., Golgher, P.B., Silva, A.S., Laender, A.F.: Automatic web news extraction using tree edit distance. In WWW’ 04: Proceedings of the 13th international conference on World Wide Web (2004)
Santini, M.: Characterizing Genres of Web Pages: Genre Hybridism and Individualization. HICSS 2007, p. 71 (2007)
Salton G., Wong, A. Yang, C. S.: A vector space model for automatic indexing, Communications of the ACM 18, 613–620 (1975)
Schmidt, S., Mandl, S., Ludwig, B., Stoyan, H.: Product-advisory on the web: An information extraction approach, Artificial Intelligence and Applications, pp. 678–683 (2007)
Schuth, A., Marx, M., de Rijke, M.: Extracting the discussion structure in comments on news-articles. ACM international Workshop on Web information and Data Management pp. 97–104 (2007)
Snasel, V., Rezankova, H., Husek, D., Kudelka, M., Lehecka, O.: Semantic Analysis of Web Pages Using Cluster Analysis and Nonnegative Matrix Factorization. IEEE/WIC AWIC 2007, Springer ASC (2007)
Tidwell, J.: Designing Interfaces: Patterns for Effective Interaction Design, O’Reilly Media, Inc. (2006)
Van Welie, M.: Pattern in Interaction Design, http://www.welie.com, (last access 2008-08-31).
Wong, T-L. W. Lam, W.: Hot Item Mining and Summarization from Multiple Auction Web Sites. ICDM 2005, pp. 797–800 (2005)
Yahoo!, http://www.yahoo.com, (last access 2008-08-31).
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences, International Journal on Document Analysis and Recognition, 7, 1–16 (2004)
Zheng, S., Song, R., Wen, J.-R.: Template-independent news extraction based on visual consistency. In Proceedings of AAAI-2007, pp. 1507–1511 (2005).
Zheng, S., Zhou, D., Li, J., Giles, C.L.: Extracting Author Meta-Data from Web Using Visual Features, Data Mining Workshops, ICDM Workshops, 2007, pp. 33–40 (2007)
Zhu, J., Zhang, B., Nie, Z., Wen, J.R., Hon, H.W. Webpage understanding: an integrated approach, Conference on Knowledge Discovery in Data, San Jose, California, USA, pp. 903–912 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Indian Institute of Information Technology, India
About this paper
Cite this paper
Snášel, V., Kudelka, M. (2009). Web Content Mining Focused on Named Objects. In: Tiwary, U.S., Siddiqui, T.J., Radhakrishna, M., Tiwari, M.D. (eds) Proceedings of the First International Conference on Intelligent Human Computer Interaction. Springer, New Delhi. https://doi.org/10.1007/978-81-8489-203-1_3
Download citation
DOI: https://doi.org/10.1007/978-81-8489-203-1_3
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-8489-404-2
Online ISBN: 978-81-8489-203-1
eBook Packages: Computer ScienceComputer Science (R0)