skip to main content
10.1145/1242572.1242622acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Robust web page segmentation for mobile terminal using content-distances and page layout information

Authors Info & Claims
Published:08 May 2007Publication History

ABSTRACT

The demand of browsing information from general Web pages using a mobile phone is increasing. However, since the majority of Web pages on the Internet are optimized for browsing from PCs, it is difficult for mobile phone users to obtain sufficient information from the Web. Therefore, a method to reconstruct PC-optimized Web pages for mobile phone users is essential. An example approach is to segment the Web page based on its structure, and utilize the hierarchy of the content element to regenerate a page suitable for mobile phone browsing. In our previous work, we have examined a robust automatic Web page segmentation scheme which uses the distance between content elements based on the relative HTML tag hierarchy, i.e., the number and depth of HTML tags in Web pages. However, this scheme has a problem that the content-distance based on the order of HTML tags does not always correspond to the intuitional distance between content elements on the actual layout of a Web page. In this paper, we propose a hybrid segmentation method which segments Web pages based on both the content-distance calculated by the previous scheme, and a novel approach which utilizes Web page layout information. Experiments conducted to evaluate the accuracy of Web page segmentation results prove that the proposed method can segment Web pages more accurately than conventional methods. Furthermore, implementation and evaluation of our system on the mobile phone prove that our method can realize superior usability compared to commercial Web browsers.

References

  1. Gen Hattori, Kazunori Matsumoto, Fumiaki Sugaya. Auto Web Page Distilling Scheme Based on Content-Distance Using Relative Tag Hierarchy. DBSJ Letters, Vol.4, No.1, 2005. (in Japanese).Google ScholarGoogle Scholar
  2. Gen Hattori, Kazunori Matsumoto, Fumiaki Sugaya. Dynamic Segmentation of a Web Page Based on Standard Deviation of Content-Distances. IPSJ Transaction of Database, Vol.47, No.SIG8, pp. 81--89, 2006. (in Japanese).Google ScholarGoogle Scholar
  3. Seiji Yamada, Yuki Nakai. Monitoring Partial Update of Web Pages by Interactive Relational Learning. JSAI Technical Papers of Active Mining, Vol. 17, No. 5, pp.614--621, 2002. (in Japanese).Google ScholarGoogle Scholar
  4. Small Screen Rendering (Opera Software ASA). http://www.opera.com/products/mobile/smallscreen/.Google ScholarGoogle Scholar
  5. Yu Chen, Wei-Ying Ma, Hong-Jiang Zhang. Improving Web Browsing on Small Devices Based on Table Classification. The Twelfth International World Wide Web Conference, 20--24, May 2003.Google ScholarGoogle Scholar
  6. Hidetaka Masuda, Shuichi Tsukamoto, Saisuke Yasutomi, and Hiroshi Nakagawa. Recognition of HTML Table Structure. The First International Joint Conference on Natural Language Processing (IJCNLP-04), pp.183--188, 2004.Google ScholarGoogle Scholar
  7. George Buchanan, Sarah Farrant, Matt Jones, and Harold Thimbleby. Improving Mobile Internet Usability. Proc. of 10th International World Wide Web Conference, Hong Kong, China, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jones, M., Buchanan, G., Thimbleby, H. Sorting out Searching on Small Screen Devices. Conference on Mobile HCI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. O. Buyukkokten, H. Garcia-Molina, and A. Paepcke. Seeing the whole in parts: Text summarization for web browsing on handheld devices. Proc. of 10th International World Wide Web Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. O. Buyukkokten, H. Garcia-Molina, A. Paepcke, and T. Winograd. Power browser: Efficient web browsing for PDAs. Proc. of Human-Computer Interaction Conference 2000, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Chen, W. Ma, and H. Zhang. Detecting web page structure for adaptive viewing on small form factor devices. In Proc. World Wide Web Conference 2003, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Maekawa, T. Hara, and S. Nishio. A Collaborative Web Browsing System for Multiple Mobile Users. Proc. of IEEE International Conference on Pervasive Computing and Communications (PerCom 2006), pp. 22--35, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Baluja, S. Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework. Proc. of the International Conference on World Wide Web (WWW '06), pp.33--42, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Milic-Frayling, R. Sommerer. SmartView: Enhanced Document Viewer for Mobile Devices. MSR-TR-2002-114, 2002.Google ScholarGoogle Scholar
  15. N. Milic-Frayling, R. Sommerer, K. Rodden, and A. F. Blackwell. SearchMobil: Web Viewing and Search for Mobile Devices. Proc. of the International Conference on World Wide Web (WWW '03), 2003.Google ScholarGoogle Scholar
  16. Anderson, C.R., Domingos, P., and Weld, D.S. Personalizing web sites for mobile users. Proc. of 10th International World Wide Web Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Maekawa, T. Hara, and S. Nishio. Content Description and Partitioning Methods for Collaborative Browsing by Multiple Mobile Users. Proc. of International Workshop on Mobility in Databases and Distributed Systems (MDDS 2005), pp. 1068--1072, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  18. Wobbrock, J., Forlizzi, J., Hudson, S., Myers, B. WebThumb: interaction techniques for small-screen browsers. In Proc. UIST '02, pp. 205--208, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. am H., Baudisch P. Summary Thumbnails: Readable Overviews for Small Screen Web Browsers. Proc. ACM CHI 2005, p. 681--690, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Patrick Baudisch, Xing Xie, Chong Wang, Wei-Ying Ma, Collapse-to-Zoom: Viewing Web Pages on Small Screen Devices by Interactively Removing Irrelevant Content, 17th Annual ACM Symposium on User Interface Software and Technology (UIST 2004), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. HTML Tidy Library Project. http://tidy.sourceforge.net/.Google ScholarGoogle Scholar
  22. Yahoo! Business News. http://news.yahoo.com/i/749.Google ScholarGoogle Scholar
  23. In-Stat. http://www.instat.com/.Google ScholarGoogle Scholar
  24. Alexa. http://www.alexa.com/.Google ScholarGoogle Scholar
  25. Google Wireless Transcoder. http://www.google.com/xhtml/.Google ScholarGoogle Scholar
  26. CNN. http://www.cnn.com/.Google ScholarGoogle Scholar

Index Terms

  1. Robust web page segmentation for mobile terminal using content-distances and page layout information

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '07: Proceedings of the 16th international conference on World Wide Web
      May 2007
      1382 pages
      ISBN:9781595936547
      DOI:10.1145/1242572

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 May 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

      Upcoming Conference

      WWW '24
      The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore , Singapore

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader