Skip to main content

Image Description Mining and Hierarchical Clustering on Data Records Using HR-Tree

  • Conference paper
Book cover Frontiers of WWW Research and Development - APWeb 2006 (APWeb 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Included in the following conference series:

Abstract

Since we can hardly get semantics from the low-level features of the image, it is much more difficult to analyze the image than textual information on the Web. Traditionally, textual information around the image is used to represent the high-level features of the image. We argue that such “flat” representation can not describe images well. In this paper, Hierarchical Representation (HR) and HR-Tree are proposed for image description. Salient phrases in HR-Tree are further to distinguish this image with others sharing the same ancestor concepts. First, we design a method to extract the salient phrases for the images in data records. Then HR-Trees are built using these phrases. Finally, new hierarchical clustering algorithm based on HR-Tree is proposed for users’ browsing conveniently. We demonstrate some HR-Trees and clustering results in experimental section.. These results illustrate the advantages of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Manjunath, B.S., Ma, W.-Y.: Texture Features for Browsing and Retrieval of Image Data. IEEE Trans on PAMI 18(8), 837–842 (1996)

    Google Scholar 

  2. Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Analysis. In: ACM Multimedia 2004 (2004)

    Google Scholar 

  3. Chen, Y., Wang, J.Z., Krovetz, R.: Content-based image retrieval by clustering. In: ACM SIGMM international workshop on Multimedia information retrieval (2003)

    Google Scholar 

  4. Frankel, C., Swain, M., Athitsos, V.: WebSeer: An image search engine for the world wide web. TR-96-14, University of Chicago (1996)

    Google Scholar 

  5. Shen, D., Chen, Z., Yang, Q., Zeng, H.-J., Zhang, B., Lu, Y., Ma, W.-Y.: Web-page classification through summarization. In: SIGIR 2004, pp. 242–249 (2004)

    Google Scholar 

  6. Chen, J.-Y., Bouman, C.A., Allebach, J.P.: Multiscale Branch-and-Bound Image Database Search. In (SPIE) 1997, pp. 133–144 (1997)

    Google Scholar 

  7. Sunayama, W., Nagata, A., Yachida, M.: Image Clustering System on WWW using Web Texts. In: HIS 2004, pp. 230–235 (2004)

    Google Scholar 

  8. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: Practical Automatic keyphrase Extraction. In: ACM DL 1999, pp. 254–255 (1999)

    Google Scholar 

  9. AltaVista image search, http://www.altavista.com/image/

  10. Google image search engine, http://images.google.com/

  11. Liu, B., Grossman, R., Zhai, Y.: Mining data records in Web pages. In: ninth ACM SIGKDD, Washington, D.C., August 24-27, Washington (2003)

    Google Scholar 

  12. Kushmerick, N.: Wrapper induction: efficiency and expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  13. Wang, Y., Hu, J.: A machine learning based approach for table detection on the web. In: WWW 2002 (2002)

    Google Scholar 

  14. Liu, B., Chin, C.W., Ng, H.T.: Mining Topic-Specific Concepts and Definitions on the Web. In: WWW 2003, Budapest, Hungary (2003)

    Google Scholar 

  15. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: Practical Automatic Keyphrase Extraction. In: ACM DL 1999, pp. 254–255 (1999)

    Google Scholar 

  16. Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: VLDB 2001, pp. 109–118 (2001)

    Google Scholar 

  17. Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: SIGMOD Conference 2003, pp. 337–348 (2003)

    Google Scholar 

  18. Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: WWW 2005, pp. 76–85 (2005)

    Google Scholar 

  19. Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: SIGIR 2004, pp. 210–217 (2004)

    Google Scholar 

  20. Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: SIGIR 1998, pp. 46–54 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, CL., Huang, S., Xue, GR., Yu, Y. (2006). Image Description Mining and Hierarchical Clustering on Data Records Using HR-Tree. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_34

Download citation

  • DOI: https://doi.org/10.1007/11610113_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31142-3

  • Online ISBN: 978-3-540-32437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics