Image Description Mining and Hierarchical Clustering on Data Records Using HR-Tree

Zhang, Cong-Le; Huang, Sheng; Xue, Gui-Rong; Yu, Yong

doi:10.1007/11610113_34

Cong-Le Zhang²¹,
Sheng Huang²¹,
Gui-Rong Xue²¹ &
…
Yong Yu²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Included in the following conference series:

Asia-Pacific Web Conference

827 Accesses
1 Citations

Abstract

Since we can hardly get semantics from the low-level features of the image, it is much more difficult to analyze the image than textual information on the Web. Traditionally, textual information around the image is used to represent the high-level features of the image. We argue that such “flat” representation can not describe images well. In this paper, Hierarchical Representation (HR) and HR-Tree are proposed for image description. Salient phrases in HR-Tree are further to distinguish this image with others sharing the same ancestor concepts. First, we design a method to extract the salient phrases for the images in data records. Then HR-Trees are built using these phrases. Finally, new hierarchical clustering algorithm based on HR-Tree is proposed for users’ browsing conveniently. We demonstrate some HR-Trees and clustering results in experimental section.. These results illustrate the advantages of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Manjunath, B.S., Ma, W.-Y.: Texture Features for Browsing and Retrieval of Image Data. IEEE Trans on PAMI 18(8), 837–842 (1996)
Google Scholar
Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Analysis. In: ACM Multimedia 2004 (2004)
Google Scholar
Chen, Y., Wang, J.Z., Krovetz, R.: Content-based image retrieval by clustering. In: ACM SIGMM international workshop on Multimedia information retrieval (2003)
Google Scholar
Frankel, C., Swain, M., Athitsos, V.: WebSeer: An image search engine for the world wide web. TR-96-14, University of Chicago (1996)
Google Scholar
Shen, D., Chen, Z., Yang, Q., Zeng, H.-J., Zhang, B., Lu, Y., Ma, W.-Y.: Web-page classification through summarization. In: SIGIR 2004, pp. 242–249 (2004)
Google Scholar
Chen, J.-Y., Bouman, C.A., Allebach, J.P.: Multiscale Branch-and-Bound Image Database Search. In (SPIE) 1997, pp. 133–144 (1997)
Google Scholar
Sunayama, W., Nagata, A., Yachida, M.: Image Clustering System on WWW using Web Texts. In: HIS 2004, pp. 230–235 (2004)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: Practical Automatic keyphrase Extraction. In: ACM DL 1999, pp. 254–255 (1999)
Google Scholar
AltaVista image search, http://www.altavista.com/image/
Google image search engine, http://images.google.com/
Liu, B., Grossman, R., Zhai, Y.: Mining data records in Web pages. In: ninth ACM SIGKDD, Washington, D.C., August 24-27, Washington (2003)
Google Scholar
Kushmerick, N.: Wrapper induction: efficiency and expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)
Article MATH MathSciNet Google Scholar
Wang, Y., Hu, J.: A machine learning based approach for table detection on the web. In: WWW 2002 (2002)
Google Scholar
Liu, B., Chin, C.W., Ng, H.T.: Mining Topic-Specific Concepts and Definitions on the Web. In: WWW 2003, Budapest, Hungary (2003)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: Practical Automatic Keyphrase Extraction. In: ACM DL 1999, pp. 254–255 (1999)
Google Scholar
Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: VLDB 2001, pp. 109–118 (2001)
Google Scholar
Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: SIGMOD Conference 2003, pp. 337–348 (2003)
Google Scholar
Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: WWW 2005, pp. 76–85 (2005)
Google Scholar
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: SIGIR 2004, pp. 210–217 (2004)
Google Scholar
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: SIGIR 1998, pp. 46–54 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Apex Data and Knowledge Management Lab, Department of Computer Science and Engineering, Shanghai JiaoTong University, Shanghai, 200030, P.R. China
Cong-Le Zhang, Sheng Huang, Gui-Rong Xue & Yong Yu

Authors

Cong-Le Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Gui-Rong Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
Heng Tao Shen
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Victoria University, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, CL., Huang, S., Xue, GR., Yu, Y. (2006). Image Description Mining and Hierarchical Clustering on Data Records Using HR-Tree. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_34

Download citation

DOI: https://doi.org/10.1007/11610113_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics