Abstract
Since we can hardly get semantics from the low-level features of the image, it is much more difficult to analyze the image than textual information on the Web. Traditionally, textual information around the image is used to represent the high-level features of the image. We argue that such “flat” representation can not describe images well. In this paper, Hierarchical Representation (HR) and HR-Tree are proposed for image description. Salient phrases in HR-Tree are further to distinguish this image with others sharing the same ancestor concepts. First, we design a method to extract the salient phrases for the images in data records. Then HR-Trees are built using these phrases. Finally, new hierarchical clustering algorithm based on HR-Tree is proposed for users’ browsing conveniently. We demonstrate some HR-Trees and clustering results in experimental section.. These results illustrate the advantages of our methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Manjunath, B.S., Ma, W.-Y.: Texture Features for Browsing and Retrieval of Image Data. IEEE Trans on PAMI 18(8), 837–842 (1996)
Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Analysis. In: ACM Multimedia 2004 (2004)
Chen, Y., Wang, J.Z., Krovetz, R.: Content-based image retrieval by clustering. In: ACM SIGMM international workshop on Multimedia information retrieval (2003)
Frankel, C., Swain, M., Athitsos, V.: WebSeer: An image search engine for the world wide web. TR-96-14, University of Chicago (1996)
Shen, D., Chen, Z., Yang, Q., Zeng, H.-J., Zhang, B., Lu, Y., Ma, W.-Y.: Web-page classification through summarization. In: SIGIR 2004, pp. 242–249 (2004)
Chen, J.-Y., Bouman, C.A., Allebach, J.P.: Multiscale Branch-and-Bound Image Database Search. In (SPIE) 1997, pp. 133–144 (1997)
Sunayama, W., Nagata, A., Yachida, M.: Image Clustering System on WWW using Web Texts. In: HIS 2004, pp. 230–235 (2004)
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: Practical Automatic keyphrase Extraction. In: ACM DL 1999, pp. 254–255 (1999)
AltaVista image search, http://www.altavista.com/image/
Google image search engine, http://images.google.com/
Liu, B., Grossman, R., Zhai, Y.: Mining data records in Web pages. In: ninth ACM SIGKDD, Washington, D.C., August 24-27, Washington (2003)
Kushmerick, N.: Wrapper induction: efficiency and expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)
Wang, Y., Hu, J.: A machine learning based approach for table detection on the web. In: WWW 2002 (2002)
Liu, B., Chin, C.W., Ng, H.T.: Mining Topic-Specific Concepts and Definitions on the Web. In: WWW 2003, Budapest, Hungary (2003)
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: Practical Automatic Keyphrase Extraction. In: ACM DL 1999, pp. 254–255 (1999)
Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: VLDB 2001, pp. 109–118 (2001)
Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: SIGMOD Conference 2003, pp. 337–348 (2003)
Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: WWW 2005, pp. 76–85 (2005)
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: SIGIR 2004, pp. 210–217 (2004)
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: SIGIR 1998, pp. 46–54 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, CL., Huang, S., Xue, GR., Yu, Y. (2006). Image Description Mining and Hierarchical Clustering on Data Records Using HR-Tree. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_34
Download citation
DOI: https://doi.org/10.1007/11610113_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)