Qualitative evaluation of automatic assignment of keywords to images

doi:10.1016/j.ipm.2004.11.001

Information Processing & Management

Volume 42, Issue 1, January 2006, Pages 136-154

https://doi.org/10.1016/j.ipm.2004.11.001 Get rights and content

Abstract

In image retrieval, most systems lack user-centred evaluation since they are assessed by some chosen ground truth dataset. The results reported through precision and recall assessed against the ground truth are thought of as being an acceptable surrogate for the judgment of real users. Much current research focuses on automatically assigning keywords to images for enhancing retrieval effectiveness. However, evaluation methods are usually based on system-level assessment, e.g. classification accuracy based on some chosen ground truth dataset. In this paper, we present a qualitative evaluation methodology for automatic image indexing systems. The automatic indexing task is formulated as one of image annotation, or automatic metadata generation for images. The evaluation is composed of two individual methods. First, the automatic indexing annotation results are assessed by human subjects. Second, the subjects are asked to annotate some chosen images as the test set whose annotations are used as ground truth. Then, the system is tested by the test set whose annotation results are judged against the ground truth. Only one of these methods is reported for most systems on which user-centred evaluation are conducted. We believe that both methods need to be considered for full evaluation. We also provide an example evaluation of our system based on this methodology. According to this study, our proposed evaluation methodology is able to provide deeper understanding of the system’s performance.

Introduction

Evaluation is a critical issue for Information Retrieval (IR). Assessment of the performance or the value of an IR system for its intended task is one of the distinguishing features of the subject. The type of evaluation to be considered depends on the objectives of the retrieval system. In general, retrieval performance evaluation is based on a test reference collection, e.g. TREC, and on an evaluation measure, e.g. precision and recall (Baeza-Yates & Ribeiro-Neto, 1999).

Saracevic (1995) reviews the history and nature of evaluation in IR and describes six different levels of IR evaluation from system to user levels. However, most IR evaluations are only based on the system level(s) and lack user-centred evaluation. To achieve a more comprehensive picture of IR performance and users’ needs, both system- and user-centred evaluations are needed. That is, we need to evaluate at different levels as appropriate and/or against different types of relevance (Dunlop, 2000). Examples of some recent studies focusing on user judgments are Belkin et al., 2001, Hersh et al., 2001, and Spink (2002).

Due to the advances in computing and multimedia technologies, the size of image collections is increasing rapidly. Content-Based Image Retrieval (CBIR) has been an active research area for the last decade whose main goal is to design mechanisms for searching large image collections. Similar to traditional IR, studies on user issues of image retrieval are lacking (Fidel, 1997, Rasmussen, 1997).

Current CBIR systems index and retrieve images based on their low-level features, such as colour, texture, and shape, and it is difficult to find desired images based on these low-level features, because they have no direct correspondence to high-level concepts in humans’ minds. This is the so-called semantic gap problem. Bridging the semantic gap in image retrieval has attracted much work generally focussing on making systems more intelligent and automatically understanding image contents in terms of high-level concepts (Eakins, 2002). Image annotation systems, i.e. automatic assignment of one or multiple keywords to an image, have been developed for this purpose (Barnard et al., 2003, Kuroda and Hagiwara, 2002, Li and Wang, 2003, Park et al., 2004, Tsai et al., 2003, Vailaya et al., 2001).

To evaluate the annotation results, most of these systems are only based on some chosen dataset with ground truth, such as Corel. However, the problem is that currently there is no standard image dataset for evaluation, like the web track of TREC for IR (Craswell, Hawking, Wilkinson, & Wu, 2003). As IR systems also need to consider human subjects for evaluation, quantitative evaluation of current annotation systems are insufficient to validate their performances. Therefore, user-centred evaluation of image annotation systems is also necessary.

This paper is organised as follows. Section 2 reviews related work on conducting qualitative evaluation for image retrieval related algorithms, systems, etc. Section 3 presents our qualitative evaluation methodology for image annotation systems. Section 4 shows an example of assessing our image annotation system based on the proposed methodology. Section 5 provides some discussion of the user-centred evaluation. Finally, some conclusions are drawn in Section 6.

Section snippets

Related work

For human assessment image retrieval systems, the general approach is to ask human subjects to evaluate directly the systems’ outputs. For example, a questionnaire can be devised to ask the human judges to rank the level of preference for each specific retrieved image, or the ease with which they were able to find desired images. For image annotation, keywords associated with their images can be selected as relevant or irrelevant by the judges. Then, conclusion can be drawn from the analysis of

The evaluation methodology

The conclusion of Section 2 motivates proposing a user-centred evaluation methodology for existing image annotation systems in terms of effectiveness, i.e. quality and accuracy of image annotation. Fig. 1 shows the evaluation procedure. It is composed of the Type I and Type II evaluation methods described above. Both types of evaluation contain three steps, which are research question formulation, data collection, and data analysis, which can provide different kinds of understanding of an image

The judges

We asked five judges (PhD research students) who are not experts in image indexing and retrieval to decide whether the keywords which are assigned by our system and the random guessing approach are relevant to that image. There were three male judges and two females who were all English first language speakers.

The test set

We considered two datasets. One was the Corel image collection and the other one was supplied by Washington University.

Discussion

These results, especially those of Section 4.2.5 show that even a state of the art automatic image indexing system like CLAIRE cannot match the performance of human annotators in terms annotation accuracy, especially when the classification scale (i.e. number of words in the indexing vocabulary) increases. However a closer inspection of these results indicates that the performance actually achieved may be useful in the context of building practical image retrieval systems which can take initial

Conclusion

Evaluation is a critical issue for information retrieval, and to fully understand the performance of IR systems it is necessary to consider both system- and user-centred evaluations. Image retrieval has become an active research area, and in image retrieval, much of the current research effort is focused on automatically annotating or indexing images to facilitate search in image databases. Most of the existing automatic image annotation systems are evaluated against their annotation or

Acknowledgement

The authors would like to thank Chris Stokoe, James Malone, Sheila Garfield, Mark Elshaw, and Jean Davison to participate the system evaluation.

References (58)

N.J. Belkin et al.
Iterative exploration, design and evaluation of support for query reformulation in interactive information retrieval
Information Processing and Management
(2001)
H.-L. Chen
An analysis of image retrieval tasks in the field of art history
Information Processing and Management
(2001)
Y. Choi et al.
Users’ relevance criteria in image retrieval in American history
Information Processing and Management
(2002)
G. Ciocca et al.
A relevance feedback mechanism for content-based image retrieval
Information Processing and Management
(1999)
J.P. Eakins
Towards intelligent image retrieval
Pattern Recognition
(2002)
A. Goodrum et al.
Image searching on the Excite Web search engine
Information Processing and Management
(2001)
V.N. Gudivada et al.
Modeling and retrieving images by content
Information Processing and Management
(1997)
M. Heath et al.
Comparison of edge detectors: a methodology and initial study
Computer Vision and Image Understanding
(1998)
W. Hersh et al.
Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations
Information Processing and Management
(2001)
K. Kuroda et al.
An image retrieval system by impression words and specific object names—IRIS
Neurocomputing
(2002)

B.M. Mehtre et al.

Content-based image retrieval using a composite color-shape approach

Information Processing and Management

(1998)

T.P. Minka et al.

Interactive learning with a “a society of models”

Pattern Recognition

(1997)

S.B. Park et al.

Content-based image classification using a neural network

Pattern Recognition Letters

(2004)

D. Sánchez et al.

Modelling subjectivity in visual perception of orientation for image retrieval

Information Processing and Management

(2003)

A. Spink

A user-centered approach to evaluating human interaction with Web search engines: an exploratory study

Information Processing and Management

(2002)

D.M. Squire et al.

Assessing agreement between human and machine clustering of image databases

Pattern Recognition

(1998)

J.K. Wu et al.

Fuzzy content-based retrieval in image databases

Information Processing and Management

(1998)

R. Applegate

Models of user satisfaction: understanding false positives

Reference Quarterly

(1993)

L. Armitage et al.

Analysis of user need in image archives

Journal of Information Science

(1997)

R. Baeza-Yates et al.

Modern information retrieval

(1999)

K. Barnard et al.

Matching words and pictures

Journal of Machine Learning Research

(2003)

Barnard, K., & Shirahatti, N. V. (2003). A method for comparing content based image retrieval methods. In Proceedings...

Black Jr., J. A., Fahmy, G., & Panchanathan, S. (2002). A method for evaluating the performance of content-based image...

Conniss, L. R., Ashford, A. J., & Graham, M. E. (2000). Information seeking behaviour in image retrieval: VISOR I final...

I.J. Cox et al.

The Bayesian image retrieval system, PicHunter: theory, implementation and psychophysical experiments

IEEE Transactions on Image Processing

(2000)

Craswell, N., Hawking, D., Wilkinson, R., & Wu, M. (2003). Overview of the TREC 2003 web track. In Proceedings of the...

M. Dunlop

Reflections on Mira: interactive evaluation in information retrieval

Journal of the American Society for Information Science

(2000)

Efthimiadis, E. N., & Fidel, R. (2000). The effect of query type on subject searching behavior of image databases: an...

R. Fidel

The image retrieval task: implications for the design and evaluation of image databases

New Review of Hypermedia and Multimedia

(1997)

Cited by (19)

Image annotation: Then and now
2018, Image and Vision Computing
Citation Excerpt :
To ascertain the accuracy of the annotation system, there exist two broad classes of evaluation measure: (i) the qualitative measure and (ii) the quantitative measure. The qualitative measure [215] deals with human subject based assessment. The subjects are asked to evaluate the performance of the system so that a more comprehensive picture of the annotation system can be obtained.
Automatic image annotation (AIA) plays a vital role in dealing with the exponentially growing digital images. Image annotation helps in effective retrieval, organization, classification, auto-illustration, etc. of the image. It started in early 1990. However, in the last three decades, there has been extensive research in AIA, and various new approaches have been advanced. In this article, we review more than 200 references related to image annotation proposed in the last three decades. This paper is an attempt to discuss predominant approaches, its constraints and ways to deal. Each segment of the article exhibits a discourse to expound the finding and future research directions and their hurdles. This paper also presents performance evaluation measures with relevant and influential image annotation database.
An attentive self-organizing neural model for text mining
2009, Expert Systems with Applications
Citation Excerpt :
In terms of the evaluation of experiment results, this paper has five native English speakers as experiment participants. We use the user-center based approach to evaluate the performance of our mechanism (Tsai, McGarry, & Tait, 2006). In this paper, we initially use two terms, “market” and “company” as two queries for searching two interest fields.
This paper utilizes an attention concept approach in text mining to address the deficiencies of existing keyword search engines. We show how an attention concept in conjunction with a traditional search approach can be used to develop an adaptive text mining model with user-oriented, time-based and attentive knowledge. Without changing a user’s search behavior, this paper considers some specific post-search operations as attentive targets for building the personalized interest base. This interest base is further shown on an interest map via the self-organizing map algorithm (SOM). By comparing the personalized interest map, the original search results from a keyword search engine are re-ranked. Experimental results demonstrate that the attentive search mechanism is able to improve user satisfaction.
A framework for evaluating automatic indexing or classification in the context of retrieval
2016, Journal of the Association for Information Science and Technology
Robustness and reliability evaluations of image annotation
2016, Imaging Science Journal
Indexing: From thesauri to the Semantic Web
2012, Indexing: From Thesauri to the Semantic Web
OGIR: An ontology-based grid information retrieval framework
2012, Online Information Review

View all citing articles on Scopus

View full text

Qualitative evaluation of automatic assignment of keywords to images

Abstract

Introduction

Section snippets

Related work

The evaluation methodology

The judges

The test set

Discussion

Conclusion

Acknowledgement

Information Processing and Management

Information Processing and Management

Information Processing and Management

Information Processing and Management

Pattern Recognition

Information Processing and Management

Information Processing and Management

Computer Vision and Image Understanding

Information Processing and Management

Neurocomputing

Information Processing and Management

Pattern Recognition

Pattern Recognition Letters

Information Processing and Management

Information Processing and Management

Pattern Recognition

Information Processing and Management

Models of user satisfaction: understanding false positives

Reference Quarterly

Analysis of user need in image archives

Journal of Information Science

Modern information retrieval

Matching words and pictures

Journal of Machine Learning Research

The Bayesian image retrieval system, PicHunter: theory, implementation and psychophysical experiments

IEEE Transactions on Image Processing

Reflections on Mira: interactive evaluation in information retrieval

Journal of the American Society for Information Science

The image retrieval task: implications for the design and evaluation of image databases

New Review of Hypermedia and Multimedia