Elsevier

Pattern Recognition

Volume 44, Issue 12, December 2011, Pages 2971-2978
Pattern Recognition

Active learning paradigms for CBIR systems based on optimum-path forest classification

https://doi.org/10.1016/j.patcog.2011.04.026Get rights and content

Abstract

This paper discusses methods for content-based image retrieval (CBIR) systems based on relevance feedback according to two active learning paradigms, named greedy and planned. In greedy methods, the system aims to return the most relevant images for a query at each iteration. In planned methods, the most informative images are returned during a few iterations and the most relevant ones are only presented afterward. In the past, we proposed a greedy approach based on optimum-path forest classification (OPF) and demonstrated its gain in effectiveness with respect to a planned method based on support-vector machines and another greedy approach based on multi-point query. In this work, we introduce a planned approach based on the OPF classifier and demonstrate its gain in effectiveness over all methods above using more image databases. In our tests, the most informative images are better obtained from images that are classified as relevant, which differs from the original definition. The results also indicate that both OPF-based methods require less user involvement (efficiency) to satisfy the user's expectation (effectiveness), and provide interactive response times.

Highlights

► Two active learning paradigms based on optimum-path forest classifiers. ► They solve image retrieval in a few iterations of relevance feedback. ► Considerable gains in efficiency and effectiveness are demonstrated.

Introduction

Content-based image retrieval (CBIR) systems aim to return the most relevant images in a database, according to the user's opinion for a given query. Due to the dynamic nature of the problem, which may change the meaning of relevance among users for a same query, these systems usually rely on an active learning process in which the system returns a small set of images (training set) and the user indicates their relevance at each iteration (see Fig. 1) [1]. The database images can be represented by feature vectors (points in a feature space), that may encode color, texture, and/or shape measures, using indexing structures [2], [3] to access images in a more efficient way. There are many research activities on each stage shown in Fig. 1. Examples are works to obtain more effective image descriptors [4], [5] (i.e., feature extraction and distance functions for image comparison), to combine distance functions from multiple descriptors [6], [7], and to provide scalability in large image databases [8], [9]. The methods presented here can take advantage of all these results, but the focus of our work is on the learning and retrieval processes (gray box in Fig. 1).

From the practical point of view, a CBIR system should minimize the response time and the number of marked images (efficiency), while it maximizes the user's satisfaction (effectiveness). These constitute the main challenges, especially when we consider large image collections. We have observed two active learning paradigms from relevance feedback on returned images, named greedy and planned. In greedy methods,

  • 1.

    a small number of images, usually ranked by relevance (similarity with the query), is presented to the user,

  • 2.

    the user indicates which images are actually relevant (irrelevant), being the complement understood as irrelevant (relevant) images,

  • 3.

    the system learns the user's opinion from this feedback, in order to return a higher number of relevant images in a next iteration at step 1.

In planned methods, the user establishes in which iteration the system should return images ranked by relevance. In the previous iterations, the system presents the most informative images in order to better learn the distribution of the relevant and irrelevant classes in the feature space (i.e., to train a pattern classifier).

In the past, we proposed a greedy approach [10] based on the optimum-path forest classifier (OPF) [11]. In this method, database images classified as relevant are ranked based on their normalized distances to special positive and negative examples (called prototypes), which are computed in the previous iteration from the user-marked images. Its effectiveness gain was notorious over a simple greedy technique [12] and a planned method based on support-vector machine (SVM) [13]. The choice of the OPF model was also justified by its considerable gain in computational time with respect to SVM and other classification models, such as neural networks [11].

In this work, we present a planned method based on the OPF classifier and demonstrate its gain in effectiveness over the previous approaches [10], [13], [12] using more image databases. In our tests, the most informative images are better obtained from images that are classified as relevant, but that were close to be classified as irrelevant. These images are ranked based on their optimum-path costs in the forest with respect to positive and negative prototypes. This strategy differs from the original definition [13], which uses relevant and irrelevant images. Our strategy reduces the number of false positives, which tends to be significantly higher than the number of false negatives, improving effectiveness. It also considerably reduces the number of images to be ranked, improving efficiency.

A drawback in the planned paradigm is that the user does not know in advance how many iterations would be necessary. However, this could be learned for a given application. Besides, it is also not clear in the greedy paradigm that the system will be able to learn faster than the number of iterations specified in a given planned method. Actually, we are presenting an example where the planned paradigm outperforms in effectiveness the greedy paradigm for a same pattern classification model (OPF). In both paradigms, the minimum number of response images per iteration may also change for distinct queries and users.

Most schemes based on relevance feedback use the greedy paradigm. Fig. 2 presents three examples of simple greedy techniques [14], [15]. In Fig. 2a, the positive examples (relevant images) from a first iteration are used to move the next query point to their geometric center in the feature space. This idea stemmed from Rocchio's formula [16] used in document retrieval systems and it has been successfully exploited in CBIR systems [17], [18], [1], [19]. Two other methods use the relevant images as next query points and, depending on the distance to this multi-point query set, different iso-surfaces are formed in the feature space (Figs. 2b and c).

The planned method proposed by Tong and Chang [13] outperforms simple greedy techniques. Some studies [20], [21], [22] have reported improvements in Tong and Chang's approach, but it is still the best option to serve as baseline. Hoi et al. [23], [24] have also observed a problem with small training sets in Tong and Chang's method [13] and have proposed the use of labeled and unlabeled images in the training set to improve performance. The idea seems interesting for further investigation and can be easily incorporated in our approaches. However, this was not necessary in the present study.

This paper is organized as follows. Section 2 reviews the OPF model and presents the active learning algorithms using the OPF-based greedy and planned paradigms. The experiments and results using five heterogeneous image collections are described in Section 3. Section 4 states the conclusions and discusses our future work.

Section snippets

Active learning using optimum-path forest classification

Let Z be an image database, such that each image tZ is represented by a feature vector v(t), computed by a feature extraction function v. The similarity between images s,tZ is measured by a distance function d(s,t). A pair (v,d) is called a descriptor. In the case of multiple descriptors encoding shape, color and texture properties, it is possible to combine their distance functions into a composite distance function, as proposed in [7]. Therefore, with no loss of generality, we will

Experiments and results

In any real world application of CBIR, it is very important for a method to minimize the average response time per query and the average number of marked images per query, while it maximizes the average degree of user satisfaction. These challenges will also include the choice of suitable and effective image descriptors for such an application. Given that, we are not addressing these issues here, the experiments in this section aim to give us at least a strong indication of which among the

Conclusion and future work

We have discussed greedy and planned active learning paradigms for CBIR systems, being both of them based on the optimum-path forest classifier (GOPF and POPF for greedy and planned methods, respectively). GOPF was proposed previously [10], but it was evaluated in this work with more databases. POPF is a new approach never presented before.

Our experiments involved a reasonable amount of user interaction from the practical point of view, by setting low numbers of response images and iterations;

Acknowledgments

The first author thanks CNPq for financial support (140968/2007-5). The second author thanks CNPq (481556/2009-5, 302617/2007-8) and FAPESP (2007/52015-0, 2008/57428-4).

André Tavares da Silva is a PhD student at University of Campinas (Unicamp). He has experience in computer science, with emphasis on Computer Graphics and Vision, acting on the following topics: Geometric Modeling, Visualization and Content-Based Image Retrieval.

References (32)

  • R.S. Torres et al.

    Content-based image retrieval: theory and applications

    Revista de Informática Teórica e Aplicada

    (2006)
  • P. Ciaccia et al.

    M-tree: an efficient access method for similarity search in metric spaces

  • M.R. Vieira et al.

    Dbm-tree: a dynamic metric access method sensitive to local density data

    Journal of Information and Data Management

    (2010)
  • R.O. Stehling et al.

    A compact and efficient image retrieval approach based on border/interior pixel classification

  • T. Tuytelaars et al.

    Local invariant feature detectors: a survey

    Found. Trends. Comput. Graph. Vis.

    (2008)
  • R. Ohbuchi, Y. Hata, Combining multiresolution shape descriptors for 3d model retrieval, in: Proc. WSCG 2006, Plzen,...
  • R. Torres et al.

    A genetic programming framework for content-based image retrieval

    Pattern Recognition

    (2009)
  • H. Lejsek et al.

    An efficient disk-based index for approximative search in very large high-dimensional collections

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2008)
  • E. Valle et al.

    High-dimensional descriptor indexing for large multimedia databases

  • A.T. Silva et al.

    A new CBIR approach based on relevance feedback and optimum path forest classification

    Journal of WSCG

    (2010)
  • J.P. Papa et al.

    Supervised pattern classification based on optimum-path forest

    International Journal of Imaging Systems and Technology

    (2009)
  • K. Porkaew et al.

    Query refinement for multimedia similarity retrieval in mars

  • S. Tong et al.

    Support vector machine active learning for image retrieval

  • D.-H. Kim et al.

    Qcluster: relevance feedback using adaptive clustering for content-based image retrieval

  • D. Liu et al.

    Fast query point movement techniques for large CBIR systems

    IEEE Transactions on Knowledge and Data Engineering

    (2009)
  • J.J. Rocchio

    Relevance Feedback in Information Retrieval

    (1971)
  • Cited by (0)

    André Tavares da Silva is a PhD student at University of Campinas (Unicamp). He has experience in computer science, with emphasis on Computer Graphics and Vision, acting on the following topics: Geometric Modeling, Visualization and Content-Based Image Retrieval.

    Alexandre Xavier Falcão is an associate professor at University of Campinas. He has experience in computer science, with emphasis on Image Processing, acting on the following topics: Image Processing, Visualization and Analysis, Content-Based Image Retrieval, Pattern Recognition and Machine Learning, Digital Video and Biomedical Imaging Applications.

    Léo Pini Magalhães is a full professor at University of Campinas. He has experience in computer science, with emphasis on Computer Graphics, mainly in the following areas: Animation, Image Synthesis and Coordination in computer environments.

    View full text