Active learning paradigms for CBIR systems based on optimum-path forest classification
Graphical abstract
Highlights
► Two active learning paradigms based on optimum-path forest classifiers. ► They solve image retrieval in a few iterations of relevance feedback. ► Considerable gains in efficiency and effectiveness are demonstrated.
Introduction
Content-based image retrieval (CBIR) systems aim to return the most relevant images in a database, according to the user's opinion for a given query. Due to the dynamic nature of the problem, which may change the meaning of relevance among users for a same query, these systems usually rely on an active learning process in which the system returns a small set of images (training set) and the user indicates their relevance at each iteration (see Fig. 1) [1]. The database images can be represented by feature vectors (points in a feature space), that may encode color, texture, and/or shape measures, using indexing structures [2], [3] to access images in a more efficient way. There are many research activities on each stage shown in Fig. 1. Examples are works to obtain more effective image descriptors [4], [5] (i.e., feature extraction and distance functions for image comparison), to combine distance functions from multiple descriptors [6], [7], and to provide scalability in large image databases [8], [9]. The methods presented here can take advantage of all these results, but the focus of our work is on the learning and retrieval processes (gray box in Fig. 1).
From the practical point of view, a CBIR system should minimize the response time and the number of marked images (efficiency), while it maximizes the user's satisfaction (effectiveness). These constitute the main challenges, especially when we consider large image collections. We have observed two active learning paradigms from relevance feedback on returned images, named greedy and planned. In greedy methods,
- 1.
a small number of images, usually ranked by relevance (similarity with the query), is presented to the user,
- 2.
the user indicates which images are actually relevant (irrelevant), being the complement understood as irrelevant (relevant) images,
- 3.
the system learns the user's opinion from this feedback, in order to return a higher number of relevant images in a next iteration at step 1.
In the past, we proposed a greedy approach [10] based on the optimum-path forest classifier (OPF) [11]. In this method, database images classified as relevant are ranked based on their normalized distances to special positive and negative examples (called prototypes), which are computed in the previous iteration from the user-marked images. Its effectiveness gain was notorious over a simple greedy technique [12] and a planned method based on support-vector machine (SVM) [13]. The choice of the OPF model was also justified by its considerable gain in computational time with respect to SVM and other classification models, such as neural networks [11].
In this work, we present a planned method based on the OPF classifier and demonstrate its gain in effectiveness over the previous approaches [10], [13], [12] using more image databases. In our tests, the most informative images are better obtained from images that are classified as relevant, but that were close to be classified as irrelevant. These images are ranked based on their optimum-path costs in the forest with respect to positive and negative prototypes. This strategy differs from the original definition [13], which uses relevant and irrelevant images. Our strategy reduces the number of false positives, which tends to be significantly higher than the number of false negatives, improving effectiveness. It also considerably reduces the number of images to be ranked, improving efficiency.
A drawback in the planned paradigm is that the user does not know in advance how many iterations would be necessary. However, this could be learned for a given application. Besides, it is also not clear in the greedy paradigm that the system will be able to learn faster than the number of iterations specified in a given planned method. Actually, we are presenting an example where the planned paradigm outperforms in effectiveness the greedy paradigm for a same pattern classification model (OPF). In both paradigms, the minimum number of response images per iteration may also change for distinct queries and users.
Most schemes based on relevance feedback use the greedy paradigm. Fig. 2 presents three examples of simple greedy techniques [14], [15]. In Fig. 2a, the positive examples (relevant images) from a first iteration are used to move the next query point to their geometric center in the feature space. This idea stemmed from Rocchio's formula [16] used in document retrieval systems and it has been successfully exploited in CBIR systems [17], [18], [1], [19]. Two other methods use the relevant images as next query points and, depending on the distance to this multi-point query set, different iso-surfaces are formed in the feature space (Figs. 2b and c).
The planned method proposed by Tong and Chang [13] outperforms simple greedy techniques. Some studies [20], [21], [22] have reported improvements in Tong and Chang's approach, but it is still the best option to serve as baseline. Hoi et al. [23], [24] have also observed a problem with small training sets in Tong and Chang's method [13] and have proposed the use of labeled and unlabeled images in the training set to improve performance. The idea seems interesting for further investigation and can be easily incorporated in our approaches. However, this was not necessary in the present study.
This paper is organized as follows. Section 2 reviews the OPF model and presents the active learning algorithms using the OPF-based greedy and planned paradigms. The experiments and results using five heterogeneous image collections are described in Section 3. Section 4 states the conclusions and discusses our future work.
Section snippets
Active learning using optimum-path forest classification
Let be an image database, such that each image is represented by a feature vector , computed by a feature extraction function v. The similarity between images is measured by a distance function . A pair (v,d) is called a descriptor. In the case of multiple descriptors encoding shape, color and texture properties, it is possible to combine their distance functions into a composite distance function, as proposed in [7]. Therefore, with no loss of generality, we will
Experiments and results
In any real world application of CBIR, it is very important for a method to minimize the average response time per query and the average number of marked images per query, while it maximizes the average degree of user satisfaction. These challenges will also include the choice of suitable and effective image descriptors for such an application. Given that, we are not addressing these issues here, the experiments in this section aim to give us at least a strong indication of which among the
Conclusion and future work
We have discussed greedy and planned active learning paradigms for CBIR systems, being both of them based on the optimum-path forest classifier (GOPF and POPF for greedy and planned methods, respectively). GOPF was proposed previously [10], but it was evaluated in this work with more databases. POPF is a new approach never presented before.
Our experiments involved a reasonable amount of user interaction from the practical point of view, by setting low numbers of response images and iterations;
Acknowledgments
The first author thanks CNPq for financial support (140968/2007-5). The second author thanks CNPq (481556/2009-5, 302617/2007-8) and FAPESP (2007/52015-0, 2008/57428-4).
André Tavares da Silva is a PhD student at University of Campinas (Unicamp). He has experience in computer science, with emphasis on Computer Graphics and Vision, acting on the following topics: Geometric Modeling, Visualization and Content-Based Image Retrieval.
References (32)
- et al.
Content-based image retrieval: theory and applications
Revista de Informática Teórica e Aplicada
(2006) - et al.
M-tree: an efficient access method for similarity search in metric spaces
- et al.
Dbm-tree: a dynamic metric access method sensitive to local density data
Journal of Information and Data Management
(2010) - et al.
A compact and efficient image retrieval approach based on border/interior pixel classification
- et al.
Local invariant feature detectors: a survey
Found. Trends. Comput. Graph. Vis.
(2008) - R. Ohbuchi, Y. Hata, Combining multiresolution shape descriptors for 3d model retrieval, in: Proc. WSCG 2006, Plzen,...
- et al.
A genetic programming framework for content-based image retrieval
Pattern Recognition
(2009) - et al.
An efficient disk-based index for approximative search in very large high-dimensional collections
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2008) - et al.
High-dimensional descriptor indexing for large multimedia databases
- et al.
A new CBIR approach based on relevance feedback and optimum path forest classification
Journal of WSCG
(2010)
Supervised pattern classification based on optimum-path forest
International Journal of Imaging Systems and Technology
Query refinement for multimedia similarity retrieval in mars
Support vector machine active learning for image retrieval
Qcluster: relevance feedback using adaptive clustering for content-based image retrieval
Fast query point movement techniques for large CBIR systems
IEEE Transactions on Knowledge and Data Engineering
Relevance Feedback in Information Retrieval
Cited by (0)
André Tavares da Silva is a PhD student at University of Campinas (Unicamp). He has experience in computer science, with emphasis on Computer Graphics and Vision, acting on the following topics: Geometric Modeling, Visualization and Content-Based Image Retrieval.
Alexandre Xavier Falcão is an associate professor at University of Campinas. He has experience in computer science, with emphasis on Image Processing, acting on the following topics: Image Processing, Visualization and Analysis, Content-Based Image Retrieval, Pattern Recognition and Machine Learning, Digital Video and Biomedical Imaging Applications.
Léo Pini Magalhães is a full professor at University of Campinas. He has experience in computer science, with emphasis on Computer Graphics, mainly in the following areas: Animation, Image Synthesis and Coordination in computer environments.