Elsevier

Neurocomputing

Volume 150, Part A, 20 February 2015, Pages 106-115
Neurocomputing

Adaptive feature descriptor selection based on a multi-table reinforcement learning strategy

https://doi.org/10.1016/j.neucom.2014.03.080Get rights and content

Abstract

This paper presents and evaluates a framework to improve the performance of visual object classification methods, which are based on the usage of image feature descriptors as inputs. The goal of the proposed framework is to learn the best descriptor for each image in a given database. This goal is reached by means of a reinforcement learning process using the minimum information. The visual classification system used to demonstrate the proposed framework is based on a bag of features scheme, and the reinforcement learning technique is implemented through the Q-learning approach. The behavior of the reinforcement learning with different state definitions is evaluated. Additionally, a method that combines all these states is formulated in order to select the optimal state. Finally, the chosen actions are obtained from the best set of image descriptors in the literature: PHOW, SIFT, C-SIFT, SURF and Spin. Experimental results using two public databases (ETH and COIL) are provided showing both the validity of the proposed approach and comparisons with state of the art. In all the cases the best results are obtained with the proposed approach.

Introduction

In the computer vision domain the visual object classification (VOC) has attracted the attention of researchers over the last two decades (e.g., [1], [2], [3], [4]). Generally, VOC is based on the representation of the given scene in a space of features, which were extracted and then described by means of some feature descriptors. These feature descriptions are then used as discriminative elements to characterize the given objects. They are computed using information of interest points together with their neighborhood; such interest points are pixels with special characteristics (e.g., [5], [6]). Hence, given an image, the feature descriptors characterize the objects at a higher abstraction level, where classical learning techniques can be used in order to recognize the target object. More elaborated techniques, such as Bag of Features (BoF), are becoming nowadays popular for visual object recognition (e.g., [7], [3], [8], [9]). The BoF consists of four steps as detailed below:

  • 1.

    Extract the features from the images of the training set using a given detector and a given descriptor.

  • 2.

    Build a dictionary of visual words using the features extracted before.

  • 3.

    Construct a histogram, using (1) and (2), for each image in the training set. Hence, the histogram bins represent the number of times a visual word is in the image.

  • 4.

    Train a classification algorithm using the histogram obtained before.

The BoF architecture is flexible, so that there are different combinations that can be used to implement the four steps presented above. The final performance of the BoF depends on the correct algorithm selection.

The current work is focused on the first step of the BoF; in particular, the goal is to learn the best algorithm to describe the interest points. From our experience, the performance of the BOF is strongly influenced by the image feature descriptor, so we state that identifying the best image descriptor for each image will improve the classification rate. A naive approach to solve this problem could be the concatenation of all the possible descriptors. However, this solution is not always feasible since on the one hand it could take a large amount of resources (e.g., memory, CPU time) and on the other hand this would introduce noise to the solution [10]. The challenge of the problem and the importance of finding the right solution have been recently addressed. An approach to select the best descriptor for each image is presented in [10], [11].

In [10] a method for selecting the best descriptor for every image in the database is proposed. In order to select the best descriptor, several attributes of the image (e.g., colorfulness, roughness, shininess, etc.) are taken into account. Although interesting results are presented, their main drawback is the use of a supervised learning scheme where the authors select the descriptors with a subjective criterion. On the contrary, in [11] a method that learns the best descriptor for each image using a Reinforcement Learning (RL) scheme is presented. The RL is a simple learning method based on a trial and error strategy. This work presents two improvements from [11].

  • 1.

    We propose to use several state definitions.

  • 2.

    A multi-table scheme is introduced in order to exploit the best state definition for each image.

In summary, this work proposes a novel method to learn the best descriptor from a given set. In order to improve the performance, multiple state definitions are used. This scheme works with a BoF approach, and in concrete, the implementation uses a kd-tree in the second step and a support vector machine (SVM) in the fourth step. The reminder of the paper is organized as follows. Section 2 presents the state of the art. Section 3 summarizes the RL technique. Then, Section 4 presents in detail the proposed method. Experimental results and comparisons are provided in section 5. Section 6 gives the conclusions and future work.

Section snippets

State of the art

Reinforcement learning is a learning technique widely used in the robotics community; recently, some work involving RL have been proposed in the computer vision field. For instance, in image segmentation, the RL technique is used to select the appropriate threshold (e.g., [12], [13]). In [14] the authors propose a RL based approach to tackle the face recognition problem. The authors present a method to learn the set of dominant features for each image. An approach that joins an active learning

Reinforcement learning

The reinforcement learning, as mentioned before, is a trial-and-error learning process [21] where the agent does not have a prior knowledge about which is the correct action to take. RL can be used as a technique to solve a Markov decision process (MDP) problem, in which the agent learns how to take an action in a given environment in order to maximize the expected reward. These concepts are incorporated to the tuple of MDP S,A,δ,τ where:

  • S is a set of environment states. In this work the

Proposed method

This paper proposes a method to learn the best descriptor for each image. Fig. 2 shows an illustration of the proposed scheme. In particular, Fig. 2(left) presents a classical BoF (i.e., [7], [3], [8], [9]) while Fig. 2(right) shows the proposed RL based scheme. In fact, we propose a new multi-table RL based strategy to select the best descriptor for each image from a set that contains the most widely used according to the literature (i.e., Spin, SIFT, SURF, C-SIFT and PHOW). This section is

Experimental results

The proposed method has been evaluated using two different databases (ETH and COIL). The evaluation framework compares the results using:

  • A unique descriptor for the whole database.

  • All the descriptors concatenated in a single one.

  • The RL-based approach presented in [11].

  • The RL-based approach with different state definitions.

  • All the states concatenated.

  • The information provided by the Q-tables combined (Fig. 7).

These experiments have been performed with the first database (ETH database) and then

Conclusions and future work

This paper presents a novel framework for visual object classification. In particular, it is focussed on the selection of the best image feature descriptor. It is based on the combined use of a bag of features scheme together with a reinforcement learning technique, implemented trough the Q-learning approach. Note that any visual classification method (based on image descriptors) can substitute the BoF in this approach.

The proposed method combines different state definitions in a multi-table

Acknowledgements

This work was partially supported by the Spanish Government under Project TIN2011-25606. Monica Piñol was supported by Universitat Autònoma de Barcelona grant PIF 471-01-8/09.

Monica Piñol Naranjo received the computer science degree from the Universitat Autònoma de Barcelona, Barcelona, Spain, in 2009. In the same year, she joined the Computer Vision Center, Barcelona; in 2010 she received the Master degree in Computer Vision and Artificial Intelligence from the same university. Currently she is pursuing her Ph.D. degree working on reinforcement learning approaches applied to computer vision domain.

References (33)

  • M. Shokri et al.

    A reinforcement agent for threshold fusion

    Appl. Soft. Comput.

    (2008)
  • J. Peng, J. Peng, B. Bhanu, Local reinforcement learning for object recognition, in: Proceedings of Fourteenth...
  • D. Lowe

    Distinctive image features from scale invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • L. Fei-Fei, P. Perona, A Bayesian hierarchical model for learning natural scene categories, in: Proceedings of IEEE...
  • L. Bo, X. Ren, D. Fox, Depth kernel descriptors for object recognition, in: IEEE/RSJ International Conference on...
  • C. Harris, M. Stephens, A combined corner and edge detector, in: Alvey vision conference, vol. 15, Manchester, UK,...
  • T. Tuytelaars et al.

    Local invariant feature detectors

    Found. Trends Comput. Graph. Vis.

    (2008)
  • G. Csurka, C. R. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, in: Workshop on...
  • H. Bay, T. Tuytelaars, L.V. Gool, Surf: speeded up robust features, in: Proceedings of the European Conference on...
  • A. Bosch, A. Zisserman, X. Muñoz, Image classification using random forests and ferns, in: Proceedings of IEEE...
  • I. Everts, J.C. van Gemert, T. Gevers, Per-patch descriptor selection using surface and scene properties, in:...
  • M. Piñol, A.D. Sappa, A. López, R. Toledo, Feature selection based on reinforcement learning for object recognition,...
  • F. Sahba, H.R. Tizhoosh, M. Salama, Application of opposition-based reinforcement learning in image segmentation, in:...
  • M.T. Harandi, M.N. Ahmadabadi, B.N. Araabi, Face recognition using reinforcement learning, in: Proceedings of IEEE...
  • S. Ebert, M. Fritz, B. Schiele, Ralf: a reinforced active learning formulation for object class recognition, in:...
  • K. Häming, G. Peters, Learning scan paths for object recognition with relational reinforcement learning, in:...
  • Cited by (1)

    Monica Piñol Naranjo received the computer science degree from the Universitat Autònoma de Barcelona, Barcelona, Spain, in 2009. In the same year, she joined the Computer Vision Center, Barcelona; in 2010 she received the Master degree in Computer Vision and Artificial Intelligence from the same university. Currently she is pursuing her Ph.D. degree working on reinforcement learning approaches applied to computer vision domain.

    Angel Domingo Sappa received the electromechanical engineering degree from the National University of La Pampa, General Pico, Argentina, in 1995 and the Ph.D. degree in industrial engineering from the Polytechnic University of Catalonia, Barcelona, Spain, in 1999. In 2003, after holding research positions in France, U.K., and Greece, he joined the Computer Vision Center, Barcelona, where he is currently a Senior Researcher. His current research focuses on stereo image processing and analysis, 3-D modeling, and dense optical flow estimation. His research interests span a broad spectrum within the 2-D and 3-D image processing. Dr. Sappa is a member of the Advanced Driver Assistance Systems Group, Computer Vision Center.

    Ricardo Toledo received the degree in Electronic Engineering from the Universidad Nacional de Rosario (Argentina) in 1986, the M.Sc. degree in image processing and artificial intelligence from the Universitat Autònoma de Barcelona (UAB) in 1992 and the Ph.D. in 2001.

    Since 1989 he has been giving lectures at the Computer Science Department of the UAB and participating in R+D projects. Currently he is a full time associated professor. In 1996 he participated in the foundation of the Computer Vision Center (CVC) at the UAB. Ricardo has participated in national and international R+D projects being the leader of some of them, and is coauthor of more than 40 articles, all these in the field of computer vision, robotics and medical imaging.

    View full text