A novel approach to combine features for salient object detection using constrained particle swarm optimization
Introduction
Salient object detection [1], [2] is one of the key problems in computer vision, having received continuous attention since its birth. Visual saliency refers to the ability to locate the relevant information (object) in an image quickly and efficiently. The yield of the salient object detection process is a saliency map [1] where each pixel is assigned a measure of relevance [2]. This can be achieved by giving high score to the interesting information and low score to the irrelevant information.
Salient object detection provides fast solutions to many complex processes real-time applications such as surveillance systems [3] to track vehicle(s), pedestrian(s) or any object. It is also used in remote sensing [4] and image retrieval [5], [6]. Additionally, it is used for automatic target detection such as finding traffic signs [1], [7] along the road or military vehicles in a savanna [7], in robotics to find salient objects in the environment as navigation landmarks. It can also be applied in the area of image and video compression [7] by giving higher quality to salient objects at the expense of degrading background clutter, automatic cropping/centering [8] of images for display on small portable screens [9]. It also finds its applications in detecting tumors in mammograms [10], advertising a design [7], image collection browsing [11], image enhancement [12] and many more.
Several approaches have been suggested to model visual saliency based on neurobiological concepts, computational and mathematical methods. They can be broadly classified into two major categories [13]: bottom-up and top-down. In bottom-up models, multiple low-level visual features (such as intensity, color, orientation, and texture) are extracted from the image. Then these features are normalized and combined into a saliency map. Salient locations are identified using winner-take-all [1] and inhibition-of-return [1] operations. On the contrary, the top-down models are task-dependent and use a priori knowledge of the visual system. They are always integrated with the bottom-up models to generate saliency maps for localizing objects of interest.
Recently, Liu et al. [14] proposed a salient object detection model based on the combination of bottom-up and top-down approach [13]. It combined multi-scale contrast, center-surround histogram and color spatial distribution with conditional random field under maximum likelihood estimation (MLE) criteria. MLE is a well-known parameter estimation technique with many advantages [15]. It provides a consistent and asymptotically efficient approach for parameter estimation. It gives unbiased variance when sample size is large. It has approximate normal distributions and approximate sample variances that can be used to generate confidence bounds and hypothesis tests for the parameters, and has a lower variance in comparison to other methods. However, it is overshadowed by certain disadvantages: MLE can be heavily biased for small samples and is highly sensitive to the choice of starting values. MLE is a derivation based approach where the function should have an analytical form. Also, the solution does not converge all the time and is usually non-trivial for the numerical estimation. Also, Liu et al. [14] used a common linear weight vector, obtained by MLE, to combine the feature maps for all test images. This weight vector may not give better saliency results for images which are significantly different from the training set. The weight vectors for such images must be learned in such a way that they give better saliency results for their corresponding images.
In this paper, we used a modified form of Particle Swarm optimization (PSO) [16], a commonly used optimization method, to obtain weight vector in order to optimally combine the features extracted from the image. PSO utilizes the fitness function to obtain the optimal solution. For this we have proposed a new fitness function to obtain better saliency results. To check the efficacy of our proposed model, the performance is evaluated in terms of precision, recall, F -measure, area under curve and computation time. Experiments are carried out on a publicly available image dataset and performance is compared with Liu et al. [14] model and 10 other popular state-of-the-art models.
The paper is organized as follows. Section 2 includes the state-of-the-art methods to obtain visual salient object. The proposed model is discussed in Section 3. The experimental setup and results are included in Section 4. Conclusion and future work are presented in Section 5.
Section snippets
Bottom-up methods
Itti et al. [1] proposed a biologically plausible model that computes saliency map by combining intensity, color and orientation features at multiple scales. Walther and Koch [17] extended the Itti et al. model to detect proto object regions. Harel et al. [18] modeled the graph theoretic ideas to determine activation maps from the raw features. The model gives high saliency values to the nodes which are at the center of the image. Han et al. [19] integrated the Itti's model with Markov random
Proposed salient object detection framework
Liu et al. [14] extracted multi-scale contrast, center-surround histogram and color spatial distribution feature maps from the image. These features can be combined in many ways to obtain a saliency map. One possible way is to give equal weightage to all the three features. However, there can be an image which is salient in terms of only singleton feature or combination of two features with different weights or combination of all the three features with different weights. So an appropriate
Experimental setup and results
To check the efficacy of the proposed approach to detect salient object, the performance is evaluated both qualitatively and quantitatively. The performance is compared with existing approaches.
In Salient Object Detection using Constrained Particle Swarm Optimization (SOD-C-PSO) procedure, the parameters are set according to Table 1.
All the experiments are carried out using Windows 7 environment over Intel(R) Xeon(R) processor with a speed of 2.27 GHz and 4 GB RAM.
Conclusion and future work
Liu et al. model used conditional random field (CRF) under maximum likelihood criteria to combine three features based on multi-scale contrast, center-surround histogram and color spatial distribution. The CRF learning generated a common linear weight vector which was applied on all the test images. This weight vector was not good for images which differs significantly from the training set and hence gave inappropriate detection results. We proposed a new fitness function for detecting salient
Conflict of interest statement
None declared.
Acknowledgments
The authors are indebted to the reviewers for their constructive suggestions which significantly helped in improving the quality of this paper. In addition, the first author expresses his gratitude to the University Grant Commission (UGC), India for the obtained financial support in performing this research work.
Navjot Singh obtained M.Tech (Computer Science and Technology) from Jawaharlal Nehru University, New Delhi. Presently, he is pursuing Ph.D. (Computer Vision and Pattern Recognition) from Jawaharlal Nehru University, New Delhi. His current research areas are Computer Vision, image processing, object detection, pattern recognition, feature extraction, and classification.
References (30)
- et al.
Modeling attention to salient proto-objects
Neural Netw.
(2006) - et al.
A model of saliency based visual attention for rapid scene analysis
IEEE Trans. Pattern Anal. Mach. Intell.
(1998) - A. Borji, D.N. Sihite, L. Itti, Salient object detection: a benchmark, in: Proceedings of the European Conference on...
- et al.
A novel approach for the detection of vehicles on freeways by real time vision
Intell. Veh.
(1996) - et al.
Saliency and gist features for target detection in satellite images
IEEE Trans. Image Process.
(2011) 2D Target Detection and Recognition, Models, Algorithms and Networks
(2002)- et al.
Digital Image Processing
(2002) - L. Itti, Models of Bottom Up and Top Down Visual Attention (Dissertation), California Institute of Technology,...
- A. Santella, M. Agrawala, D. Decarlo, D. Salesin, M. Cohen, Gaze based interaction for semi automatic photo cropping,...
- L. Chen, X. Xie, X. Fan, W. Ma, H. Shang, H. Zhou, A Visual Attention Model for Adapting Images on Small Displays,...
Detection of stellate distortions in mammograms
IEEE Trans. Med. Imaging
Low quality image enhancement using visual attention
Opt. Eng.
An adaptive computational model for salient object detection
IEEE Trans. Multimed.
Learning to detect a salient object
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (0)
Navjot Singh obtained M.Tech (Computer Science and Technology) from Jawaharlal Nehru University, New Delhi. Presently, he is pursuing Ph.D. (Computer Vision and Pattern Recognition) from Jawaharlal Nehru University, New Delhi. His current research areas are Computer Vision, image processing, object detection, pattern recognition, feature extraction, and classification.
Rinki Arya obtained M.Tech (Computer Science and Technology) from Jawaharlal Nehru University, New Delhi. Presently, she is pursuing Ph.D. (Computer Vision and Pattern Recognition) from Jawaharlal Nehru University, New Delhi. Her current research areas are Computer Vision, object detection, pattern recognition, and feature extraction.
R.K. Agrawal obtained M.Tech (Computer Application) from Indian Institute of Technology Delhi, New Delhi and Ph.D. (Computational Physics) from University of Delhi, Delhi. Presently, he is working as a Professor at the School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi. His current research areas are classification, feature extraction and selection for pattern recognition problems in domains of image processing, security, and bioinformatics.