Efficient region-aware large graph construction towards scalable multi-label propagation
Introduction
Recent advances in technology for image capturing, processing, distribution, storage, and sharing have resulted in the proliferation of image data. Huge collections of images are now available and shared on the photo-sharing websites such as Flickr [2] and Picasa [3]. It is reported that more than 5000 images are uploaded to Flickr every minute, and up to 12,000 images are browsed per second during the peak time. Noting that most of these uploaded images lack high-quality annotations, one crucial task is to automatically annotate these images to facilitate subsequent image searching.
Automatic annotation of images in the large-scale datasets at the semantic level is challenging, mainly due to difficulties on: (1) How to explore the relationships between regions and labels. Generally, labels are relevant to image local regions rather than the whole image. However, most of the existing algorithms associate labels to the whole images, assuming that image similarity and label similarity are consistent. Our first task is to associate the labels with specific image regions, which are believed to be more accurate. (2) How to exploit the co-occurrence among labels. Identifying labels that frequently co-occur, e.g. “cloud” and “sky”, will improve the image annotation performance. This has been discussed in recent works [4], [5], [6], but not in large-scale dataset since extra computations are required. Our second task is to reveal the label co-occurrence in large-scale datasets with tolerable computational cost. (3) How to explore the relationships among images. To explore such relationships, semantics-motivated kernel function is used to construct the similarity matrix, which encodes the underlying structural dependence between images. However, it is impractical to calculate the whole kernel matrix over a large-scale dataset. Our third task is to efficiently explore the image relationships in large-scale datasets. (4) How to propagate the labels from labeled images to unlabeled ones. In large-scale dataset, it is intractable to obtain all the image labels. But it is still feasible to label a small subset of images, which are often regarded as “seed images”. Labels of these seed images are then propagated to the rest unlabeled ones through semi-supervised learning algorithm. Our fourth task is to develop an efficient semi-supervised learning algorithm among existing ones for label propagation. To address these four difficulties, we propose a framework of region-aware and scalable multi-label propagation in this paper.
Most of existing annotation algorithms assume that image similarity and label similarity are consistent. This assumption ignores the fact that usually each label only best characterizes a local semantic region within an image while image similarity is generally calculated based on the whole image, as illustrated in Fig. 1. One reasonable solution is to represent the image with semantic region-based features [7], [8], [9] proposed to regard each image as a bag of multiple manually segmented regions and predicted the label of each region by a multi-class bag classifier. In practice, the manual segmentation is very time-consuming while on the other hand automatic image segmentation algorithms are still far from satisfaction. To address this, Gu et al. [10] proposed to segment each image into a robust bag of overlaid regions, which are processed in different scales to explore rich cues. This Bag-of-Regions (BOR) representation catches the regional semantics and can be based on most existing segmentation algorithms. Therefore, in our framework, we follow Gu's algorithm to represent every image as BOR and extract features from the regions to explore the relationships between semantic regions and labels.
The label co-occurrence is potentially useful to improve the accuracy in label propagation. Chen et al. constructed graph on label level to reveal the label correlations [5]. Liu et al. introduced a label similarity matrix to provide a semi-supervised learning algorithm [6]. Unfortunately, most existing methods are hardly extended to large-scale case, especially as the number of labels increases. The Bag-of-Regions representation, which explores the label locality as we mentioned before, also provides the possibility to reveal the label co-occurrence. Since the segmentation is performed from coarse to fine, the single-label associated regions are generated at the fine segmentation level to explore the label locality, while the multi-label associated regions are generated at the coarse segmentation level. If certain multiple labels are frequently segmented into one region, the co-occurrence will be revealed and propagated to unlabeled data.
Image similarity plays an important role in image annotation problems, however, it is hard to calculate all pairwise similarities in large-scale applications due to the storage limitation and the highly computational cost. Sparse k-nearest neighbor (k-NN) graph is a widely used method to explore image similarity. In order to retrieve the k-nearest data points of the query, a possible solution is to construct a data structure that supports spatial partition, and searches the nearest neighbors under some (Euclidean) distance function. kd-tree is a well-known space partitioning data structure for organizing data points in k-dimensional space [12], however, it suffers from the query time that is exponential to dimension k. Recent advances in approximate nearest neighbor methods shed some lights on solving this problem. The appeal of this method is that, an approximate nearest neighbor is almost as good as the exact one in many cases, while it improves the computational efficiency. In particular, if the distance measure can accurately capture the underlying data dependence between images, then small differences in the distance have only limited effect. Locality-sensitive hashing (LSH) scheme [13], a most popular method in approximate search, is explored to partition image set by hashing them into different buckets to shortlist the k-NN candidates. The hash key is generated by several hash functions which provide higher collision probability for the nearby data points. The retrieving process is simplified to search the elements in the bucket which contains the query data point. The experiments show that this data structure achieves a large speedup over several tree-based data structures [14], [15]. Jain et al. [16] provided a method to incrementally update locality-sensitive hash keys during the updates of the metric learner, which makes it possible to perform accurate sub-linear time nearest neighbor search over the data in an online manner. Adopting the same merit, we utilize LSH to fast search every region's nearest neighbors to construct region-level based k-NN graph. However, it is still unclear that how to convert the region-level based k-NN graph into image-level based k-NN graph. In this paper, we provide a Region-to-Image (R2I) process to generate image-level k-NN graph from the semantic regions. More specifically, we first segment every image into regions with different segmentation scales to obtain the BOR and extract the region representation; next utilize LSH to construct the k-NN graph at region level; then, generate the k-NN graph at image level in R2I process; the image similarity provided by image k-NN graph is at last leveraged to propagate the labels from labeled ones into unlabeled ones.
In the phase of label propagation, an appropriate learning algorithm is also a vital factor for effectiveness and efficiency, especially for large-scale problem. From a machine learning point of view, our goal of label propagation can be regarded as a special case of semi-supervised learning. With the philosophy that adding quantities of unlabeled data could provide auxiliary information and produce a better classifier, semi-supervised learning algorithms have demonstrated superior performances on many large-scale problems. Several efficient algorithms have been proposed in recent years. For example, [17], [18] developed a large scale semi-supervised linear SVM. Delalleau et al. [19] proposed an algorithm to improve the induction speed when modeling data with graphs. Karlen et al. [20] solved a large-scale graph transduction problem with up to 650,000 samples. Subramanya et al. [1] performed efficient transduction by optimizing an entropy-regularized objective function. This algorithm is cache-cognizant and obtains linear speedup in parallel computations, which ensures the scalability to large dataset. The algorithm proposed in this paper is primarily adapted from Subramanya's, which is able to process ever largest samples (120 million) and experimentally outperforms other square-loss algorithms such as the harmonic function algorithm. We accommodate it to the special requirements underlying the image annotation task.
In this paper, we propose a framework of region-aware and scalable multi-label propagation, which not only explores the relationships between labels and regions, but also possesses the computational efficiency, especially for large-scale setting. The main contributions of the propose work are summarized as follows. (1) The relationship between regions and labels is explored by the inter-relationship between the small-regions at the fine segmentation level; (2) the label co-occurrence is revealed by large-size regions at the coarse segmentation level; (3) the computation for image similarity is accelerated by locality-sensitive hashing; and (4) to the best of our knowledge, it is the first time to apply the entropic graph regularized semi-supervised learning algorithm [1] on a large-scale dataset up to 260k images. Extensive experiment results on NUS-WIDE and Corel datasets demonstrate the effectiveness and efficiency of our proposed framework.
The rest of this paper is organized as follows. Some related works are provided in Section 2. In Section 3, we present the details of our region-aware and scalable multi-label propagation framework. The experimental results are given in Section 4 with comprehensive discussions. Finally, Section 5 concludes the paper.
Section snippets
Related works
Recently, more and more researchers are focusing on image annotation problem. Qi et al. [4] proposed a novel correlative multi-label (CML) framework to simultaneously classify concepts and model correlations between them in a single step. Chen et al. presented a graph-based semi-supervised multi-label learning algorithm by solving a Sylvester equation [5], which is denoted as SMSE in this paper. This algorithm first constructs two graphs on sample level and category level, respectively, and
Overview of the framework
Fig. 2 shows the overall flow chart of the proposed framework for region-aware and scalable multi-label propagation. There are four steps in our framework. The first one is region specific representation. In this step, we segment every image into a Bag-of-Regions with different segmentation scales. Then, we extract every region's histogram of oriented gradient (HOG) features [27] and color moments [28] to obtain the region specific feature representation. In the second step, we first exploit
Experiments
In this section, we systematically evaluate the effectiveness and efficiency of our proposed region-aware and scalable multi-label propagation framework with two experiments, respectively. The first experiment is to evaluate the effectiveness, to validate the effectiveness of our BOR-based method, we select the “non-segmentation approach” as our baseline, which does not segment the image into regions. The other experiment is to evaluate the efficiency, as our framework exploits LSH to decrease
Conclusion
In this paper, we have proposed a novel scalable label-locality driven image annotation framework, which possesses four characteristics, namely, (1) the inter-label co-occurrence is revealed with the relationships among super-size regions from the segmentation algorithm; (2) the relationships among the atomic regions are investigated to explore the relationships between regions and labels and these relationships are further propagated into image level similarity measurements; (3) the
Bing-Kun Bao received the Ph.D. degree in Control Theory and Control Application, Department of Automation, University of Science and Technology of China (USTC), China, in 2009; and the B.E. degree from the School of Computing, Hefei University of Technology, China, in 2004. She is currently a Research Engineer in Electrical and Computer Engineering at the National University of Singapore (NUS). Her research interests are in the areas of multimedia and computer vision.
References (31)
- A. Subramanya, J. Bilmes, Entropic graph regularization in non-parametric semi-supervised classification, in: Neural...
- ...
- ...
- et al.
Correlative multi-label video annotation
- G. Chen, Y. Song, F. Wang, C. Zhang, Semi-supervised multi-label learning by solving a sylvester equation, in: SIAM...
- et al.
Semi-supervised multi-label learning by constrained non-negative matrix factorization
- et al.
Logistic regression and boosting for labeled bags of instances
- et al.
Miles: multiple-instance learning via embedded instance selection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2006) - et al.
Multi-instance multi-label learning with application to scene classification
Advances in Neural Information Processing Systems
(2007) - et al.
Recognition using regions
Exploiting spatial context constraints for automatic image region annotation
Similarity estimation techniques from rounding algorithms
Approximate nearest neighbors: towards removing the curse of dimensionality
Similarity search in high dimensions via hashing
Cited by (20)
Noise-robust semi-supervised learning via fast sparse coding
2015, Pattern RecognitionCitation Excerpt :The basic idea behind this semi-supervised learning is label propagation over the graph with the cluster consistency [9] (i.e. two data points on the same geometric structure are likely to have the same class label). Since the graph is at the heart of graph-based semi-supervised learning, graph construction has been studied extensively in the literature [11–17]. However, these graph construction methods are not developed directly for noise reduction and the corresponding semi-supervised learning suffers from significant performance degradation due to the inaccurate labeling of data points commonly encountered in different image analysis tasks.
Manifold-ranking based retrieval using k-regular nearest neighbor graph
2012, Pattern RecognitionCitation Excerpt :To test the efficiency of our proposed graph structure in manifold-ranking based image retrieval, we employed the Corel dataset. Corel dataset is publicly available and widely used in evaluating image retrieval [25,26] and annotation methods recently [27,28]. Our version contains about 17,695 small Corel photos.
Efficient semi-supervised learning on locally informative multiple graphs
2012, Pattern RecognitionCitation Excerpt :Fig. 1(a) and (b) shows an example of our input with two different graphs, which share the same set of nodes, some part being labeled (by green or red) and the rest being unlabeled (or no colors). This problem is a kind of semi-supervised learning, which is practically useful because labeled examples are hard to get in real world [1–3]. Our problem is in graph-based semi-supervised learning, which can be classified into two types [4]: (1) supervised learning with unlabeled examples (semi-supervised classification) [1,2] and (2) unsupervised learning with constraints (semi-supervised clustering) [5,6].
A discriminative graph inferring framework towards weakly supervised image parsing
2017, Multimedia SystemsA semantic relatedness-based solution for reducing missing problem in TBIR
2017, International Journal of Signal and Imaging Systems Engineering
Bing-Kun Bao received the Ph.D. degree in Control Theory and Control Application, Department of Automation, University of Science and Technology of China (USTC), China, in 2009; and the B.E. degree from the School of Computing, Hefei University of Technology, China, in 2004. She is currently a Research Engineer in Electrical and Computer Engineering at the National University of Singapore (NUS). Her research interests are in the areas of multimedia and computer vision.
Bingbing Ni received his B.E. degree in Electrical Engineering from Shanghai Jiao Tong University (SJTU), China in 2005. He is currently a Ph.D. candidate in Electrical and Computer Engineering at the National University of Singapore (NUS). His research interests are in the areas of computer vision and machine learning.
Yadong Mu received the Ph.D. degree from the Institute of Computer Science and Technology, Peking University, Beijing, China, in 2009, and the B.S. degrees from Department of Computer Science and Department of Philosophy, Peking University in 2004 and 2005, respectively. Currently, he is a Postdoctoral Research Fellow in the Department of Electrical and Computer Engineering, National University of Singapore. His research interests include computer vision, machine learning, and data mining.
Shuicheng Yan (M’06, SM’09) received the Ph.D. degree from the School of Mathematical Sciences, Peking University in 2004. He spent three years as Postdoctoral Fellow at Chinese University of Hong Kong and then University of Illinois at Urban Champaign, and he is currently an Assistant Professor in the Department of Electrical and Computer Engineering at National University of Singapore. In recent years, his research interests have focused on computer vision (biometrics, surveillance, and internet vision), multimedia (video event analysis, image annotation, and media search), machine learning (feature extraction, sparsity/non-negativity analysis, and large-scale machine learning), and medical image analysis.
Dr. Yan has authored or co-authored over 140 technical papers over a wide range of research topics. He has served on the editor board of International Journal of Computer Mathematics, has served as guest editor of the special issue for Pattern Recognition Letters, and has been serving as the guest editor of the special issue for Computer Vision and Image Understanding. He has served as Co-Chair of the IEEE International Workshop on Video-oriented Object and Event Classification (VOEC’09) held in conjunction with ICCV’09. He is the Special Session Chair of the Pacific-Rim Symposium on Image and Video Technology, 2010. He is an Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology.