Robust latent semantic exploration for image retrieval in social media

doi:10.1016/j.neucom.2015.02.082

Neurocomputing

Volume 169, 2 December 2015, Pages 180-184

https://doi.org/10.1016/j.neucom.2015.02.082 Get rights and content

Abstract

With the speedy development of social media, more and more multimedia data are generated by users with tags associated. The tag information provides the extra cue to link multimedia data in addition to the multimedia content itself. However, the manually added tags are always with noise and not correct enough. Moreover, the semantically similar tags exist massively but cannot be accounted for well. This paper proposes a new algorithm to robustly combine multimedia content and associated tags by mining the latent semantic which takes into account the semantically similar tags. The $l_{2, 1}$ norm is proposed to employ in latent semantic indexing for a more robust latent space, and a word-to-vector based clustering method is proposed to address the massive tags with similar meaning. The experiments on extensive data demonstrate the proposed method. Compared to the existing latent semantic based methods, the algorithm proposed a more robust model to deal with noise.

Introduction

Recent years have witnessed the boom of social media with the rapid development of smart phones and convenient internet access. People can easily acquire and share information and communicate with others on these platforms. One distinctive feature of social media is that it is more easier for users to create and share their own contents. Among the massive social media platforms, many image and video sharing web sites have become very popular. On these platforms, users not only produce and store their own contents but also can view and comment on others׳ contents. These platforms provide the social network characteristics which make users get connected and communicate with one another in multiple forms. Users can view the contents from one another and are provided the ability to comment on these contents by using tags. These tags can be viewed as annotation multimedia contents.

However, since users do not have the obligation to add tags accurately and thoroughly, directly using tags as annotation will bring problems due to being noisy and incomplete. Users sometimes give tags not from the same angle. There are many ways to describe one multimedia content, so it is impossible to add all possible tags completely for users. For example, for the landscape images taken at the same place, different users would pay attention to different aspects and hence prefer to add different tags to describe this place. Consequently, although these two images are visually similar, the difference introduced by tags will cause that these two images are not “similar” during searching. On the other hand, there is also the possibility that the visually different images are declared “similar” based on the associated tags.

As a reference, in the dataset proposed in [1], it has been shown that the average precision of user tags is about 0.5 and the average recall (completeness rate) of the user tags is 0.5 as well. Hence it means half of the tags created by users are noise and half of the true labels are missing.

To address the noisy and missing tags, a lot of works have been proposed recently. Tag refinement is one potential solution to improve the quality of user-generated tags associated to the multimedia data [2], [3], [4], [5]. The relevance between tag and image is explored and further refined in these works. In [2] tag refinement is formulated as a tag ranking problem and a probabilistic framework was proposed to rank the associated tags. In [3], content consistency, tag consistency and low rank properties are studied in the mean time to generate a refined tag set. Tang et al. [4] used a robust graph and employed semi-supervised learning technique to learn a tag ranking model to do tag refinement.

In this paper, we propose an automatic image annotation algorithm by introducing a new latent semantic space to discover the semantic structure hidden in image and its tags. During the latent space construction, we use $l_{2, 1}$ norm as the regularizor which has been demonstrated more robust to noise. To address those tags in similar meanings but regarded as different words, a word to vector based clustering method is proposed to build connection with similar tags. In addition, visual feature learnt from a deep convolutional neural network is invoked to compute the visual similarity between images. Based on these improvements, the proposed image retrieval method is demonstrated superior with using extensive dataset.

The rest of this paper is organized as follows. Section 2 reviews the foundation of the proposed algorithm, i.e., the latent semantic indexing method and the low rank based latent semantic discovery means. Section 3 is devoted to presenting the proposed robust latent semantic mining method. The experimental results are provided in Section 4. We conclude this paper in Section 5.

Section snippets

Semantic modeling of multimedia

In this section, we review the existing latent semantic based indexing methods which are the basis of our algorithm.

Robust latent semantic mining

In [7], the latent semantic model depicted by Eq. (10) has demonstrated its superiority. In this paper, our objective is to make it more robust to noise. In Eq. (10), the first term is Frobenius norm which is well known that it is sensitive to noises since the squared error of noises may dominate this term. While $l_{2, 1}$ norm has shown its better robustness to noises [8], [9], [10], in our work we change the first term from Frobenius norm to $l_{2, 1}$ norm: $\min_{H} F (H) = {‖ A - H ‖}_{2, 1}^{2} + λ tr (H^{T} LH) + γ ∥ H ∥_{⁎},$ where $l_{2, 1}$

Experimental results

In this section we evaluate the proposed robust latent method and its application in image retrieval. The experiments are conducted on a public dataset with a large number of images with noisy tags associated. To show the effectiveness, we set the so-called context-and-content-based multimedia retrieval (C2MR) method proposed in [7] as the baseline method, and we call our method Robust C2MR (RC2MR).

Conclusions

This paper has presented a latent semantic based image retrieval method. As the state-of-the-art latent semantic mining method works on sparse and noisy tags, we tried to improve the existing methods from two aspects. To increase the robustness of the latent semantic modeling, we adopted $l_{2, 1}$ norm rather than the popular Fobenius norm and presented the optimization method. Then to deal with the sparse tags associated with each images, we proposed to use a new representation of word tags in a

Acknowledgment

This work is supported by the National Natural Science Foundation of China Under Grant Number 61402388.

Liujuan Cao is currently an Assistant Professor at the Department of Computer Science, School of Information Science and Engineering, Xiamen University. Before that, she obtained her Ph.D. degree from Harbin Engineering University. Her research is in the field of multimedia analysis, geo-science and remote sensing, and computer vision. She has published extensively at CVPR, Neurocomputing, Signal Processing, ICIP, VCIP, etc.

References (16)

T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, Y.-T. Zheng, Nus-wide: a real-world web image database from National...
D. Liu, X.-S. Hua, L. Yang, M. Wang, H.-J. Zhang, Tag ranking, in: Proceedings of the 18th International Conference on...
G. Zhu, S. Yan, Y. Ma, Image tag refinement towards low-rank, content-tag prior and error sparsity, in: Proceedings of...
J. Tang, S. Yan, R. Hong, G.-J. Qi, T.-S. Chua, Inferring semantic concepts from community-contributed images and noisy...
H. Xu, J. Wang, X.-S. Hua, S. Li, Tag refinement by regularized lda, in: Proceedings of the 17th ACM International...
E.J. Candes, Y. Plan, Matrix completion with noise, Proc. IEEE 98 (6) (2010)...
G.-J. Qi et al.
Exploring context and content links in social mediaa latent space method
IEEE Trans. Pattern Anal. Mach. Intell.
(2012)
C. Ding, D. Zhou, X. He, H. Zha, R 1-pca: rotational invariant l 1-norm principal component analysis for robust...

There are more references available in the full text version of this article.

Cited by (5)

Bag-of-concepts: Comprehending document representation through clustering words in distributed representation
2017, Neurocomputing
Citation Excerpt :
One of the biggest contributions of word2vec is that the words that occur in similar context - consequently with similar meaning according to the distributed hypothesis - are located close to each other in the embedded space, preserving the semantic similarities between the words. As the words are represented in a continuous embedded space, various conventional machine learning and data mining techniques can be applied in this space to resolve various text mining tasks [2,5,6,23,35]. Fig. 4 shows an example of such embedded space visualized by t-sne [17].
Two document representation methods are mainly used in solving text mining problems. Known for its intuitive and simple interpretability, the bag-of-words method represents a document vector by its word frequencies. However, this method suffers from the curse of dimensionality, and fails to preserve accurate proximity information when the number of unique words increases. Furthermore, this method assumes every word to be independent, disregarding the impact of semantically similar words on preserving document proximity. On the other hand, doc2vec, a basic neural network model, creates low dimensional vectors that successfully preserve the proximity information. However, it loses the interpretability as meanings behind each feature are indescribable. This paper proposes the bag-of-concepts method as an alternative document representation method that overcomes the weaknesses of these two methods. This proposed method creates concepts through clustering word vectors generated from word2vec, and uses the frequencies of these concept clusters to represent document vectors. Through these data-driven concepts, the proposed method incorporates the impact of semantically similar words on preserving document proximity effectively. With appropriate weighting scheme such as concept frequency-inverse document frequency, the proposed method provides better document representation than previously suggested methods, and also offers intuitive interpretability behind the generated document vectors. Based on the proposed method, subsequently constructed text mining models, such as decision tree, can also provide interpretable and intuitive reasons on why certain collections of documents are different from others.
A novel image retrieval algorithm based on transfer learning and fusion features
2019, World Wide Web
New generation model of word vector representation based on CBOW or skip-gram
2019, Computers, Materials and Continua
Emotional modelling and classification of a large-scale collection of scene images in a cluster environment
2018, PLoS ONE
Image retrieval method based on an improved particle swarm optimization algorithm
2016, Proceedings - 2015 International Conference on Intelligent Transportation, Big Data and Smart City, ICITBS 2015

Fanglin Wang is currently a Research Fellow in School of Computing, National University of Singapore. He received the B.S. and M.S. degrees both from Harbin Institute of Technology, Harbin, China, in 2003 and 2005, respectively, and the Ph.D. degree from Shanghai Jiao Tong University, China, in 2009. He had worked as a Senior Researcher, Software Researcher and Senior Researcher at Sharp Laboratories China, Autodesk China Research and Development, Carestream Inc., respectively, from 2009 to 2012. His research interests include object detection, visual tracking and medical tomographic reconstruction.

View full text