Nonnegative cross-media recoding of visual-auditory content for social media analysis

Zhang, Hong; Xu, Xin

doi:10.1007/s11042-014-1970-x

Nonnegative cross-media recoding of visual-auditory content for social media analysis

Published: 28 March 2014

Volume 74, pages 577–593, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hong Zhang^1,2,3 &
Xin Xu¹

890 Accesses
6 Citations
Explore all metrics

Abstract

Cross-media semantics understanding, which focuses on multimedia data of different modalities, is a rising hot topic in social media analysis. One of the most challenging issues for cross-media semantics understanding is how to represent multimedia data of different modalities. Most traditional multimedia semantics analysis works are based on single modality data sources, such as Flickr images or YouTube videos, leaving efficient cross-media data representation wide open. In this paper, we propose a novel nonnegative cross-media recoding approach, which learns co-occurrences of cross-media feature spaces by explicitly learning a common subset of basis vectors. Moreover, we impose the nonnegativity constraint on the decomposed matrices so that the basis vectors represent important and locally meaningful features of the cross-media data. We take two kinds of typical multimedia data, that is, image and audio, as experimental data. Our approach can be applied to a wide range of multimedia applications. The experiments are conducted on image-audio dataset for applications of cross-media retrieval and data clustering. Experiment results are encouraging and show that the performance of our approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple kernel visual-auditory representation learning for retrieval

Article 23 February 2016

Latent semantic factorization for multimedia representation learning

Article 30 August 2017

A cross-media distance metric learning framework based on multi-view correlation mining and matching

Article 21 April 2015

References

Cai D, He XF, Han JW (2007) Semi-supervised discriminant analysis. in: IEEE11thInternational conference on Computer Vision, 2007, pp.1–7
Dempster AP, Laird NM, Rubin DB et al (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39(1):1–38
MATH MathSciNet Google Scholar
Dumais ST, Landauer TK, Littman ML (1996) Automatic cross-linguistic information retrieval using Latent Semantic Indexing. Proceedings of ACM SIGIR Workshop on Cross-Linguistic Information Retrieval, pp.16-23, 1996
Frey BJ, Dueck D (2007) Clustering by passing messages between data point. Science 315:972–976
Article MATH MathSciNet Google Scholar
Gupta SK, Phung D, Adams B et al (2010) Nonnegative shared subspace learning and its application to social media retrieval. Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining. USA. 2010. pp. 1169–1178
Han Y, Wu F, Tian Q, Zhuang Y (2012) Graph-guided sparse reconstruction for region tagging. IEEE Conference on Computer Vision and Pattern Recognition, 2012
Hansen L, Larsen J, Kolenda T (2000) On independent component analysis for multimedia signals. Multimedia image and video processing. CRC Press, Boca Raton
Google Scholar
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Article MATH Google Scholar
He X, Ma WY, Zhang HJ (2004) Learning an image manifold for retrieval. ACM Multimedia Conference, USA, 2004
Hong Z, Jun Y, Meng W (2012) Semi-supervised distance metric learning based on local linear regression for data clustering. Neurocomputing 93:100–105
Article Google Scholar
Hong Z, Yun L, Zhigang M (2013) Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119:10–16
Article Google Scholar
Kruskal JB, Wish M (1997) Multidimensional scaling. Sage Publications, Beverly Hills
Google Scholar
Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing, 2000
Li X, Snoek CGM, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimedia 11(7):1310–1322
Article Google Scholar
Liu Y, Wu F, Zhuang Y, Xiao J (2008) Active post-refined multimodality video semantic concept detection with tensor representation. ACM Multimedia Conference. pp.91-100. 2008
Lovasz L, Plummer M (1986) Matching theory. Akadémiai Kiadó, North Holland
MATH Google Scholar
Ma Q, Akiyo N, Katsumi T (2006) Complementary information retrieval for cross-media news content. Inf Syst 31(7):659–678
Article Google Scholar
Marl L, Marl S (2008) Structural segmentation of musical audio by constrained clustering. IEEE Trans Audio Speech Lang Process 16(2):318–326
Article Google Scholar
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748
Article Google Scholar
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. 1988
Pan J, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. ACM SIGKDD Conference. pp.22-25, 2004
Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. Proceedings of the 24th International Conference on Machine Learning, page 766, 2007
Tang J, Yan S, Hong R, Qi G, Chua T (2009) Inferring semantic concepts from community-contributed images and noisy tags, in: Proceedings of the 17th ACM International Conference on Multimedia, 2009, pp. 223–232
Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Article Google Scholar
Wu Y, Chang EY, Chang, KCC, Smith JR (2004) Optimal multimodal fusion for multi-media data analysis, in: ACM Multimedia Conference, 2004, pp. 572–579
Wu Y, Chang EY, Chang KCC, Smith JR (2004) Optimal multimodal fusion for multimedia data analysis. ACM Multimedia Conference, pp.572-579, 2004
Yang Y, Zhuang Y, Wu F, Pan Y (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimed 10(3):437–446
Article Google Scholar
Yi Y, Feiping N, Dong X, Jiebo L, Yueting Z, Yunhe P (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34(5):723–742
Google Scholar
Yi Y, Jingkuan S, Zi H, Zhigang M, Nicu S, Alex H (2013) Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis. IEEE Transactions on Multimedia 15(3):572–581
Article Google Scholar
Yueting Z, Yi Y, Fei W (2008) Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans Multimedia 10(2):221–229
Article Google Scholar
Zhang H, Chen L, Liu Y (2013) Isomorphic and sparse multimodal data representation based on correlation analysis. IEEE International Conference on Image Processing. pp.3959-3962. 2013
Zhang H, Zhuang Y, Wu F (2007) Cross-modal correlation learning for clustering on Image-Audio Dataset. The 15th ACM International Conference on Multimedia. Germany, 2007
Yahong H, Fei W, Qi T, Zhuang Y (2012) Image annotation by input-output structural grouping sparsity. IEEE Trans Image Process (IEEE T-IP) 21(6):3066–3079
Google Scholar
Yahong H, Yang Y, Zhigang M, Haoquan S, Nicu S, Xiaofang Z (2014) Image attribute adaptation. IEEE Transactions on Multimedia (IEEE T-MM). doi:10.1109/TMM.2014.2306092

Download references

Acknowledgments

This research is supported by the National Natural Science Foundation of China (No.61003127, No. 61373109), State Key Laboratory of Software Engineering (SKLSE2012-09-31).

Author information

Authors and Affiliations

College of Computer Science & Technology, Wuhan University of Science & Technology, Wuhan, 430081, China
Hong Zhang & Xin Xu
Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan, China
Hong Zhang
State Key Laboratory of Software Engineering, Wuhan University, Wuhan, 430072, China
Hong Zhang

Authors

Hong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Xu, X. Nonnegative cross-media recoding of visual-auditory content for social media analysis. Multimed Tools Appl 74, 577–593 (2015). https://doi.org/10.1007/s11042-014-1970-x

Download citation

Published: 28 March 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11042-014-1970-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonnegative cross-media recoding of visual-auditory content for social media analysis

Abstract

Access this article

Similar content being viewed by others

Multiple kernel visual-auditory representation learning for retrieval

Latent semantic factorization for multimedia representation learning

A cross-media distance metric learning framework based on multi-view correlation mining and matching

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonnegative cross-media recoding of visual-auditory content for social media analysis

Abstract

Access this article

Similar content being viewed by others

Multiple kernel visual-auditory representation learning for retrieval

Latent semantic factorization for multimedia representation learning

A cross-media distance metric learning framework based on multi-view correlation mining and matching

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation