Abstract
Cross-media semantics understanding, which focuses on multimedia data of different modalities, is a rising hot topic in social media analysis. One of the most challenging issues for cross-media semantics understanding is how to represent multimedia data of different modalities. Most traditional multimedia semantics analysis works are based on single modality data sources, such as Flickr images or YouTube videos, leaving efficient cross-media data representation wide open. In this paper, we propose a novel nonnegative cross-media recoding approach, which learns co-occurrences of cross-media feature spaces by explicitly learning a common subset of basis vectors. Moreover, we impose the nonnegativity constraint on the decomposed matrices so that the basis vectors represent important and locally meaningful features of the cross-media data. We take two kinds of typical multimedia data, that is, image and audio, as experimental data. Our approach can be applied to a wide range of multimedia applications. The experiments are conducted on image-audio dataset for applications of cross-media retrieval and data clustering. Experiment results are encouraging and show that the performance of our approach is effective.
Similar content being viewed by others
References
Cai D, He XF, Han JW (2007) Semi-supervised discriminant analysis. in: IEEE11thInternational conference on Computer Vision, 2007, pp.1–7
Dempster AP, Laird NM, Rubin DB et al (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39(1):1–38
Dumais ST, Landauer TK, Littman ML (1996) Automatic cross-linguistic information retrieval using Latent Semantic Indexing. Proceedings of ACM SIGIR Workshop on Cross-Linguistic Information Retrieval, pp.16-23, 1996
Frey BJ, Dueck D (2007) Clustering by passing messages between data point. Science 315:972–976
Gupta SK, Phung D, Adams B et al (2010) Nonnegative shared subspace learning and its application to social media retrieval. Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining. USA. 2010. pp. 1169–1178
Han Y, Wu F, Tian Q, Zhuang Y (2012) Graph-guided sparse reconstruction for region tagging. IEEE Conference on Computer Vision and Pattern Recognition, 2012
Hansen L, Larsen J, Kolenda T (2000) On independent component analysis for multimedia signals. Multimedia image and video processing. CRC Press, Boca Raton
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
He X, Ma WY, Zhang HJ (2004) Learning an image manifold for retrieval. ACM Multimedia Conference, USA, 2004
Hong Z, Jun Y, Meng W (2012) Semi-supervised distance metric learning based on local linear regression for data clustering. Neurocomputing 93:100–105
Hong Z, Yun L, Zhigang M (2013) Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119:10–16
Kruskal JB, Wish M (1997) Multidimensional scaling. Sage Publications, Beverly Hills
Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing, 2000
Li X, Snoek CGM, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimedia 11(7):1310–1322
Liu Y, Wu F, Zhuang Y, Xiao J (2008) Active post-refined multimodality video semantic concept detection with tensor representation. ACM Multimedia Conference. pp.91-100. 2008
Lovasz L, Plummer M (1986) Matching theory. Akadémiai Kiadó, North Holland
Ma Q, Akiyo N, Katsumi T (2006) Complementary information retrieval for cross-media news content. Inf Syst 31(7):659–678
Marl L, Marl S (2008) Structural segmentation of musical audio by constrained clustering. IEEE Trans Audio Speech Lang Process 16(2):318–326
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. 1988
Pan J, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. ACM SIGKDD Conference. pp.22-25, 2004
Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. Proceedings of the 24th International Conference on Machine Learning, page 766, 2007
Tang J, Yan S, Hong R, Qi G, Chua T (2009) Inferring semantic concepts from community-contributed images and noisy tags, in: Proceedings of the 17th ACM International Conference on Multimedia, 2009, pp. 223–232
Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Wu Y, Chang EY, Chang, KCC, Smith JR (2004) Optimal multimodal fusion for multi-media data analysis, in: ACM Multimedia Conference, 2004, pp. 572–579
Wu Y, Chang EY, Chang KCC, Smith JR (2004) Optimal multimodal fusion for multimedia data analysis. ACM Multimedia Conference, pp.572-579, 2004
Yang Y, Zhuang Y, Wu F, Pan Y (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimed 10(3):437–446
Yi Y, Feiping N, Dong X, Jiebo L, Yueting Z, Yunhe P (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34(5):723–742
Yi Y, Jingkuan S, Zi H, Zhigang M, Nicu S, Alex H (2013) Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis. IEEE Transactions on Multimedia 15(3):572–581
Yueting Z, Yi Y, Fei W (2008) Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans Multimedia 10(2):221–229
Zhang H, Chen L, Liu Y (2013) Isomorphic and sparse multimodal data representation based on correlation analysis. IEEE International Conference on Image Processing. pp.3959-3962. 2013
Zhang H, Zhuang Y, Wu F (2007) Cross-modal correlation learning for clustering on Image-Audio Dataset. The 15th ACM International Conference on Multimedia. Germany, 2007
Yahong H, Fei W, Qi T, Zhuang Y (2012) Image annotation by input-output structural grouping sparsity. IEEE Trans Image Process (IEEE T-IP) 21(6):3066–3079
Yahong H, Yang Y, Zhigang M, Haoquan S, Nicu S, Xiaofang Z (2014) Image attribute adaptation. IEEE Transactions on Multimedia (IEEE T-MM). doi:10.1109/TMM.2014.2306092
Acknowledgments
This research is supported by the National Natural Science Foundation of China (No.61003127, No. 61373109), State Key Laboratory of Software Engineering (SKLSE2012-09-31).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, H., Xu, X. Nonnegative cross-media recoding of visual-auditory content for social media analysis. Multimed Tools Appl 74, 577–593 (2015). https://doi.org/10.1007/s11042-014-1970-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-1970-x