Skip to main content
Log in

Nonnegative cross-media recoding of visual-auditory content for social media analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Cross-media semantics understanding, which focuses on multimedia data of different modalities, is a rising hot topic in social media analysis. One of the most challenging issues for cross-media semantics understanding is how to represent multimedia data of different modalities. Most traditional multimedia semantics analysis works are based on single modality data sources, such as Flickr images or YouTube videos, leaving efficient cross-media data representation wide open. In this paper, we propose a novel nonnegative cross-media recoding approach, which learns co-occurrences of cross-media feature spaces by explicitly learning a common subset of basis vectors. Moreover, we impose the nonnegativity constraint on the decomposed matrices so that the basis vectors represent important and locally meaningful features of the cross-media data. We take two kinds of typical multimedia data, that is, image and audio, as experimental data. Our approach can be applied to a wide range of multimedia applications. The experiments are conducted on image-audio dataset for applications of cross-media retrieval and data clustering. Experiment results are encouraging and show that the performance of our approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Cai D, He XF, Han JW (2007) Semi-supervised discriminant analysis. in: IEEE11thInternational conference on Computer Vision, 2007, pp.1–7

  2. Dempster AP, Laird NM, Rubin DB et al (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39(1):1–38

    MATH  MathSciNet  Google Scholar 

  3. Dumais ST, Landauer TK, Littman ML (1996) Automatic cross-linguistic information retrieval using Latent Semantic Indexing. Proceedings of ACM SIGIR Workshop on Cross-Linguistic Information Retrieval, pp.16-23, 1996

  4. Frey BJ, Dueck D (2007) Clustering by passing messages between data point. Science 315:972–976

    Article  MATH  MathSciNet  Google Scholar 

  5. Gupta SK, Phung D, Adams B et al (2010) Nonnegative shared subspace learning and its application to social media retrieval. Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining. USA. 2010. pp. 1169–1178

  6. Han Y, Wu F, Tian Q, Zhuang Y (2012) Graph-guided sparse reconstruction for region tagging. IEEE Conference on Computer Vision and Pattern Recognition, 2012

  7. Hansen L, Larsen J, Kolenda T (2000) On independent component analysis for multimedia signals. Multimedia image and video processing. CRC Press, Boca Raton

    Google Scholar 

  8. Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  MATH  Google Scholar 

  9. He X, Ma WY, Zhang HJ (2004) Learning an image manifold for retrieval. ACM Multimedia Conference, USA, 2004

  10. Hong Z, Jun Y, Meng W (2012) Semi-supervised distance metric learning based on local linear regression for data clustering. Neurocomputing 93:100–105

    Article  Google Scholar 

  11. Hong Z, Yun L, Zhigang M (2013) Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119:10–16

    Article  Google Scholar 

  12. Kruskal JB, Wish M (1997) Multidimensional scaling. Sage Publications, Beverly Hills

    Google Scholar 

  13. Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing, 2000

  14. Li X, Snoek CGM, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimedia 11(7):1310–1322

    Article  Google Scholar 

  15. Liu Y, Wu F, Zhuang Y, Xiao J (2008) Active post-refined multimodality video semantic concept detection with tensor representation. ACM Multimedia Conference. pp.91-100. 2008

  16. Lovasz L, Plummer M (1986) Matching theory. Akadémiai Kiadó, North Holland

    MATH  Google Scholar 

  17. Ma Q, Akiyo N, Katsumi T (2006) Complementary information retrieval for cross-media news content. Inf Syst 31(7):659–678

    Article  Google Scholar 

  18. Marl L, Marl S (2008) Structural segmentation of musical audio by constrained clustering. IEEE Trans Audio Speech Lang Process 16(2):318–326

    Article  Google Scholar 

  19. McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748

    Article  Google Scholar 

  20. McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. 1988

  21. Pan J, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. ACM SIGKDD Conference. pp.22-25, 2004

  22. Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. Proceedings of the 24th International Conference on Machine Learning, page 766, 2007

  23. Tang J, Yan S, Hong R, Qi G, Chua T (2009) Inferring semantic concepts from community-contributed images and noisy tags, in: Proceedings of the 17th ACM International Conference on Multimedia, 2009, pp. 223–232

  24. Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

    Article  Google Scholar 

  25. Wu Y, Chang EY, Chang, KCC, Smith JR (2004) Optimal multimodal fusion for multi-media data analysis, in: ACM Multimedia Conference, 2004, pp. 572–579

  26. Wu Y, Chang EY, Chang KCC, Smith JR (2004) Optimal multimodal fusion for multimedia data analysis. ACM Multimedia Conference, pp.572-579, 2004

  27. Yang Y, Zhuang Y, Wu F, Pan Y (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimed 10(3):437–446

    Article  Google Scholar 

  28. Yi Y, Feiping N, Dong X, Jiebo L, Yueting Z, Yunhe P (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34(5):723–742

    Google Scholar 

  29. Yi Y, Jingkuan S, Zi H, Zhigang M, Nicu S, Alex H (2013) Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis. IEEE Transactions on Multimedia 15(3):572–581

    Article  Google Scholar 

  30. Yueting Z, Yi Y, Fei W (2008) Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans Multimedia 10(2):221–229

    Article  Google Scholar 

  31. Zhang H, Chen L, Liu Y (2013) Isomorphic and sparse multimodal data representation based on correlation analysis. IEEE International Conference on Image Processing. pp.3959-3962. 2013

  32. Zhang H, Zhuang Y, Wu F (2007) Cross-modal correlation learning for clustering on Image-Audio Dataset. The 15th ACM International Conference on Multimedia. Germany, 2007

  33. Yahong H, Fei W, Qi T, Zhuang Y (2012) Image annotation by input-output structural grouping sparsity. IEEE Trans Image Process (IEEE T-IP) 21(6):3066–3079

    Google Scholar 

  34. Yahong H, Yang Y, Zhigang M, Haoquan S, Nicu S, Xiaofang Z (2014) Image attribute adaptation. IEEE Transactions on Multimedia (IEEE T-MM). doi:10.1109/TMM.2014.2306092

Download references

Acknowledgments

This research is supported by the National Natural Science Foundation of China (No.61003127, No. 61373109), State Key Laboratory of Software Engineering (SKLSE2012-09-31).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Xu, X. Nonnegative cross-media recoding of visual-auditory content for social media analysis. Multimed Tools Appl 74, 577–593 (2015). https://doi.org/10.1007/s11042-014-1970-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-1970-x

Keywords

Navigation