Abstract
In this paper, we introduce an effective information fusion method using multimodal hashing with discriminant canonical correlation maximization. As an effective computation method of similarity between different inputs, multimodal hashing technique has attracted increasing attentions in fast similarity search. In this paper, the proposed approach not only finds the minimum of the semantic similarity across different modalities by multimodal hashing, but also is capable of extracting the discriminant representations, which minimize the between-class correlation and maximize the within-class correlation simultaneously for information fusion. Benefiting from the combination of semantic similarity across different modalities and the discriminant representation strategy, the proposed algorithm can achieve improved performance. A prototype of the proposed method is implemented to demonstrate its performance in audio emotion recognition and cross-modal (text-image) fusion. Experimental results show that the proposed approach outperforms the related methods, in terms of accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Guan, L., Wang, Y., Zhang, R., Tie, Y., Bulzacki, A., Ibrahim, M.: Multimodal information fusion for selected multimedia applications. Int. J, Multimedia Intell. Secur. 1(1), 5–32 (2010)
Balazs, J.A., Velsquez, J.D.: Opinion mining and information fusion: a survey. Inf. Fusion 27, 95–110 (2016)
Ma, J., Ma, Y., Li, C.: Infrared and visible image fusion methods and applications: a survey. Inf. Fusion 45, 153–178 (2019)
Suk, H.-I., Lee, S.-W.: A novel bayesian framework for discriminative feature extraction in brain-computer interfaces. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 286–299 (2013)
Gao, L., Qi, L., Chen, E., Guan, L.: Discriminative multiple canonical correlation analysis forvinformation fusion. IEEE Trans. Image Process. 27(4), 1951–1965 (2018)
Zhang, J.G., Huang, K.Q., et al.: Df2Net: a discriminative feature learning and fusion network for RGB-D indoor scene classification. In: 2018 AAAI, pp. 7041–7048 (2018)
Zhang, D., Li, W.-J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: 2014 AAAI, vol. 1, pp. 1–7 (2014)
Jin, L., Li, K., Hao, H., Qi, G.-J., Tang, J.: Semantic neighbor graph hashing for multimodal retrieval. IEEE Trans. Image Process. 27(3), 1405–1417 (2018)
Tang, J., Li, Z.: Weakly supervised multimodalhashing forscalablesocial imageretrieval. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2730–2741 (2018)
Zhen, Y., Yeung, D.-Y.: A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 940–948 (2012)
Wei, Y., et al.: Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans. Cybern. 47(2), 449–460 (2017)
Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: 2008 CVPR, pp. 1–8 (2008)
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Wollmer, M., Kaiser, M., Eyben, F., Schuller, B.R., Rigoll, G.: LSTM-modeling of continuous emotions in an audio-visual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)
Wang, Y., Guan, L.: Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10(5), 936–946 (2008)
Wang, Y., Guan, L., Venetsanopoulos, A.N.: Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Trans. Multimedia 14(3), 597–607 (2012)
Zeshui, X., Zhao, N.: Information fusion for intuitionistic fuzzy decision making: an overview. Inf. Fusion 28, 10–23 (2016)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Shekar, B.H., Sharmila Kumari, M., Mestetskiy, L.M., Dyshkant, N.F.: Face recognition using kernel entropy component analysis. Neurocomputing 74(6), 1053–1057 (2011)
Cho, S., Jiang, J.: Optimal fault classification using fisher discriminant analysis in the parity space for applications to NPPs. IEEE Trans. Nucl. Sci. 65(3), 856–865 (2018)
Gao, L., Zhang, R., Qi, L., Chen, E., Guan, L.: The labeled multiple canonical correlation analysis for information fusion. IEEE Trans. Multimedia 21(2), 375–387 (2019)
Zhang, S., Zhang, S., Huang, T., Gao, W., Tian, Q.: Learning affective features with a hybrid deep model for audio-visual emotion recognition. IEEE Trans. Circuits Syst. Video Technol. 28(10), 3030–3043 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gao, L., Guan, L. (2019). Information Fusion via Multimodal Hashing with Discriminant Canonical Correlation Maximization. In: Karray, F., Campilho, A., Yu, A. (eds) Image Analysis and Recognition. ICIAR 2019. Lecture Notes in Computer Science(), vol 11663. Springer, Cham. https://doi.org/10.1007/978-3-030-27272-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-27272-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27271-5
Online ISBN: 978-3-030-27272-2
eBook Packages: Computer ScienceComputer Science (R0)