Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images

Zheng, Yu; Gao, Xinbo

doi:10.1007/s11042-016-3423-1

Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images

Published: 12 March 2016

Volume 76, pages 4427–4443, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

651 Accesses
6 Citations
3 Altmetric
Explore all metrics

Abstract

The traditional scene analysis mainly focuses on outdoor scene recognition rather than indoor scene understanding. However, with the widespread use of depth cameras, we have a new opportunity to handle the indoor scene recognition problem. In this paper, we propose a multi-task metric multi-kernel learning algorithm that exploits the inter-source similarities and complementarities between color images and depth images to conduct the indoor scene recognition. Specifically, our method utilize multi-task metric learning to learn a Mahalanobis metric for RGB-D images. Multi-task metric learning can extract the common properties from color images and depth images to learn better metrics. Furthermore, the learned metrics are employed to transform features to a correcting feature space for obtaining a better representation. By exploiting multi-kernel learning, our method can leverage multiple feature representations to train a more discriminative classifier. We conduct experiments on NYU Depth Dataset and B3DO Dataset to evaluate the effectiveness of our approach. The experimental results have demonstrated that our proposed method can lead to better indoor scene recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of object detection based on deep learning

Article 12 June 2020

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Deep Learning for Generic Object Detection: A Survey

Article Open access 31 October 2019

References

Barron JT, Malik J (2013) Intrinsic scene properties from a single RGB-d image. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):17-24
Bo L, Lai K, Ren X, Fox D (2011) Object recognition with hierarchical kernel descriptors. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):1729-1736
Cai F, Cherkassky V (2012) Generalized SMO algorithm for SVM-based multitask learning. IEEE Transactions on Neural Networks and Learning Systems 23(6):997–1003
Article Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Article Google Scholar
Cruz L, Lucio D, Velho L (2012) Kinect and rgbd images: challenges and applications. In: Proceedings of IEEE international conference on graphics, patterns and images tutorials (SIBGRAPI-t):36-49
Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: Proceedings of international conference on machine learning (ICML):209-216
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: A deep convolutional activation feature for generic visual recognition. arXiv:1310.1531
Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res:615–637
Fan H, Yang M, Cao Z, Jiang Y, Yin Q (2014) Learning compact face representation: Packing a face into an int32. In: Proceedings of ACM international conference on multimedia (ACM MM):933–936
Gao X, Gao F, Tao D, Li X (2013) Universal blind image quality assessment metrics via natural scene statistics and multiple kernel learning. IEEE Transactions on Neural Networks and Learning Systems 24(12):2013–2026
Article Google Scholar
Gonen M, Alpaydin E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
MathSciNet MATH Google Scholar
Gould S, Fulton R, Koller D (2009) Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of IEEE international conference on computer vision (ICCV):1–8
Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from RGB-d images. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):564-571
Gupta S, Girshick R, Arbelez P, Malik J (2014) Learning rich features from RGB-d images for object detection and segmentation. In: Proceedings of european conference on computer vision (ECCV):345-360
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with Microsoft Kinect sensor: A review. IEEE Transactions on Cybernetics 43(5):1318–1334
Article Google Scholar
Han J, Pauwels EJ, De Zeeuw PM, De With PH (2012) Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Transactions on Consumer Electronics 58(2):255–263
Article Google Scholar
He X, Zemel RS, Carreira-Perpinan M (2004) Multiscale conditional random fields for image labeling. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):695-702
Janoch A, Karayev S, Jia Y, Barron JT, Fritz M, Saenko K, Darrell T (2013) A category-level 3d object dataset: Putting the kinect to work. Consumer Depth Cameras for Computer Vision:141–165
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of ACM international conference on multimedia (ACM MM):675-678
Jiang F, Zhang S, Wu S, Gao Y, Zhao D (2015) Multi-layered gesture recognition with kinect. J Mach Learn Res 16(1):227–254
MathSciNet Google Scholar
Khosla A, An B, Lim JJ, Torralba A (2014) Looking beyond the visible scene. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):3710–3717
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (NIPS):1097–1105
Kulis B (2012) Metric learning: a survey. Foundations and Trends in Machine Learning 5(4):287–364
Article MATH Google Scholar
Kumar MP, Torr PHS, Zisserman A (2007) An invariant large margin nearest neighbour classifier. In: Proceedings of IEEE international conference on computer vision (ICCV):1-8
Lapin M, Schiele B, Hein M (2014) Scalable multitask representation learning for scene classification. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):1434-1441
Li LJ, Su H, Lim Y, Fei-Fei L (2012) Objects as attributes for scene classification. Trends and Topics in Computer Vision:57–69
Lin D, Fidler S, Urtasun R (2013) Holistic scene understanding for 3d object detection with rgbd cameras. In: Proceedings of IEEE international conference on computer vision (ICCV):1417-1424
Ming Y, Ruan Q, Hauptmann AG (2012) Activity recognition from rgb-d camera with 3d local spatio-temporal features. In: Proceedings of IEEE international conference on multimedia and expo (ICME):344-349
Niu Z, Hua G, Gao X, Tian Q (2012) Context aware topic model for scene recognition. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):2743-2750
Ouyang W, Wang X, Zeng X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Loy CC, Tang X (2015) Deepid-net: deformable deep convolutional neural networks for object detection. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):2403-2412
Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models. In: Proceedings of IEEE international conference on computer vision (ICCV):1307-1314
Parameswaran S, Weinberger KQ (2010) Large margin multi-task metric learning. Advances in Neural Information Processing Systems (NIPS):1867–1875
Qian Q, Jin R, Zhu S, Lin Y (2015) Fine-grained Visual Categorization via Multi-stage Metric Learning. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):3716-3724
Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2008) SimpleMKL. J Mach Learn Res 9:2491–2521
MathSciNet MATH Google Scholar
Ramirez I, Sprechmann P, Sapiro G (2010) Classification and clustering via dictionary learning with structured incoherence and shared features. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):3501-3508
Ren X, Bo L, Fox D (2012) Rgb-(d) scene labeling: Features and algorithms. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR): 2759-2766
Shao J, Kang K, Loy CC, Wang X (2015) Deeply learned attributes for crowded scene understanding. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):4657–4666
Shao T, Xu W, Zhou K, Wang J, Li D, Guo B (2012) An interactive approach to semantic modeling of indoor scenes with an rgbd camera. ACM Trans Graph 31(6):136
Article Google Scholar
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: Proceedings of european conference on computer vision (ECCV):746-760
Song X, Jiang S, Herranz L (2015) Joint multi-feature spatial context for scene recognition in the semantic manifold. In: Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR):1312-1320
Wallraven C, Caputo B, Graf A (2003) Recognition with local features: the kernel recipe. In: Proceedings of IEEE international conference on computer vision (ICCV):257-264
Wan J, Ruan Q, Li W, Deng S (2013) One-shot learning gesture recognition from RGB-d data using bag of features. J Mach Learn Res 14(1):2549–2582
Google Scholar
Wan S, Hu C, Aggarwal JK (2014) Indoor scene recognition from RGB-d images by learning scene bases. In: Proceedings of IEEE international conference on pattern recognition (ICPR):3416-3421
Wang A, Lu J, Wang G, Cai J, Cham TJ (2014) Multi-modal unsupervised feature learning for RGB-d scene labeling. In: Proceedings of european conference on computer vision (ECCV):453–467
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
MATH Google Scholar
Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with application to clustering with side-information. Advances in Neural Information Processing Systems (NIPS):505–512
Yan Y, Ricci E, Liu G, Subramanian R, Sebe N (2014) Clustered multi-task linear discriminant analysis for view invariant color-depth action recognition. In: Proceedings of IEEE international conference on pattern recognition (ICPR):3493-3498
Yu M, Liu L, Shao L (2015) Structure-preserving binary representations for RGB-D action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, doi:10.1109/TPAMI.2015.2491925

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61432014, in part by the Program for Changjiang Scholars and Innovative Research Team in University of China under Grant IRT13088 and in part by the Shaanxi Innovative Research Team for Key Science and Technology under Grant 2012KCT-02.

Author information

Authors and Affiliations

School of Electronic Engineering, Xidian University, Xi’an, 710071, China
Yu Zheng & Xinbo Gao

Authors

Yu Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xinbo Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinbo Gao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, Y., Gao, X. Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images. Multimed Tools Appl 76, 4427–4443 (2017). https://doi.org/10.1007/s11042-016-3423-1

Download citation

Received: 31 October 2015
Revised: 03 February 2016
Accepted: 01 March 2016
Published: 12 March 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s11042-016-3423-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images

Abstract

Access this article

Similar content being viewed by others

A review of object detection based on deep learning

Image Matching from Handcrafted to Deep Features: A Survey

Deep Learning for Generic Object Detection: A Survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images

Abstract

Access this article

Similar content being viewed by others

A review of object detection based on deep learning

Image Matching from Handcrafted to Deep Features: A Survey

Deep Learning for Generic Object Detection: A Survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation