Deep point-to-subspace metric learning for sketch-based 3D shape retrieval
Introduction
With the rapid development of 3D sensing techniques, 3D shape data has received increasing research interests in the field of computer vision. Since the volume of 3D shape data grows significantly, shape retrieval has been becoming a crucial problem for 3D shape data management [1], [2], [3], [4], [5], [6]. In its early year, a keyword is first labeled for each 3D shape, and is used as the query for retrieval [7], [8]. However, the keyword labeling is a time-consuming process, and is also impractical for the real-world applications, especially when dealing with large-scale datasets. Then, by using a 3D shape as query, considerable research has been devoted to the content-based 3D shape retrieval techniques. However, the acquisition of a query shape itself is difficult due to the nature of the 3D modality. Recently, the prevalence of touchscreen technologies (e.g., smart phones and tablet computers) enable the hand-drawing sketch a more convenient way for representing the user’s intention. Compared with using a keyword or 3D shape as query, the sketch-based 3D shape retrieval is more straightforward and thus easier to be implemented in practical applications [9], [10], [11], [12].
The hand-drawing sketches usually contain limited information and only reflect certain views of 3D shapes. As a result, obtaining a discriminative 3D shape features aiming to reduce the cross-modality discrepancy to sketch becomes a key issue. In order to extract 3D shape features, different 3D shape representations have been proposed. Recently, the point-cloud based [13], [14], [15], [16] and the multi-view based [17], [18], [19], [20] representations gradually become dominate choices. In particular, the multi-view based representations have achieved state-of-the-art performance so far [17], [18], [19], [20]. For this type of representations, the 3D shape is initially rendered by a family of 2D views, as shown in Fig. 1. On top of that, one can then leverage the well-established 2D image deep models (e.g., AlexNet [21], VGG [22] and ResNet [23]), which are pre-trained on large-scale datasets (e.g., ImageNet [24]), for feature extraction.
Despite the promising prospect of the sketch-based 3D shape retrieval, there still exists three major challenges which have been hindering its development. First, the free-hand sketch drawing is a subjective activity, resulting in large variation among different individuals. Second, the sketch and 3D shape have a large cross-modality discrepancy, which makes it difficult to obtain modality-independent features. Third, the sketch usually reflects certain view of a 3D shape, and the visual appearance of different views may vary significantly. Aiming to handle these problems, the existing methods can be coarsely categorized into traditional descriptor based [2], [25] and deep-learned descriptor based [26], [27]. The first kind methods commonly apply the hand-crafted or shallow-learned features to describe both sketches and 3D shapes for similarity measurement. Nevertheless, it is difficult to design discriminative feature descriptors applied for both sketches and 3D shapes due to the large cross-modality discrepancy [11]. In contrast, the second kind methods, which are based on the deep-learned features are considered to be more robust and with more discriminative power. It can better accommodate the cross-modality discrepancy, and attain an improved retrieval accuracy.
As mentioned above, the query sketch is only representative to part views of a 3D shape, and the unrepresentative views offer minor contribution or even be harmful for retrieval. However, many existing methods [20], [28], [29], [30] treat all the views equally without considering the viewpoint information. In order to resolve this problem, we propose a Deep Point-to-Subspace Metric Learning (DPSML) framework. First, a Representative-View Selection (RVS) module is applied to obtain several most representative 3D shape views, and then a subspace spanned by feature vectors from the selected views is generated for describing a 3D shape. Later, the similarity between a sketch and a 3D shape is defined as the distance between the sketch feature vector and its closest point in the spanned subspace by solving an optimization problem on the fly. Note that, the closest point is query-adaptive and can reflect the viewpoint information captured by the query sketch. Moreover, in order to efficiently learn a deep model, we formulate the representation learning problem as a classification problem without the pairwise sample learning process used by many existing methods [29], [31]. In summary, the proposed DPSML is an end-to-end framework, and its effectiveness and robustness are extensively demonstrated by a set of experiments on three widely used benchmark datasets i.e., SHREC 2013, 2014 and 2016.
The rest of this paper is organized as follows. Section 2 describes the related works which are representative to the proposed method. Then, we give a method overview. Section 3 presents a detailed explanation of the proposed framework. Section 4 provides the details of the used benchmark datasets, evaluation metrics and the implementation details. The experimental results, comparisons to the state-of-the-arts along with a discussion are provided in Section 5. Finally, Section 6 concludes this work.
Section snippets
Related works
The work in [12], [32] provided a comprehensive survey and comparison of the sketch-based 3D shape retrieval methods. In the following, we restrain the review to the representative methods closely related to this work. More specifically, we cover the traditional sketch-based 3D shape retrieval methods e.g., hand-crafted or shallow-learned features and the deep-learned descriptors for the task of 3D shape retrieval in Sections 2.1.1 and 2.1.2, respectively.
Methodology
As shown in Fig. 2, the proposed framework mainly contains three modules. First, the feature extraction module is described in Section 3.1. Then, the details of the proposed RVS module are given in Section 3.2. Last, the detailed explanation of the DPSML framework is described in Section 3.3.
Experimental setups
In order to demonstrate the effectiveness of the proposed method, we evaluate it on three public benchmark datasets, i.e., the SHREC 2013 [12], [33], SHREC 2014 [32], [46] and SHREC 2016 [47]. We first introduce the experimental setups, including the details of benchmark datasets and the used evaluation metrics. Next, we present the implementation details of our framework. Then, we calculate all the metrics to investigate the performance and compare our results against the state-of-the-arts.
Evaluation on the SHREC 2013 dataset
Our proposed method is based on an efficient point-to-subspace learning. In order to further improve the retrieval accuracy, a modified center learning method is used as part of the loss function. In order to demonstrate the effectiveness of RVS module, we compare the performance of the proposed method with different fusion operations, i.e., average pooling and FC-layer based feature. We also report the results with and without “center learning” method as described in the Section 3.3.2. Note
Conclusions
In this paper, we propose a novel DPSML framework for sketch-based 3D shape retrieval. First, the raw features for both sketches and 3D shapes (represented by 12 rendered views) are extracted via pre-trained deep models (AlexNet, VGG and ResNet). Second, a RVS module is introduced to reduce the redundancy of the rendered views and results in a set of most representative views. Then, the sketch is projected into a feature point and the 3D shape is projected into a subspace which is spanned by
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China (NSFC) with Grant Nos. 61403265, 61602499. This work was partially supported by the Key Research and Development Program of Sichuan Province (No. 2019YFG0409). Lingqiao Liu was in part supported by ARC DECRA Fellowship DE170101259. This work was also partially supported by the Fundamental Research Funds for the Central Universities (No. 18lgzd06).
Yinjie Lei received his M.S. degree from Sichuan University (SCU), China, with the area of Image Processing in 2009, and the Ph.D. degree in Computer Vision from University of Western Australia (UWA), Australia in 2013. He is currently an associate professor with the college of Electronics and Information Engineering at SCU. He serves as the vice dean of the College of Electronics and Information Engineering at SCU since 2017. His research interests mainly include deep learning, 3D biometrics,
References (53)
- et al.
A new 3D model retrieval approach based on the elevation descriptor
Pattern Recognit.
(2007) - et al.
Euclidean-distance-based canonical forms for non-rigid 3D shape retrieval
Pattern Recognit.
(2015) - et al.
Monocular 3D facial shape reconstruction from a single 2D image with coupled-dictionary learning and sparse coding
Pattern Recognit.
(2018) - et al.
Recognition of feature curves on 3D shapes using an algebraic approach to hough transforms
Pattern Recognit.
(2018) - et al.
Nasal similarity measure of 3D faces based on curve shape space
Pattern Recognit.
(2019) - et al.
Three-dimensional shape searching: state-of-the-art review and future trends
Comput.-Aided Des.
(2005) - et al.
A comparison of methods for sketch-based 3d shape retrieval
Comput. Vision Image Understanding
(2014) - et al.
Orthogonal moment-based descriptors for pose shape query on 3D point cloud patches
Pattern Recognit.
(2016) - et al.
A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries
Comput. Vision Image Understanding
(2015) - et al.
Learning shape retrieval from different modalities
Neurocomputing
(2017)
The princeton shape benchmark
Proceedings of the Shape Modeling Applications
A survey of content based 3D shape retrieval methods
Proceedings of Shape Modeling Applications
Sketch-based shape retrieval
ACM Trans. Graphics (TOG)
Learning semantic signatures for 3D object retrieval
IEEE Trans. Multimedia
Ranking on cross-domain manifold for sketch-based 3D model retrieval
Proceedings of the Cyberworlds (CW)
3D ShapeNets: a deep representation for volumetric shapes
Proceedings of the Computer Vision and Pattern Recognition (CVPR)
PointNet: a 3D convolutional neural network for real-time object class recognition
Proceedings of the International Joint Conference on Neural Networks (IJCNN)
Learning barycentric representations of 3D shapes for sketch-based 3D shape retrieval
Proceedings of the Computer Vision and Pattern Recognition (CVPR)
RotationNet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints
Proceedings of the Computer Vision and Pattern Recognition (CVPR)
GVCNN: group-view convolutional neural networks for 3D shape recognition
Proceedings of the Computer Vision and Pattern Recognition (CVPR)
PVNet: a joint convolutional network of point cloud and multi-view for 3D shape recognition
Proceedings of the ACM International Conference on Multimedia
ImageNet classification with deep convolutional neural networks
Proceedings of the Neural Information Processing Systems (NeurIPS)
Deep residual learning for image recognition
Proceedings of the Computer Vision and Pattern Recognition (CVPR)
ImageNet: a large-scale hierarchical image database
Proceedings of the Computer Vision and Pattern Recognition (CVPR)
Cited by (36)
Structure correspondence searching of CAD model using local feature-based description and indexing
2024, Pattern RecognitionSketch-based 3D shape retrieval via teacher–student learning
2024, Computer Vision and Image UnderstandingExpansion window local alignment weighted network for fine-grained sketch-based image retrieval
2023, Pattern RecognitionImproving point cloud classification and segmentation via parametric veronese mapping
2023, Pattern RecognitionHDA<sup>2</sup>L: Hierarchical Domain-Augmented Adaptive Learning for sketch-based 3D shape retrieval
2023, Knowledge-Based SystemsJFLN: Joint Feature Learning Network for 2D sketch based 3D shape retrieval
2022, Journal of Visual Communication and Image RepresentationCitation Excerpt :Additionally, a cross-modal similarity model is employed for feature matching between different modalities, which effectively improves the cross-modal retrieval accuracy. Lei et al. [37] proposed a DPSML framework to project the sketch features into points, while the shape descriptors are mapped into a subspace. The similarity of cross-modal samples are defined as the distance between points and subspace.
Yinjie Lei received his M.S. degree from Sichuan University (SCU), China, with the area of Image Processing in 2009, and the Ph.D. degree in Computer Vision from University of Western Australia (UWA), Australia in 2013. He is currently an associate professor with the college of Electronics and Information Engineering at SCU. He serves as the vice dean of the College of Electronics and Information Engineering at SCU since 2017. His research interests mainly include deep learning, 3D biometrics, object recognition and semantic segmentation.
Ziqin Zhou received her bachelor’s degree from Sichuan University (SCU), China, in 2017. She is currently pursuing the M.S. degree with the Electronics and Information Engineering at SCU. Her current research interests include 3D shape analysis and neural architecture search.
Pingping Zhang received his B.E. degree in mathematics and applied mathematics, Henan Normal University (HNU), Xinxiang, China, in 2012. He is currently a Ph.D. candidate in the School of Information and Communication Engineering, Dalian University of Technology (DUT), Dalian, China. His research interests include deep learning, saliency detection, object tracking and semantic segmentation.
Yulan Guo received the B.Eng. and Ph.D. degrees from National University of Defense Technology (NUDT) in 2008 and 2015, respectively. He was a visiting Ph.D. student with the University of Western Australia from 2011 to 2014. He is currently an Assistant Professor with the College of Electronic Science, NUDT. He has authored over 60 articles in journals and conferences, such as the IEEE TPAMI and IJCV. His current research interests focus on 3D vision, particularly on 3D feature learning, 3D modeling, 3D object recognition, and 3D biometrics. Dr. Guo received the NUDT Outstanding Doctoral Dissertation Award in 2015 and the CAAI Outstanding Doctoral Dissertation Award in 2016. He served as an associate editor for IET Computer Vision, a guest editor for IEEE TPAMI, a PC member for several international conferences (e.g., ACM MM, IJCAI, AAAI), and a reviewer for over 30 international journals and conferences.
Zijun Ma received her bachelor’s degree from Sichuan University (SCU), China in 2018. She is currently pursuing the M.S. degree with the Electronics and Information Engineering at SCU. Her current research interests include 3D shape analysis and deep model compression.
- 1
The second author has the equal contribution as the first author for this work.