Deep point-to-subspace metric learning for sketch-based 3D shape retrieval

doi:10.1016/j.patcog.2019.106981

Pattern Recognition

Volume 96, December 2019, 106981

https://doi.org/10.1016/j.patcog.2019.106981 Get rights and content

Highlights

•
A representative-view selection (RVS) module is designed to identify the most representative views of a 3D shape for reducing the redundancy.
•
A deep point-to-subspace metric learning (DPSML) module is proposed to calculate the query-adaptive similarity for sketch-based 3D shape retrieval.
•
The representation learning problem is formulated as a classification problem with a specially designed classifier and training loss.
•
State-of-the-art performance on SHREC 2013, 2014 and 2016 benchmarks are achieved.

Abstract

One key issue in managing a large scale 3D shape dataset is to identify an effective way to retrieve a shape-of-interest. The sketch-based query, which enjoys the flexibility in representing the user’s intention, has received growing interests in recent years due to the popularization of the touchscreen technology. Essentially, the sketch depicts an abstraction of a shape in a certain view while the shape contains the full 3D information. Matching between them is a cross-modality retrieval problem, and the state-of-the-art solution is to project the sketch and the 3D shape into a common space with which the cross-modality similarity can be calculated by the feature similarity/distance within. However, for a given query, only part of the viewpoints of the 3D shape is representative. Thus, blindly projecting a 3D shape into a feature vector without considering what is the query will inevitably bring query-unrepresentative information. To handle this issue, in this work we propose a Deep Point-to-Subspace Metric Learning (DPSML) framework to project a sketch into a feature vector and a 3D shape into a subspace spanned by a few selected basis feature vectors. The similarity between them is defined as the distance between the query feature vector and its closest point in the subspace by solving an optimization problem on the fly. Note that, the closest point is query-adaptive and can reflect the viewpoint information that is representative to the given query. To efficiently learn such a deep model, we formulate it as a classification problem with a special classifier design. To reduce the redundancy of 3D shapes, we also introduce a Representative-View Selection (RVS) module to select the most representative views of a 3D shape. By conducting extensive experiments on various datasets, we show that the proposed method can achieve superior performance over its competitive baseline methods and attain the state-of-the-art performance.

Introduction

With the rapid development of 3D sensing techniques, 3D shape data has received increasing research interests in the field of computer vision. Since the volume of 3D shape data grows significantly, shape retrieval has been becoming a crucial problem for 3D shape data management [1], [2], [3], [4], [5], [6]. In its early year, a keyword is first labeled for each 3D shape, and is used as the query for retrieval [7], [8]. However, the keyword labeling is a time-consuming process, and is also impractical for the real-world applications, especially when dealing with large-scale datasets. Then, by using a 3D shape as query, considerable research has been devoted to the content-based 3D shape retrieval techniques. However, the acquisition of a query shape itself is difficult due to the nature of the 3D modality. Recently, the prevalence of touchscreen technologies (e.g., smart phones and tablet computers) enable the hand-drawing sketch a more convenient way for representing the user’s intention. Compared with using a keyword or 3D shape as query, the sketch-based 3D shape retrieval is more straightforward and thus easier to be implemented in practical applications [9], [10], [11], [12].

The hand-drawing sketches usually contain limited information and only reflect certain views of 3D shapes. As a result, obtaining a discriminative 3D shape features aiming to reduce the cross-modality discrepancy to sketch becomes a key issue. In order to extract 3D shape features, different 3D shape representations have been proposed. Recently, the point-cloud based [13], [14], [15], [16] and the multi-view based [17], [18], [19], [20] representations gradually become dominate choices. In particular, the multi-view based representations have achieved state-of-the-art performance so far [17], [18], [19], [20]. For this type of representations, the 3D shape is initially rendered by a family of 2D views, as shown in Fig. 1. On top of that, one can then leverage the well-established 2D image deep models (e.g., AlexNet [21], VGG [22] and ResNet [23]), which are pre-trained on large-scale datasets (e.g., ImageNet [24]), for feature extraction.

Despite the promising prospect of the sketch-based 3D shape retrieval, there still exists three major challenges which have been hindering its development. First, the free-hand sketch drawing is a subjective activity, resulting in large variation among different individuals. Second, the sketch and 3D shape have a large cross-modality discrepancy, which makes it difficult to obtain modality-independent features. Third, the sketch usually reflects certain view of a 3D shape, and the visual appearance of different views may vary significantly. Aiming to handle these problems, the existing methods can be coarsely categorized into traditional descriptor based [2], [25] and deep-learned descriptor based [26], [27]. The first kind methods commonly apply the hand-crafted or shallow-learned features to describe both sketches and 3D shapes for similarity measurement. Nevertheless, it is difficult to design discriminative feature descriptors applied for both sketches and 3D shapes due to the large cross-modality discrepancy [11]. In contrast, the second kind methods, which are based on the deep-learned features are considered to be more robust and with more discriminative power. It can better accommodate the cross-modality discrepancy, and attain an improved retrieval accuracy.

As mentioned above, the query sketch is only representative to part views of a 3D shape, and the unrepresentative views offer minor contribution or even be harmful for retrieval. However, many existing methods [20], [28], [29], [30] treat all the views equally without considering the viewpoint information. In order to resolve this problem, we propose a Deep Point-to-Subspace Metric Learning (DPSML) framework. First, a Representative-View Selection (RVS) module is applied to obtain several most representative 3D shape views, and then a subspace spanned by feature vectors from the selected views is generated for describing a 3D shape. Later, the similarity between a sketch and a 3D shape is defined as the distance between the sketch feature vector and its closest point in the spanned subspace by solving an optimization problem on the fly. Note that, the closest point is query-adaptive and can reflect the viewpoint information captured by the query sketch. Moreover, in order to efficiently learn a deep model, we formulate the representation learning problem as a classification problem without the pairwise sample learning process used by many existing methods [29], [31]. In summary, the proposed DPSML is an end-to-end framework, and its effectiveness and robustness are extensively demonstrated by a set of experiments on three widely used benchmark datasets i.e., SHREC 2013, 2014 and 2016.

The rest of this paper is organized as follows. Section 2 describes the related works which are representative to the proposed method. Then, we give a method overview. Section 3 presents a detailed explanation of the proposed framework. Section 4 provides the details of the used benchmark datasets, evaluation metrics and the implementation details. The experimental results, comparisons to the state-of-the-arts along with a discussion are provided in Section 5. Finally, Section 6 concludes this work.

Section snippets

Related works

The work in [12], [32] provided a comprehensive survey and comparison of the sketch-based 3D shape retrieval methods. In the following, we restrain the review to the representative methods closely related to this work. More specifically, we cover the traditional sketch-based 3D shape retrieval methods e.g., hand-crafted or shallow-learned features and the deep-learned descriptors for the task of 3D shape retrieval in Sections 2.1.1 and 2.1.2, respectively.

Methodology

As shown in Fig. 2, the proposed framework mainly contains three modules. First, the feature extraction module is described in Section 3.1. Then, the details of the proposed RVS module are given in Section 3.2. Last, the detailed explanation of the DPSML framework is described in Section 3.3.

Experimental setups

In order to demonstrate the effectiveness of the proposed method, we evaluate it on three public benchmark datasets, i.e., the SHREC 2013 [12], [33], SHREC 2014 [32], [46] and SHREC 2016 [47]. We first introduce the experimental setups, including the details of benchmark datasets and the used evaluation metrics. Next, we present the implementation details of our framework. Then, we calculate all the metrics to investigate the performance and compare our results against the state-of-the-arts.

Evaluation on the SHREC 2013 dataset

Our proposed method is based on an efficient point-to-subspace learning. In order to further improve the retrieval accuracy, a modified center learning method is used as part of the loss function. In order to demonstrate the effectiveness of RVS module, we compare the performance of the proposed method with different fusion operations, i.e., average pooling and FC-layer based feature. We also report the results with and without “center learning” method as described in the Section 3.3.2. Note

Conclusions

In this paper, we propose a novel DPSML framework for sketch-based 3D shape retrieval. First, the raw features for both sketches and 3D shapes (represented by 12 rendered views) are extracted via pre-trained deep models (AlexNet, VGG and ResNet). Second, a RVS module is introduced to reduce the redundancy of the rendered views and results in a set of most representative views. Then, the sketch is projected into a feature point and the 3D shape is projected into a subspace which is spanned by

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China (NSFC) with Grant Nos. 61403265, 61602499. This work was partially supported by the Key Research and Development Program of Sichuan Province (No. 2019YFG0409). Lingqiao Liu was in part supported by ARC DECRA Fellowship DE170101259. This work was also partially supported by the Fundamental Research Funds for the Central Universities (No. 18lgzd06).

Yinjie Lei received his M.S. degree from Sichuan University (SCU), China, with the area of Image Processing in 2009, and the Ph.D. degree in Computer Vision from University of Western Australia (UWA), Australia in 2013. He is currently an associate professor with the college of Electronics and Information Engineering at SCU. He serves as the vice dean of the College of Electronics and Information Engineering at SCU since 2017. His research interests mainly include deep learning, 3D biometrics,

References (53)

J. Shih et al.
A new 3D model retrieval approach based on the elevation descriptor
Pattern Recognit.
(2007)
D. Pickup et al.
Euclidean-distance-based canonical forms for non-rigid 3D shape retrieval
Pattern Recognit.
(2015)
P. Dou et al.
Monocular 3D facial shape reconstruction from a single 2D image with coupled-dictionary learning and sparse coding
Pattern Recognit.
(2018)
M.-L. Torrente et al.
Recognition of feature curves on 3D shapes using an algebraic approach to hough transforms
Pattern Recognit.
(2018)
C. Lv et al.
Nasal similarity measure of 3D faces based on curve shape space
Pattern Recognit.
(2019)
N. Iyer et al.
Three-dimensional shape searching: state-of-the-art review and future trends
Comput.-Aided Des.
(2005)
B. Li et al.
A comparison of methods for sketch-based 3d shape retrieval
Comput. Vision Image Understanding
(2014)
H. Cheng et al.
Orthogonal moment-based descriptors for pose shape query on 3D point cloud patches
Pattern Recognit.
(2016)
B. Li et al.
A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries
Comput. Vision Image Understanding
(2015)
H. Tabia et al.
Learning shape retrieval from different modalities
Neurocomputing
(2017)

P. Shilane et al.

The princeton shape benchmark

Proceedings of the Shape Modeling Applications

(2004)

J.W. Tangelder et al.

A survey of content based 3D shape retrieval methods

Proceedings of Shape Modeling Applications

(2004)

M. Eitz et al.

Sketch-based shape retrieval

ACM Trans. Graphics (TOG)

(2012)

B. Gong et al.

Learning semantic signatures for 3D object retrieval

IEEE Trans. Multimedia

(2013)

T. Furuya et al.

Ranking on cross-domain manifold for sketch-based 3D model retrieval

Proceedings of the Cyberworlds (CW)

(2013)

Z. Wu et al.

3D ShapeNets: a deep representation for volumetric shapes

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

(2015)

A. Garcia Garcia et al.

PointNet: a 3D convolutional neural network for real-time object class recognition

Proceedings of the International Joint Conference on Neural Networks (IJCNN)

(2016)

X. Liu, Z. Han, Y. Liu, M. Zwicker, Point2Sequence: learning the shape representation of 3D point clouds with an...

J. Xie et al.

Learning barycentric representations of 3D shapes for sketch-based 3D shape retrieval

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

(2017)

A. Kanezaki et al.

RotationNet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

(2018)

Y. Feng et al.

GVCNN: group-view convolutional neural networks for 3D shape recognition

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

(2018)

H. You et al.

PVNet: a joint convolutional network of point cloud and multi-view for 3D shape recognition

Proceedings of the ACM International Conference on Multimedia

(2018)

A. Krizhevsky et al.

ImageNet classification with deep convolutional neural networks

Proceedings of the Neural Information Processing Systems (NeurIPS)

(2012)

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556v1...

K. He et al.

Deep residual learning for image recognition

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

(2016)

J. Deng et al.

ImageNet: a large-scale hierarchical image database

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

(2009)

Cited by (36)

Structure correspondence searching of CAD model using local feature-based description and indexing
2024, Pattern Recognition
The CAD model retrieval has played a significant role in various applications, including product development and knowledge mining. However, most existing retrieval methods compare 3D shape similarity from a global perspective, while detecting similar structures automatically for CAD models remains a challenging problem. Consequently, this study proposes a structure correspondence searching framework for CAD models to address the issues. According to the boundary representation (B-rep) information, the proposed method first segments a CAD model into a set of local features denoted as structural cells. Then, the descriptor of each structural cell is extracted using a weighted shape distribution vector and neighbor set. In order to speed up the matching of structural cells, an indexing and filtering mechanism is constructed based on the shape clustering and topological analysis. The matched structural cells determine the boundary of similar structures. Finally, similarity measurement is conducted to generate a ranking list by analyzing the quality of the matched structural cells. The rationality and efficiency of the proposed approach are demonstrated via an analysis of experimental results.
Sketch-based 3D shape retrieval via teacher–student learning
2024, Computer Vision and Image Understanding
One of the main difficulties of sketch-based 3D shape retrieval is the significant cross-modal difference between 2D sketches and 3D shapes. Most previous works adopt one-stage methods to directly learn the aligned common embedding space of sketches and shapes by a shared classifier. However, the intra-class difference of the sketch is more significant than the shape, harming the feature learning of 3D shapes when the two modalities are considered under the shared classifier. This issue harms the discrimination of the learned common embedding space. This paper proposes a novel two-stage method to learn a common aligned embedding space via teacher–student learning to address the issue. Specifically, we first employ a classification network to learn the discriminative features of shapes. The learned shape features are considered a teacher to guide the feature learning of sketches. Moreover, we design a guidance loss to achieve the feature transfer with semantic alignment. The proposed method achieves an effective, aligned cross-modal embedding space. Experiments on three public benchmark datasets prove the superiority of the proposed method over state-of-the-art methods.
Expansion window local alignment weighted network for fine-grained sketch-based image retrieval
2023, Pattern Recognition
Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) is a worthwhile task, which can be useful in many scenarios like recommendation systems, receiving a great deal of attention. In this study, we analyze challenges faced in FG-SBIR and propose a novel Expansion Window Local Alignment Weighted Network (EWLAW-Net). Specifically, it contains two main components: the Expansion Window Local Alignment module (EWLA) and the Local Weighted Fusion module (LWF). The EWLA module adopts an expansion window mechanism to align local features extracted from the backbone with the same semantic meaning between photos and sketches. The LWF module assigns weights to each local feature of the sketch after evaluating their importance and fuses them to calculate the similarity between the sketch and photos for retrieval. Experiments are conducted on five datasets and the results demonstrate the effectiveness of the proposed method.
Improving point cloud classification and segmentation via parametric veronese mapping
2023, Pattern Recognition
Deep learning based 3D point cloud classification and segmentation has achieved remarkable success. Existing methods are usually implemented in the original space with 3D coordinates as inputs. However, we find that point networks taking only information of first-order coordinates hardly learn geometric features of higher order, such as point cloud normals or poses. In this study, we propose to map the input point clouds into a non-linear space to facilitate networks learning and leveraging high-order features. Firstly, we design the Parametric Veronese Mapping (PVM) function which automatically learns to map point clouds into a non-linear space. As a result, the mapped point clouds are enriched with high-order elements and maintain the basic point set properties as in the original 3D space. We can then exploit existing networks to learn high-order features from mapped point clouds. Secondly, we contribute a two-stage transformation learning module that modifies the previous one-stage module to better leverage high-order features for aligning point clouds in the projective space. Finally, an interaction module is designed to learn more discriminative features by aggregating information from both the original and projective space. Extensive experiments demonstrate that our method successfully improves the ability of most existing networks to learn high-order features and thus contributing to more accurate classification and segmentation. Moreover, the resulting models show stronger robustness to affine transformations and real-world perturbations.
HDA<sup>2</sup>L: Hierarchical Domain-Augmented Adaptive Learning for sketch-based 3D shape retrieval
2023, Knowledge-Based Systems
The sketch-based 3D shape retrieval has been an active but challenging task for several decades. In this paper, we deeply analyze the challenges and propose a novel Hierarchical Domain-Augmented Adaptive Learning (HDA $^{2}$ L) for sketch-based 3D shape retrieval. The first notable challenge is the vast cross-modality discrepancies between sketches and 3D shapes. To this issue, the existing methods restrict the consistency of the final features by establishing a shared cross-domain loss but ignore the feature extraction process, resulting in a limited effect. We argue that the mutual information of the samples from the same class but different domains can provide an important cue to enhance common features captured in the feature extraction process. Thus, we design an Inter-Domain Augmented Network (Inter-DAN) by employing inter-domain feature correlation learning to capture cross-domain mutual information to learn augmented common global features for both sketches and 3D shapes. Another challenge is that the input sketch is various: it may be particularly abstract and contains only the overall outline of the target model, or it may be too sketchy and only contains some salient local regions of the target model. Though existing methods have demonstrated their capability to capture overall features, they always ignore the learning of local discriminative features and fail to adapt to the various changes of input sketches. To address this issue, we design an Intra-Domain Augmented Network (Intra-DAN) for sketches and 3D shapes, respectively, which learns augmented local discriminative features by adopting cascading cross-layer bilinear pooling operations. In addition, we design a Source-Agnostic Adversarial Network (SAAN) to accomplish the adaptive hierarchical domain features fusion, which forces the network to adaptively focus on more discriminative information from global features and local features and further adapt the diversity of the input sketches. The experiments on three benchmark datasets demonstrate that our method obtains superior retrieval performance than the state-of-the-art sketch-based 3D shape retrieval approaches.
JFLN: Joint Feature Learning Network for 2D sketch based 3D shape retrieval
2022, Journal of Visual Communication and Image Representation
Citation Excerpt :
Additionally, a cross-modal similarity model is employed for feature matching between different modalities, which effectively improves the cross-modal retrieval accuracy. Lei et al. [37] proposed a DPSML framework to project the sketch features into points, while the shape descriptors are mapped into a subspace. The similarity of cross-modal samples are defined as the distance between points and subspace.
Cross-modal retrieval attracts much research attention due to its wide applications in numerous search systems. Sketch based 3D shape retrieval is a typical challenging cross-modal retrieval task for the huge divergence between sketch modality and 3D shape view modality. Existing approaches project the sketches and shapes into a common space for feature update and data alignment. However, these methods contain several disadvantages: Firstly, the majority approaches ignore the modality-shared information for divergence compensation in descriptor generation process. Secondly, traditional fusion method of multi-view features introduces much redundancy, which decreases the discrimination of shape descriptors. Finally, most approaches only focus on the cross-modal alignment, which omits the modality-specific data relevance. To address these limitations, we propose a Joint Feature Learning Network (JFLN). Firstly, we design a novel modality-shared feature extraction network to exploit both modality-specific characteristics and modality-shared information for descriptor generation. Subsequently, we introduce a hierarchical view attention module to gradually focus on the effective information for multiview feature updating and aggregation. Finally, we propose a novel cross-modal feature learning network, which can simultaneously contribute to modality-specific data distribution and cross-modal data alignment. We conduct exhaustive experiments on three public databases. The experimental results validate the superiority of the proposed method. Full Codes are available at https://github.com/dlmuyy/JFLN.

View all citing articles on Scopus

Ziqin Zhou received her bachelor’s degree from Sichuan University (SCU), China, in 2017. She is currently pursuing the M.S. degree with the Electronics and Information Engineering at SCU. Her current research interests include 3D shape analysis and neural architecture search.

Pingping Zhang received his B.E. degree in mathematics and applied mathematics, Henan Normal University (HNU), Xinxiang, China, in 2012. He is currently a Ph.D. candidate in the School of Information and Communication Engineering, Dalian University of Technology (DUT), Dalian, China. His research interests include deep learning, saliency detection, object tracking and semantic segmentation.

Yulan Guo received the B.Eng. and Ph.D. degrees from National University of Defense Technology (NUDT) in 2008 and 2015, respectively. He was a visiting Ph.D. student with the University of Western Australia from 2011 to 2014. He is currently an Assistant Professor with the College of Electronic Science, NUDT. He has authored over 60 articles in journals and conferences, such as the IEEE TPAMI and IJCV. His current research interests focus on 3D vision, particularly on 3D feature learning, 3D modeling, 3D object recognition, and 3D biometrics. Dr. Guo received the NUDT Outstanding Doctoral Dissertation Award in 2015 and the CAAI Outstanding Doctoral Dissertation Award in 2016. He served as an associate editor for IET Computer Vision, a guest editor for IEEE TPAMI, a PC member for several international conferences (e.g., ACM MM, IJCAI, AAAI), and a reviewer for over 30 international journals and conferences.

Zijun Ma received her bachelor’s degree from Sichuan University (SCU), China in 2018. She is currently pursuing the M.S. degree with the Electronics and Information Engineering at SCU. Her current research interests include 3D shape analysis and deep model compression.

¹: The second author has the equal contribution as the first author for this work.

View full text

Deep point-to-subspace metric learning for sketch-based 3D shape retrieval

Highlights

Abstract

Introduction

Section snippets

Related works

Methodology

Experimental setups

Evaluation on the SHREC 2013 dataset

Conclusions

Acknowledgment

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Comput.-Aided Des.

Comput. Vision Image Understanding

Pattern Recognit.

Comput. Vision Image Understanding

Neurocomputing

The princeton shape benchmark

Proceedings of the Shape Modeling Applications

A survey of content based 3D shape retrieval methods

Proceedings of Shape Modeling Applications

Sketch-based shape retrieval

ACM Trans. Graphics (TOG)

Learning semantic signatures for 3D object retrieval

IEEE Trans. Multimedia

Ranking on cross-domain manifold for sketch-based 3D model retrieval

Proceedings of the Cyberworlds (CW)

3D ShapeNets: a deep representation for volumetric shapes

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

PointNet: a 3D convolutional neural network for real-time object class recognition

Proceedings of the International Joint Conference on Neural Networks (IJCNN)

Learning barycentric representations of 3D shapes for sketch-based 3D shape retrieval

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

RotationNet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

GVCNN: group-view convolutional neural networks for 3D shape recognition

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

PVNet: a joint convolutional network of point cloud and multi-view for 3D shape recognition

Proceedings of the ACM International Conference on Multimedia

ImageNet classification with deep convolutional neural networks

Proceedings of the Neural Information Processing Systems (NeurIPS)

Deep residual learning for image recognition

Proceedings of the Computer Vision and Pattern Recognition (CVPR)

ImageNet: a large-scale hierarchical image database

Proceedings of the Computer Vision and Pattern Recognition (CVPR)