Elsevier

Computers & Graphics

Volume 106, August 2022, Pages 237-247
Computers & Graphics

Technical Section
A study of deep single sketch-based modeling: View/style invariance, sparsity and latent space disentanglement

https://doi.org/10.1016/j.cag.2022.06.005Get rights and content

Highlights

  • Sketch-based modeling robust to view and style change.

  • A new formulation of the regression loss fitted to work with SDFs.

  • Latent space disentanglement with regression loss increases reconstruction accuracy.

  • Challenges inherent to sketch input in the context of deep-reconstruction methods.

  • Auxiliary network learns to predict foreground mask supportting user sparse labels.

Abstract

Deep image-based modeling has received a lot of attention in recent years. Sketch-based modeling in particular has gained popularity given the ubiquitous nature of touchscreen devices. In this paper, we (i) study and compare diverse single-image reconstruction methods on sketch input, comparing the different 3D shape representations: multi-view, voxel- and point-cloud-based, mesh-based and implicit ones; and (ii) analyze the main challenges and requirements of sketch-based modeling systems. We introduce the regression loss and provide two variants of its formulation for the two most promising 3D shape representations: point clouds and signed distance functions. We show that this loss can increase general reconstruction accuracy, and the view- and style-robustness of the reconstruction methods. Moreover, we demonstrate that this loss can benefit the disentanglement of latent space to view-invariant and view-specific information, resulting in further improved performance. To address the figure-ground ambiguity typical for sparse freehand sketches, we propose a two-branch architecture that exploits sparse user labeling. We hope that our work will inform future research on sketch-based modeling.

Introduction

The challenge of being able to obtain a 3D model from a single sketch has intrigued researchers for decades. Typically, proposed methods make assumptions on the type of the input [1], [2] or restrict users to a specific user interface [3], [4], [5]. Being an under-constrained problem, for which it is hard to devise a reliable set of heuristics, it naturally asks for deep learning-based methods.

In light of the recent surge of image-based reconstruction, deep sketch-based modeling is gaining popularity [6], [7], [8], [9]. In this work, we study and compare the state-of-the-art deep single RGB image 3D shape reconstruction methods on sketch inputs. This allows us to identify the main challenges of working with sketch input: style variance between humans, imprecise perspective, and sparsity. We propose targeted solutions to increase the robustness of existing methods on a sketch input.

The first challenge comes from style differences, i.e., each person carries a unique sketching style. To address this problem, we use for training three synthetic datasets: naive, stylized, and one where the style is unified by an additional image processing network, as we proposed in the conference version of this paper [10]. The naive dataset represents rendering style with a uniform line width, commonly used in sketch-based reconstruction papers, where the lines are obtained from 2D images or via non-photorealistic rendering from 3D models. In this work, we rely on the latter. The stylized dataset is designed to capture the diversity of freehand sketching styles. The strategy of the style-unifying image translation network in the context of sketch-based modeling was first proposed in [6] and proved to be efficient on doodle sketches. We aim at more detailed sketches and show that if the sketching style is within an expected variance on line widths and over-sketching then training on the proposed stylized dataset results in more accurate reconstructions compared to a style-unifying network.

Second, we address the robustness of reconstruction methods with respect to viewpoint and style. It is common for deep single image methods to train and test their models on a predefined set of viewpoints [11]. Nevertheless, it was observed by Gryaditskaya et al. [12] that even professional designers, when asked to sketch from a given viewpoint, produce sketches with large angular deviations from the set viewpoint. Therefore, we create a dataset by generating for each shape 48 viewpoints, where 8 viewpoints are fixed and 5 additional viewpoints for each viewpoint are randomly sampled from a normal distribution with the mean matching the parameters of one of the fixed viewpoints. To explicitly account for the variation of styles and viewpoints, we aim at learning style- and viewpoint-invariant shape representation by proposing a regression loss that encourages the correlation of distances between 3D shapes and distances in a latent feature space. We extend [10] by proposing a new formulation of the regression loss, enabling the representation of 3D shapes as signed distance functions in the context of the regression loss. We show that this representation achieves the most accurate reconstruction results. We carefully study the choice of the parameter used in this loss, sampling and training strategies. Moreover, we show the advantage of this loss on the task of feature space disentanglement to view-invariant and view-specific components, allowing to further improve the accuracy of the reconstruction results.

The final challenge in deep sketch-based reconstruction comes from the difficulties in distinguishing the foreground from the background, due to the sparsity of sketch lines. To alleviate this problem we suggest a simple framework, where the user can provide a few sparse labels, and the network propagates these labels to robustly predict shape foreground binary mask, which is then passed to a 3D shape reconstruction network alongside the input sketch. We apply the strategy proposed in [10] and show its efficiency in the context of SDFs.

In summary, we propose the following contributions:

  • We discuss the key challenges inherent to sketch input in the context of deep-reconstruction methods.

  • We compare alternative strategies to handle freehand sketching styles variations.

  • We adopt the regression loss to learn style- and viewpoint-invariant sketch embedding, and provide two formulations of the regression loss targeted two 3D shape representations: point clouds and signed distance functions.

  • We study the effectiveness of the regression loss in neighborhood relations preservation between the reconstruction results, its style and view invariance properties.

  • Finally, we propose to use an auxiliary network that learns to predict foreground mask from the input sketch and supports user sparse labels when necessary. We demonstrate how such mask can be incorporated as an input to a reconstruction network and allows to better account for the sparsity of input sketches.

Section snippets

Related work

For a general overview of existing sketch-based reconstruction methods, please refer to a recent survey by Bonnici et al. [13]. In this paper, we focus on sketch-based modeling relying on recent advances in deep learning. Moreover, we focus on scenarios where the end user provides a detailed sketch, however, possibly with perspective distortions and over-sketching.

Research on sketch-based modeling using deep learning approaches is limited by the lack of large freehand sketch datasets paired

Overview

First, we study and compare diverse single-image reconstruction methods (Section 4) on sketch input, comparing the different 3D shape representations: multi-view, voxel- and point-cloud-based, mesh-based, and implicit ones. We then analyze and address the main challenges and requirements of single-view sketch-based modeling systems in Sections 5 Sketch sparsity, 6 Viewpoint/style invariant embedded space.

Analysis of various 3D shape representations

In the conference version of this paper [10], we have compared quantitatively and qualitatively a set of 3D reconstruction methods. In this work, we extend this set and also consider the popular representation of 3D shapes with signed distance functions [40] and a recently proposed sketch-dedicated method by Zhong et al. [7] that predicts multi-view 3D shape representation. For completeness, we summarize all reconstruction methods used in [10] in Appendix B.

Sketch sparsity

Due to the sparsity of information in sketch images, single image reconstruction methods often cannot reliably distinguish foreground from the background. To alleviate this problem, we use image translation network [47] and the idea of interactive sparse user labeling [48], [49] to first predict foreground binary mask that we then leverage as an additional input to 3D shape reconstruction methods, as we proposed in the conference paper [10].

Viewpoint/style invariant embedded space

An ideal sketch to 3D shape prediction system should be expected to produce consistent 3D geometries independent of the sketch style or viewpoint. The majority of single view reconstruction methods obtain one global feature vector for each input which is then passed to a decoder. Therefore, the requirement on viewpoint/style invariance means that the embeddings of sketches of the same 3D geometry in different styles and from different viewpoints should be identical. To encourage this, in the

Conclusion

This work extends our conference paper [10] by considering a recently proposed sketch-specific method [7] and the increasingly popular representation of 3D shapes as Signed Distance Fields (SDFs). In this extension, we propose a new formulation of the regression loss fitted to work with SDFs. We consider and evaluate several sampling strategies and distance functions to compare two 3D shapes encoded with SDFs. We also perform a careful analysis of the choice of the parameter for the regression

CRediT authorship contribution statement

Yue Zhong: Conceptualization, Methodology, Formal analysis, Investigation, Writing – original draft, Writing – review & editing. Yulia Gryaditskaya: Supervision, Conceptualization, Methodology, Writing – original draft, Writing – review & editing. Honggang Zhang: Supervision. Yi-Zhe Song: Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References (51)

  • LiC. et al.

    Robust flow-guided neural prediction for sketch-based freeform surface modeling

    ACM Trans Graph

    (2018)
  • XuB. et al.

    True2Form: 3D curve networks from 2D sketches via selective regularization

    ACM Trans Graph

    (2014)
  • GryaditskayaY. et al.

    Lifting freehand concept sketches into 3D

    ACM Trans Graph

    (2020)
  • Bae SH, Balakrishnan R, Singh K. ILoveSketch: As-natural-as-possible Sketching System for Creating 3D Curve Models. In:...
  • Igarashi T, Matsuoka S, Tanaka H. Teddy: a Sketching Interface for 3D Freeform Design. In: ACM trans graph. 2006, p....
  • Schmidt R, Khan A, Singh K, Kurtenbach G. Analytic Drawing of 3D Scaffolds. In: ACM trans graph. 2009, p....
  • WangJ. et al.

    3D shape reconstruction from free-hand sketches

    (2020)
  • ZhongY. et al.

    Towards practical sketch-based 3D shape generation: The role of professional sketches

    IEEE Trans Circuits Syst Video Technol

    (2020)
  • Zhang SH, Guo YC, Gu QW. Sketch2Model: View-Aware 3D Modeling from Single Free-Hand Sketches. In: Proceedings of the...
  • Guillard B, Remelli E, Yvernay P, Fua P. Sketch2Mesh: Reconstructing and Editing 3D Shapes from Sketches. In:...
  • ZhongY. et al.

    Deep sketch-based modeling: Tips and tricks

  • LunZ. et al.

    3D shape reconstruction from sketches via multi-view convolutional networks

  • GryaditskayaY. et al.

    Opensketch: A richly-annotated dataset of product design sketches

    ACM Trans Graph

    (2019)
  • BonniciA. et al.

    Sketch-based interaction and modeling: Where do we stand?

    AI EDAM

    (2019)
  • LoperM.M. et al.

    OpenDR: An approximate differentiable renderer

  • Kato H, Ushiku Y, Harada T. Neural 3d mesh renderer. In: Proceedings of the IEEE conference on computer vision and...
  • Liu S, Li T, Chen W, Li H. Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning. In: Proceedings of...
  • KatoH. et al.

    Differentiable rendering: A survey

    (2020)
  • Remelli E, Lukoianov A, Richter SR, Guillard B, Bagautdinov T, Baque P, et al. Meshsdf: Differentiable Iso-surface...
  • TatarchenkoM. et al.

    Multi-view 3D models from single images with a convolutional network

  • YaoY. et al.

    Front2Back: single view 3D shape reconstruction via front to back prediction

    (2019)
  • Nealen A, Sorkine O, Alexa M, Cohen-Or D. A Sketch-based Interface for Detail-preserving Mesh Editing. In: ACM trans...
  • WuJ. et al.

    Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling

  • ChoyC.B. et al.

    3D-R2N2: A unified approach for single and multi-view 3D object reconstruction

  • GirdharR. et al.

    Learning a predictable and generative vector representation for objects

  • View full text