A sketch semantic segmentation method using novel local feature aggregation and segment-level self-attention

Wang, Lei; Zhang, Shihui; Wang, Wei; Zhao, Weibo

doi:10.1007/s00521-023-08504-1

A sketch semantic segmentation method using novel local feature aggregation and segment-level self-attention

Original Article
Published: 08 April 2023

Volume 35, pages 15295–15313, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Lei Wang¹,
Shihui Zhang^1,2,
Wei Wang¹ &
…
Weibo Zhao¹

439 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Sketch semantic segmentation presents great challenges, since sketches have simpler appearances and more levels of abstraction than natural images. To overcome these challenges, we propose a sketch semantic segmentation method. Concretely, we treat a sketch as a 2D point set and exploit the structures of strokes and the spatial position relationship among 2D points to develop a novel local feature aggregation module. The novel local feature aggregation module encodes informative local features, which are highly useful to analyze semantics. And we define “stroke distance” to balance the two-dimensional spatial distributions of sketches and the internal structures of strokes. Simultaneously, we design a segment-level self-attention module to establish and enhance the relationship between segments by encoding the contents and positions of segment features. Further, based on the above two modules, we construct a similar encoder–decoder structure with two sub-branches, which retains the features of the significant points and integrates the features of several intermediate stages by utilizing a global multi-scale mechanism. Finally, the two outputs of the two sub-branches are fused to obtain the final sketch semantic segmentation result. Extensive experiments on SPG and SketchSeg-150K show that our method achieves state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stroke-based semantic segmentation for scene-level free-hand sketches

Article 07 December 2022

2D freehand sketch labeling using CNN and CRF

Article 05 November 2019

FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Li L, Zou CQ, Zheng YY et al (2021) Sketch-R2CNN: an RNN-rasterization-CNN architecture for vector sketch recognition. IEEE Trans Vis Comput Graph 27(9):3745–3754
Article Google Scholar
Wan J, Zhang KH, Li HD et al (2021) Angular-driven feedback restoration networks for imperfect sketch recognition. IEEE Trans Image Process 30:5085–5095
Article Google Scholar
Lin H, Fu Y, Jiang Y G et al (2020) Sketch-BERT: learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 6757–6766
Zhang XL, Shen ML, Xue M et al (2022) A deformable CNN-based triplet model for fine-grained sketch-based image retrieval. Pattern Recognit 125:108508
Article Google Scholar
Chen YD, Zhang ZL, Wang YF et al (2022) AE-Net: fine-grained sketch-based image retrieval via attention-enhanced network. Pattern Recognit 122:108291
Article Google Scholar
Bhunia AK, Chowdhury PN, Sain A et al (2021) More photos are all you need: semi-supervised learning for fine-grained sketch based image retrieval. In: IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society, pp 4245–4254
Gryaditskaya YL, Song JF, Yang YX et al (2021) Toward fine-grained sketch-based 3d shape retrieval. IEEE Trans Image Process 30:8595–8606
Article Google Scholar
He X, Zhou Y, Zhou Z et al (2018) Triplet-center loss for multi-view 3d object retrieval. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 1945–1954
Ge C, Sun HF, Song YZ et al (2022) Exploring local detail perception for scene sketch semantic segmentation. IEEE Trans Image Process 31:1447–1461
Article Google Scholar
Yang LM, Zhuang JJ, Fu HB et al (2021) SketchGNN: semantic sketch segmentation with graph neural networks. ACM Trans Graph 40(3):1–13
Article Google Scholar
Sarvadevabhatla RK, Dwivedi I, Biswas A et al (2017) Sketchparse: towards rich descriptions for poorly drawn sketches using multi-task hierarchical deep networks. In: ACM international conference on multimedia. Association for Computing Machinery, pp 10–18
Zhu MR, Li J, Wang NN et al (2021) Learning deep patch representation for probabilistic graphical model-based face sketch synthesis. Int J Comput Vision 129(6):1820–1836
Article Google Scholar
Willis KD, Jayaraman PK, Lambourne JG et al (2021) Engineering sketch generation for computer-aided design. In: IEEE computer society conference on computer vision and pattern recognition workshops. IEEE Computer Society, pp 2105–2114
Xu BX, Chang W, Sheffer A et al (2014) True2Form: 3D curve networks from 2D sketches via selective regularization. ACM Trans Graph 33(4):1–13
Article Google Scholar
Xu K, Chen K et al (2013) Sketch2scene: sketch-based co-retrieval and co-placement of 3d models. ACM Trans Graph 32(4):123:1-123:15
Article Google Scholar
Pu JT, Gur D (2009) Automated freehand sketch segmentation using radial basis functions. CAD Comput Aided Des 41(12):857–864
Article Google Scholar
Sun ZB, Wang CH, Zhang LQ et al (2012) Free hand-drawn sketch segmentation. European conference on computer vision. Springer, New York, pp 626–639
Google Scholar
Schneider RG, Tuytelaars T (2016) Example-based sketch segmentation and labeling using CRFs. ACM Trans Graph 35(5):1–9
Article Google Scholar
Wu XY, Qi YG, Liu J et al (2018) Sketchsegnet: aRNN model for labeling sketch strokes. In: IEEE international workshop on machine learning for signal processing. IEEE Computer Society, pp 1–6
Li K, Pang KY, Song YZ et al (2019) Towards deep universal sketch perceptual grouper. IEEE Trans Image Process 28(7):3219–3231
Article MathSciNet MATH Google Scholar
Qi YG, Tan ZH (2019) SketchSegNet+: an end-to-end learning of RNN for multi-class sketch semantic segmentation. IEEE Access 7:102717–102726
Article Google Scholar
Li L, Fu HB, Tai CL (2019) Fast sketch segmentation and labeling with deep learning. IEEE Comput Graph Appl 39(2):38–51
Article Google Scholar
Zhu XY, Xiao Y, Zheng Y (2020) 2D freehand sketch labeling using CNN and CRF. Multimed Tools Appl 79(3):1–18
Google Scholar
Wang F, Lin SJ, Li HH et al (2020) Multi-column point-CNN for sketch segmentation. Neurocomputing 392:50–59
Article Google Scholar
Wang F, Lin S, Wu H et al (2019) SPFusionNet: sketch segmentation using multi-modal data fusion. In: IEEE international conference on multimedia and expo. IEEE Computer Society, pp 1654–1659
Huang Z, Fu HB, Lau RWH et al (2014) Data-driven segmentation and labeling of freehand sketches. ACM Trans Graph 33(6):1–10
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Google Scholar
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Chen LC, Papandreou G, Kokkinos I et al (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: International conference on learning representations, ICLR
Chen LC, Papandreou G, Kokkinos I et al (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Chen LC, Zhu Y, Papandreou G et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. European conference on computer vision. Springer, New York, pp 833–851
Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. International conference on medical image computing and computer-assisted intervention. Springer Verlag, New York, pp 234–241
Google Scholar
Oktay O, Schlemper J, Folgoc L L et al (2018) Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999
Zhou Z, Siddiquee M, Tajbakhsh N et al (2018) U-Net++: a nested U-Net architecture for medical image segmentation. Lect Notes Comput Sci 11045:3–11
Article Google Scholar
Alom MZ, Hasan M, Yakopcic C et al (2018) Recurrent residual convolutional neural network based on U-Net (R2U-Net) for medical image segmentation. arXiv preprint arXiv:1802.06955
Zhang X, Xu HM, Mo H et al (2021) DCNAs: Densely connected neural architecture search for semantic image segmentation. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, pp 13951–13962
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: International conference on learning representations, ICLR
Ma X, Qin C, You H X et al (2022) Rethinking network design and local geometry in point cloud: a simple residual MLP framework. arXiv preprint arXiv:2202.07123
Yu Q, Yang Y, Liu F et al (2017) Sketch-a-Net: a deep neural network that beats humans. Int J Comput Vision 122(3):411–425
Article MathSciNet Google Scholar
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
MATH Google Scholar
Stefano Z, Shabab B, Stefan H et al (2022) PolyWorld: polygonal building extraction with graph neural networks in satellite images. In: IEEE/CVF conference on computer vision and pattern recognition, IEEE
Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: International conference on machine learning. IMLS, pp 4043–4055

Download references

Acknowledgements

We are very grateful to the editor and reviewers for their time and efforts while reviewing this manuscript. Besides, we also appreciate the support of the Central Government Guided Local Funds for Science and Technology Development (No. 216Z0301G), the National Natural Science Foundation of China (No. 61379065) and the Natural Science Foundation of Hebei Province in China (No. F2019203285).

Author information

Authors and Affiliations

School of Information Science and Engineering, Yanshan University, West section, Hebei Street, Qinhuangdao, 066004, Hebei Province, China
Lei Wang, Shihui Zhang, Wei Wang & Weibo Zhao
Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Yanshan University, West section, Hebei Street, Qinhuangdao, 066004, Hebei Province, China
Shihui Zhang

Authors

Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shihui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weibo Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shihui Zhang.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, L., Zhang, S., Wang, W. et al. A sketch semantic segmentation method using novel local feature aggregation and segment-level self-attention. Neural Comput & Applic 35, 15295–15313 (2023). https://doi.org/10.1007/s00521-023-08504-1

Download citation

Received: 28 June 2022
Accepted: 21 March 2023
Published: 08 April 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00521-023-08504-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sketch semantic segmentation method using novel local feature aggregation and segment-level self-attention

Abstract

Access this article

Similar content being viewed by others

Stroke-based semantic segmentation for scene-level free-hand sketches

2D freehand sketch labeling using CNN and CRF

FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A sketch semantic segmentation method using novel local feature aggregation and segment-level self-attention

Abstract

Access this article

Similar content being viewed by others

Stroke-based semantic segmentation for scene-level free-hand sketches

2D freehand sketch labeling using CNN and CRF

FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation