BTSwin-Unet: 3D U-shaped Symmetrical Swin Transformer-based Network for Brain Tumor Segmentation with Self-supervised Pre-training

Liang, Junjie; Yang, Cihui; Zhong, Jingting; Ye, Xiaoli

doi:10.1007/s11063-022-10919-1

BTSwin-Unet: 3D U-shaped Symmetrical Swin Transformer-based Network for Brain Tumor Segmentation with Self-supervised Pre-training

Published: 17 June 2022

Volume 55, pages 3695–3713, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Junjie Liang¹,
Cihui Yang ORCID: orcid.org/0000-0003-4544-1486¹,
Jingting Zhong¹ &
…
Xiaoli Ye¹

2774 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Medical image automatic segmentation plays an important role in Computer-Aided Diagnosis system. Although convolution-based network has achieved great performance in medical image segmentation, it has limitations in modeling long-range contextual interactions and spatial dependencies. Due to the powerful ability of long-range information interaction of Vision Transformer, Vision Transformer have achieved advanced performance in several downstream tasks via self-supervised learning. In this paper, motivative by Swin Transformer, we proposed BTSwin-Unet, which is a 3D U-shaped symmetrical Swin Transformer-based network for brain tumor segmentation. Moreover, we construct a self-supervised learning framework to pre-train the model encoder through the reconstruction task. Extensive experiments on tumor segmentation tasks validated the performance of our proposed model, and our results consistently demonstrate favorable benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

References

Erdaş ÇB, Güney S (2021) Human activity recognition by using different deep learning approaches for wearable sensors. Neural Process Lett 53:1795–1809
Article Google Scholar
Pitchai R, Supraja P, Victoria AH, Madhavi M (2021) Brain tumor segmentation using deep learning and fuzzy k-means clustering for magnetic resonance images. Neural Process Lett 53:2519–2532
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, Springer p 234–241
Milletari F, Navab N, Ahmadi S-A (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV), IEEE p 565–571
Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O (2016) 3d u-net: learning dense volumetric segmentation from sparse annotation. In: International conference on medical image computing and computer-assisted intervention, Springer, p 424–432
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, p 5998–6008
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows, arXiv preprint arXiv:2103.14030
Hatamizadeh A, Tang Y, Nath V, Yang D, Myronenko A, Landman B, Roth H, Xu D, (2021) Unetr: Transformers for 3d medical image segmentation, arXiv preprint arXiv:2103.10504
Wang W, Chen C, Ding M, Yu H, Zha S, Li J (2021) Transbts: Multimodal brain tumor segmentation using transformer. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, p 109–119
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2021) Swin-unet: Unet-like pure transformer for medical image segmentation, arXiv preprint arXiv:2105.05537
Bao H, Dong L, Wei F (2021) Beit: Bert pre-training of image transformers, arXiv preprint arXiv:2106.08254
Li C, Yang J, Zhang P, Gao M, Xiao B, Dai X, Yuan L, Gao J (2021) Efficient self-supervised vision transformers for representation learning, arXiv preprint arXiv:2106.09785
Xiao T, Dollar P, Singh M, Mintun E, Darrell T, Girshick R (2021) Early convolutions help transformers see better. Adv Neural Inf Process Syst 34:30392–30400
Google Scholar
Dong X, Bao J, Chen D, Zhang W, Yu N, Yuan L, Chen D, Guo B (2021) Cswin transformer: A general vision transformer backbone with cross-shaped windows, arXiv preprint arXiv:2107.00652
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p 3431–3440
Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B, et al (2018) Attention u-net: Learning where to look for the pancreas, arXiv preprint arXiv:1804.03999
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, Springer p 3–11
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p 7794–7803
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848
Article Google Scholar
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: Deformable transformers for end-to-end object detection, arXiv preprint arXiv:2010.04159
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: Transformers make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Chen M, Radford A, Child R, Wu J, Jun H, Luan D, Sutskever I (2020) Generative pretraining from pixels. In: International Conference on Machine Learning, PMLR p 1691–1703
Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition p 2536–2544
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, p 1096–1103
Becker S, Hinton GE (1992) Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature 355:161–163
Article Google Scholar
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2, IEEE, p 1735–1742
Zhang Y, Li X, Liu C, Shuai B, Zhu Y, Brattoli B, Chen H, Marsic I, Tighe J (2021) Vidtr: Video transformer without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, p 13577–13587
Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: Introducing convolutions to vision transformers, arXiv preprint arXiv:2103.15808
Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pvtv2: Improved baselines with pyramid vision transformer, arXiv preprint arXiv:2106.13797
He K, Chen X, Xie S, Li Y, Dollár P, Girshick R (2021) Masked autoencoders are scalable vision learners, arXiv preprint arXiv:2111.06377
Xie Z, Zhang Z, Cao Y, Lin Y, Bao J, Yao Z, Dai Q, Hu H (2021) Simmim: A simple framework for masked image modeling, arXiv preprint arXiv:2111.09886
Huttenlocher DP, Klanderman GA, Rucklidge WJ (1993) Comparing images using the hausdorff distance. IEEE Trans Pattern Anal Mach Intell 15:850–863
Article Google Scholar
Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS, Freymann JB, Farahani K, Davatzikos C (2017) Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data 4:1–13
Article Google Scholar
Bakas S, Reyes M, Jakab A, Bauer S, Rempfler M, Crimi A, Shinohara RT, Berger C, Ha SM, Rozycki M et al (2018) Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge, arXiv preprint arXiv:1811.02629

Download references

Funding

This work was supported by the Science and Technology Department of Jiangxi Province under Grant No. 20202BABL202028.

Author information

Authors and Affiliations

School of Information Engineering, Nanchang Hangkong University, No.696 Fenghe South Avenue, Nanchang, 330063, China
Junjie Liang, Cihui Yang, Jingting Zhong & Xiaoli Ye

Authors

Junjie Liang
View author publications
You can also search for this author in PubMed Google Scholar
Cihui Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jingting Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cihui Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, J., Yang, C., Zhong, J. et al. BTSwin-Unet: 3D U-shaped Symmetrical Swin Transformer-based Network for Brain Tumor Segmentation with Self-supervised Pre-training. Neural Process Lett 55, 3695–3713 (2023). https://doi.org/10.1007/s11063-022-10919-1

Download citation

Accepted: 06 June 2022
Published: 17 June 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s11063-022-10919-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BTSwin-Unet: 3D U-shaped Symmetrical Swin Transformer-based Network for Brain Tumor Segmentation with Self-supervised Pre-training

Abstract

Access this article

Similar content being viewed by others

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BTSwin-Unet: 3D U-shaped Symmetrical Swin Transformer-based Network for Brain Tumor Segmentation with Self-supervised Pre-training

Abstract

Access this article

Similar content being viewed by others

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation