Elsevier

Medical Image Analysis

Volume 73, October 2021, 102134
Medical Image Analysis

Contrastive rendering with semi-supervised learning for ovary and follicle segmentation from 3D ultrasound

https://doi.org/10.1016/j.media.2021.102134Get rights and content

Highlights

  • We propose a contrastive rendering (C-Rend) framework to segment ovary and follicles with detail-refined boundaries.

  • We adopt the concept of point-wise contrastive learning into the C-Rend to maximize the divergence among different classes.

  • We incorporate the proposed C-Rend with a semi-supervised learning (SSL) framework, leveraging unlabeled data for better performance.

  • Our proposed method has the potential to assist clinical doctors for fast diagnosis decision of infertility and may facilitate many advanced applications.

Abstract

Segmentation of ovary and follicles from 3D ultrasound (US) is the crucial technique of measurement tools for female infertility diagnosis. Since manual segmentation is time-consuming and operator-dependent, an accurate and fast segmentation method is highly demanded. However, it is challenging for current deep-learning based methods to segment ovary and follicles precisely due to ambiguous boundaries and insufficient annotations. In this paper, we propose a contrastive rendering (C-Rend) framework to segment ovary and follicles with detail-refined boundaries. Furthermore, we incorporate the proposed C-Rend with a semi-supervised learning (SSL) framework, leveraging unlabeled data for better performance. Highlights of this paper include: (1) A rendering task is performed to estimate boundary accurately via enriched feature representation learning. (2) Point-wise contrastive learning is proposed to enhance the similarity of intra-class points and contrastively decrease the similarity of inter-class points. (3) The C-Rend plays a complementary role for the SSL framework in uncertainty-aware learning, which could provide reliable supervision information and achieve superior segmentation performance. Through extensive validation on large in-house datasets with partial annotations, our method outperforms state-of-the-art methods in various evaluation metrics for both the ovary and follicles.

Introduction

Ultrasound (US) is widely used in female infertility diagnosis due to its advantages of real-time imaging, low cost and non-invasiveness. Female infertility is mainly caused by abnormal development of ovary and follicle (Coelho Neto, Ludwin, Borrell, Benacerraf, Dewailly, da Silva Costa, Condous, Alcazar, Jokubkiene, Guerriero, et al., 2018, Kiruthika, Ramya, 2014). Periodical and comprehensive ultrasound screenings are usually performed to monitor the growth of ovary and follicle. During each scanning, biometrics are measured for quantitative evaluation and diagnosis.

Specifically, the standard diagnosis pipeline is mainly based on 2D US. Sonographers first manually localize the standard planes (SPs) for the targets from scanned videos. Next, biometrics, including ovary and follicle size as well as follicle count, are measured and analyzed on the planes (Narra, Singhal, Narayan, Ramaraju, 2018, Kelsey, Wallace, 2012). Although the pipeline is tractable, the following drawbacks still exist. First, the SP localization is strongly dependent on sonographers’ experience and skill. High inter- and intra-operator variability exists, thus leading to low diagnosis reproducibility, especially for novices. Second, both SP localization and manual measurement is time-consuming when tens of follicles exist (as shown in Fig. 1(b)). Sonographers have to localize multiple SPs for complete measurement and examination. It brings huge difficulty for follicle counting and significantly decreases the diagnosis efficiency. Third, the final diagnostic conclusions could be biased by planar metrics, which only partially represent anatomical geometry.

In contrast, 3D US has the inherent advantages of less experience-dependency and more efficiency, which also makes off-line analysis more reliable (Coelho Neto et al., 2018). With a broad volumetric view, ovary and all follicles could be imagined in just a single 3D scanning (as shown in Fig. 1(a)). Notably, 3D US paves new paths for many crucial studies that can not be approached by 2D US, such as the measurement of volume size of ovary and follicles. However, facing the poor image quality and expensive manual annotation cost, related clinical studies often resort to semi-automatic methods, such as (Gooding, Barber, Kennedy, Noble, 2004, Gooding, Kennedy, Noble, 2008). Since these methods still involve cumbersome and subjective interactions, they are inappropriate to measure objects with irregular shapes thus resulting in conflicting results (Narra et al., 2018). In this regard, automated analysis tools are highly demanded. To achieve that, accurate segmentation should be obtained firstly.

However, it is a challenging task to perform ovary and follicle segmentation from US volumes. The first difficulty comes from boundary ambiguity. As shown in Fig. 1(c) and (d), speckle noise makes the boundary among follicles blurring. Boundary thickness of follicles is inconsistent due to irregular follicle shapes and complex connection status. It is difficult to recognize the boundary between the ovary and background tissues, even for experienced experts. Second, it is very difficult to annotate sufficient 3D data (each including hundreds of slices) for training high-performance supervised models. It would take at least 10 hours for an experienced expert to annotate one volume like the one in Fig. 1(b) which contains tens of follicles. Last but not least, an explosive increase of volumetric data also requires a relatively large network, which challenges the efficient segmentation in clinical application.

In our review of related works, we first introduce 2D US based methods, and then we summarize the 3D US studies. Finally, previous works that focus on addressing boundary ambiguity issue are discussed.

Most early works perform segmentation based on conventional methods (e.g, thresholding, watershed, region growing, active contours). A previous work (Deng et al., 2011) used a modified watershed algorithm to automatically segment the follicles. A similar idea can also be found in Krivanek and Sonka (1998). Potocnik and Zazula (2000) firstly employed region growing method to extract candidates of follicles and then recognized them based on empirically determined parameters. Potočnik and Zazula (2002) further proposed to estimate the parameters adaptively and employed dynamic images to take advantages of spatial-temporal information. Cigale et al. (2006) and Lenic et al. (2007) employed an SVM to train cellular neural networks which achieved a shorter learning process and more robust segmentation performance. Li et al. (2019) presented a composite network using recurrent neural network (RNN) to learn multi-scale and long-range spatial contexts in 2D US. The above methods indicate immense capacity for the ovary or follicle segmentation task. However, the measurement based on automatic segmentation in 2D US is error-prone, due to irregular shapes and occlusion of targets(Chen et al., 2009).

Different from the 2D US, segmentation studies in 3D US are limited due to the challenges of low imaging quality, large volume size and annotation difficulty. Continuous-wave transform algorithm (Cigale and Zazula, 2007) is applied to segment ovary in 3D US automatically. Chen et al. (2009) used a probabilistic boosting tree (PBT) to detect the follicular locations based on global-local context and then used the locations as prior knowledge to segment follicles based on markov random filed (MRF). Narayan et al. (2018) employed noise-robust phase asymmetric feature maps to detect and segment follicles in 3D US based on the max-flow algorithm. Narra et al. (2018) presented a variational segmentation framework to perform 2D radial slice segmentation. It integrated with deep energy map learned from U-Net as soft shape prior. The segmented slice results were unitized to generate a 3D mesh of ovary or follicle for surface measurement. Although the aforementioned methods are effective, the majority is still not discriminative enough to deal with gray-scale inhomogeneity and boundary ambiguity well. In addition, few supervised 3D deep models have been studied due to the difficulty of 3D annotation. Cigale and Zazula (2007) only explored 2D U-Net to segment 3D volume in a slice-by-slice way, which loses spatial information in the third dimension.

It is worth noting that boundary ambiguity has always been a great challenge for segmentation tasks (not just for ovary and follicle) and has been studied by previous works. Tu and Bai (2009) presented a novel auto-context framework to iteratively reuse the context information for segmentation refinement during training. Yang et al. (2018) employed a multi-directional recurrent neural network (RNN) to extract local semantic features to combat boundary ambiguity. Chen et al. (2016) and Zhu et al. (2019) employed edge-weighted mechanisms to pay more attention to segment object edges, to tackle the ambiguity problem. Wang et al. (2019b) applied the Atrous Spatial Pyramid Pooling (ASPP, (Chen et al., 2017)) to fuse the context information hierarchically and improve the performance on capturing small objects. Although these methods extract global/local semantic information for boundary renement effectively, they cannot approach point-level learning for fine-grained boundary identication.

Recently, a PointRend method was proposed to treat image segmentation as a rendering task (Kirillov et al., 2020). It was proven to be promising alternative in improving semantic segmentation. Inspired by it, we proposed a Contrastive-Rendering (C-Rend) framework which was primarily proposed in our initial MICCAI work (Li et al., 2020) to address the boundary ambiguity issue. It aims to improve the boundary estimation accuracy in US images and decrease computation cost.

In this study, considering the difficulty in collecting annotated volumetric data, we further aggregate the C-Rend framework with a semi-supervised learning (SSL) strategy based on Mean-Teacher (MT) model (Tarvainen and Valpola, 2017a). The conventional MT model employs an uncertainty estimation strategy to generate a more reliable target from unlabeled data for the training process. Although such strategy helps the model pay more attention to the high-confidence region (Yu et al., 2019), those ambiguous boundaries are probably ignored. Thus we integrate our C-Rend module into the SSL framework as a student model, enhancing the performance of boundary refinement. The student model learns from the teacher by encouraging the consistency between the predictions of unlabeled data obtained by student and teacher models. We performed extensive experiments on a challenging in-house dataset. Results show that the proposed method got decent agreements with expert annotations and outperformed strong contenders. It has great potential to advance the quantitative analysis of ovarian and follicle US volumes.

In summary, our contribution is three-fold:

  • We propose a C-Rend framework to formulate boundary estimation as a rendering task. The C-Rend adaptively recognizes ambiguous points and re-predicts their boundary predictions via coarse and fine-grained feature enriched representation learning.

  • We adopt the concept of point-wise contrastive learning into the C-Rend to maximize the divergence among different classes. It encourages the similarity of intra-class ambiguous points and contrastively decreases the similarity of inter-class ones.

  • We aggregate the C-Rend into the SSL framework leveraging reliable information from unlabeled data. Such aggregation reinforces the model optimization for both high- and low-confidence regions. Experiments demonstrate that our C-Rend with SSL can yield satisfactory segmentation results.

The organization of the rest in this paper is as follows. Section 2 presents the details of the C-Rend framework and SSL strategy. Section 3 presents the experimental results of the proposed method on 3D US segmentation. Finally, Section 4 elaborates the discussion of the proposed method, and the conclusion of this study is given in Section 5.

Section snippets

Methodology

Fig. 2 illustrates the overview of the proposed segmentation framework for the 3D ovarian US. It mainly consists of ve key components: (1) a segmentation architecture containing an asymmetric encoder-decoder structure as the backbone, (2) point selection module to select ambiguous points that need to be re-predicted, (3) a rendering head to re-predict the label of selected points based on the hybrid point-level features, (4) a contrastive learning head to further enhance the condence on

Datasets and pre-processing

Experiments were conducted on a dataset consisting of 307 transvaginal ultrasound (TVUS) volumes collected from 217 patients at the Third Affiliated Hospital of Guangzhou Medical University with approval from the local research ethics committee. Concretely, there are 156 TVUS volumes with manual annotations and the rest were used as unlabeled data for our semi-supervised system. The labeled dataset was divided into a training group and evaluation group at a ratio of 8:2. All the ovaries and

Discussion

Segmentation of ovary and follicles is crucial to quantitative analysis based female infertility diagnosis in 3D US. This task, however, is challenging due to poor 3D imaging quality, ambiguous boundary, large volume size and insufficient annotations. To address these problems and improve the segmentation performance, we developed a novel contrastive rendering framework to enhance the boundary prediction accuracy with less computation cost. The proposed C-Rend framework trained a light-weight

Conclusion

In this study, a general and lightweight framework for 3D ovarian US segmentation is presented, which holds potentials for dirent deep architectures and applications. Exploiting point-level coarse prediction and ne-grained features, coupled with contrastive learning, to calibrate the ambiguous prediction on boundary is the main highlight of this work. Moreover, we combined the C-Rend method with the semi-supervised training strategy based on Mean-Teacher approach to acquire better segmentation

CRediT authorship contribution statement

Xin Yang: Conceptualization, Methodology, Writing - review & editing, Project administration. Haoming Li: Writing - original draft, Methodology, Formal analysis, Validation. Yi Wang: Writing - review & editing. Xiaowen Liang: Investigation, Validation. Chaoyu Chen: Investigation, Validation. Xu Zhou: Writing - review & editing. Fengyi Zeng: Investigation, Validation. Jinghui Fang: Validation. Alejandro Frangi: Writing - review & editing. Zhiyi Chen: Data curation, Resources. Dong Ni:

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the grant from National Key R&D Program of China (No. 2019YFC0118300), Shenzhen Peacock Plan (No. KQTD2016053112051497, KQJSCX20180328095606003).

References (35)

  • B. Cigale et al.

    Segmentation of 3d ovarian ultrasound volumes using continuous wavelet transform

    11th Mediterranean Conference on Medical and Biomedical Engineering and Computing 2007

    (2007)
  • M.A. Coelho Neto et al.

    Counting ovarian antral follicles by ultrasound: a practical guide

    Ultrasound Obstetric. Gynecol.

    (2018)
  • M. Gooding et al.

    The effect of follicle volume measurement on clinical decisions

    Proc. Med. Image Understand. Anal. (MIUA)

    (2004)
  • T.W. Kelsey et al.

    Ovarian volume correlates strongly with the number of nongrowing follicles in the human ovary

    Obstet. Gynecol. Int.

    (2012)
  • A. Kendall et al.

    What uncertainties do we need in bayesian deep learning for computer vision?

    Advances in Neural Information Processing Systems

    (2017)
  • A. Kirillov et al.

    Pointrend: Image segmentation as rendering

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2020)
  • V. Kiruthika et al.

    Automatic segmentation of ovarian follicle using k-means clustering

    2014 fifth international conference on signal and image processing

    (2014)
  • Cited by (10)

    • Evaluation of oocyte maturity using artificial intelligence quantification of follicle volume biomarker by three-dimensional ultrasound

      2022, Reproductive BioMedicine Online
      Citation Excerpt :

      An in-house developed artificial intelligence method based on deep learning was used to obtain the follicle volume biomarker. A detailed technological description of the pipeline has been published separately (Yang et al., 2021). C-Rend is a deep learning model for the accurate simultaneous segmentation of ovaries and follicles (Figure 1).

    View all citing articles on Scopus
    1

    The two authors contribute equally to this work.

    View full text