Elsevier

Medical Image Analysis

Volume 81, October 2022, 102539
Medical Image Analysis

CS-CO: A Hybrid Self-Supervised Visual Representation Learning Method for H&E-stained Histopathological Images

https://doi.org/10.1016/j.media.2022.102539Get rights and content

Highlights

  • Novel hybrid self-supervised visual representation learning method tailored for H&E-stained histopathological images.

  • Generative and discriminative self-supervised learning can complement and enhance each other.

  • Good rationality by leveraging domain-specific knowledge in histopathology.

  • Good versatility for different kinds of computational histopathology tasks.

Abstract

Visual representation extraction is a fundamental problem in the field of computational histopathology. Considering the powerful representation capacity of deep learning and the scarcity of annotations, self-supervised learning has emerged as a promising approach to extract effective visual representations from unlabeled histopathological images. Although a few self-supervised learning methods have been specifically proposed for histopathological images, most of them suffer from certain defects that may hurt the versatility or representation capacity. In this work, we propose CS-CO, a hybrid self-supervised visual representation learning method tailored for H&E-stained histopathological images, which integrates advantages of both generative and discriminative approaches. The proposed method consists of two self-supervised learning stages: cross-stain prediction (CS) and contrastive learning (CO). In addition, a novel data augmentation approach named stain vector perturbation is specifically proposed to facilitate contrastive learning. Our CS-CO makes good use of domain-specific knowledge and requires no side information, which means good rationality and versatility. We evaluate and analyze the proposed CS-CO on three H&E-stained histopathological image datasets with downstream tasks of patch-level tissue classification and slide-level cancer prognosis and subtyping. Experimental results demonstrate the effectiveness and robustness of the proposed CS-CO on common computational histopathology tasks. Furthermore, we also conduct ablation studies and prove that cross-staining prediction and contrastive learning in our CS-CO can complement and enhance each other. Our code is made available at https://github.com/easonyang1996/CS-CO.

Introduction

Histopathology plays an important role in clinical medicine. It can reveal the morphology of pathologic cell and tissue at a microscopic level and provide vital information for disease diagnosis and prognosis (Srinidhi et al., 2021). In the past decades, thanks to the popularity of whole slide digital scanners, a growing number of histopathological slides have been digitized as histopathological images. This process of digitization not only facilitates the viewing, storing and sharing of histopathological slides but also paves the way for computer-aided analysis (Al-Janabi et al., 2012). In recent years, lots of computer-aided histopathological image analysis methods have been proposed, aiming to relieve the workload of pathologists and improve the objectivity of disease diagnosis (Gurcan et al., 2009). These valuable researches give birth to a promising research topic, computational histopathology, which has made a huge influence on the study of pathology (Abels et al., 2019). Furthermore, artificial intelligence-based computational histopathology has recently shown great promise to increase both the accuracy and availability of high-quality health care to patients (Srinidhi, Ciga, Martel, 2021, Cui, Zhang, 2021).

In computational histopathology, extracting effective visual representation is one of the most important problems (Gurcan et al., 2009). It is the cornerstone of many computational histopathology tasks, such as image retrieval (Shi, Sapkota, Xing, Liu, Cui, Yang, 2018, Yang, Zhai, Li, Lv, Wang, Zhu, Jiang, 2020), disease diagnosis (Shao, Bian, Chen, Wang, Zhang, Ji, et al., 2021, Lu, Williamson, Chen, Chen, Barbieri, Mahmood, 2021) and prognosis (Saillard, Schmauch, Laifa, Moarii, Toldo, Zaslavskiy, Pronier, Laurent, Amaddeo, Regnault, et al., 2020, Yao, Zhu, Jonnagaddala, Hawkins, Huang, 2020), and molecular signature prediction (Ding, Liu, Lee, Zhou, Lu, Zhang, 2020, Fu, Jung, Torne, Gonzalez, Vöhringer, Shmatko, Yates, Jimenez-Linan, Moore, Gerstung, 2020, Kather, Heij, Grabsch, Loeffler, Echle, Muti, Krause, Niehues, Sommer, Bankhead, et al., 2020). Besides, using visual representations instead of raw RGB images can significantly reduce data dimensionality and computational consumption. In earlier researches, some features are manually designed based on the knowledge of pathology and extracted via traditional feature extraction approaches. However, these traditional handcrafted features are very subjective, so their representation capacity is limited (Madabhushi and Lee, 2016). Recently, deep learning-based methods have shown powerful representation capability and have gradually become the mainstream for visual representation extraction (LeCun et al., 2015). Deep learning-based methods usually rely on large amounts of labeled data to learn good visual representations, while preparing large-scale labeled datasets is expensive and time-consuming, especially for histopathological image data. Therefore, to avoid this tedious data collection and annotation procedure, some researchers take a compromise and utilize pre-trained deep models, e.g. ImageNet (Deng et al., 2009) pre-trained convolutional neural network (CNN), to extract visual representations from histopathological images (Shao, Bian, Chen, Wang, Zhang, Ji, et al., 2021, Lu, Williamson, Chen, Chen, Barbieri, Mahmood, 2021, Saillard, Schmauch, Laifa, Moarii, Toldo, Zaslavskiy, Pronier, Laurent, Amaddeo, Regnault, et al., 2020, Yao, Zhu, Jonnagaddala, Hawkins, Huang, 2020, Ding, Liu, Lee, Zhou, Lu, Zhang, 2020). However, this compromise ignores both the data distribution difference and task bias, which will result in inappropriate or suboptimal visual representations.

Considering the aforementioned dilemma, self-supervised learning is one of the feasible solutions and has received increasing attention from researchers in recent years. The greatest advantage of self-supervised learning is that it can fit a deep model using only unlabeled data. Given a well-designed pretext task, the supervisory signals can be automatically generated from the unlabeled data. Then, the deep model can be easily trained to capture features by solving the pretext task in a supervised manner (Jing and Tian, 2020). In the past few years, self-supervised visual representation learning has made great progress. For natural images, several self-supervised learning methods (He, Fan, Wu, Xie, Girshick, 2020, Chen, Kornblith, Norouzi, Hinton, 2020) have achieved surprising results and shrunk the performance gap with supervised methods on downstream tasks (Jing and Tian, 2020). For histopathological images, a few self-supervised learning methods have also been proposed, but most of them have certain defects: some methods need side information other than images like magnification for supervision (Sahasrabudhe, Christodoulidis, Salgado, Michiels, Loi, André, Paragios, Vakalopoulou, 2020, Xie, Chen, Li, Shen, Ma, Zheng, 2020), others rely on the spatial proximity assumption which does not necessarily hold (Gildenblat, Klaiman, 2019, Abbet, Zlobec, Bozorgtabar, Thiran, 2020). As far as we know, there is still a lack of universal and effective self-supervised learning methods for extracting visual representations from histopathological images.

To this end, we propose CS-CO, a novel hybrid self-supervised visual representation learning method tailored for H&E-stained histopathological images. Our CS-CO employs two kinds of pretext tasks for self-supervised learning. One is the generative Cross-Stain prediction, and the other is the discriminative COntrastive learning. Both of them make good use of domain-specific knowledge and require no side information. Therefore, the proposed method has good rationality and versatility. The major contributions of our work are summarized as follows.

  • We design a novel generative pretext task, i.e., cross-staining prediction, for self-supervised learning on H&E-stained histopathological images.

  • We propose a new data augmentation approach, i.e. stain vector perturbation, to facilitate histopathological image contrastive learning.

  • We integrate the advantages of generative and discriminative approaches and build a hybrid self-supervised visual representation learning framework for H&E-stained histopathological images.

  • We demonstrate the superiority of the proposed CS-CO on several computational pathology tasks, such as patch-level tissue classification and slide-level disease cancer prognosis and subtyping.

This paper is an extension of the preliminary work (Yang et al., 2021) presented in MICCAI 2021. Besides a more in-depth background introduction, a more detailed method description and a more comprehensive discussion on experimental results, we improve and extend the previous paper in three main aspects. (1) We conduct new ablation studies to analyze the impact of the weighting of contrastive learning and cross-stain prediction on model performance. According to the experimental results, we refine the training strategy of CS-CO to improve the robustness. (2) We rerun the experiments presented in (Yang et al., 2021), with the more convincing cross-validation strategy and statistical significance testing. We also add two strong pathology-specific contrastive learning baselines for comprehensive comparison. (3) We evaluate the performance of CS-CO on two slide-level downstream tasks. One is for hepatocellular carcinoma (HCC) prognosis, and the other is for glioma subtyping. The experimental results indicate the effectiveness and versatility of CS-CO on common computational histopathology tasks.

Section snippets

Related work

The recently emerged self-supervised learning has become an important branch of deep learning. In the context of self-supervised learning, deep learning models can be well-trained using only unlabeled data, and visual representations can be easily extracted with the learned models. In this section, we will firstly introduce the taxonomy of existing self-supervised learning methods. Then, studies on contrastive learning, as well as methods for self-supervised learning of medical images, will be

Overview of CS-CO

As illustrated in Fig. 1, our proposed CS-CO consists of two self-supervised learning stages, namely cross-stain prediction and contrastive learning, both of which are specially designed for histopathological images. Before the first self-supervised learning stage, stain separation is firstly applied on original H&E-stained images to generate single-dye staining results. With these stain-separated images, a two-branch autoencoder is trained at the first self-supervised learning stage by solving

Experiments and results

We conduct four sets of experiments on three different H&E-stained histopathological image datasets to dissect and evaluate the proposed CS-CO. First of all, we show the feasibility of the proposed generative pretext task, cross-stain prediction, on all datasets. Afterwards, on the patch-level tissue classification dataset, we compare CS-CO with several baselines under the linear evaluation protocol and conduct ablation studies to explore the role of key components of CS-CO. At last, we

Discussion and conclusion

Extracting effective visual representations from histopathological images is the cornerstone of many computational histopathology tasks. In recent years, deep learning models have shown powerful capabilities in extracting representations from images. However, it is not easy to collect large-scale labeled data for model training, especially for medical images like histopathological images. Nowadays, thanks to the popularity of digital pathology, a growing number of unlabeled histopathological

CRediT authorship contribution statement

Pengshuai Yang: Conceptualization, Data curation, Formal analysis, Methodology, Software, Visualization, Writing – original draft. Xiaoxu Yin: Data curation, Formal analysis, Software, Visualization, Writing – original draft. Haiming Lu: Supervision, Writing – review & editing. Zhongliang Hu: Resources, Data curation, Writing – review & editing. Xuegong Zhang: Funding acquisition, Supervision, Writing – review & editing. Rui Jiang: Conceptualization, Funding acquisition, Supervision, Writing –

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Key Research and Development Program of China grant no. 2021YFF1200902, the National Natural Science Foundation of China grants nos. 61873141, 61721003, 61573207, U1736210, 42050101, a grant from the Guoqiang Institute, Tsinghua University, and the Tsinghua-Fuzhou Institute for Data Technology. We also thank Zhengyu Zhao for proofreading this article.

References (72)

  • P. Yang et al.

    A deep metric learning approach for histopathological image retrieval

    Methods

    (2020)
  • J. Yao et al.

    Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks

    Medical Image Analysis

    (2020)
  • C. Abbet et al.

    Divide-and-rule: self-supervised learning for survival analysis in colorectal cancer

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2020)
  • E. Abels et al.

    Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the digital pathology association

    The Journal of pathology

    (2019)
  • S. Al-Janabi et al.

    Digital pathology: current status and future perspectives

    Histopathology

    (2012)
  • A. Ally et al.

    Comprehensive and integrative genomic characterization of hepatocellular carcinoma

    Cell

    (2017)
  • A. Basavanhally et al.

    Multi-field-of-view framework for distinguishing tumor grade in er+ breast cancer from entire histopathology slides

    IEEE transactions on biomedical engineering

    (2013)
  • J. Boyd et al.

    Self-supervised representation learning using visual field expansion on digital pathology

    Proceedings of the IEEE/CVF International Conference on Computer Vision

    (2021)
  • Cerami, E., Gao, J., Dogrusoz, U., Gross, B. E., Sumer, S. O., Aksoy, B. A., Jacobsen, A., Byrne, C. J., Heuer, M. L.,...
  • J.K. Chan

    The wonderful colors of the hematoxylin–eosin stain in diagnostic surgical pathology

    International journal of surgical pathology

    (2014)
  • J.-R. Chang et al.

    Stain mix-up: Unsupervised domain generalization for histopathology images

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2021)
  • T. Chen et al.

    A simple framework for contrastive learning of visual representations

    International conference on machine learning

    (2020)
  • X. Chen et al.

    Exploring simple siamese representation learning

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    (2021)
  • D. Chicco

    Siamese neural networks: An overview

    Artificial Neural Networks

    (2021)
  • J. Deng et al.

    Imagenet: A large-scale hierarchical image database

    2009 IEEE conference on computer vision and pattern recognition

    (2009)
  • K. Ding et al.

    Feature-enhanced graph networks for genetic mutational prediction using histopathological images in colon cancer

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2020)
  • C. Doersch et al.

    Unsupervised visual representation learning by context prediction

    Proceedings of the IEEE international conference on computer vision

    (2015)
  • Y. Fu et al.

    Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis

    Nature Cancer

    (2020)
  • J. Gao et al.

    Integrative analysis of complex cancer genomics and clinical profiles using the cbioportal

    Science signaling

    (2013)
  • S. Gidaris et al.

    Unsupervised representation learning by predicting image rotations

    arXiv preprint arXiv:1803.07728

    (2018)
  • J. Gildenblat et al.

    Self-supervised similarity learning for digital pathology

    arXiv preprint arXiv:1905.08139

    (2019)
  • J.-B. Grill et al.

    Bootstrap your own latent-a new approach to self-supervised learning

    Advances in Neural Information Processing Systems

    (2020)
  • M.N. Gurcan et al.

    Histopathological image analysis: A review

    IEEE reviews in biomedical engineering

    (2009)
  • R. Hadsell et al.

    Dimensionality reduction by learning an invariant mapping

    2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)

    (2006)
  • K. He et al.

    Masked autoencoders are scalable vision learners

    arXiv preprint arXiv:2111.06377

    (2021)
  • K. He et al.

    Momentum contrast for unsupervised visual representation learning

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    (2020)
  • Cited by (26)

    • Traditional machine learning algorithms for breast cancer image classification with optimized deep features

      2023, Biomedical Signal Processing and Control
      Citation Excerpt :

      In order to obtain a good result over meta-heuristic algorithms, the method to be used must be correctly associated with the relevant problem. As deep learning has advanced, features are now automatically acquired, and it has shown an increasing capacity to extract high-dimensional features [19,20]. The fact that data with very high dimensional features is the key issue here [21,22].

    View all citing articles on Scopus
    View full text