CS-CO: A Hybrid Self-Supervised Visual Representation Learning Method for H&E-stained Histopathological Images

doi:10.1016/j.media.2022.102539

Medical Image Analysis

Volume 81, October 2022, 102539

https://doi.org/10.1016/j.media.2022.102539 Get rights and content

Highlights

•
Novel hybrid self-supervised visual representation learning method tailored for H&E-stained histopathological images.
•
Generative and discriminative self-supervised learning can complement and enhance each other.
•
Good rationality by leveraging domain-specific knowledge in histopathology.
•
Good versatility for different kinds of computational histopathology tasks.

Abstract

Visual representation extraction is a fundamental problem in the field of computational histopathology. Considering the powerful representation capacity of deep learning and the scarcity of annotations, self-supervised learning has emerged as a promising approach to extract effective visual representations from unlabeled histopathological images. Although a few self-supervised learning methods have been specifically proposed for histopathological images, most of them suffer from certain defects that may hurt the versatility or representation capacity. In this work, we propose CS-CO, a hybrid self-supervised visual representation learning method tailored for H&E-stained histopathological images, which integrates advantages of both generative and discriminative approaches. The proposed method consists of two self-supervised learning stages: cross-stain prediction (CS) and contrastive learning (CO). In addition, a novel data augmentation approach named stain vector perturbation is specifically proposed to facilitate contrastive learning. Our CS-CO makes good use of domain-specific knowledge and requires no side information, which means good rationality and versatility. We evaluate and analyze the proposed CS-CO on three H&E-stained histopathological image datasets with downstream tasks of patch-level tissue classification and slide-level cancer prognosis and subtyping. Experimental results demonstrate the effectiveness and robustness of the proposed CS-CO on common computational histopathology tasks. Furthermore, we also conduct ablation studies and prove that cross-staining prediction and contrastive learning in our CS-CO can complement and enhance each other. Our code is made available at https://github.com/easonyang1996/CS-CO.

Introduction

Histopathology plays an important role in clinical medicine. It can reveal the morphology of pathologic cell and tissue at a microscopic level and provide vital information for disease diagnosis and prognosis (Srinidhi et al., 2021). In the past decades, thanks to the popularity of whole slide digital scanners, a growing number of histopathological slides have been digitized as histopathological images. This process of digitization not only facilitates the viewing, storing and sharing of histopathological slides but also paves the way for computer-aided analysis (Al-Janabi et al., 2012). In recent years, lots of computer-aided histopathological image analysis methods have been proposed, aiming to relieve the workload of pathologists and improve the objectivity of disease diagnosis (Gurcan et al., 2009). These valuable researches give birth to a promising research topic, computational histopathology, which has made a huge influence on the study of pathology (Abels et al., 2019). Furthermore, artificial intelligence-based computational histopathology has recently shown great promise to increase both the accuracy and availability of high-quality health care to patients (Srinidhi, Ciga, Martel, 2021, Cui, Zhang, 2021).

In computational histopathology, extracting effective visual representation is one of the most important problems (Gurcan et al., 2009). It is the cornerstone of many computational histopathology tasks, such as image retrieval (Shi, Sapkota, Xing, Liu, Cui, Yang, 2018, Yang, Zhai, Li, Lv, Wang, Zhu, Jiang, 2020), disease diagnosis (Shao, Bian, Chen, Wang, Zhang, Ji, et al., 2021, Lu, Williamson, Chen, Chen, Barbieri, Mahmood, 2021) and prognosis (Saillard, Schmauch, Laifa, Moarii, Toldo, Zaslavskiy, Pronier, Laurent, Amaddeo, Regnault, et al., 2020, Yao, Zhu, Jonnagaddala, Hawkins, Huang, 2020), and molecular signature prediction (Ding, Liu, Lee, Zhou, Lu, Zhang, 2020, Fu, Jung, Torne, Gonzalez, Vöhringer, Shmatko, Yates, Jimenez-Linan, Moore, Gerstung, 2020, Kather, Heij, Grabsch, Loeffler, Echle, Muti, Krause, Niehues, Sommer, Bankhead, et al., 2020). Besides, using visual representations instead of raw RGB images can significantly reduce data dimensionality and computational consumption. In earlier researches, some features are manually designed based on the knowledge of pathology and extracted via traditional feature extraction approaches. However, these traditional handcrafted features are very subjective, so their representation capacity is limited (Madabhushi and Lee, 2016). Recently, deep learning-based methods have shown powerful representation capability and have gradually become the mainstream for visual representation extraction (LeCun et al., 2015). Deep learning-based methods usually rely on large amounts of labeled data to learn good visual representations, while preparing large-scale labeled datasets is expensive and time-consuming, especially for histopathological image data. Therefore, to avoid this tedious data collection and annotation procedure, some researchers take a compromise and utilize pre-trained deep models, e.g. ImageNet (Deng et al., 2009) pre-trained convolutional neural network (CNN), to extract visual representations from histopathological images (Shao, Bian, Chen, Wang, Zhang, Ji, et al., 2021, Lu, Williamson, Chen, Chen, Barbieri, Mahmood, 2021, Saillard, Schmauch, Laifa, Moarii, Toldo, Zaslavskiy, Pronier, Laurent, Amaddeo, Regnault, et al., 2020, Yao, Zhu, Jonnagaddala, Hawkins, Huang, 2020, Ding, Liu, Lee, Zhou, Lu, Zhang, 2020). However, this compromise ignores both the data distribution difference and task bias, which will result in inappropriate or suboptimal visual representations.

Considering the aforementioned dilemma, self-supervised learning is one of the feasible solutions and has received increasing attention from researchers in recent years. The greatest advantage of self-supervised learning is that it can fit a deep model using only unlabeled data. Given a well-designed pretext task, the supervisory signals can be automatically generated from the unlabeled data. Then, the deep model can be easily trained to capture features by solving the pretext task in a supervised manner (Jing and Tian, 2020). In the past few years, self-supervised visual representation learning has made great progress. For natural images, several self-supervised learning methods (He, Fan, Wu, Xie, Girshick, 2020, Chen, Kornblith, Norouzi, Hinton, 2020) have achieved surprising results and shrunk the performance gap with supervised methods on downstream tasks (Jing and Tian, 2020). For histopathological images, a few self-supervised learning methods have also been proposed, but most of them have certain defects: some methods need side information other than images like magnification for supervision (Sahasrabudhe, Christodoulidis, Salgado, Michiels, Loi, André, Paragios, Vakalopoulou, 2020, Xie, Chen, Li, Shen, Ma, Zheng, 2020), others rely on the spatial proximity assumption which does not necessarily hold (Gildenblat, Klaiman, 2019, Abbet, Zlobec, Bozorgtabar, Thiran, 2020). As far as we know, there is still a lack of universal and effective self-supervised learning methods for extracting visual representations from histopathological images.

To this end, we propose CS-CO, a novel hybrid self-supervised visual representation learning method tailored for H&E-stained histopathological images. Our CS-CO employs two kinds of pretext tasks for self-supervised learning. One is the generative Cross-Stain prediction, and the other is the discriminative COntrastive learning. Both of them make good use of domain-specific knowledge and require no side information. Therefore, the proposed method has good rationality and versatility. The major contributions of our work are summarized as follows.

•
We design a novel generative pretext task, i.e., cross-staining prediction, for self-supervised learning on H&E-stained histopathological images.
•
We propose a new data augmentation approach, i.e. stain vector perturbation, to facilitate histopathological image contrastive learning.
•
We integrate the advantages of generative and discriminative approaches and build a hybrid self-supervised visual representation learning framework for H&E-stained histopathological images.
•
We demonstrate the superiority of the proposed CS-CO on several computational pathology tasks, such as patch-level tissue classification and slide-level disease cancer prognosis and subtyping.

This paper is an extension of the preliminary work (Yang et al., 2021) presented in MICCAI 2021. Besides a more in-depth background introduction, a more detailed method description and a more comprehensive discussion on experimental results, we improve and extend the previous paper in three main aspects. (1) We conduct new ablation studies to analyze the impact of the weighting of contrastive learning and cross-stain prediction on model performance. According to the experimental results, we refine the training strategy of CS-CO to improve the robustness. (2) We rerun the experiments presented in (Yang et al., 2021), with the more convincing cross-validation strategy and statistical significance testing. We also add two strong pathology-specific contrastive learning baselines for comprehensive comparison. (3) We evaluate the performance of CS-CO on two slide-level downstream tasks. One is for hepatocellular carcinoma (HCC) prognosis, and the other is for glioma subtyping. The experimental results indicate the effectiveness and versatility of CS-CO on common computational histopathology tasks.

Section snippets

Related work

The recently emerged self-supervised learning has become an important branch of deep learning. In the context of self-supervised learning, deep learning models can be well-trained using only unlabeled data, and visual representations can be easily extracted with the learned models. In this section, we will firstly introduce the taxonomy of existing self-supervised learning methods. Then, studies on contrastive learning, as well as methods for self-supervised learning of medical images, will be

Overview of CS-CO

As illustrated in Fig. 1, our proposed CS-CO consists of two self-supervised learning stages, namely cross-stain prediction and contrastive learning, both of which are specially designed for histopathological images. Before the first self-supervised learning stage, stain separation is firstly applied on original H&E-stained images to generate single-dye staining results. With these stain-separated images, a two-branch autoencoder is trained at the first self-supervised learning stage by solving

Experiments and results

We conduct four sets of experiments on three different H&E-stained histopathological image datasets to dissect and evaluate the proposed CS-CO. First of all, we show the feasibility of the proposed generative pretext task, cross-stain prediction, on all datasets. Afterwards, on the patch-level tissue classification dataset, we compare CS-CO with several baselines under the linear evaluation protocol and conduct ablation studies to explore the role of key components of CS-CO. At last, we

Discussion and conclusion

Extracting effective visual representations from histopathological images is the cornerstone of many computational histopathology tasks. In recent years, deep learning models have shown powerful capabilities in extracting representations from images. However, it is not easy to collect large-scale labeled data for model training, especially for medical images like histopathological images. Nowadays, thanks to the popularity of digital pathology, a growing number of unlabeled histopathological

CRediT authorship contribution statement

Pengshuai Yang: Conceptualization, Data curation, Formal analysis, Methodology, Software, Visualization, Writing – original draft. Xiaoxu Yin: Data curation, Formal analysis, Software, Visualization, Writing – original draft. Haiming Lu: Supervision, Writing – review & editing. Zhongliang Hu: Resources, Data curation, Writing – review & editing. Xuegong Zhang: Funding acquisition, Supervision, Writing – review & editing. Rui Jiang: Conceptualization, Funding acquisition, Supervision, Writing –

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Key Research and Development Program of China grant no. 2021YFF1200902, the National Natural Science Foundation of China grants nos. 61873141, 61721003, 61573207, U1736210, 42050101, a grant from the Guoqiang Institute, Tsinghua University, and the Tsinghua-Fuzhou Institute for Data Technology. We also thank Zhengyu Zhao for proofreading this article.

References (72)

L. Chen et al.
Self-supervised learning for medical image analysis using image context restoration
Medical image analysis
(2019)
O. Ciga et al.
Self supervised contrastive learning for digital histopathology
Machine Learning with Applications
(2022)
M. Cui et al.
Artificial intelligence and computational pathology
Laboratory Investigation
(2021)
A. Madabhushi et al.
Image analysis and machine learning in digital pathology: Challenges and opportunities
Medical image analysis
(2016)
X. Shi et al.
Pairwise based deep ranking hashing for histopathology image classification and retrieval
Pattern Recognition
(2018)
C.L. Srinidhi et al.
Deep neural network models for computational histopathology: A survey
Medical Image Analysis
(2021)
C.L. Srinidhi et al.
Self-supervised driven consistency training for annotation efficient histopathology image analysis
Medical Image Analysis
(2022)
J. Vasiljević et al.
Towards histopathological stain invariance by unsupervised domain augmentation using generative adversarial networks
Neurocomputing
(2021)
M. Veta et al.
Prognostic value of automatically extracted nuclear morphometric features in whole slide images of male breast cancer
Modern pathology
(2012)
J. Yan et al.
Deep contrastive learning based tissue clustering for annotation-free histopathology image analysis
Computerized Medical Imaging and Graphics
(2022)

P. Yang et al.

A deep metric learning approach for histopathological image retrieval

Methods

(2020)

J. Yao et al.

Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks

Medical Image Analysis

(2020)

C. Abbet et al.

Divide-and-rule: self-supervised learning for survival analysis in colorectal cancer

International Conference on Medical Image Computing and Computer-Assisted Intervention

(2020)

E. Abels et al.

Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the digital pathology association

The Journal of pathology

(2019)

S. Al-Janabi et al.

Digital pathology: current status and future perspectives

Histopathology

(2012)

A. Ally et al.

Comprehensive and integrative genomic characterization of hepatocellular carcinoma

Cell

(2017)

A. Basavanhally et al.

Multi-field-of-view framework for distinguishing tumor grade in er+ breast cancer from entire histopathology slides

IEEE transactions on biomedical engineering

(2013)

J. Boyd et al.

Self-supervised representation learning using visual field expansion on digital pathology

Dimensionality reduction by learning an invariant mapping

2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)

(2006)

K. He et al.

Masked autoencoders are scalable vision learners

arXiv preprint arXiv:2111.06377

(2021)

K. He et al.

Momentum contrast for unsupervised visual representation learning

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

(2020)

Cited by (26)

RNFLT2Vec: Artifact-corrected representation learning for retinal nerve fiber layer thickness maps
2024, Medical Image Analysis
Optical coherence tomography imaging provides a crucial clinical measurement for diagnosing and monitoring glaucoma through the two-dimensional retinal nerve fiber layer (RNFL) thickness (RNFLT) map. Researchers have been increasingly using neural models to extract meaningful features from the RNFLT map, aiming to identify biomarkers for glaucoma and its progression. However, accurately representing the RNFLT map features relevant to glaucoma is challenging due to significant variations in retinal anatomy among individuals, which confound the pathological thinning of the RNFL. Moreover, the presence of artifacts in the RNFLT map, caused by segmentation errors in the context of degraded image quality and defective imaging procedures, further complicates the task. In this paper, we propose a general framework called RNFLT2Vec for unsupervised learning of vectorized feature representations from RNFLT maps. Our method includes an artifact correction component that learns to rectify RNFLT values at artifact locations, producing a representation reflecting the RNFLT map without artifacts. Additionally, we incorporate two regularization techniques to encourage discriminative representation learning. Firstly, we introduce a contrastive learning-based regularization to capture the similarities and dissimilarities between RNFLT maps. Secondly, we employ a consistency learning-based regularization to align pairwise distances of RNFLT maps with their corresponding thickness distributions. Through extensive experiments on a large-scale real-world dataset, we demonstrate the superiority of RNFLT2Vec in three different clinical tasks: RNFLT pattern discovery, glaucoma detection, and visual field prediction. Our results validate the effectiveness of our framework and its potential to contribute to a better understanding and diagnosis of glaucoma.
GCLR: A self-supervised representation learning pretext task for glomerular filtration barrier segmentation in TEM images
2023, Artificial Intelligence in Medicine
Automatic segmentation of the three substructures of glomerular filtration barrier (GFB) in transmission electron microscopy (TEM) images holds immense potential for aiding pathologists in renal disease diagnosis. However, the labor-intensive nature of manual annotations limits the training data for a fully-supervised deep learning model. Addressing this, our study harnesses self-supervised representation learning (SSRL) to utilize vast unlabeled data and mitigate annotation scarcity. Our innovation, GCLR, is a hybrid pixel-level pretext task tailored for GFB segmentation, integrating two subtasks: global clustering (GC) and local restoration (LR). GC captures the overall GFB by learning global context representations, while LR refines three substructures by learning local detail representations. Experiments on 18,928 unlabeled glomerular TEM images for self-supervised pre-training and 311 labeled ones for fine-tuning demonstrate that our proposed GCLR obtains the state-of-the-art segmentation results for all three substructures of GFB with the Dice similarity coefficient of 86.56 ± 0.16%, 75.56 ± 0.36%, and 79.41 ± 0.16%, respectively, compared with other representative self-supervised pretext tasks. Our proposed GCLR also outperforms the fully-supervised pre-training methods based on the three large-scale public datasets – MitoEM, COCO, and ImageNet – with less training data and time.
Pyramid-based self-supervised learning for histopathological image classification
2023, Computers in Biology and Medicine
Large-scale labeled datasets are crucial for the success of supervised learning in medical imaging. However, annotating histopathological images is a time-consuming and labor-intensive task that requires highly trained professionals. To address this challenge, self-supervised learning (SSL) can be utilized to pre-train models on large amounts of unsupervised data and transfer the learned representations to various downstream tasks. In this study, we propose a self-supervised Pyramid-based Local Wavelet Transformer (PLWT) model for effectively extracting rich image representations. The PLWT model extracts both local and global features to pre-train a large number of unlabeled histopathology images in a self-supervised manner. Wavelet is used to replace average pooling in the downsampling of the multi-head attention, achieving a significant reduction in information loss during the transmission of image features. Additionally, we introduce a Local Squeeze-and-Excitation (Local SE) module in the feedforward network in combination with the inverse residual to capture local image information. We evaluate PLWT’s performance on three histopathological images and demonstrate the impact of pre-training. Our experiment results indicate that PLWT with self-supervised learning performs highly competitive when compared with other SSL methods, and the transferability of visual representations generated by SSL on domain-relevant histopathological images exceeds that of the supervised baseline trained on ImageNet.
SGCL: Spatial guided contrastive learning on whole-slide pathological images
2023, Medical Image Analysis
Self-supervised representation learning (SSL) has achieved remarkable success in its application to natural images while falling behind in performance when applied to whole-slide pathological images (WSIs). This is because the inherent characteristics of WSIs in terms of gigapixel resolution and multiple objects in training patches are fundamentally different from natural images. Directly transferring the state-of-the-art (SOTA) SSL methods designed for natural images to WSIs will inevitably compromise their performance. We present a novel scheme SGCL: Spatial Guided Contrastive Learning, to fully explore the inherent properties of WSIs, leveraging the spatial proximity and multi-object priors for stable self-supervision. Beyond the self-invariance of instance discrimination, we expand and propagate the spatial proximity for the intra-invariance from the same WSI and inter-invariance from different WSIs, as well as propose the spatial-guided multi-cropping for inner-invariance within patches. To adaptively explore such spatial information without supervision, we propose a new loss function and conduct a theoretical analysis to validate it. This novel scheme of SGCL is able to achieve additional improvements over the SOTA pre-training methods on diverse downstream tasks across multiple datasets. Extensive ablation studies have been carried out and visualizations of these results have been presented to aid understanding of the proposed SGCL scheme. As open science, all codes and pre-trained models are available at https://github.com/HHHedo/SGCL.
Identification lymph node metastasis in esophageal squamous cell carcinoma using whole slide images and a hybrid network of multiple instance and transfer learning
2023, Biomedical Signal Processing and Control
Difficulties associated with identifying lymph nodes metastasis in esophageal squamous cell carcinoma (ESCC LNM) can make it challenging to determine the clinical stage and devize precise treatment strategies for patients with esophageal cancer (EC). The lack of a large public dataset and expensive expert annotation are the factors responsible for the slow development of clinical computer-aided diagnosis for ESCC LNM. In this study, we collected 863 whole slide images from 198 patients at two hospitals, and developed a weakly supervised workflow based on a hybrid network of multiple instance and transfer learning (MITL) for automating identification of ESCC LNM. The results showed that MITL achieved a significant performance advantage over its competitors. The accuracy (ACC), F1-score, Matthews correlation coefficient, and areas under the curve were 0.976, 0.944, 0.929, and 0.991 for the internal testing data, and 0.969, 0.925, 0.905, and 0.988 for the external testing data, respectively. Compared to pre-train feature extractor, the improvement in ACC of pre-training aggregation network on the internal testing data was approximately twice. Furthermore, ACCs were determined using MITL from micrometastasis and macrometastasis (0.824 and 0.95, respectively). Visualization showed that the key features of ESCC LNM can be extracted by MITL for accurate detection and classification. In summary, our findings illustrated that MITL can achieve the high-efficiency identification and prediction of ESCC LNM with less investment of resources, and provide a new research strategy for the diagnosis of it.
Traditional machine learning algorithms for breast cancer image classification with optimized deep features
2023, Biomedical Signal Processing and Control
Citation Excerpt :
In order to obtain a good result over meta-heuristic algorithms, the method to be used must be correctly associated with the relevant problem. As deep learning has advanced, features are now automatically acquired, and it has shown an increasing capacity to extract high-dimensional features [19,20]. The fact that data with very high dimensional features is the key issue here [21,22].
For breast cancer diagnosis, computer-aided classification of histopathological images is of critical importance for correct and early diagnosis. Transfer learning approaches for feature extraction have made significant progress in recent years and are now widely used. To select the best representative features to classify breast cancer pathological images and avoid the curse of dimensionality, this work uses optimized deep features. The proposed approach firstly employs ResNet18 architecture for feature extraction to achieve deep features. Then, meta-heuristic algorithms namely Particle Swarm Optimization (PSO), Atom Search Optimization (ASO) and Equilibrium Optimizer (EO) algorithms, are employed to provide more representative features of breast cancer pathological images. To understand the effect of optimized deep features on classification, traditional machine learning (ML) algorithms are used. The experimental analysis of the proposed approach has been done on the public benchmark dataset BreakHis. Experimental results illustrate that, for features obtained from ResNet18-EO, the proposed approach achieves a 97.75% F-score by using the Support Vector Machine (SVM) with gaussian and radial-based functions (RBF).

View all citing articles on Scopus

View full text

CS-CO: A Hybrid Self-Supervised Visual Representation Learning Method for H&E-stained Histopathological Images

Highlights

Abstract

Introduction

Section snippets

Related work

Overview of CS-CO

Experiments and results

Discussion and conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Medical image analysis

Machine Learning with Applications

Laboratory Investigation

Medical image analysis

Pattern Recognition

Medical Image Analysis

Medical Image Analysis

Neurocomputing

Modern pathology

Computerized Medical Imaging and Graphics

Methods

Medical Image Analysis

Divide-and-rule: self-supervised learning for survival analysis in colorectal cancer

International Conference on Medical Image Computing and Computer-Assisted Intervention

Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the digital pathology association

The Journal of pathology

Digital pathology: current status and future perspectives

Histopathology

Comprehensive and integrative genomic characterization of hepatocellular carcinoma

Cell

Multi-field-of-view framework for distinguishing tumor grade in er+ breast cancer from entire histopathology slides

IEEE transactions on biomedical engineering

Self-supervised representation learning using visual field expansion on digital pathology

Proceedings of the IEEE/CVF International Conference on Computer Vision

The wonderful colors of the hematoxylin–eosin stain in diagnostic surgical pathology

International journal of surgical pathology

Stain mix-up: Unsupervised domain generalization for histopathology images

International Conference on Medical Image Computing and Computer-Assisted Intervention

A simple framework for contrastive learning of visual representations

International conference on machine learning

Exploring simple siamese representation learning

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Siamese neural networks: An overview

Artificial Neural Networks

Imagenet: A large-scale hierarchical image database

2009 IEEE conference on computer vision and pattern recognition

Feature-enhanced graph networks for genetic mutational prediction using histopathological images in colon cancer

International Conference on Medical Image Computing and Computer-Assisted Intervention

Unsupervised visual representation learning by context prediction

Proceedings of the IEEE international conference on computer vision

Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis

Nature Cancer

Integrative analysis of complex cancer genomics and clinical profiles using the cbioportal

Science signaling

Unsupervised representation learning by predicting image rotations

arXiv preprint arXiv:1803.07728

Self-supervised similarity learning for digital pathology

arXiv preprint arXiv:1905.08139

Bootstrap your own latent-a new approach to self-supervised learning

Advances in Neural Information Processing Systems

Histopathological image analysis: A review

IEEE reviews in biomedical engineering

Dimensionality reduction by learning an invariant mapping

2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)

Masked autoencoders are scalable vision learners

arXiv preprint arXiv:2111.06377

Momentum contrast for unsupervised visual representation learning

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition