research-article

Not All Pixels Are Matched: Dense Contrastive Learning for Cross-Modality Person Re-Identification

Authors:
Hanzhe Sun

East China Normal University, Shanghai, China

East China Normal University, Shanghai, China
View Profile

,
Jun Liu

Tencent Youtu Lab, Shenzhen, China

Tencent Youtu Lab, Shenzhen, China
View Profile

,
Zhizhong Zhang

East China Normal University, Shanghai, China

East China Normal University, Shanghai, China
View Profile

,
Chengjie Wang

Tencent Youtu Lab, Shanghai, China

Tencent Youtu Lab, Shanghai, China
View Profile

,
Yanyun Qu

Xiamen University, Xiamen, China

Xiamen University, Xiamen, China
View Profile

,
Yuan Xie

East China Normal University, Shanghai, China

East China Normal University, Shanghai, China
View Profile

,
Lizhuang Ma

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022Pages 5333–5341https://doi.org/10.1145/3503161.3547970

Published:10 October 2022Publication History

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 5333–5341

ABSTRACT

Visible-Infrared Person Re-Identification (VI-ReID) has become an emerging task for night-time surveillance systems. In order to reduce the cross-modality discrepancy, previous works either align the features via metric learning or generate synthesized cross-modality images by Generative Adversary Network. However, feature-level alignment ignores the heterogeneous data itself while generative framework suffers from the low generation quality, limiting their applications. In this paper, we propose a dense contrastive learning framework (DCLNet), which performs pixel-to-pixel dense alignment acting on the intermediate representations, rather than the final deep feature. It is a new loss function that brings views of positive pixels with same semantic information closer in shallow representation space, whilst pushing views of negative pixels apart. It naturally provides additional dense supervision and captures fine-grained pixel correspondence, reducing the modality gap from a new perspective. To implement it, a Part Aware Parsing (PAP) module and a Semantic Rectification Module (SRM) are introduced to learn and refine a semantic-guided mask, allowing us to efficiently find positive pairs only requiring instance-level supervision. Extensive experiments on the public SYSU-MM01 and RegDB datasets demonstrate the superiority of our pipeline over state-of-the-arts. Code is available at https://github.com/sunhz0117/DCLNet.

Supplemental Material

Available for Download

mp4

MM22-fp914.mp4 (15 MB)

References

Krishna Chaitanya, Ertunc Erdil, Neerav Karani, and Ender Konukoglu. 2020. Contrastive learning of global and local features for medical image segmentation with limited annotations. arXiv preprint arXiv:2006.10511 (2020).Google Scholar
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.Google Scholar
Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, and Jia-Bin Huang. 2019. Crdoco: Pixel-level domain transfer with cross-domain consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1791--1800.Google ScholarCross Ref
Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, and Changick Kim. 2020. Hi-cmd: Hierarchical cross-modality disentanglement for visible-infrared person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10257--10266.Google ScholarCross Ref
Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-modality person re-identification with generative adversarial training.. In IJCAI, Vol. 1. 2.Google Scholar
Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, and Ran He. 2021. CMNAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification. arXiv preprint arXiv:2101.08467 (2021).Google Scholar
Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3146--3154.Google ScholarCross Ref
Shaogang Gong, Marco Cristani, Chen Change Loy, and Timothy M Hospedales. 2014. The re-identification challenge. In Person re-identification. Springer, 1--20.Google Scholar
Xin Hao, Sanyuan Zhao, Mang Ye, and Jianbing Shen. 2021. Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16403-- 16412.Google ScholarCross Ref
Yi Hao, Nannan Wang, Xinbo Gao, Jie Li, and Xiaoyu Wang. 2019. Dual-alignment feature embedding for cross-modality person re-identification. In Proceedings of the 27th ACM International Conference on Multimedia. 57--65.Google ScholarDigital Library
Yi Hao, Nannan Wang, Jie Li, and Xinbo Gao. 2019. HSME: hypersphere manifold embedding for visible thermal person re-identification. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 8385--8392.Google ScholarDigital Library
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729--9738.Google ScholarCross Ref
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018).Google Scholar
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.Google ScholarCross Ref
Guoliang Kang, Yunchao Wei, Yi Yang, Yueting Zhuang, and Alexander G Hauptmann. 2020. Pixel-level cycle association: A new perspective for domain adaptive semantic segmentation. arXiv preprint arXiv:2011.00147 (2020).Google Scholar
Diangang Li, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2020. Infrared-visible cross-modal person re-identification with an x modality. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4610--4617.Google ScholarCross Ref
Wenkang Li, Ke Qi, Wenbin Chen, and Yicong Zhou. 2021. Unified Batch All Triplet Loss for Visible-Infrared Person Re-identification. arXiv preprint arXiv:2103.04607 (2021).Google Scholar
Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, and Nenghai Yu. 2020. Cross-modality person re-identification with shared-specific feature transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13379--13389.Google ScholarCross Ref
Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. 2019. Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0--0.Google ScholarCross Ref
Dat Tien Nguyen, Hyung Gil Hong, Ki Wan Kim, and Kang Ryoung Park. 2017. Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17, 3 (2017), 605.Google ScholarCross Ref
Hyunjong Park, Sanghoon Lee, Junghyup Lee, and Bumsub Ham. 2021. Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12046--12055.Google ScholarCross Ref
Pedro O Pinheiro, Amjad Almahairi, Ryan Y Benmalek, Florian Golemo, and Aaron Courville. 2020. Unsupervised learning of dense visual representations. arXiv preprint arXiv:2011.05499 (2020).Google Scholar
Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European conference on computer vision (ECCV). 480--496.Google ScholarDigital Library
Xudong Tian, Zhizhong Zhang, Shaohui Lin, Yanyun Qu, Yuan Xie, and Lizhuang Ma. 2021. Farewell to Mutual Information: Variational Distillation for CrossModal Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1522--1531.Google ScholarCross Ref
Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019. Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3623--3632.Google ScholarCross Ref
Guan-An Wang, Tianzhu Zhang, Yang Yang, Jian Cheng, Jianlong Chang, Xu Liang, and Zeng-Guang Hou. 2020. Cross-modality paired-images generation for RGB-infrared person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12144--12151.Google ScholarCross Ref
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.Google ScholarCross Ref
Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, and Lei Li. 2021. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3024--3033.Google ScholarCross Ref
Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, and Shin'ichi Satoh. 2019. Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 618--626.Google ScholarCross Ref
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV). 3--19.Google ScholarDigital Library
Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE international conference on computer vision. 5380--5389.Google ScholarCross Ref
Qiong Wu, Pingyang Dai, Jie Chen, Chia-Wen Lin, Yongjian Wu, Feiyue Huang, Bineng Zhong, and Rongrong Ji. 2021. Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4330--4339.Google ScholarCross Ref
Zhenda Xie, Yutong Lin, Zheng Zhang, Yue Cao, Stephen Lin, and Han Hu. 2021. Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16684--16693.Google ScholarCross Ref
Wenjie Yang, Houjing Huang, Zhang Zhang, Xiaotang Chen, Kaiqi Huang, and Shu Zhang. 2019. Towards rich feature discovery with class activation maps augmentation for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1389--1398.Google ScholarCross Ref
Mang Ye, Xiangyuan Lan, Jiawei Li, and Pong Yuen. 2018. Hierarchical discriminative learning for visible thermal person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C Yuen. 2019. Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Transactions on Information Forensics and Security 15 (2019), 407--419.Google ScholarDigital Library
Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, and Jiebo Luo. 2020. Dynamic dual-attentive aggregation learning for visible-infrared person reidentification. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XVII 16. Springer, 229--247.Google Scholar
Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven CH Hoi. 2021. Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google Scholar
Mang Ye, Zheng Wang, Xiangyuan Lan, and Pong C Yuen. 2018. Visible thermal person re-identification via dual-constrained top-ranking.. In IJCAI, Vol. 1. 2.Google Scholar
Yukang Zhang, Yan Yan, Yang Lu, and Hanzi Wang. 2021. Towards a Unified Middle Modality Learning for Visible-Infrared Person Re-Identification. In Proceedings of the 29th ACM International Conference on Multimedia. 788--796.Google ScholarDigital Library
Zhizhong Zhang, Yuan Xie, Ding Li, Wensheng Zhang, and Qi Tian. 2020. Learning to align via wasserstein for person re-identification. IEEE Transactions on Image Processing 29 (2020), 7104--7116.Google ScholarCross Ref
Zhizhong Zhang, Yuan Xie, Wensheng Zhang, Yongqiang Tang, and Qi Tian. 2019. Tensor multi-task learning for person re-identification. IEEE Transactions on Image Processing 29 (2019), 2463--2477.Google ScholarDigital Library
Zhizhong Zhang, Yuan Xie, Wensheng Zhang, and Qi Tian. 2019. Effective image retrieval via multilinear multi-index fusion. IEEE Transactions on Multimedia 21, 11 (2019), 2878--2890.Google ScholarCross Ref
Liang Zheng, Yi Yang, and Alexander G Hauptmann. 2016. Person reidentification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016).Google Scholar
Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 13001--13008.Google ScholarCross Ref
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929Google ScholarCross Ref

Index Terms

Not All Pixels Are Matched: Dense Contrastive Learning for Cross-Modality Person Re-Identification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification

Recommendations

Precise Location Matching Improves Dense Contrastive Learning in Digital Pathology
Information Processing in Medical Imaging
Abstract
Dense prediction tasks such as segmentation and detection of pathological entities hold crucial clinical value in computational pathology workflows. However, obtaining dense annotations on large cohorts is usually tedious and expensive. ...
Read More
CP $^{2}$ : Copy-Paste Contrastive Pretraining for Semantic Segmentation
Computer Vision – ECCV 2022
Abstract
Recent advances in self-supervised contrastive learning yield good image-level representation, which favors classification tasks but usually neglects pixel-level detailed information, leading to unsatisfactory transfer performance to dense ... $^{}$ $^{}$ $^{}$
Read More
Semi-supervised Single-Cell Cross-modality Translation Using Polarbear
Research in Computational Molecular Biology
Abstract
The emergence of single-cell co-assays enables us to learn to translate between single-cell modalities, potentially offering valuable insights from datasets where only one modality is available. However, the sparsity of single-cell measurements ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cross-modality alignment
dense contrastive learning
visible-infrared person re-identification
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 409
  Total Downloads
- Downloads (Last 12 months)214
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Not All Pixels Are Matched: Dense Contrastive Learning for Cross-Modality Person Re-Identification

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Precise Location Matching Improves Dense Contrastive Learning in Digital Pathology

CP $^{2}$ : Copy-Paste Contrastive Pretraining for Semantic Segmentation

Semi-supervised Single-Cell Cross-modality Translation Using Polarbear