skip to main content
10.1145/3503161.3547970acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Not All Pixels Are Matched: Dense Contrastive Learning for Cross-Modality Person Re-Identification

Published:10 October 2022Publication History

ABSTRACT

Visible-Infrared Person Re-Identification (VI-ReID) has become an emerging task for night-time surveillance systems. In order to reduce the cross-modality discrepancy, previous works either align the features via metric learning or generate synthesized cross-modality images by Generative Adversary Network. However, feature-level alignment ignores the heterogeneous data itself while generative framework suffers from the low generation quality, limiting their applications. In this paper, we propose a dense contrastive learning framework (DCLNet), which performs pixel-to-pixel dense alignment acting on the intermediate representations, rather than the final deep feature. It is a new loss function that brings views of positive pixels with same semantic information closer in shallow representation space, whilst pushing views of negative pixels apart. It naturally provides additional dense supervision and captures fine-grained pixel correspondence, reducing the modality gap from a new perspective. To implement it, a Part Aware Parsing (PAP) module and a Semantic Rectification Module (SRM) are introduced to learn and refine a semantic-guided mask, allowing us to efficiently find positive pairs only requiring instance-level supervision. Extensive experiments on the public SYSU-MM01 and RegDB datasets demonstrate the superiority of our pipeline over state-of-the-arts. Code is available at https://github.com/sunhz0117/DCLNet.

Skip Supplemental Material Section

Supplemental Material

References

  1. Krishna Chaitanya, Ertunc Erdil, Neerav Karani, and Ender Konukoglu. 2020. Contrastive learning of global and local features for medical image segmentation with limited annotations. arXiv preprint arXiv:2006.10511 (2020).Google ScholarGoogle Scholar
  2. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.Google ScholarGoogle Scholar
  3. Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, and Jia-Bin Huang. 2019. Crdoco: Pixel-level domain transfer with cross-domain consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1791--1800.Google ScholarGoogle ScholarCross RefCross Ref
  4. Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, and Changick Kim. 2020. Hi-cmd: Hierarchical cross-modality disentanglement for visible-infrared person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10257--10266.Google ScholarGoogle ScholarCross RefCross Ref
  5. Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-modality person re-identification with generative adversarial training.. In IJCAI, Vol. 1. 2.Google ScholarGoogle Scholar
  6. Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, and Ran He. 2021. CMNAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification. arXiv preprint arXiv:2101.08467 (2021).Google ScholarGoogle Scholar
  7. Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3146--3154.Google ScholarGoogle ScholarCross RefCross Ref
  8. Shaogang Gong, Marco Cristani, Chen Change Loy, and Timothy M Hospedales. 2014. The re-identification challenge. In Person re-identification. Springer, 1--20.Google ScholarGoogle Scholar
  9. Xin Hao, Sanyuan Zhao, Mang Ye, and Jianbing Shen. 2021. Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16403-- 16412.Google ScholarGoogle ScholarCross RefCross Ref
  10. Yi Hao, Nannan Wang, Xinbo Gao, Jie Li, and Xiaoyu Wang. 2019. Dual-alignment feature embedding for cross-modality person re-identification. In Proceedings of the 27th ACM International Conference on Multimedia. 57--65.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yi Hao, Nannan Wang, Jie Li, and Xinbo Gao. 2019. HSME: hypersphere manifold embedding for visible thermal person re-identification. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 8385--8392.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729--9738.Google ScholarGoogle ScholarCross RefCross Ref
  13. R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018).Google ScholarGoogle Scholar
  14. Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.Google ScholarGoogle ScholarCross RefCross Ref
  15. Guoliang Kang, Yunchao Wei, Yi Yang, Yueting Zhuang, and Alexander G Hauptmann. 2020. Pixel-level cycle association: A new perspective for domain adaptive semantic segmentation. arXiv preprint arXiv:2011.00147 (2020).Google ScholarGoogle Scholar
  16. Diangang Li, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2020. Infrared-visible cross-modal person re-identification with an x modality. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4610--4617.Google ScholarGoogle ScholarCross RefCross Ref
  17. Wenkang Li, Ke Qi, Wenbin Chen, and Yicong Zhou. 2021. Unified Batch All Triplet Loss for Visible-Infrared Person Re-identification. arXiv preprint arXiv:2103.04607 (2021).Google ScholarGoogle Scholar
  18. Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, and Nenghai Yu. 2020. Cross-modality person re-identification with shared-specific feature transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13379--13389.Google ScholarGoogle ScholarCross RefCross Ref
  19. Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. 2019. Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0--0.Google ScholarGoogle ScholarCross RefCross Ref
  20. Dat Tien Nguyen, Hyung Gil Hong, Ki Wan Kim, and Kang Ryoung Park. 2017. Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17, 3 (2017), 605.Google ScholarGoogle ScholarCross RefCross Ref
  21. Hyunjong Park, Sanghoon Lee, Junghyup Lee, and Bumsub Ham. 2021. Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12046--12055.Google ScholarGoogle ScholarCross RefCross Ref
  22. Pedro O Pinheiro, Amjad Almahairi, Ryan Y Benmalek, Florian Golemo, and Aaron Courville. 2020. Unsupervised learning of dense visual representations. arXiv preprint arXiv:2011.05499 (2020).Google ScholarGoogle Scholar
  23. Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European conference on computer vision (ECCV). 480--496.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xudong Tian, Zhizhong Zhang, Shaohui Lin, Yanyun Qu, Yuan Xie, and Lizhuang Ma. 2021. Farewell to Mutual Information: Variational Distillation for CrossModal Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1522--1531.Google ScholarGoogle ScholarCross RefCross Ref
  25. Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019. Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3623--3632.Google ScholarGoogle ScholarCross RefCross Ref
  26. Guan-An Wang, Tianzhu Zhang, Yang Yang, Jian Cheng, Jianlong Chang, Xu Liang, and Zeng-Guang Hou. 2020. Cross-modality paired-images generation for RGB-infrared person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12144--12151.Google ScholarGoogle ScholarCross RefCross Ref
  27. Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.Google ScholarGoogle ScholarCross RefCross Ref
  28. Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, and Lei Li. 2021. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3024--3033.Google ScholarGoogle ScholarCross RefCross Ref
  29. Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, and Shin'ichi Satoh. 2019. Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 618--626.Google ScholarGoogle ScholarCross RefCross Ref
  30. Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV). 3--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE international conference on computer vision. 5380--5389.Google ScholarGoogle ScholarCross RefCross Ref
  32. Qiong Wu, Pingyang Dai, Jie Chen, Chia-Wen Lin, Yongjian Wu, Feiyue Huang, Bineng Zhong, and Rongrong Ji. 2021. Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4330--4339.Google ScholarGoogle ScholarCross RefCross Ref
  33. Zhenda Xie, Yutong Lin, Zheng Zhang, Yue Cao, Stephen Lin, and Han Hu. 2021. Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16684--16693.Google ScholarGoogle ScholarCross RefCross Ref
  34. Wenjie Yang, Houjing Huang, Zhang Zhang, Xiaotang Chen, Kaiqi Huang, and Shu Zhang. 2019. Towards rich feature discovery with class activation maps augmentation for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1389--1398.Google ScholarGoogle ScholarCross RefCross Ref
  35. Mang Ye, Xiangyuan Lan, Jiawei Li, and Pong Yuen. 2018. Hierarchical discriminative learning for visible thermal person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  36. Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C Yuen. 2019. Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Transactions on Information Forensics and Security 15 (2019), 407--419.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, and Jiebo Luo. 2020. Dynamic dual-attentive aggregation learning for visible-infrared person reidentification. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XVII 16. Springer, 229--247.Google ScholarGoogle Scholar
  38. Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven CH Hoi. 2021. Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google ScholarGoogle Scholar
  39. Mang Ye, Zheng Wang, Xiangyuan Lan, and Pong C Yuen. 2018. Visible thermal person re-identification via dual-constrained top-ranking.. In IJCAI, Vol. 1. 2.Google ScholarGoogle Scholar
  40. Yukang Zhang, Yan Yan, Yang Lu, and Hanzi Wang. 2021. Towards a Unified Middle Modality Learning for Visible-Infrared Person Re-Identification. In Proceedings of the 29th ACM International Conference on Multimedia. 788--796.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhizhong Zhang, Yuan Xie, Ding Li, Wensheng Zhang, and Qi Tian. 2020. Learning to align via wasserstein for person re-identification. IEEE Transactions on Image Processing 29 (2020), 7104--7116.Google ScholarGoogle ScholarCross RefCross Ref
  42. Zhizhong Zhang, Yuan Xie, Wensheng Zhang, Yongqiang Tang, and Qi Tian. 2019. Tensor multi-task learning for person re-identification. IEEE Transactions on Image Processing 29 (2019), 2463--2477.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhizhong Zhang, Yuan Xie, Wensheng Zhang, and Qi Tian. 2019. Effective image retrieval via multilinear multi-index fusion. IEEE Transactions on Multimedia 21, 11 (2019), 2878--2890.Google ScholarGoogle ScholarCross RefCross Ref
  44. Liang Zheng, Yi Yang, and Alexander G Hauptmann. 2016. Person reidentification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016).Google ScholarGoogle Scholar
  45. Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 13001--13008.Google ScholarGoogle ScholarCross RefCross Ref
  46. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Not All Pixels Are Matched: Dense Contrastive Learning for Cross-Modality Person Re-Identification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '22: Proceedings of the 30th ACM International Conference on Multimedia
      October 2022
      7537 pages
      ISBN:9781450392037
      DOI:10.1145/3503161

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 October 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader