research-article

MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification

Authors:
Yajun Gao

Beijing Jiaotong University, Beijing, China

Beijing Jiaotong University, Beijing, China
View Profile

,
Tengfei Liang

Beijing Jiaotong University, Beijing, China

Beijing Jiaotong University, Beijing, China
View Profile

,
Yi Jin

Beijing Jiaotong University, Beijing, China

Beijing Jiaotong University, Beijing, China
View Profile

,
Xiaoyan Gu

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
View Profile

,
Wu Liu

JD AI Research, Beijing, China

JD AI Research, Beijing, China
View Profile

,
Yidong Li

Beijing Jiaotong Univeristy, Beijing, China

Beijing Jiaotong Univeristy, Beijing, China
View Profile

,
Congyan Lang

Beijing Jiaotong University, Beijing, China

Beijing Jiaotong University, Beijing, China
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 5257–5265https://doi.org/10.1145/3474085.3475643

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 5257–5265

ABSTRACT

The RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality. Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space, which ignore the single space of each modality in the shallow layers. To solve it, in this paper, we present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space. Firstly, based on the observation that edge information is modality-invariant, we propose an edge features enhancement module to enhance the modality-sharable features in each single-modality space. Specifically, we design a perceptual edge features (PEF) loss after the edge fusion strategy analysis. According to our knowledge, this is the first work that proposes explicit optimization in the single-modality feature space on cross-modality ReID task. Moreover, to increase the difference between cross-modality distance and class distance, we introduce a novel cross-modality contrastive-center (CMCC) loss into the modality-joint constraints in the common feature space. The PEF loss and CMCC loss jointly optimize the model in an end-to-end manner, which markedly improves the network's performance. Extensive experiments demonstrate that the proposed model significantly outperforms state-of-the-art methods on both the SYSU-MM01 and RegDB datasets.

References

Martín Arjovsky, Soumith Chintala, and Lé on Bottou. 2017. Wasserstein GAN. CoRR, Vol. abs/1701.07875 (2017).Google Scholar
Qian Bao, Wu Liu, Yuhao Cheng, Boyan Zhou, and Tao Mei. 2021. Pose-Guided Tracking-by-Detection: Robust Multi-Person Pose Tracking. IEEE Trans. Multim., Vol. 23 (2021), 161--175.Google ScholarCross Ref
Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, and Changick Kim. 2020. Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification. In CVPR. 10254--10263.Google Scholar
Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-Modality Person Re-Identification with Generative Adversarial Training. In IJCAI 2018. 677--683. Google ScholarDigital Library
Weijian Deng, Liang Zheng, Qixiang Ye, Guoliang Kang, Yi Yang, and Jianbin Jiao. 2018. Image-Image Domain Adaptation With Preserved Self-Similarity and Domain-Dissimilarity for Person Re-Identification. In CVPR 2018. 994--1003.Google Scholar
Xing Fan, Wei Jiang, Hao Luo, and Mengjuan Fei. 2019. SphereReID: Deep hypersphere manifold embedding for person re-identification. J. Vis. Commun. Image Represent., Vol. 60 (2019), 51--58.Google ScholarDigital Library
Zhan-Xiang Feng, Jianhuang Lai, and Xiaohua Xie. 2020. Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification. IEEE Trans. Image Process., Vol. 29 (2020), 579--590.Google ScholarDigital Library
Chuang Gan, Tianbao Yang, and Boqing Gong. 2016. Learning Attributes Equals Multi-Source Domain Generalization. In CVPR 2016. 87--97.Google Scholar
Chuang Gan, Hang Zhao, Peihao Chen, David D. Cox, and Antonio Torralba. 2019. Self-Supervised Moving Vehicle Tracking With Stereo Sound. In ICCV 2019. 7052--7061.Google Scholar
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM, Vol. 63, 11 (2020), 139--144. Google ScholarDigital Library
Yi Hao, Nannan Wang, Jie Li, and Xinbo Gao. 2019. HSME: Hypersphere Manifold Embedding for Visible Thermal Person Re-Identification. In AAAI 2019. 8385--8392.Google Scholar
Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. 2018. Triplet-Center Loss for Multi-View 3D Object Retrieval. In CVPR 2018. 1945--1954.Google ScholarCross Ref
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In Defense of the Triplet Loss for Person Re-Identification. CoRR, Vol. abs/1703.07737 (2017).Google Scholar
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In ICML 2015, Vol. 37. 448--456. Google ScholarDigital Library
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2016. Image-to-Image Translation with Conditional Adversarial Networks. CoRR, Vol. abs/1611.07004 (2016).Google Scholar
Mengxi Jia, Yunpeng Zhai, Shijian Lu, Siwei Ma, and Jian Zhang. 2020. A Similarity Inference Metric for RGB-Infrared Cross-Modality Person Re-identification. In IJCAI 2020. 1026--1032.Google ScholarCross Ref
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In ECCV (2) 2016, Vol. 9906. 694--711.Google ScholarCross Ref
Jin Kyu Kang, Toan Minh Hoang, and Kang Ryoung Park. 2019. Person Re-Identification Between Visible and Thermal Camera Images Based on Deep Residual CNN Using Single Input. IEEE Access, Vol. 7 (2019), 57972--57984.Google ScholarCross Ref
Kajal Kansal, A. Venkata Subramanyam, Zheng Wang, and Shin'ichi Satoh. 2020. SDL: Spectrum-Disentangled Representation Learning for Visible-Infrared Person Re-Identification. IEEE Trans. Circuits Syst. Video Technol., Vol. 30, 10 (2020), 3422--3432.Google ScholarCross Ref
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR (Poster) 2017.Google Scholar
Diangang Li, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2020. Infrared-Visible Cross-Modal Person Re-Identification with an X Modality. In AAAI 2020. 4610--4617.Google ScholarCross Ref
Yongguo Ling, Zhun Zhong, Zhiming Luo, Paolo Rota, Shaozi Li, and Nicu Sebe. 2020. Class-Aware Modality Mix and Center-Guided Metric Learning for Visible-Thermal Person Re-Identification. In ACM Multimedia 2020. 889--897. Google ScholarDigital Library
Haijun Liu, Jian Cheng, Wen Wang, Yanzhou Su, and Haiwei Bai. 2020. Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification. Neurocomputing, Vol. 398 (2020), 11--19.Google ScholarCross Ref
Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. 2017. SphereFace: Deep Hypersphere Embedding for Face Recognition. In CVPR 2017. 6738--6746.Google Scholar
Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, and Nenghai Yu. 2020. Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer. In CVPR 2020. 13376--13386.Google Scholar
Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. 2019. Bag of Tricks and a Strong Baseline for Deep Person Re-Identification. In CVPR Workshops 2019. 1487--1495.Google ScholarCross Ref
Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR, Vol. abs/1411.1784 (2014).Google Scholar
Dat Tien Nguyen, Hyung Gil Hong, Ki-Wan Kim, and Kang Ryoung Park. 2017. Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras. Sensors, Vol. 17, 3 (2017), 605.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR 2015.Google Scholar
I. Sobel and G. Feldman. 1973. A 3×3 isotropic gradient operator for image processing. Pattern Classification and Scene Analysis (1973), 271--272.Google Scholar
Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated Siamese Convolutional Neural Network Architecture for Human Re-identification. In ECCV (8) 2016, Vol. 9912. 791--808.Google ScholarCross Ref
Feng Wang, Xiang Xiang, Jian Cheng, and Alan Loddon Yuille. 2017. NormFace: L(2) Hypersphere Embedding for Face Verification. In ACM Multimedia 2017. 1041--1049. Google ScholarDigital Library
Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019 b. RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment. In ICCV 2019. 3622--3631.Google ScholarCross Ref
Guan'an Wang, Tianzhu Zhang, Yang Yang, Jian Cheng, Jianlong Chang, Xu Liang, and Zeng-Guang Hou. 2020 b. Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification. In AAAI 2020. 12144--12151.Google Scholar
Qi Wang, Xinchen Liu, Wu Liu, An-An Liu, Wenyin Liu, and Tao Mei. 2020 a. MetaSearch: Incremental Product Search via Deep Meta-Learning. IEEE Trans. Image Process., Vol. 29 (2020), 7549--7564.Google ScholarDigital Library
Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-Local Neural Networks. In CVPR 2018. 7794--7803.Google Scholar
Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, and Shin'ichi Satoh. 2019 a. Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification. In CVPR 2019. 618--626.Google ScholarCross Ref
Xing Wei, Diangang Li, Xiaopeng Hong, Wei Ke, and Yihong Gong. 2020. Co-Attentive Lifting for Infrared-Visible Person Re-Identification. In ACM Multimedia 2020. 1028--1037. Google ScholarDigital Library
Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-Infrared Cross-Modality Person Re-identification. In ICCV 2017. 5390--5399.Google Scholar
Hanrong Ye, Hong Liu, Fanyang Meng, and Xia Li. 2021 a. Bi-Directional Exponential Angular Triplet Loss for RGB-Infrared Person Re-Identification. IEEE Trans. Image Process., Vol. 30 (2021), 1583--1595.Google ScholarDigital Library
Mang Ye, Xiangyuan Lan, and Qingming Leng. 2019. Modality-aware Collaborative Learning for Visible Thermal Person Re-Identification. In ACM Multimedia 2019. 347--355. Google ScholarDigital Library
Mang Ye, Xiangyuan Lan, Jiawei Li, and Pong C. Yuen. 2018a. Hierarchical Discriminative Learning for Visible Thermal Person Re-Identification. In AAAI 2018. 7501--7508.Google Scholar
Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C. Yuen. 2020 a. Bi-Directional Center-Constrained Top-Ranking for Visible Thermal Person Re-Identification. IEEE Trans. Inf. Forensics Secur., Vol. 15 (2020), 407--419.Google ScholarDigital Library
Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven C. H. Hoi. 2020 b. Deep Learning for Person Re-identification: A Survey and Outlook. CoRR, Vol. abs/2001.04193 (2020).Google Scholar
Mang Ye, Jianbing Shen, and Ling Shao. 2021 b. Visible-Infrared Person Re-Identification via Homogeneous Augmented Tri-Modal Learning. IEEE Trans. Inf. Forensics Secur., Vol. 16 (2021), 728--739.Google ScholarCross Ref
Mang Ye, Zheng Wang, Xiangyuan Lan, and Pong C. Yuen. 2018b. Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking. In IJCAI 2018. 1092--1099. Google ScholarDigital Library
Ye Yuan, Wuyang Chen, Yang Yang, and Zhangyang Wang. 2020. In Defense of the Triplet Loss Again: Learning Robust Person Re-Identification with Fast Approximated Triplet Loss and Label Distillation. In CVPR Workshops 2020. 1454--1463.Google Scholar
Shizhou Zhang, Yifei Yang, Peng Wang, Xiuwei Zhang, and Yanning Zhang. 2019. Attend to the Difference: Cross-Modality Person Re-identification via Contrastive Correlation. CoRR, Vol. abs/1910.11656 (2019).Google Scholar
Ziyue Zhang, Shuai Jiang, Congzhentao Huang, Yang Li, and Richard Yi Da Xu. 2020. RGB-IR Cross-modality Person ReID based on Teacher-Student GAN Model. CoRR, Vol. abs/2007.07452 (2020).Google Scholar
Yun-Bo Zhao, Jian-Wu Lin, Qi Xuan, and Xugang Xi. 2019. HPILN: a feature learning framework for cross-modality person re-identification. IET Image Process., Vol. 13, 14 (2019), 2897--2904.Google ScholarCross Ref
Kecheng Zheng, Wu Liu, Lingxiao He, Tao Mei, Jiebo Luo, and Zheng-Jun Zha. 2021. Group-aware Label Transfer for Domain Adaptive Person Re-identification. CoRR, Vol. abs/2103.12366 (2021).Google Scholar
Liang Zheng, Hengheng Zhang, Shaoyan Sun, Manmohan Chandraker, Yi Yang, and Qi Tian. 2017. Person Re-identification in the Wild. In CVPR 2017. 3346--3355.Google Scholar
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV 2017. 2242--2251.Google Scholar
Yuanxin Zhu, Zhao Yang, Li Wang, Sai Zhao, Xiao Hu, and Dapeng Tao. 2020. Hetero-Center loss for cross-modality person Re-identification. Neurocomputing, Vol. 386 (2020), 97--109.Google ScholarCross Ref

Index Terms

MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

Dual-alignment Feature Embedding for Cross-modality Person Re-identification
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Person re-identification aims at searching pedestrians across different cameras, which is a key problem in video surveillance. With requirements in night environment, RGB-infrared person re-identification which could be regarded as a cross-modality ...
Read More
Cross-Modality Transformer for Visible-Infrared Person Re-Identification
Computer Vision – ECCV 2022
Abstract
Visible-infrared person re-identification (VI-ReID) is a challenging task due to the large cross-modality discrepancies and intra-class variations. Existing works mainly focus on learning modality-shared representations by embedding different ...
Read More
Unbiased feature enhancement framework for cross-modality person re-identification
Abstract
Cross-modality person re-identification aims at matching the RGB images of a specific person in variable appearances with his/her images in another modality like infrared modality, sketch modality, etc. It is challenging due to domain gap and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cross-modality
joint optimization
multi-feature space
person re-identification
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 292
  Total Downloads
- Downloads (Last 12 months)72
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Dual-alignment Feature Embedding for Cross-modality Person Re-identification

Cross-Modality Transformer for Visible-Infrared Person Re-Identification

Unbiased feature enhancement framework for cross-modality person re-identification