research-article

Unsupervised Person Re-identification: Clustering and Fine-tuning

Authors:
Hehe Fan

Institute of Information and Control, Hangzhou Dianzi University and Center for Artificial Intelligence, University of Technology Sydney, Ultimo, Sydney, NSW, Australia

Institute of Information and Control, Hangzhou Dianzi University and Center for Artificial Intelligence, University of Technology Sydney, Ultimo, Sydney, NSW, Australia
View Profile

,
Liang Zheng

Center for Artificial Intelligence, University of Technology Sydney, Ultimo, Sydney, NSW, Australia

Center for Artificial Intelligence, University of Technology Sydney, Ultimo, Sydney, NSW, Australia
View Profile

,
Chenggang Yan

Institute of Information and Control, Hangzhou Dianzi University, Hangzhou, Zhejiang, China

Institute of Information and Control, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
View Profile

,
Yi Yang

Center for Artificial Intelligence, University of Technology Sydney, Ultimo, Sydney, NSW, Australia

Center for Artificial Intelligence, University of Technology Sydney, Ultimo, Sydney, NSW, Australia

0000-0002-0512-880X
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 14 Issue 4Article No.: 83pp 1–18https://doi.org/10.1145/3243316

Published:10 October 2018Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

The superiority of deeply learned pedestrian representations has been reported in very recent literature of person re-identification (re-ID). In this article, we consider the more pragmatic issue of learning a deep feature with no or only a few labels. We propose a progressive unsupervised learning (PUL) method to transfer pretrained deep representations to unseen domains. Our method is easy to implement and can be viewed as an effective baseline for unsupervised re-ID feature learning. Specifically, PUL iterates between (1) pedestrian clustering and (2) fine-tuning of the convolutional neural network (CNN) to improve the initialization model trained on the irrelevant labeled dataset. Since the clustering results can be very noisy, we add a selection operation between the clustering and fine-tuning. At the beginning, when the model is weak, CNN is fine-tuned on a small amount of reliable examples that locate near to cluster centroids in the feature space. As the model becomes stronger, in subsequent iterations, more images are being adaptively selected as CNN training samples. Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence. This process is naturally formulated as self-paced learning. We then point out promising directions that may lead to further improvement. Extensive experiments on three large-scale re-ID datasets demonstrate that PUL outputs discriminative features that improve the re-ID accuracy. Our code has been released at https://github.com/hehefan/Unsupervised-Person-Re-identification-Clustering-and-Fine-tuning.

References

David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’07). 1027--1035. Google ScholarDigital Library
Boris Babenko, Ming-Hsuan Yang, and Serge J. Belongie. 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 983--990. Retrieved fromGoogle Scholar
Song Bai, Xiang Bai, and Qi Tian. 2017. Scalable person re-identification on supervised smoothed manifold. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3356--3365. Retrieved fromGoogle ScholarCross Ref
S. Bai, X. Bai, Q. Tian, and L. J. Latecki. 2018. Regularized diffusion process on bidirectional context for object retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2018). Retrieved fromGoogle Scholar
Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, and Qi Tian. 2017. Ensemble diffusion for retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 774--783. Retrieved fromGoogle ScholarCross Ref
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). 41--48. Retrieved from Google ScholarDigital Library
Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1268--1277. Retrieved fromGoogle ScholarCross Ref
Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1320--1329. Retrieved fromGoogle ScholarCross Ref
De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1335--1344. Retrieved fromGoogle ScholarCross Ref
Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans. Image Processing 27, 8 (2018), 3893--3903. Retrieved fromGoogle ScholarCross Ref
Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, and Deyu Meng. 2018. Few-example object detection with model communication. IEEE Trans. Pattern Anal. Mach. Intell. (2018). Retrieved fromGoogle Scholar
Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, and Alexander G. Hauptmann. 2017. Complex event detection by identifying reliable shots from untrimmed videos. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 736--744. Retrieved fromGoogle Scholar
Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 2360--2367. Retrieved fromGoogle ScholarCross Ref
Pedro F. Felzenszwalb, Ross B. Girshick, David A. McAllester, and Deva Ramanan. {n.d.}. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9, 1627--1645. Retrieved from Google ScholarDigital Library
Mengyue Geng, Yaowei Wang, Tao Xiang, and Yonghong Tian. 2016. Deep transfer learning for person re-identification. arXiv abs/1611.05244.Google Scholar
Douglas Gray and Hai Tao. 2008. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Proceedings of the 10th European Conference on Computer Vision (ECCV’08). 262--275. Retrieved from Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778. Retrieved fromGoogle ScholarCross Ref
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv abs/1703.07737.Google Scholar
Lu Jiang, Deyu Meng, Shoou-I Yu, Zhen-Zhong Lan, Shiguang Shan, and Alexander G. Hauptmann. 2014. Self-paced learning with diversity. In Proceedings of the Annual Conference on Neural Information Processing Systems. 2078--2086. Google ScholarDigital Library
Elyor Kodirov, Tao Xiang, Zhen-Yong Fu, and Shaogang Gong. 2016. Person re-identification by unsupervised l<sub>1</sub> graph learning. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 178--195. Retrieved fromGoogle Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems. 1106--1114. Google ScholarDigital Library
M. Pawan Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-paced learning for latent variable models. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems. 1189--1197. Google ScholarDigital Library
Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 152--159. Retrieved from Google ScholarDigital Library
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2197--2206. Retrieved fromGoogle Scholar
Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving person re-identification by attribute and identity learning. arXiv abs/1703.07220.Google Scholar
Chunxiao Liu, Chen Change Loy, Shaogang Gong, and Guijin Wang. 2013. POP: Person re-identification post-rank optimisation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). 441--448. Retrieved from Google ScholarDigital Library
Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2017. End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Processing 26, 7 (2017), 3492--3506. Retrieved fromGoogle ScholarDigital Library
Jiawei Liu, Zheng-Jun Zha, Q. I. Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016. Multi-scale triplet CNN for person re-identification. In Proceedings of the 2016 ACM Conference on Multimedia Conference (MM’16). 192--196. Retrieved from Google ScholarDigital Library
Xinchen Liu, Wu Liu, Tao Mei, and Huadong Ma. 2018. PROVID: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans. Multimedia 20, 3 (2018), 645--658. Retrieved from Google ScholarDigital Library
Fan Ma, Deyu Meng, Qi Xie, Zina Li, and Xuanyi Dong. 2017. Self-paced co-training. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 2275--2284.Google ScholarDigital Library
Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, and Yisheng Zhong. 2017. Person re-identification by unsupervised video matching. Pattern Recogn. 65 (2017), 197--210. Retrieved from Google ScholarDigital Library
Zhigang Ma, Xiaojun Chang, Yi Yang, Nicu Sebe, and Alexander G. Hauptmann. 2017. The many shades of negativity. IEEE Trans. Multimedia 19, 7 (2017), 1558--1568. Retrieved fromGoogle ScholarDigital Library
Peixi Peng, Tao Xiang, Yaowei Wang, Massimiliano Pontil, Shaogang Gong, Tiejun Huang, and Yonghong Tian. 2016. Unsupervised cross-dataset transfer learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1306--1315. Retrieved fromGoogle ScholarCross Ref
Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 3--20. Retrieved fromGoogle ScholarCross Ref
Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision (ECCV’16). 17--35. Retrieved fromGoogle ScholarCross Ref
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211--252. Retrieved from Google ScholarDigital Library
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815--823. Retrieved fromGoogle ScholarCross Ref
Yifan Sun, Liang Zheng, Weijian Deng, and Shengjin Wang. 2017. SVDNet for pedestrian retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3820--3828. Retrieved fromGoogle ScholarCross Ref
Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated siamese convolutional neural network architecture for human re-identification. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 791--808. Retrieved fromGoogle ScholarCross Ref
Hanxiao Wang, Shaogang Gong, Xiatian Zhu, and Tao Xiang. 2016. Human-in-the-loop person re-identification. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 405--422. Retrieved fromGoogle ScholarCross Ref
Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person re-identification by video ranking. In Proceedings of the 13th European Conference on Computer Vision (ECCV’14). 688--703. Retrieved fromGoogle ScholarCross Ref
Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2017. Person transfer GAN to bridge domain gap for person re-identification. arXiv abs/1711.08565.Google Scholar
Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 5177--5186.Google ScholarCross Ref
Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1249--1258. Retrieved fromGoogle ScholarCross Ref
Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2017. Joint detection and identification feature learning for person search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3376--3385. Retrieved fromGoogle ScholarCross Ref
Chenggang Yan, Hongtao Xie, Shun Liu, Jian Yin, Yongdong Zhang, and Qionghai Dai. 2018. Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans. Intell. Transport. Syst. 19, 1 (2018), 220--229. Retrieved fromGoogle ScholarCross Ref
Chenggang Yan, Hongtao Xie, Dongbao Yang, Jian Yin, Yongdong Zhang, and Qionghai Dai. 2018. Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transport. Syst. 19, 1 (2018), 284--295. Retrieved fromGoogle ScholarCross Ref
Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. {n. d.}. Enhancing person re-identification in a self-trained subspace. TOMCCAP 13, 3, 27:1--27:23. Retrieved from Google ScholarDigital Library
Yi Yang, Zhigang Ma, Alexander G. Hauptmann, and Nicu Sebe. 2013. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15, 3 (2013), 661--669. Retrieved from Google ScholarDigital Library
Yang Yang, Longyin Wen, Siwei Lyu, and Stan Z. Li. 2017. Unsupervised learning of multi-level descriptors for person re-identification. In Proceedings of the 21st AAAI Conference on Artificial Intelligence. 4306--4312.Google Scholar
Mang Ye, Chao Liang, Yi Yu, Zheng Wang, Qingming Leng, Chunxia Xiao, Jun Chen, and Ruimin Hu. 2016. Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans. Multimedia 18, 12 (2016), 2553--2566. Retrieved from Google ScholarDigital Library
Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep metric learning for person re-identification. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR’14). 34--39. Retrieved from Google ScholarDigital Library
Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1239--1248. Retrieved fromGoogle ScholarCross Ref
Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1278--1287. Retrieved fromGoogle ScholarCross Ref
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). 2528--2535. Retrieved from Google ScholarDigital Library
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Unsupervised salience learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3586--3593. Retrieved from Google ScholarDigital Library
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 144--151. Retrieved from Google ScholarDigital Library
Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the 14th European Conference On Compuer Vision (ECCV’16). 868--884. Retrieved fromGoogle ScholarCross Ref
Liang Zheng, Yujia Huang, Huchuan Lu, and Yi Yang. 2017. Pose invariant embedding for deep person re-identification. arXiv abs/1701.07732 (2017).Google Scholar
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 1116--1124. Retrieved from Google ScholarDigital Library
Liang Zheng, Shengjin Wang, Lu Tian, Fei He, Ziqiong Liu, and Qi Tian. 2015. Query-adaptive late fusion for image search and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1741--1750. Retrieved fromGoogle ScholarCross Ref
Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv abs/1610.02984 (2016).Google Scholar
Liang Zheng, Yi Yang, and Qi Tian. 2018. SIFT meets CNN: A decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1224--1244. Retrieved fromGoogle ScholarCross Ref
Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3774--3782. Retrieved fromGoogle ScholarCross Ref
Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3652--3661. Retrieved fromGoogle ScholarCross Ref

Index Terms

Unsupervised Person Re-identification: Clustering and Fine-tuning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

A loss combination based deep model for person re-identification

The Convolutional Neural Network (CNN) has significantly improved the state-of-the-art in person re-identification (re-ID). In the existing available identification CNN model, the softmax loss function is employed as the supervision signal to train the ...
Read More
Unsupervised Person Re-Identification via Multi-Label Classification
Abstract
The challenge of unsupervised person re-identification (ReID) lies in learning discriminative features without true labels. Most of previous works predict single-class pseudo labels through clustering. To improve the quality of generated pseudo ...
Read More
Semi-supervised person re-identification using multi-view clustering
Highlights
- We design a semi-supervised feature representation framework for person Re-Identification which effectively utilizes both labeled and unlabeled training data ...
Abstract
Person Re-Identification (Re-Id) is a challenging task focusing on identifying the same person among disjoint camera views. A number of deep learning algorithms have been reported for this task in fully-supervised fashion which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 14, Issue 4
Special Section on Deep Learning for Intelligent Multimedia Analytics
November 2018
221 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3282485
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2018
- Accepted: 1 July 2018
- Revised: 1 June 2018
- Received: 1 February 2018
Published in tomm Volume 14, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Large-scale person re-identification
clustering
convolutional neural network
unsupervised learning
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 474
  Total Citations
  View Citations
- 1,956
  Total Downloads
- Downloads (Last 12 months)202
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Unsupervised Person Re-identification: Clustering and Fine-tuning

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

A loss combination based deep model for person re-identification

Unsupervised Person Re-Identification via Multi-Label Classification

Semi-supervised person re-identification using multi-view clustering