Abstract
The superiority of deeply learned pedestrian representations has been reported in very recent literature of person re-identification (re-ID). In this article, we consider the more pragmatic issue of learning a deep feature with no or only a few labels. We propose a progressive unsupervised learning (PUL) method to transfer pretrained deep representations to unseen domains. Our method is easy to implement and can be viewed as an effective baseline for unsupervised re-ID feature learning. Specifically, PUL iterates between (1) pedestrian clustering and (2) fine-tuning of the convolutional neural network (CNN) to improve the initialization model trained on the irrelevant labeled dataset. Since the clustering results can be very noisy, we add a selection operation between the clustering and fine-tuning. At the beginning, when the model is weak, CNN is fine-tuned on a small amount of reliable examples that locate near to cluster centroids in the feature space. As the model becomes stronger, in subsequent iterations, more images are being adaptively selected as CNN training samples. Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence. This process is naturally formulated as self-paced learning. We then point out promising directions that may lead to further improvement. Extensive experiments on three large-scale re-ID datasets demonstrate that PUL outputs discriminative features that improve the re-ID accuracy. Our code has been released at https://github.com/hehefan/Unsupervised-Person-Re-identification-Clustering-and-Fine-tuning.
- David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’07). 1027--1035. Google ScholarDigital Library
- Boris Babenko, Ming-Hsuan Yang, and Serge J. Belongie. 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 983--990. Retrieved fromGoogle Scholar
- Song Bai, Xiang Bai, and Qi Tian. 2017. Scalable person re-identification on supervised smoothed manifold. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3356--3365. Retrieved fromGoogle ScholarCross Ref
- S. Bai, X. Bai, Q. Tian, and L. J. Latecki. 2018. Regularized diffusion process on bidirectional context for object retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2018). Retrieved fromGoogle Scholar
- Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, and Qi Tian. 2017. Ensemble diffusion for retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 774--783. Retrieved fromGoogle ScholarCross Ref
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). 41--48. Retrieved from Google ScholarDigital Library
- Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1268--1277. Retrieved fromGoogle ScholarCross Ref
- Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1320--1329. Retrieved fromGoogle ScholarCross Ref
- De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1335--1344. Retrieved fromGoogle ScholarCross Ref
- Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans. Image Processing 27, 8 (2018), 3893--3903. Retrieved fromGoogle ScholarCross Ref
- Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, and Deyu Meng. 2018. Few-example object detection with model communication. IEEE Trans. Pattern Anal. Mach. Intell. (2018). Retrieved fromGoogle Scholar
- Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, and Alexander G. Hauptmann. 2017. Complex event detection by identifying reliable shots from untrimmed videos. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 736--744. Retrieved fromGoogle Scholar
- Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 2360--2367. Retrieved fromGoogle ScholarCross Ref
- Pedro F. Felzenszwalb, Ross B. Girshick, David A. McAllester, and Deva Ramanan. {n.d.}. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9, 1627--1645. Retrieved from Google ScholarDigital Library
- Mengyue Geng, Yaowei Wang, Tao Xiang, and Yonghong Tian. 2016. Deep transfer learning for person re-identification. arXiv abs/1611.05244.Google Scholar
- Douglas Gray and Hai Tao. 2008. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Proceedings of the 10th European Conference on Computer Vision (ECCV’08). 262--275. Retrieved from Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778. Retrieved fromGoogle ScholarCross Ref
- Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv abs/1703.07737.Google Scholar
- Lu Jiang, Deyu Meng, Shoou-I Yu, Zhen-Zhong Lan, Shiguang Shan, and Alexander G. Hauptmann. 2014. Self-paced learning with diversity. In Proceedings of the Annual Conference on Neural Information Processing Systems. 2078--2086. Google ScholarDigital Library
- Elyor Kodirov, Tao Xiang, Zhen-Yong Fu, and Shaogang Gong. 2016. Person re-identification by unsupervised l<sub>1</sub> graph learning. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 178--195. Retrieved fromGoogle Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems. 1106--1114. Google ScholarDigital Library
- M. Pawan Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-paced learning for latent variable models. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems. 1189--1197. Google ScholarDigital Library
- Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 152--159. Retrieved from Google ScholarDigital Library
- Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2197--2206. Retrieved fromGoogle Scholar
- Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving person re-identification by attribute and identity learning. arXiv abs/1703.07220.Google Scholar
- Chunxiao Liu, Chen Change Loy, Shaogang Gong, and Guijin Wang. 2013. POP: Person re-identification post-rank optimisation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). 441--448. Retrieved from Google ScholarDigital Library
- Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2017. End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Processing 26, 7 (2017), 3492--3506. Retrieved fromGoogle ScholarDigital Library
- Jiawei Liu, Zheng-Jun Zha, Q. I. Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016. Multi-scale triplet CNN for person re-identification. In Proceedings of the 2016 ACM Conference on Multimedia Conference (MM’16). 192--196. Retrieved from Google ScholarDigital Library
- Xinchen Liu, Wu Liu, Tao Mei, and Huadong Ma. 2018. PROVID: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans. Multimedia 20, 3 (2018), 645--658. Retrieved from Google ScholarDigital Library
- Fan Ma, Deyu Meng, Qi Xie, Zina Li, and Xuanyi Dong. 2017. Self-paced co-training. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 2275--2284.Google ScholarDigital Library
- Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, and Yisheng Zhong. 2017. Person re-identification by unsupervised video matching. Pattern Recogn. 65 (2017), 197--210. Retrieved from Google ScholarDigital Library
- Zhigang Ma, Xiaojun Chang, Yi Yang, Nicu Sebe, and Alexander G. Hauptmann. 2017. The many shades of negativity. IEEE Trans. Multimedia 19, 7 (2017), 1558--1568. Retrieved fromGoogle ScholarDigital Library
- Peixi Peng, Tao Xiang, Yaowei Wang, Massimiliano Pontil, Shaogang Gong, Tiejun Huang, and Yonghong Tian. 2016. Unsupervised cross-dataset transfer learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1306--1315. Retrieved fromGoogle ScholarCross Ref
- Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 3--20. Retrieved fromGoogle ScholarCross Ref
- Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision (ECCV’16). 17--35. Retrieved fromGoogle ScholarCross Ref
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211--252. Retrieved from Google ScholarDigital Library
- Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815--823. Retrieved fromGoogle ScholarCross Ref
- Yifan Sun, Liang Zheng, Weijian Deng, and Shengjin Wang. 2017. SVDNet for pedestrian retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3820--3828. Retrieved fromGoogle ScholarCross Ref
- Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated siamese convolutional neural network architecture for human re-identification. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 791--808. Retrieved fromGoogle ScholarCross Ref
- Hanxiao Wang, Shaogang Gong, Xiatian Zhu, and Tao Xiang. 2016. Human-in-the-loop person re-identification. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 405--422. Retrieved fromGoogle ScholarCross Ref
- Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person re-identification by video ranking. In Proceedings of the 13th European Conference on Computer Vision (ECCV’14). 688--703. Retrieved fromGoogle ScholarCross Ref
- Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2017. Person transfer GAN to bridge domain gap for person re-identification. arXiv abs/1711.08565.Google Scholar
- Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 5177--5186.Google ScholarCross Ref
- Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1249--1258. Retrieved fromGoogle ScholarCross Ref
- Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2017. Joint detection and identification feature learning for person search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3376--3385. Retrieved fromGoogle ScholarCross Ref
- Chenggang Yan, Hongtao Xie, Shun Liu, Jian Yin, Yongdong Zhang, and Qionghai Dai. 2018. Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans. Intell. Transport. Syst. 19, 1 (2018), 220--229. Retrieved fromGoogle ScholarCross Ref
- Chenggang Yan, Hongtao Xie, Dongbao Yang, Jian Yin, Yongdong Zhang, and Qionghai Dai. 2018. Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transport. Syst. 19, 1 (2018), 284--295. Retrieved fromGoogle ScholarCross Ref
- Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. {n. d.}. Enhancing person re-identification in a self-trained subspace. TOMCCAP 13, 3, 27:1--27:23. Retrieved from Google ScholarDigital Library
- Yi Yang, Zhigang Ma, Alexander G. Hauptmann, and Nicu Sebe. 2013. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15, 3 (2013), 661--669. Retrieved from Google ScholarDigital Library
- Yang Yang, Longyin Wen, Siwei Lyu, and Stan Z. Li. 2017. Unsupervised learning of multi-level descriptors for person re-identification. In Proceedings of the 21st AAAI Conference on Artificial Intelligence. 4306--4312.Google Scholar
- Mang Ye, Chao Liang, Yi Yu, Zheng Wang, Qingming Leng, Chunxia Xiao, Jun Chen, and Ruimin Hu. 2016. Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans. Multimedia 18, 12 (2016), 2553--2566. Retrieved from Google ScholarDigital Library
- Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep metric learning for person re-identification. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR’14). 34--39. Retrieved from Google ScholarDigital Library
- Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1239--1248. Retrieved fromGoogle ScholarCross Ref
- Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1278--1287. Retrieved fromGoogle ScholarCross Ref
- Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). 2528--2535. Retrieved from Google ScholarDigital Library
- Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Unsupervised salience learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3586--3593. Retrieved from Google ScholarDigital Library
- Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 144--151. Retrieved from Google ScholarDigital Library
- Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the 14th European Conference On Compuer Vision (ECCV’16). 868--884. Retrieved fromGoogle ScholarCross Ref
- Liang Zheng, Yujia Huang, Huchuan Lu, and Yi Yang. 2017. Pose invariant embedding for deep person re-identification. arXiv abs/1701.07732 (2017).Google Scholar
- Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 1116--1124. Retrieved from Google ScholarDigital Library
- Liang Zheng, Shengjin Wang, Lu Tian, Fei He, Ziqiong Liu, and Qi Tian. 2015. Query-adaptive late fusion for image search and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1741--1750. Retrieved fromGoogle ScholarCross Ref
- Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv abs/1610.02984 (2016).Google Scholar
- Liang Zheng, Yi Yang, and Qi Tian. 2018. SIFT meets CNN: A decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1224--1244. Retrieved fromGoogle ScholarCross Ref
- Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3774--3782. Retrieved fromGoogle ScholarCross Ref
- Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3652--3661. Retrieved fromGoogle ScholarCross Ref
Index Terms
- Unsupervised Person Re-identification: Clustering and Fine-tuning
Recommendations
A loss combination based deep model for person re-identification
The Convolutional Neural Network (CNN) has significantly improved the state-of-the-art in person re-identification (re-ID). In the existing available identification CNN model, the softmax loss function is employed as the supervision signal to train the ...
Unsupervised Person Re-Identification via Multi-Label Classification
AbstractThe challenge of unsupervised person re-identification (ReID) lies in learning discriminative features without true labels. Most of previous works predict single-class pseudo labels through clustering. To improve the quality of generated pseudo ...
Semi-supervised person re-identification using multi-view clustering
Highlights- We design a semi-supervised feature representation framework for person Re-Identification which effectively utilizes both labeled and unlabeled training data ...
AbstractPerson Re-Identification (Re-Id) is a challenging task focusing on identifying the same person among disjoint camera views. A number of deep learning algorithms have been reported for this task in fully-supervised fashion which ...
Comments