skip to main content
research-article

Unsupervised Person Re-identification: Clustering and Fine-tuning

Authors Info & Claims
Published:10 October 2018Publication History
Skip Abstract Section

Abstract

The superiority of deeply learned pedestrian representations has been reported in very recent literature of person re-identification (re-ID). In this article, we consider the more pragmatic issue of learning a deep feature with no or only a few labels. We propose a progressive unsupervised learning (PUL) method to transfer pretrained deep representations to unseen domains. Our method is easy to implement and can be viewed as an effective baseline for unsupervised re-ID feature learning. Specifically, PUL iterates between (1) pedestrian clustering and (2) fine-tuning of the convolutional neural network (CNN) to improve the initialization model trained on the irrelevant labeled dataset. Since the clustering results can be very noisy, we add a selection operation between the clustering and fine-tuning. At the beginning, when the model is weak, CNN is fine-tuned on a small amount of reliable examples that locate near to cluster centroids in the feature space. As the model becomes stronger, in subsequent iterations, more images are being adaptively selected as CNN training samples. Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence. This process is naturally formulated as self-paced learning. We then point out promising directions that may lead to further improvement. Extensive experiments on three large-scale re-ID datasets demonstrate that PUL outputs discriminative features that improve the re-ID accuracy. Our code has been released at https://github.com/hehefan/Unsupervised-Person-Re-identification-Clustering-and-Fine-tuning.

References

  1. David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’07). 1027--1035. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Boris Babenko, Ming-Hsuan Yang, and Serge J. Belongie. 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 983--990. Retrieved fromGoogle ScholarGoogle Scholar
  3. Song Bai, Xiang Bai, and Qi Tian. 2017. Scalable person re-identification on supervised smoothed manifold. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3356--3365. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  4. S. Bai, X. Bai, Q. Tian, and L. J. Latecki. 2018. Regularized diffusion process on bidirectional context for object retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2018). Retrieved fromGoogle ScholarGoogle Scholar
  5. Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, and Qi Tian. 2017. Ensemble diffusion for retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 774--783. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  6. Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). 41--48. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1268--1277. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  8. Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1320--1329. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  9. De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1335--1344. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  10. Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans. Image Processing 27, 8 (2018), 3893--3903. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  11. Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, and Deyu Meng. 2018. Few-example object detection with model communication. IEEE Trans. Pattern Anal. Mach. Intell. (2018). Retrieved fromGoogle ScholarGoogle Scholar
  12. Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, and Alexander G. Hauptmann. 2017. Complex event detection by identifying reliable shots from untrimmed videos. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 736--744. Retrieved fromGoogle ScholarGoogle Scholar
  13. Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 2360--2367. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  14. Pedro F. Felzenszwalb, Ross B. Girshick, David A. McAllester, and Deva Ramanan. {n.d.}. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9, 1627--1645. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mengyue Geng, Yaowei Wang, Tao Xiang, and Yonghong Tian. 2016. Deep transfer learning for person re-identification. arXiv abs/1611.05244.Google ScholarGoogle Scholar
  16. Douglas Gray and Hai Tao. 2008. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Proceedings of the 10th European Conference on Computer Vision (ECCV’08). 262--275. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  18. Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv abs/1703.07737.Google ScholarGoogle Scholar
  19. Lu Jiang, Deyu Meng, Shoou-I Yu, Zhen-Zhong Lan, Shiguang Shan, and Alexander G. Hauptmann. 2014. Self-paced learning with diversity. In Proceedings of the Annual Conference on Neural Information Processing Systems. 2078--2086. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Elyor Kodirov, Tao Xiang, Zhen-Yong Fu, and Shaogang Gong. 2016. Person re-identification by unsupervised l<sub>1</sub> graph learning. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 178--195. Retrieved fromGoogle ScholarGoogle Scholar
  21. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems. 1106--1114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Pawan Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-paced learning for latent variable models. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems. 1189--1197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 152--159. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2197--2206. Retrieved fromGoogle ScholarGoogle Scholar
  25. Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving person re-identification by attribute and identity learning. arXiv abs/1703.07220.Google ScholarGoogle Scholar
  26. Chunxiao Liu, Chen Change Loy, Shaogang Gong, and Guijin Wang. 2013. POP: Person re-identification post-rank optimisation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). 441--448. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2017. End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Processing 26, 7 (2017), 3492--3506. Retrieved fromGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jiawei Liu, Zheng-Jun Zha, Q. I. Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016. Multi-scale triplet CNN for person re-identification. In Proceedings of the 2016 ACM Conference on Multimedia Conference (MM’16). 192--196. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Xinchen Liu, Wu Liu, Tao Mei, and Huadong Ma. 2018. PROVID: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans. Multimedia 20, 3 (2018), 645--658. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fan Ma, Deyu Meng, Qi Xie, Zina Li, and Xuanyi Dong. 2017. Self-paced co-training. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 2275--2284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, and Yisheng Zhong. 2017. Person re-identification by unsupervised video matching. Pattern Recogn. 65 (2017), 197--210. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Zhigang Ma, Xiaojun Chang, Yi Yang, Nicu Sebe, and Alexander G. Hauptmann. 2017. The many shades of negativity. IEEE Trans. Multimedia 19, 7 (2017), 1558--1568. Retrieved fromGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  33. Peixi Peng, Tao Xiang, Yaowei Wang, Massimiliano Pontil, Shaogang Gong, Tiejun Huang, and Yonghong Tian. 2016. Unsupervised cross-dataset transfer learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1306--1315. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  34. Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 3--20. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  35. Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision (ECCV’16). 17--35. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  36. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211--252. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815--823. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  38. Yifan Sun, Liang Zheng, Weijian Deng, and Shengjin Wang. 2017. SVDNet for pedestrian retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3820--3828. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  39. Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated siamese convolutional neural network architecture for human re-identification. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 791--808. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  40. Hanxiao Wang, Shaogang Gong, Xiatian Zhu, and Tao Xiang. 2016. Human-in-the-loop person re-identification. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 405--422. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  41. Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person re-identification by video ranking. In Proceedings of the 13th European Conference on Computer Vision (ECCV’14). 688--703. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  42. Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2017. Person transfer GAN to bridge domain gap for person re-identification. arXiv abs/1711.08565.Google ScholarGoogle Scholar
  43. Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 5177--5186.Google ScholarGoogle ScholarCross RefCross Ref
  44. Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1249--1258. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  45. Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2017. Joint detection and identification feature learning for person search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3376--3385. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  46. Chenggang Yan, Hongtao Xie, Shun Liu, Jian Yin, Yongdong Zhang, and Qionghai Dai. 2018. Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans. Intell. Transport. Syst. 19, 1 (2018), 220--229. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  47. Chenggang Yan, Hongtao Xie, Dongbao Yang, Jian Yin, Yongdong Zhang, and Qionghai Dai. 2018. Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transport. Syst. 19, 1 (2018), 284--295. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  48. Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. {n. d.}. Enhancing person re-identification in a self-trained subspace. TOMCCAP 13, 3, 27:1--27:23. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yi Yang, Zhigang Ma, Alexander G. Hauptmann, and Nicu Sebe. 2013. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15, 3 (2013), 661--669. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yang Yang, Longyin Wen, Siwei Lyu, and Stan Z. Li. 2017. Unsupervised learning of multi-level descriptors for person re-identification. In Proceedings of the 21st AAAI Conference on Artificial Intelligence. 4306--4312.Google ScholarGoogle Scholar
  51. Mang Ye, Chao Liang, Yi Yu, Zheng Wang, Qingming Leng, Chunxia Xiao, Jun Chen, and Ruimin Hu. 2016. Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans. Multimedia 18, 12 (2016), 2553--2566. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep metric learning for person re-identification. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR’14). 34--39. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1239--1248. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  54. Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1278--1287. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  55. Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). 2528--2535. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Unsupervised salience learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3586--3593. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 144--151. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the 14th European Conference On Compuer Vision (ECCV’16). 868--884. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  59. Liang Zheng, Yujia Huang, Huchuan Lu, and Yi Yang. 2017. Pose invariant embedding for deep person re-identification. arXiv abs/1701.07732 (2017).Google ScholarGoogle Scholar
  60. Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 1116--1124. Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Liang Zheng, Shengjin Wang, Lu Tian, Fei He, Ziqiong Liu, and Qi Tian. 2015. Query-adaptive late fusion for image search and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1741--1750. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  62. Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv abs/1610.02984 (2016).Google ScholarGoogle Scholar
  63. Liang Zheng, Yi Yang, and Qi Tian. 2018. SIFT meets CNN: A decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1224--1244. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  64. Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3774--3782. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref
  65. Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3652--3661. Retrieved fromGoogle ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Unsupervised Person Re-identification: Clustering and Fine-tuning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 14, Issue 4
        Special Section on Deep Learning for Intelligent Multimedia Analytics
        November 2018
        221 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3282485
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 October 2018
        • Accepted: 1 July 2018
        • Revised: 1 June 2018
        • Received: 1 February 2018
        Published in tomm Volume 14, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader