skip to main content
10.1145/3394171.3414014acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

KTN: Knowledge Transfer Network for Multi-person DensePose Estimation

Authors Info & Claims
Published:12 October 2020Publication History

ABSTRACT

In this paper, we address the multi-person densepose estimation problem, which aims at learning dense correspondences between 2D pixels of human body and 3D surface. It still poses several challenges due to real-world scenes with scale variations, occlusion and insufficient annotations. In particular, we address two main problems: 1) how to design a simple yet effective pipeline for densepose estimation; and 2) how to equip this pipeline with the ability of handling the issues of limited annotations and class-imbalanced labels. To tackle these problems, we develop a novel densepose estimation framework based on a two-stage pipeline, called Knowledge Transfer Network (KTN). Unlike existing works which directly propagate the pyramidal base features of regions, we enhance their representation power by a multi-instance decoder (MID). MID can well distinguish the target instance from other interference instances and background. Then, we introduce a knowledge transfer machine (KTM), which improves densepose estimation by utilizing the external commonsense knowledge. Notably, with the help of our knowledge transfer machine (KTM), current densepose estimation systems (either based on RCNN or fully-convolutional frameworks) can be improved in terms of the accuracy of human densepose estimation. Solid experiments on densepose estimation benchmarks demonstrate the superiority and generalizability of our approach. Our code and models will be publicly available.

Skip Supplemental Material Section

Supplemental Material

3394171.3414014.mp4

mp4

31.5 MB

References

  1. Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML.Google ScholarGoogle Scholar
  2. Lianli Gao, Xuanhan Wang, Jingkuan Song, and Yang Liu. 2019. Fused GRU with semantic-temporal attention for video captioning. Neurocomputing (2019).Google ScholarGoogle Scholar
  3. Spyros Gidaris and Nikos Komodakis. 2018. Dynamic Few-Shot Visual Learning Without Forgetting. In CVPR.Google ScholarGoogle Scholar
  4. Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, and Liang Lin. 2018. Instance-level Human Parsing via Part Grouping Network. In ECCV. 805--822.Google ScholarGoogle Scholar
  5. Riza Alp Gü ler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense Human Pose Estimation in the Wild. In CVPR.Google ScholarGoogle Scholar
  6. Yuyu Guo, Lianli Gao, Jingkuan Song, Peng Wang, Wuyuan Xie, and Heng Tao Shen. 2019. Adaptive Multi-Path Aggregation for Human DensePose Estimation in the Wild. In ACM MM. 356--364.Google ScholarGoogle Scholar
  7. Kaiming He, Georgia Gkioxari, Piotr Dollá r, and Ross B. Girshick. 2017. Mask R-CNN. In ICCV.Google ScholarGoogle Scholar
  8. Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. 2016. Learning Deep Representation for Imbalanced Classification. In CVPR. 5375--5384.Google ScholarGoogle Scholar
  9. Yanli Ji, Yue Zhan, Yang Yang, Xing Xu, Fumin Shen, and Heng Tao Shen. 2020. A Context Knowledge Map Guided Coarse-to-Fine Action Recognition. IEEE Transactions on Image Processing, Vol. 29 (2020), 2742--2752.Google ScholarGoogle ScholarCross RefCross Ref
  10. Buyu Li, Yu Liu, and Xiaogang Wang. 2019 c. Gradient Harmonized Single-stage Detector.. In AAAI.Google ScholarGoogle Scholar
  11. Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, and Cewu Lu. 2019 d. CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark. In CVPR.Google ScholarGoogle Scholar
  12. Xiangpeng Li, Lianli Gao, Xuanhan Wang, Wu Liu, Xing Xu, Heng Tao Shen, and Jingkuan Song. 2019 b. Learnable aggregating net with diversity learning for video question answering. In ACM MM.Google ScholarGoogle Scholar
  13. Yanghao Li, Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. 2019 a. Scale-Aware Trident Networks for Object Detection. In ICCV 2019.Google ScholarGoogle Scholar
  14. Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Shiyi Wang, Hao-Shu Fang, Ze Ma, Mingyang Chen, and Cewu Lu. 2020. PaStaNet: Toward Human Activity Knowledge Engine. In CVPR.Google ScholarGoogle Scholar
  15. Tsung-Yi Lin, Piotr Dollá r, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. 2017a. Feature Pyramid Networks for Object Detection. In CVPR.Google ScholarGoogle Scholar
  16. Tsungyi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. 2017b. Focal Loss for Dense Object Detection. In ICCV. 2999--3007.Google ScholarGoogle Scholar
  17. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollá r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV.Google ScholarGoogle Scholar
  18. Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X Yu. 2019. Large-Scale Long-Tailed Recognition in an Open World. In CVPR. 2537--2546.Google ScholarGoogle Scholar
  19. Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In ICML.Google ScholarGoogle Scholar
  20. Dhruv Mahajan, Ross Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, and Laurens Van Der Maaten. 2018. Exploring the Limits of Weakly Supervised Pretraining. In ECCV. 185--201.Google ScholarGoogle Scholar
  21. Kenneth Marino, Ruslan Salakhutdinov, and Abhinav Gupta. 2017. The More You Know: Using Knowledge Graphs for Image Classification. In CVPR.Google ScholarGoogle Scholar
  22. Tomas Mikolov, Kai Chen, Greg S Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICML.Google ScholarGoogle Scholar
  23. George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards Accurate Multi-person Pose Estimation in the Wild. In CVPR.Google ScholarGoogle Scholar
  24. Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, Zhenmin Tang, and Heng Tao Shen. 2015. Hashing on Nonlinear Manifolds. IEEE Transactions on Image Processing (2015), 1839--1851.Google ScholarGoogle Scholar
  25. Heng Tao Shen, Luchen Liu, Yang Yang, Xing Xu, Zi Huang, Fumin Shen, and Richang Hong. 2020. Exploiting Subspace Relation in Semantic Labels for Cross-modal Hashing. IEEE Transactions on Knowledge and Data Engineering (2020).Google ScholarGoogle Scholar
  26. Li Shen, Zhouchen Lin, and Qingming Huang. 2016. Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks. In ECCV. 467--482.Google ScholarGoogle Scholar
  27. Jake Snell, Kevin Swersky, and Richard S Zemel. 2017. Prototypical Networks for Few-shot Learning. arXiv preprint arXiv:1703.05175 (2017), 4077--4087.Google ScholarGoogle Scholar
  28. Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep High-Resolution Representation Learning for Human Pose Estimation. In CVPR. 5693--5703.Google ScholarGoogle Scholar
  29. Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching networks for one shot learning. In NIPS.Google ScholarGoogle Scholar
  30. Guanan Wang, Shuo Yang, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, Erjinzhou, and Jian Sun. 2020. High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification.. In CVPR.Google ScholarGoogle Scholar
  31. Xuanhan Wang, Lianli Gao, Peng Wang, Xiaoshuai Sun, and Xianglong Liu. 2018a. Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length. IEEE Transactions on Multimedia, Vol. 20 (2018), 634--644.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, and Chunhua Shen. 2018b. Repulsion Loss: Detecting Pedestrians in a Crowd. In CVPR. 7774--7783.Google ScholarGoogle Scholar
  33. Xiaolong Wang, Yufei Ye, and Abhinav Gupta. 2018c. Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs. In CVPR.Google ScholarGoogle Scholar
  34. Yuxiong Wang, Deva Ramanan, and Martial Hebert. 2017. Learning to Model the Tail. In NIPS. 7029--7039.Google ScholarGoogle Scholar
  35. Bin Xiao, Haiping Wu, and Yichen Wei. 2018. Simple Baselines for Human Pose Estimation and Tracking. In ECCV.Google ScholarGoogle Scholar
  36. Hang Xu, Chenhan Jiang, Xiaodan Liang, Liang Lin, and Zhenguo Li. 2019. Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection. In CVPR. 6419--6428.Google ScholarGoogle Scholar
  37. Lu Yang, Qing Song, Zhihui Wang, and Ming Jiang. 2019. Parsing R-CNN for Instance-Level Human Analysis. In CVPR.Google ScholarGoogle Scholar
  38. Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, and Manmohan Chandraker. 2019. Feature Transfer Learning for Face Recognition With Under-Represented Data. In CVPR. 5704--5713.Google ScholarGoogle Scholar
  39. Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd. In ECCV. 637--653.Google ScholarGoogle Scholar
  40. Jian Zhao, Jianshu Li, Yu Cheng, Terence Sim, Shuicheng Yan, and Jiashi Feng. 2018. Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing. In ACM MM. 792--800.Google ScholarGoogle Scholar

Index Terms

  1. KTN: Knowledge Transfer Network for Multi-person DensePose Estimation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '20: Proceedings of the 28th ACM International Conference on Multimedia
      October 2020
      4889 pages
      ISBN:9781450379885
      DOI:10.1145/3394171

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader