Skip to main content

Semi-supervised Crowd Counting via Self-training on Surrogate Tasks

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12360))

Included in the following conference series:

Abstract

Most existing crowd counting systems rely on the availability of the object location annotation which can be expensive to obtain. To reduce the annotation cost, one attractive solution is to leverage a large number of unlabeled images to build a crowd counting model in semi-supervised fashion. This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning. Our key idea is to leverage the unlabeled images to train a generic feature extractor rather than the entire network of a crowd counter. The rationale of this design is that learning the feature extractor can be more reliable and robust towards the inevitable noisy supervision generated from the unlabeled data. Also, on top of a good feature extractor, it is possible to build a density map regressor with much fewer density map annotations. Specifically, we proposed a novel semi-supervised crowd counting method which is built upon two innovative components: (1) a set of inter-related binary segmentation tasks are derived from the original density map regression task as the surrogate prediction target; (2) the surrogate target predictors are learned from both labeled and unlabeled data by utilizing a proposed self-training scheme which fully exploits the underlying constraints of these binary segmentation tasks. Through experiments, we show that the proposed method is superior over the existing semi-supervised crowd counting method and other representative baselines.

Y. Liu and L. Liu—Authors have equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    As the unlabeled set contains more images than the labeled set, we oversample labeled images to ensure the similar amount of labeled and unlabeled images occur in a single batch.

References

  1. Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems (NIPS), pp. 1324–1332 (2010)

    Google Scholar 

  2. Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841 (2015)

    Google Scholar 

  3. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)

    Google Scholar 

  4. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5744–5752 (2017)

    Google Scholar 

  5. Li, Y., Zhang, X., Chen, D.: Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)

    Google Scholar 

  6. Kang, K., Wang, X.: Fully convolutional neural networks for crowd segmentation. arXiv preprint arXiv:1411.4464 (2014)

  7. Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters-improve semantic segmentation by global convolutional network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361 (2017)

    Google Scholar 

  8. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)

    Google Scholar 

  9. Subburaman, V.B., Descamps, A., Carincotte, C.: Counting people in the crowd using a generic head detector. In: IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), pp. 470–475 (2012)

    Google Scholar 

  10. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)

    Google Scholar 

  11. Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. (IJCV) 57(2), 137–154 (2004)

    Article  Google Scholar 

  12. Chen, K., Loy, C.C., Gong, S., Xiang, T., : Feature mining for localised crowd counting. In: British Machine Vision Conference (BMVC), pp. 21.1–21.11 (2012)

    Google Scholar 

  13. Chan, A.B., Vasconcelos, N.: Bayesian poisson regression for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 545–551 (2009)

    Google Scholar 

  14. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)

    Google Scholar 

  15. Fiaschi, L., Köthe, U., Nair, R., Hamprecht, F.A.: Learning to count with regression forest and structured labels. In: International Conference on Pattern Recognition (ICPR), pp. 2685–2688 (2012)

    Google Scholar 

  16. Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: IEEE International Conference on Computer Vision (ICCV), pp. 1861–1870 (2017)

    Google Scholar 

  17. Chen, X., Bin, Y., Sang, N., Gao, C.: Scale pyramid network for crowd counting. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1941–1950 (2019)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  19. Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6133–6142 (2019)

    Google Scholar 

  20. Zhao, M., Zhang, J., Zhang, C., Zhang, W.: Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12736–12745(2019)

    Google Scholar 

  21. Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: European Conference on Computer Vision (ECCV), pp. 270–285 (2018)

    Google Scholar 

  22. Zhu, L., Zhao, Z., Lu, C., Lin, Y., Peng, Y., Yao, T.: Dual path multi-scale fusion networks with attention for crowd counting. arXiv preprint arXiv:1902.01115 (2019)

  23. Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9175–9184 (2018)

    Google Scholar 

  24. Lian, D., Li, J., Zheng, J., Luo, W., Gao, S.: Density map regression guided detection network for RGB-D crowd counting and localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1821–1830 (2019)

    Google Scholar 

  25. Jiang, S., Lu, X., Lei, Y., Liu, L.: Mask-aware networks for crowd counting. IEEE Trans. Circ. Syst. Video Technol. (TCSVT) (2019)

    Google Scholar 

  26. Valloli, V.K. and Mehta, K.: W-net: reinforced u-net for density map estimation. arXiv preprint arXiv:1903.11249 (2019)

  27. Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: European Conference on Computer Vision (ECCV), pp. 532–546 (2018)

    Google Scholar 

  28. Zhang, A., et al.: Relational attention network for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 6788–6797 (2019)

    Google Scholar 

  29. Wan, J., Chan, A.: Adaptive density map generation for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 1130–1139 (2019)

    Google Scholar 

  30. Sindagi, V.A., Yasarla, R., Patel, V.M.: Pushing the frontiers of unconstrained crowd counting: new dataset and benchmark method. In: IEEE International Conference on Computer Vision (ICCV), pp. 1221–1231 (2019)

    Google Scholar 

  31. Xiong, H., Lu, H., Liu, C., Liu, L., Cao, Z., Shen, C.: From open set to closed set: counting objects by spatial divide-and-conquer. In: IEEE International Conference on Computer Vision (ICCV), pp. 8362–8371 (2019)

    Google Scholar 

  32. Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: IEEE International Conference on Computer Vision (ICCV), pp. 6142–6151 (2019)

    Google Scholar 

  33. Liu, X., Van De Weijer, J., Bagdanov, A.D.: Leveraging unlabeled data for crowd counting by learning to rank. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7661–7669 (2018)

    Google Scholar 

  34. von Borstel, M., Kandemir, M., Schmidt, P., Rao, M.K., Rajamani, K., Hamprecht, F.A.: Gaussian process density counting from weak supervision. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 365–380. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_22

    Chapter  Google Scholar 

  35. Sam, D.B., Sajjan, N.N., Maurya, H., Babu, R.V.: Almost unsupervised learning for dense crowd counting. In: AAAI Conference on Artificial Intelligence (AAAI), vol. 33, pp. 8868–8875 (2019)

    Google Scholar 

  36. Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8198–8207 (2019)

    Google Scholar 

  37. Gao, J., Wang, Q., Yuan, Y.: Feature-aware adaptation and structured density alignment for crowd counting in video surveillance. arXiv preprint arXiv:1912.03672 (2019)

  38. Gao, J., Han, T., Wang, Q., Yuan, Y.: Domain-adaptive crowd counting via inter-domain features segregation and gaussian-prior reconstruction. arXiv preprint arXiv:1912.03677 (2019)

  39. Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)

  40. Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems (NIPS), pp. 1195–1204 (2017)

    Google Scholar 

  41. Verma, V., Lamb, A., Kannala, J., Bengio, Y., Lopez-Paz, D.: Interpolation consistency training for semi-supervised learning. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3635–3641 (2019)

    Google Scholar 

  42. Zhang, F., Bo, D., Zhang, L.: Saliency-guided unsupervised feature learning for scene classification. IEEE Trans. Geosci. Remote Sensing (TGRS) 53(4), 2175–2184 (2014)

    Article  Google Scholar 

  43. Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 766–774 (2014)

    Google Scholar 

  44. Yang, Y., Shu, G., Shah, M.: Semi-supervised learning of feature hierarchies for object detection in a video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1650–1657 (2013)

    Google Scholar 

  45. Cheng, Y., Zhao, X., Huang, K. and Tan, T.: Semi-supervised learning for RGB-D object recognition. In: International Conference on Pattern Recognition (ICPR), pp. 2377–2382 (2014)

    Google Scholar 

  46. Gidaris, S., Bursuc, A., Komodakis, N., Pérez, P., Cord, M.: Boosting few-shot visual learning with self-supervision. In: IEEE International Conference on Computer Vision (ICCV), pp. 8059–8068 (2019)

    Google Scholar 

  47. Li, H., Eigen, D., Dodge, S., Zeiler, M., Wang, X.: Finding task-relevant features for few-shot learning by category traversal. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–10 (2019)

    Google Scholar 

  48. Socher, R., Fei-Fei, L.: Connecting modalities: semi-supervised segmentation and annotation of images using unaligned text corpora. In: IEEE Computer Conference on Computer Vision and Pattern Recognition (CVPR), pp. 966–973 (2010)

    Google Scholar 

  49. Karnyaczki, S., Desrosiers, C.: A sparse coding method for semi-supervised segmentation with multi-class histogram constraints. In: IEEE International Conference on Image Processing (ICIP), pp. 3215–3219 (2015)

    Google Scholar 

  50. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. (JAIR) 2, 263–286 (1994)

    Article  Google Scholar 

  51. Zhang, H., Cisse, M., Dauphin, Y.N.,Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  52. Kingma, D.P., Ba, J.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), pp. 1–13 (2014)

    Google Scholar 

  53. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NIPS), pp. 8024–8035 (2019)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the Key Research and Development Program of Sichuan Province (2019YFG0409). Lingqiao Liu was in part supported by ARC DECRA Fellowship DE170101259.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinjie Lei .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 148 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y., Liu, L., Wang, P., Zhang, P., Lei, Y. (2020). Semi-supervised Crowd Counting via Self-training on Surrogate Tasks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12360. Springer, Cham. https://doi.org/10.1007/978-3-030-58555-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58555-6_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58554-9

  • Online ISBN: 978-3-030-58555-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics