Semi-supervised Crowd Counting via Self-training on Surrogate Tasks

Liu, Yan; Liu, Lingqiao; Wang, Peng; Zhang, Pingping; Lei, Yinjie

doi:10.1007/978-3-030-58555-6_15

Yan Liu¹²,
Lingqiao Liu¹³,
Peng Wang¹⁴,
Pingping Zhang¹⁵ &
…
Yinjie Lei¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12360))

Included in the following conference series:

European Conference on Computer Vision

3445 Accesses
37 Citations

Abstract

Most existing crowd counting systems rely on the availability of the object location annotation which can be expensive to obtain. To reduce the annotation cost, one attractive solution is to leverage a large number of unlabeled images to build a crowd counting model in semi-supervised fashion. This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning. Our key idea is to leverage the unlabeled images to train a generic feature extractor rather than the entire network of a crowd counter. The rationale of this design is that learning the feature extractor can be more reliable and robust towards the inevitable noisy supervision generated from the unlabeled data. Also, on top of a good feature extractor, it is possible to build a density map regressor with much fewer density map annotations. Specifically, we proposed a novel semi-supervised crowd counting method which is built upon two innovative components: (1) a set of inter-related binary segmentation tasks are derived from the original density map regression task as the surrogate prediction target; (2) the surrogate target predictors are learned from both labeled and unlabeled data by utilizing a proposed self-training scheme which fully exploits the underlying constraints of these binary segmentation tasks. Through experiments, we show that the proposed method is superior over the existing semi-supervised crowd counting method and other representative baselines.

Y. Liu and L. Liu—Authors have equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
As the unlabeled set contains more images than the labeled set, we oversample labeled images to ensure the similar amount of labeled and unlabeled images occur in a single batch.

References

Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems (NIPS), pp. 1324–1332 (2010)
Google Scholar
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841 (2015)
Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)
Google Scholar
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5744–5752 (2017)
Google Scholar
Li, Y., Zhang, X., Chen, D.: Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)
Google Scholar
Kang, K., Wang, X.: Fully convolutional neural networks for crowd segmentation. arXiv preprint arXiv:1411.4464 (2014)
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters-improve semantic segmentation by global convolutional network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361 (2017)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)
Google Scholar
Subburaman, V.B., Descamps, A., Carincotte, C.: Counting people in the crowd using a generic head detector. In: IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), pp. 470–475 (2012)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Google Scholar
Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. (IJCV) 57(2), 137–154 (2004)
Article Google Scholar
Chen, K., Loy, C.C., Gong, S., Xiang, T., : Feature mining for localised crowd counting. In: British Machine Vision Conference (BMVC), pp. 21.1–21.11 (2012)
Google Scholar
Chan, A.B., Vasconcelos, N.: Bayesian poisson regression for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 545–551 (2009)
Google Scholar
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)
Google Scholar
Fiaschi, L., Köthe, U., Nair, R., Hamprecht, F.A.: Learning to count with regression forest and structured labels. In: International Conference on Pattern Recognition (ICPR), pp. 2685–2688 (2012)
Google Scholar
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: IEEE International Conference on Computer Vision (ICCV), pp. 1861–1870 (2017)
Google Scholar
Chen, X., Bin, Y., Sang, N., Gao, C.: Scale pyramid network for crowd counting. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1941–1950 (2019)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6133–6142 (2019)
Google Scholar
Zhao, M., Zhang, J., Zhang, C., Zhang, W.: Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12736–12745(2019)
Google Scholar
Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: European Conference on Computer Vision (ECCV), pp. 270–285 (2018)
Google Scholar
Zhu, L., Zhao, Z., Lu, C., Lin, Y., Peng, Y., Yao, T.: Dual path multi-scale fusion networks with attention for crowd counting. arXiv preprint arXiv:1902.01115 (2019)
Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9175–9184 (2018)
Google Scholar
Lian, D., Li, J., Zheng, J., Luo, W., Gao, S.: Density map regression guided detection network for RGB-D crowd counting and localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1821–1830 (2019)
Google Scholar
Jiang, S., Lu, X., Lei, Y., Liu, L.: Mask-aware networks for crowd counting. IEEE Trans. Circ. Syst. Video Technol. (TCSVT) (2019)
Google Scholar
Valloli, V.K. and Mehta, K.: W-net: reinforced u-net for density map estimation. arXiv preprint arXiv:1903.11249 (2019)
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: European Conference on Computer Vision (ECCV), pp. 532–546 (2018)
Google Scholar
Zhang, A., et al.: Relational attention network for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 6788–6797 (2019)
Google Scholar
Wan, J., Chan, A.: Adaptive density map generation for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 1130–1139 (2019)
Google Scholar
Sindagi, V.A., Yasarla, R., Patel, V.M.: Pushing the frontiers of unconstrained crowd counting: new dataset and benchmark method. In: IEEE International Conference on Computer Vision (ICCV), pp. 1221–1231 (2019)
Google Scholar
Xiong, H., Lu, H., Liu, C., Liu, L., Cao, Z., Shen, C.: From open set to closed set: counting objects by spatial divide-and-conquer. In: IEEE International Conference on Computer Vision (ICCV), pp. 8362–8371 (2019)
Google Scholar
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: IEEE International Conference on Computer Vision (ICCV), pp. 6142–6151 (2019)
Google Scholar
Liu, X., Van De Weijer, J., Bagdanov, A.D.: Leveraging unlabeled data for crowd counting by learning to rank. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7661–7669 (2018)
Google Scholar
von Borstel, M., Kandemir, M., Schmidt, P., Rao, M.K., Rajamani, K., Hamprecht, F.A.: Gaussian process density counting from weak supervision. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 365–380. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_22
Chapter Google Scholar
Sam, D.B., Sajjan, N.N., Maurya, H., Babu, R.V.: Almost unsupervised learning for dense crowd counting. In: AAAI Conference on Artificial Intelligence (AAAI), vol. 33, pp. 8868–8875 (2019)
Google Scholar
Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8198–8207 (2019)
Google Scholar
Gao, J., Wang, Q., Yuan, Y.: Feature-aware adaptation and structured density alignment for crowd counting in video surveillance. arXiv preprint arXiv:1912.03672 (2019)
Gao, J., Han, T., Wang, Q., Yuan, Y.: Domain-adaptive crowd counting via inter-domain features segregation and gaussian-prior reconstruction. arXiv preprint arXiv:1912.03677 (2019)
Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems (NIPS), pp. 1195–1204 (2017)
Google Scholar
Verma, V., Lamb, A., Kannala, J., Bengio, Y., Lopez-Paz, D.: Interpolation consistency training for semi-supervised learning. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3635–3641 (2019)
Google Scholar
Zhang, F., Bo, D., Zhang, L.: Saliency-guided unsupervised feature learning for scene classification. IEEE Trans. Geosci. Remote Sensing (TGRS) 53(4), 2175–2184 (2014)
Article Google Scholar
Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 766–774 (2014)
Google Scholar
Yang, Y., Shu, G., Shah, M.: Semi-supervised learning of feature hierarchies for object detection in a video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1650–1657 (2013)
Google Scholar
Cheng, Y., Zhao, X., Huang, K. and Tan, T.: Semi-supervised learning for RGB-D object recognition. In: International Conference on Pattern Recognition (ICPR), pp. 2377–2382 (2014)
Google Scholar
Gidaris, S., Bursuc, A., Komodakis, N., Pérez, P., Cord, M.: Boosting few-shot visual learning with self-supervision. In: IEEE International Conference on Computer Vision (ICCV), pp. 8059–8068 (2019)
Google Scholar
Li, H., Eigen, D., Dodge, S., Zeiler, M., Wang, X.: Finding task-relevant features for few-shot learning by category traversal. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–10 (2019)
Google Scholar
Socher, R., Fei-Fei, L.: Connecting modalities: semi-supervised segmentation and annotation of images using unaligned text corpora. In: IEEE Computer Conference on Computer Vision and Pattern Recognition (CVPR), pp. 966–973 (2010)
Google Scholar
Karnyaczki, S., Desrosiers, C.: A sparse coding method for semi-supervised segmentation with multi-class histogram constraints. In: IEEE International Conference on Image Processing (ICIP), pp. 3215–3219 (2015)
Google Scholar
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. (JAIR) 2, 263–286 (1994)
Article Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N.,Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Kingma, D.P., Ba, J.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), pp. 1–13 (2014)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NIPS), pp. 8024–8035 (2019)
Google Scholar

Download references

Acknowledgement

This work was supported by the Key Research and Development Program of Sichuan Province (2019YFG0409). Lingqiao Liu was in part supported by ARC DECRA Fellowship DE170101259.

Author information

Authors and Affiliations

College of Electronics and Information Engieering, Sichuan University, Chengdu, China
Yan Liu & Yinjie Lei
School of Computer Science, The University of Adelaide, Adelaide, Australia
Lingqiao Liu
School of Computing and Information Technology, University of Wollongong, Wollongong, Australia
Peng Wang
School of Artificial Intelligence, Dalian University of Technology, Dalian, China
Pingping Zhang

Authors

Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lingqiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pingping Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yinjie Lei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yinjie Lei .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 148 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Liu, L., Wang, P., Zhang, P., Lei, Y. (2020). Semi-supervised Crowd Counting via Self-training on Surrogate Tasks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12360. Springer, Cham. https://doi.org/10.1007/978-3-030-58555-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-58555-6_15
Published: 16 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58554-9
Online ISBN: 978-3-030-58555-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semi-supervised Crowd Counting via Self-training on Surrogate Tasks