Abstract
Most existing crowd counting systems rely on the availability of the object location annotation which can be expensive to obtain. To reduce the annotation cost, one attractive solution is to leverage a large number of unlabeled images to build a crowd counting model in semi-supervised fashion. This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning. Our key idea is to leverage the unlabeled images to train a generic feature extractor rather than the entire network of a crowd counter. The rationale of this design is that learning the feature extractor can be more reliable and robust towards the inevitable noisy supervision generated from the unlabeled data. Also, on top of a good feature extractor, it is possible to build a density map regressor with much fewer density map annotations. Specifically, we proposed a novel semi-supervised crowd counting method which is built upon two innovative components: (1) a set of inter-related binary segmentation tasks are derived from the original density map regression task as the surrogate prediction target; (2) the surrogate target predictors are learned from both labeled and unlabeled data by utilizing a proposed self-training scheme which fully exploits the underlying constraints of these binary segmentation tasks. Through experiments, we show that the proposed method is superior over the existing semi-supervised crowd counting method and other representative baselines.
Y. Liu and L. Liu—Authors have equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
As the unlabeled set contains more images than the labeled set, we oversample labeled images to ensure the similar amount of labeled and unlabeled images occur in a single batch.
References
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in Neural Information Processing Systems (NIPS), pp. 1324–1332 (2010)
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841 (2015)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 589–597 (2016)
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5744–5752 (2017)
Li, Y., Zhang, X., Chen, D.: Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1091–1100 (2018)
Kang, K., Wang, X.: Fully convolutional neural networks for crowd segmentation. arXiv preprint arXiv:1411.4464 (2014)
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters-improve semantic segmentation by global convolutional network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)
Subburaman, V.B., Descamps, A., Carincotte, C.: Counting people in the crowd using a generic head detector. In: IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), pp. 470–475 (2012)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. (IJCV) 57(2), 137–154 (2004)
Chen, K., Loy, C.C., Gong, S., Xiang, T., : Feature mining for localised crowd counting. In: British Machine Vision Conference (BMVC), pp. 21.1–21.11 (2012)
Chan, A.B., Vasconcelos, N.: Bayesian poisson regression for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 545–551 (2009)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)
Fiaschi, L., Köthe, U., Nair, R., Hamprecht, F.A.: Learning to count with regression forest and structured labels. In: International Conference on Pattern Recognition (ICPR), pp. 2685–2688 (2012)
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: IEEE International Conference on Computer Vision (ICCV), pp. 1861–1870 (2017)
Chen, X., Bin, Y., Sang, N., Gao, C.: Scale pyramid network for crowd counting. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1941–1950 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6133–6142 (2019)
Zhao, M., Zhang, J., Zhang, C., Zhang, W.: Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12736–12745(2019)
Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: European Conference on Computer Vision (ECCV), pp. 270–285 (2018)
Zhu, L., Zhao, Z., Lu, C., Lin, Y., Peng, Y., Yao, T.: Dual path multi-scale fusion networks with attention for crowd counting. arXiv preprint arXiv:1902.01115 (2019)
Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9175–9184 (2018)
Lian, D., Li, J., Zheng, J., Luo, W., Gao, S.: Density map regression guided detection network for RGB-D crowd counting and localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1821–1830 (2019)
Jiang, S., Lu, X., Lei, Y., Liu, L.: Mask-aware networks for crowd counting. IEEE Trans. Circ. Syst. Video Technol. (TCSVT) (2019)
Valloli, V.K. and Mehta, K.: W-net: reinforced u-net for density map estimation. arXiv preprint arXiv:1903.11249 (2019)
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: European Conference on Computer Vision (ECCV), pp. 532–546 (2018)
Zhang, A., et al.: Relational attention network for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 6788–6797 (2019)
Wan, J., Chan, A.: Adaptive density map generation for crowd counting. In: IEEE International Conference on Computer Vision (ICCV), pp. 1130–1139 (2019)
Sindagi, V.A., Yasarla, R., Patel, V.M.: Pushing the frontiers of unconstrained crowd counting: new dataset and benchmark method. In: IEEE International Conference on Computer Vision (ICCV), pp. 1221–1231 (2019)
Xiong, H., Lu, H., Liu, C., Liu, L., Cao, Z., Shen, C.: From open set to closed set: counting objects by spatial divide-and-conquer. In: IEEE International Conference on Computer Vision (ICCV), pp. 8362–8371 (2019)
Ma, Z., Wei, X., Hong, X., Gong, Y.: Bayesian loss for crowd count estimation with point supervision. In: IEEE International Conference on Computer Vision (ICCV), pp. 6142–6151 (2019)
Liu, X., Van De Weijer, J., Bagdanov, A.D.: Leveraging unlabeled data for crowd counting by learning to rank. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7661–7669 (2018)
von Borstel, M., Kandemir, M., Schmidt, P., Rao, M.K., Rajamani, K., Hamprecht, F.A.: Gaussian process density counting from weak supervision. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 365–380. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_22
Sam, D.B., Sajjan, N.N., Maurya, H., Babu, R.V.: Almost unsupervised learning for dense crowd counting. In: AAAI Conference on Artificial Intelligence (AAAI), vol. 33, pp. 8868–8875 (2019)
Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8198–8207 (2019)
Gao, J., Wang, Q., Yuan, Y.: Feature-aware adaptation and structured density alignment for crowd counting in video surveillance. arXiv preprint arXiv:1912.03672 (2019)
Gao, J., Han, T., Wang, Q., Yuan, Y.: Domain-adaptive crowd counting via inter-domain features segregation and gaussian-prior reconstruction. arXiv preprint arXiv:1912.03677 (2019)
Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems (NIPS), pp. 1195–1204 (2017)
Verma, V., Lamb, A., Kannala, J., Bengio, Y., Lopez-Paz, D.: Interpolation consistency training for semi-supervised learning. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3635–3641 (2019)
Zhang, F., Bo, D., Zhang, L.: Saliency-guided unsupervised feature learning for scene classification. IEEE Trans. Geosci. Remote Sensing (TGRS) 53(4), 2175–2184 (2014)
Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 766–774 (2014)
Yang, Y., Shu, G., Shah, M.: Semi-supervised learning of feature hierarchies for object detection in a video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1650–1657 (2013)
Cheng, Y., Zhao, X., Huang, K. and Tan, T.: Semi-supervised learning for RGB-D object recognition. In: International Conference on Pattern Recognition (ICPR), pp. 2377–2382 (2014)
Gidaris, S., Bursuc, A., Komodakis, N., Pérez, P., Cord, M.: Boosting few-shot visual learning with self-supervision. In: IEEE International Conference on Computer Vision (ICCV), pp. 8059–8068 (2019)
Li, H., Eigen, D., Dodge, S., Zeiler, M., Wang, X.: Finding task-relevant features for few-shot learning by category traversal. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–10 (2019)
Socher, R., Fei-Fei, L.: Connecting modalities: semi-supervised segmentation and annotation of images using unaligned text corpora. In: IEEE Computer Conference on Computer Vision and Pattern Recognition (CVPR), pp. 966–973 (2010)
Karnyaczki, S., Desrosiers, C.: A sparse coding method for semi-supervised segmentation with multi-class histogram constraints. In: IEEE International Conference on Image Processing (ICIP), pp. 3215–3219 (2015)
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. (JAIR) 2, 263–286 (1994)
Zhang, H., Cisse, M., Dauphin, Y.N.,Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR) (2018)
Kingma, D.P., Ba, J.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR), pp. 1–13 (2014)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NIPS), pp. 8024–8035 (2019)
Acknowledgement
This work was supported by the Key Research and Development Program of Sichuan Province (2019YFG0409). Lingqiao Liu was in part supported by ARC DECRA Fellowship DE170101259.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y., Liu, L., Wang, P., Zhang, P., Lei, Y. (2020). Semi-supervised Crowd Counting via Self-training on Surrogate Tasks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12360. Springer, Cham. https://doi.org/10.1007/978-3-030-58555-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-58555-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58554-9
Online ISBN: 978-3-030-58555-6
eBook Packages: Computer ScienceComputer Science (R0)