Skip to main content
Log in

Scale and density invariant head detection deep model for crowd counting in pedestrian crowds

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Crowd counting in high density crowds has significant importance in crowd safety and crowd management. Existing state-of-the-art methods employ regression models to count the number of people in an image. However, regression models are blind and cannot localize the individuals in the scene. On the other hand, detection-based crowd counting in high density crowds is a challenging problem due to significant variations in scales, poses and appearances. The variations in poses and appearances can be handled through large capacity convolutional neural networks. However, the problem of scale lies in the heart of every detector and needs to be addressed for effective crowd counting. In this paper, we propose a end-to-end scale invariant head detection framework that can handle broad range of scales. We demonstrate that scale variations can be handled by modeling a set of specialized scale-specific convolutional neural networks with different receptive fields. These scale-specific detectors are combined into a single backbone network, where parameters of the network is optimized in end-to-end fashion. We evaluated our framework on challenging benchmark datasets, i.e., UCF-QNRF, UCSD. From experiment results, we demonstrate that proposed framework beats existing methods by a great margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Badrinarayanan, V., Kendall, A., SegNet, R.C.: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015)

  2. Bai, Y., Zhang, Y., Ding, M., Ghanem, B.: Finding tiny faces in the wild with generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–30 (2018)

  3. Basalamah, S., Khan, S.D., Ullah, H.: Scale driven convolutional neural network model for people counting and localization in crowd scenes. IEEE Access 7, 71576–71584 (2019)

    Article  Google Scholar 

  4. Chan, A.B., Vasconcelos, N.N.: Counting people with low-level features and Bayesian regression. IEEE Trans. Image Process. 21(4), 2160–2177 (2011)

    Article  MathSciNet  Google Scholar 

  5. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L..: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

  6. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

  7. Hu, P., Ramanan, D.: Finding tiny faces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–959 (2017)

  8. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

  9. Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., Shah, M.: Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 532–546 (2018)

  10. Jin, M., Li, H.: Feature-enhanced one-stage face detector for multiscale faces. J. Electron. Imaging 29(1), 013006 (2020)

    Article  Google Scholar 

  11. Kang, D., Ma, Z., Chan, A.B.: Beyond counting: comparisons of density maps for crowd analysis tasks-counting, detection, and tracking. In: Transactions on Circuits and Systems for Video Technology (IEEE TCSVT) (2018)

  12. Khan, S.D., Ullah, H., Uzair, M., Ullah, M., Ullah, R., Cheikh, F.A.: Disam: density independent and scale aware model for crowd counting and localization. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 4474–4478. IEEE (2019)

  13. Li, Y., Sun, B., Wu, T., Wang, Y.: Face detection with end-to-end integration of a convnet and a 3D model. In: European Conference on Computer Vision, pp. 420–436. Springer, Berlin (2016)

  14. Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206 (2018)

  15. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer, Berlin (2016)

  16. Mliki, H., Dammak, S., Fendri, E.: An improved multi-scale face detection using convolutional neural network. Signal Image Video Process. 14, 1–9 (2020)

    Article  Google Scholar 

  17. Qin, H., Yan, J., Li, X., Hu, X.: Joint training of cascaded CNN for face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3456–3465 (2016)

  18. Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 121–135 (2017)

    Article  Google Scholar 

  19. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

  21. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4031–4039. IEEE (2017)

  22. Saqib, M., Khan, S.D., Sharma, N., Blumenstein, M.: Crowd counting in low-resolution crowded scenes using region-based deep convolutional neural networks. IEEE Access 7, 35317–35329 (2019)

    Article  Google Scholar 

  23. Saqib, M., Khan, S.D., Sharma, N., Blumenstein, M.: Person head detection in multiple scales using deep convolutional neural networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018)

  24. Saqib, M., Khan, S.D., Sharma, N., Blumenstein, M.: Crowd counting in low-resolution crowded scenes using region-based deep convolutional neural networks. IEEE Access 7, 35317–35329 (2019)

    Article  Google Scholar 

  25. Shami, M., Maqbool, S., Sajid, H., Ayaz, Y., Cheung, S.-C.S.: People counting in dense crowd images using sparse head detections. IEEE Trans. Circuits Syst. Video Technol. 29, 2627–2636 (2018)

    Article  Google Scholar 

  26. Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1861–1870 (2017)

  27. Tong, K., Wu, Y., Zhou, F.: Recent advances in small object detection based on deep learning: a review. Image Vis. Comput. 97, 103910 (2020)

    Article  Google Scholar 

  28. Vora, A., Chilaka, V.: FCHD: fast and accurate head detection in crowded scenes. arXiv preprint arXiv:1809.08766 (2018)

  29. Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841 (2015)

  30. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)

  31. Zhu, L., Li, C., Yang, Z., Yuan, K., Wang, S.: Crowd density estimation based on classification activation map and patch density level. J Neural Comput. Appl. 32, 1–12 (2019)

    Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sultan Daud Khan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, S.D., Basalamah, S. Scale and density invariant head detection deep model for crowd counting in pedestrian crowds. Vis Comput 37, 2127–2137 (2021). https://doi.org/10.1007/s00371-020-01974-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-01974-7

Keywords

Navigation