Skip to main content
Log in

Crowd density estimation based on multi scale features fusion network with reverse attention mechanism

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deep learning has made substantial progress in crowd counting, but in practical applications, due to interference factors such as perspective distortion and complex background, the existing methods still have large errors in counting. In response to the above problems, this paper designs a multi-scale feature fusion network (IA-MFFCN) based on the reverse attention mechanism, which maps the image to the crowd density map for counting. The network consists of three parts: feature extraction module, inverse attention module, and back-end module. First, to overcome the problem of perspective distortion, deeper single-column CNNs was designed as a feature extraction module to extract multi-scale feature information and merge them; second, to avoid interference of complex backgrounds, the inverse attention module was designed, through the multi-scale inverse attention mechanism, reducing the influence of noise on counting accuracy. Finally, to generate a high-quality crowd density map, dilation convolution was introduced. Simultaneously, to enhance the sensitivity of the network to crowd counting, a comprehensive loss function based on Euclidean loss and predicted population loss is designed to improve training accuracy, to produce a more accurate density value. Experiments show that compared with the comparison algorithm, the algorithm in this paper has a significant reduction in the mean absolute error (MAE) and mean square error (MSE) on the ShanghaiTech dataset, UCF_CC_50 dataset and WorldExpo`10 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Xu M, Li C, Lv P, Lin N, Hou R, Zhou B (2017) An efficient method of crowd aggregation computation in public areas. IEEE Trans Circuits Syst Video Technol 28(10):2814–2825

    Article  Google Scholar 

  2. Idrees H, Soomro K, Shah M (2015) Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. IEEE Trans Pattern Anal Mach Intell 37(10):1986–1998

    Article  Google Scholar 

  3. Idrees H, Saleemi I, Seibert C, Shah M (2013). Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 2547–2554

  4. Hu C, Wang Y, Gu J (2020) Cross-domain intelligent fault classification of bearings based on tensor-aligned invariant subspace learning and two-dimensional convolutional neural networks. Knowledge-Based Systems 209:106214

    Article  Google Scholar 

  5. Hu C, He S, Wang Y (2021) A classification method to detect faults in a rotating machinery based on kernelled support tensor machine and multilinear principal component analysis. Appl Intell 51(4):2609–2621

    Article  Google Scholar 

  6. Babu Sam D, Surya S, Venkatesh Babu R (2017). Switching convolutional neural network for crowd density estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 5744–5752

  7. Onoro-Rubio D, López-Sastre RJ (2016). Towards perspective-free object counting with deep learning. In European conference on computer vision. Springer, Cham, 615–629

  8. Wang L, Yin B, Guo A, Ma H, Cao J (2018) Skip-connection convolutional neural network for still image crowd density estimation. Appl Intell 48(10):3360–3371

    Article  Google Scholar 

  9. Jiang M, Lin J, Wang ZJ (2021) A smartly simple way for joint crowd counting and localization. Neurocomputing 459:35–43

    Article  Google Scholar 

  10. Xia Y, He Y, Peng S, Hao X, Yang Q, Yin B (2021) EDENet: Elaborate density estimation network for crowd counting. Neurocomputing 459:108–121

    Article  Google Scholar 

  11. Wang W, Liu Q, Wang W (2021). Pyramid-dilated deep convolutional neural network for crowd counting. Appl Intell 1–13

  12. Amirgholipour S, Jia W, Liu L, Fan X, Wang D, He X (2021) PDANet: Pyramid density-aware attention based network for accurate crowd counting. Neurocomputing 451:215–230

    Article  Google Scholar 

  13. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016). Single-image crowd density estimation via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 589–597

  14. Li Y, Zhang X, Chen D (2018). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 1091–1100

  15. Nie P, Fan C, Zou L, Chen L, Li X (2020) crowd density estimation Guided by Attention Network. Information 11(12):567

    Article  Google Scholar 

  16. Zhang Y, Li G, Lei J, He J (2019) FDCNet: Frontend-backend fusion dilated network through channel-attention mechanism. Appl Sci 9(17):3466

    Article  Google Scholar 

  17. Li T, Chang H, Wang M, Ni B, Hong R, Yan S (2014) Crowded scene analysis: A survey. IEEE Trans Circuits Syst Video Technol 25(3):367–386

    Article  Google Scholar 

  18. Wang L, Yin B, Tang X, Li Y (2019) Removing background interference for crowd density estimation via de-background detail convolutional network. Neurocomputing 332:360–371

    Article  Google Scholar 

  19. Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 31(6):645–654

    Article  Google Scholar 

  20. Dalal N, Triggs B (2005). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), Vol. 1, pp. 886–893

  21. Li M, Zhang Z, Huang K, Tan T (2008). Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In 2008 19th international conference on pattern recognition (ICPR), 1–4. IEEE

  22. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  23. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  24. An S, Liu W, Venkatesh S (2007). Face recognition using kernel ridge regression. In 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–7

  25. Chan AB, Vasconcelos N (2009). Bayesian poisson regression for crowd density estimation. In 2009 IEEE 12th international conference on computer vision (ICCV) 545–551

  26. Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015). Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 3253–3261

  27. Lempitsky V, Zisserman A (2010) Learning to count objects in images. Adv Neural Inf Process Syst 23:1324–1332

    Google Scholar 

  28. Chan AB, Vasconcelos N (2011) Counting people with low-level features and Bayesian regression. IEEE Trans Image Process 21(4):2160–2177

    Article  MathSciNet  Google Scholar 

  29. Zhang C, Li H, Wang X, Yang X (2015). Cross-scene crowd density estimation via deep convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 833–841

  30. Sindagi VA, Patel VM (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd density estimation. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1–6. IEEE

  31. Liu W, Salzmann M, Fua P (2019) Context-aware crowd density estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 5099–5108

  32. Wang Q, Gao J, Lin W, Yuan Y (2019) Learning from synthetic data for crowd density estimation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 8198–8207

  33. Zhu F, Yan H, Chen X, Li T, Zhang Z (2021) A multi-scale and multi-level feature aggregation network for crowd counting. Neurocomputing 423:46–56

    Article  Google Scholar 

  34. Liu YB, Jia RS, Liu QM, Zhang XL, Sun HM (2021) Crowd counting method based on the self-attention residual network. Appl Intell 51(1):427–440

    Article  Google Scholar 

  35. Gu L, Pang C, Zheng Y, Lyu C, Lyu L (2021) Context-aware pyramid attention network for crowd counting. Appl Intell.1–17

  36. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, ..., Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (2048–2057). PMLR

  37. Sindagi VA, Patel VM (2019). Inverse attention guided deep crowd density estimation network. In 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS) 1–8. IEEE

  38. Zhang Y, Zhao H, Duan Z, Huang L, Deng J, Zhang Q (2021) Congested crowd density estimation via Adaptive Multi-Scale Context Learning. Sensors 21(11):3777

    Article  Google Scholar 

  39. Liu L, Jiang J, Jia W, Amirgholipour S, Wang Y, Zeibots M, He X (2020) Denet: A universal network for counting crowd with varying densities and scales. IEEE Trans Multimedia 23:1060–1068

    Article  Google Scholar 

  40. Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) crowd density estimation via adversarial cross-scale consistency pursuit. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 5245–5254

  41. Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE international conference on computer vision (ICCV) 1861–1870

  42. Sindagi VA, Patel VM (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd density estimation. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1–6

  43. Sam DB, Sajjan NN, Babu RV, Srinivasan M (2018) Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 3618–3626

  44. Cao X, Wang Z, Zhao Y, Su F (2018). Scale aggregation network for accurate and efficient crowd density estimation. In Proceedings of the European Conference on Computer Vision (ECCV) 734–750

  45. Shi Z, Zhang L, Liu Y, Cao X, Ye Y, Cheng MM, Zheng G (2018) crowd density estimation with deep negative correlation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 5382–5390

  46. Chen J, Wang Z (2021) Crowd counting with segmentation attention convolutional neural network. IET Image Proc 15(6):1221–1231

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful for the collaborative funding support from the Humanity and Social Science Foundation of Ministry of Education, China (21YJAZH077).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rui-Sheng Jia or Hong-Mei Sun.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, YC., Jia, RS., Hu, YX. et al. Crowd density estimation based on multi scale features fusion network with reverse attention mechanism. Appl Intell 52, 13097–13113 (2022). https://doi.org/10.1007/s10489-022-03187-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03187-y

Keywords

Navigation