Abstract
Deep learning has made substantial progress in crowd counting, but in practical applications, due to interference factors such as perspective distortion and complex background, the existing methods still have large errors in counting. In response to the above problems, this paper designs a multi-scale feature fusion network (IA-MFFCN) based on the reverse attention mechanism, which maps the image to the crowd density map for counting. The network consists of three parts: feature extraction module, inverse attention module, and back-end module. First, to overcome the problem of perspective distortion, deeper single-column CNNs was designed as a feature extraction module to extract multi-scale feature information and merge them; second, to avoid interference of complex backgrounds, the inverse attention module was designed, through the multi-scale inverse attention mechanism, reducing the influence of noise on counting accuracy. Finally, to generate a high-quality crowd density map, dilation convolution was introduced. Simultaneously, to enhance the sensitivity of the network to crowd counting, a comprehensive loss function based on Euclidean loss and predicted population loss is designed to improve training accuracy, to produce a more accurate density value. Experiments show that compared with the comparison algorithm, the algorithm in this paper has a significant reduction in the mean absolute error (MAE) and mean square error (MSE) on the ShanghaiTech dataset, UCF_CC_50 dataset and WorldExpo`10 dataset.
Similar content being viewed by others
References
Xu M, Li C, Lv P, Lin N, Hou R, Zhou B (2017) An efficient method of crowd aggregation computation in public areas. IEEE Trans Circuits Syst Video Technol 28(10):2814–2825
Idrees H, Soomro K, Shah M (2015) Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. IEEE Trans Pattern Anal Mach Intell 37(10):1986–1998
Idrees H, Saleemi I, Seibert C, Shah M (2013). Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 2547–2554
Hu C, Wang Y, Gu J (2020) Cross-domain intelligent fault classification of bearings based on tensor-aligned invariant subspace learning and two-dimensional convolutional neural networks. Knowledge-Based Systems 209:106214
Hu C, He S, Wang Y (2021) A classification method to detect faults in a rotating machinery based on kernelled support tensor machine and multilinear principal component analysis. Appl Intell 51(4):2609–2621
Babu Sam D, Surya S, Venkatesh Babu R (2017). Switching convolutional neural network for crowd density estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 5744–5752
Onoro-Rubio D, López-Sastre RJ (2016). Towards perspective-free object counting with deep learning. In European conference on computer vision. Springer, Cham, 615–629
Wang L, Yin B, Guo A, Ma H, Cao J (2018) Skip-connection convolutional neural network for still image crowd density estimation. Appl Intell 48(10):3360–3371
Jiang M, Lin J, Wang ZJ (2021) A smartly simple way for joint crowd counting and localization. Neurocomputing 459:35–43
Xia Y, He Y, Peng S, Hao X, Yang Q, Yin B (2021) EDENet: Elaborate density estimation network for crowd counting. Neurocomputing 459:108–121
Wang W, Liu Q, Wang W (2021). Pyramid-dilated deep convolutional neural network for crowd counting. Appl Intell 1–13
Amirgholipour S, Jia W, Liu L, Fan X, Wang D, He X (2021) PDANet: Pyramid density-aware attention based network for accurate crowd counting. Neurocomputing 451:215–230
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016). Single-image crowd density estimation via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 589–597
Li Y, Zhang X, Chen D (2018). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 1091–1100
Nie P, Fan C, Zou L, Chen L, Li X (2020) crowd density estimation Guided by Attention Network. Information 11(12):567
Zhang Y, Li G, Lei J, He J (2019) FDCNet: Frontend-backend fusion dilated network through channel-attention mechanism. Appl Sci 9(17):3466
Li T, Chang H, Wang M, Ni B, Hong R, Yan S (2014) Crowded scene analysis: A survey. IEEE Trans Circuits Syst Video Technol 25(3):367–386
Wang L, Yin B, Tang X, Li Y (2019) Removing background interference for crowd density estimation via de-background detail convolutional network. Neurocomputing 332:360–371
Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 31(6):645–654
Dalal N, Triggs B (2005). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), Vol. 1, pp. 886–893
Li M, Zhang Z, Huang K, Tan T (2008). Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In 2008 19th international conference on pattern recognition (ICPR), 1–4. IEEE
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
An S, Liu W, Venkatesh S (2007). Face recognition using kernel ridge regression. In 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–7
Chan AB, Vasconcelos N (2009). Bayesian poisson regression for crowd density estimation. In 2009 IEEE 12th international conference on computer vision (ICCV) 545–551
Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015). Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 3253–3261
Lempitsky V, Zisserman A (2010) Learning to count objects in images. Adv Neural Inf Process Syst 23:1324–1332
Chan AB, Vasconcelos N (2011) Counting people with low-level features and Bayesian regression. IEEE Trans Image Process 21(4):2160–2177
Zhang C, Li H, Wang X, Yang X (2015). Cross-scene crowd density estimation via deep convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 833–841
Sindagi VA, Patel VM (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd density estimation. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1–6. IEEE
Liu W, Salzmann M, Fua P (2019) Context-aware crowd density estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 5099–5108
Wang Q, Gao J, Lin W, Yuan Y (2019) Learning from synthetic data for crowd density estimation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 8198–8207
Zhu F, Yan H, Chen X, Li T, Zhang Z (2021) A multi-scale and multi-level feature aggregation network for crowd counting. Neurocomputing 423:46–56
Liu YB, Jia RS, Liu QM, Zhang XL, Sun HM (2021) Crowd counting method based on the self-attention residual network. Appl Intell 51(1):427–440
Gu L, Pang C, Zheng Y, Lyu C, Lyu L (2021) Context-aware pyramid attention network for crowd counting. Appl Intell.1–17
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, ..., Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (2048–2057). PMLR
Sindagi VA, Patel VM (2019). Inverse attention guided deep crowd density estimation network. In 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS) 1–8. IEEE
Zhang Y, Zhao H, Duan Z, Huang L, Deng J, Zhang Q (2021) Congested crowd density estimation via Adaptive Multi-Scale Context Learning. Sensors 21(11):3777
Liu L, Jiang J, Jia W, Amirgholipour S, Wang Y, Zeibots M, He X (2020) Denet: A universal network for counting crowd with varying densities and scales. IEEE Trans Multimedia 23:1060–1068
Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) crowd density estimation via adversarial cross-scale consistency pursuit. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 5245–5254
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE international conference on computer vision (ICCV) 1861–1870
Sindagi VA, Patel VM (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd density estimation. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1–6
Sam DB, Sajjan NN, Babu RV, Srinivasan M (2018) Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 3618–3626
Cao X, Wang Z, Zhao Y, Su F (2018). Scale aggregation network for accurate and efficient crowd density estimation. In Proceedings of the European Conference on Computer Vision (ECCV) 734–750
Shi Z, Zhang L, Liu Y, Cao X, Ye Y, Cheng MM, Zheng G (2018) crowd density estimation with deep negative correlation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 5382–5390
Chen J, Wang Z (2021) Crowd counting with segmentation attention convolutional neural network. IET Image Proc 15(6):1221–1231
Acknowledgements
The authors are grateful for the collaborative funding support from the Humanity and Social Science Foundation of Ministry of Education, China (21YJAZH077).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, YC., Jia, RS., Hu, YX. et al. Crowd density estimation based on multi scale features fusion network with reverse attention mechanism. Appl Intell 52, 13097–13113 (2022). https://doi.org/10.1007/s10489-022-03187-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03187-y