Feature pyramid of bi-directional stepped concatenation for small object detection

Zheng, Qiyuan; Chen, Ying

doi:10.1007/s11042-021-10718-1

Feature pyramid of bi-directional stepped concatenation for small object detection

Published: 05 March 2021

Volume 80, pages 20283–20305, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Qiyuan Zheng¹ &
Ying Chen¹

799 Accesses
9 Citations
Explore all metrics

Abstract

In recent years, great breakthroughs have been made in object detection. However, performance of the most algorithms declines significantly when detecting small objects in an image. Thus, multi-scale feature maps are often used to develop network variants to generate multi-scale representations. Existing feature pyramid-based methods tend to keep the number of channels consistent and fuse different scales by adding corresponding elements or channel concatenation, which is prone to lose low-level detailed feature information in feature fusion process. To solve this problem, a bi-directional stepped concatenation feature pyramid construction method based on SSD (BSCF-SSD) is proposed. The stepped concatenation strategy helps to avoid the loss of information at the current layer during the pyramid construction process, and the bi-directional tactic ensures the fusion features contain both detailed and semantic information. Furthermore, an attentional interaction module is designed to better aggregate dual-stream features to improve network performance. The proposed method improves the detection accuracy of small objects with less speed loss. Experimental results show that the method achieves 80.3% and 82.4% mAP on Pascal VOC2007 using VGG16 and Resnet50, respectively. On the special aviation object dataset UCAS-AOD, BSCF-SSD with VGG16 still achieves moderate improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MSSD: multi-scale object detector based on spatial pyramid depthwise convolution and efficient channel attention mechanism

Article 01 September 2023

Hierarchical Focused Feature Pyramid Network for Small Object Detection

An attention-based feature pyramid network for single-stage small object detection

Article 18 November 2022

References

Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Computer vision and pattern recognition, pp 2874–2883
Chen Z, Zhang Y, Wu C, Ran B (2019) Understanding individualization driving states via latent Dirichlet allocation model. IEEE Intell Transp Syst Mag 11(2):41–53
Article Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Duan K, Du D, Qi H, Huang Q (2020) Detecting small objects using a channel-aware deconvolutional network. IEEE Trans Circ Syst Video Technol 30(6):1639–1652
Article Google Scholar
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88 (2):303–338
Article Google Scholar
Fu C, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD : deconvolutional Single shot detector. In: IEEE conference on computer vision and pattern recognition. arXiv:1701.06659
Girshick R, Donahue J, Darrell T, Malik J, He K, Zhang X, Ren S, Sun J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Trans Pattern Anal Machine Intell 37(9):1904–1916
Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Hassanien AE, Alberto OD (2017) Advances in soft computing and machine learning in image processing. Springer, Berlin
Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Machine Intell 37(9):1904–1916
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Computer vision and pattern recognition, pp 770–778
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. In: IEEE conference on computer vision and pattern recognition. arXiv:1704.04861
Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: Computer vision and pattern recognition, pp 2261–2269
Ivanov Y, Peleshko D, Makoveychuk O, Izonin I, Malets I, Lotoshunska N, Batyuk D (2015) Adaptive moving object segmentation algorithms in cluttered environments. In: The experience of designing and application of CAD systems in microelectronics. IEEE, pp 97–99
Jeong J, Park H, Kwak N (2017) Enhancement of ssd by concatenating feature maps for object detection. In: IEEE conference on computer vision and pattern recognition. arXiv:1705.09587
Kim S, Kook HK, Sun JY, Kang MC, Ko S (2018) Parallel feature pyramid network for object detection. In: European conference on computer vision, pp 239–256
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Computer vision and pattern recognition, pp 845–853
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5936–5944
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Communications of The ACM 60(6):84–90
Article Google Scholar
Lai Q, Wang W, Sun H, Shen J (2019) Video saliency prediction using spatiotemporal residual attentive networks. IEEE Trans Image Process 29:1113–1126
Article MathSciNet Google Scholar
Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. In: IEEE conference on computer vision and pattern recognition. arXiv:1712.00960
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Computer vision and pattern recognition, pp 510–519
Li Y, Pang Y, Shen J, Cao J, Shao L (2020) NETNet: neighbor erasing and transferring network for better single shot object detection. In: Computer vision and pattern recognition, pp 13349–13358
Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Computer vision and pattern recognition, pp 936–944
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Ling S (2020) Hrank: filter pruning using high-rank feature map. In: Computer vision and pattern recognition (CVPR)
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC (2016) SSD: single Shot multibox detector. In: European conference on computer vision, pp 21–37
Qunli YAO, Xian HU, Hong Lei (2019) Aircraft detection in remote sensing imagery with multi-scale feature fusion convolutional neural networks. Acta Geodaetica et Cartographica Sinica 48(10):1266–1274
Google Scholar
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Redmon J, Divvala SK, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Computer vision and pattern recognition, pp 6517–6525
Redmon J, Farhadi A (2018) yolov3: an incremental improvement. In: IEEE conference on computer vision and pattern recognition. arXiv:1804.02767
Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: International conference on computer vision, pp 1937–1945
Shen J, Tang X, Dong X, Shao L (2019) Visual object tracking by hierarchical attention siamese network. IEEE Trans Cybern 50(7):3068–3080
Article Google Scholar
Shi W, Bao S, Tan D (2019) FFESSD: an accurate and efficient single-shot detector for target detection. Appl Sci 9(20):4276
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: IEEE conference on computer vision and pattern recognition. arXiv:1409.1556
Song H, Wang W, Zhao S, Shen J, Lam K-M (2018) Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 715–731
Wang W, Shen J, Yu Y, Ma K-L (2016) Stereoscopic thumbnail creation via efficient stereo saliency detection. IEEE Trans Visualization Comput Graphic 23(8):2014–2027
Article Google Scholar
Wang W, Shen J, Cheng M-M, Shao L (2019) An iterative and cooperative top-down and bottom-up inference network for salient object detection. In: Computer vision and pattern recognition, pp 5968–5977
Wang W, Zhao S, Shen J, Hoi SCH, Borji A (2019) Salient object detection with pyramid attention and salient edges. In: Computer vision and pattern recognition, pp 1448–1457
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Computer vision and pattern recognition, pp 528–537
Wang W, Shen J, Yu Y, Ma K-L (2020) Deep learning for autonomous ship-oriented small ship detection. Saf Sci 130:104812
Article Google Scholar
Wang W, Shen J (2017) Deep visual attention prediction. IEEE Trans Image Process 27(5):2368–2378
Article MathSciNet Google Scholar
Wang W, Shen J, Xie J, Cheng M-M, Ling H, Borji A (2019) Revisiting Video Saliency Prediction in the Deep Learning Era. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2924417
Wang W, Shen J, Ling H (2018) A deep network solution for attention and aesthetics aware photo cropping. IEEE Trans Pattern Anal Mach Intell 41(7):1531–1544
Article Google Scholar
Wei H, Zhang Y, Wang B, Yang Y, Li H, Wang H (2020) X-linenet: detecting aircraft in remote sensing images by a pair of intersecting line segments. IEEE Transactions on Geoscience and Remote Sensing. https://doi.org/10.1109/TGRS.2020.2999082
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Computer vision and pattern recognition, pp 4203–4212
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5813–5821
Zhao H, Li Z, Fang L, Zhang T (2020) A balanced feature fusion SSD for object detection. Neural Process Lett 51:2789–2806. https://doi.org/10.1007/s11063-020-10228-5
Article Google Scholar
Zhu H, Chen X, Dai W, Fu K, Ye Q, Jiao J (2015) Orientation robust object detection in aerial images using deep convolutional neural network. In: International conference on image processing, pp 3735–3739

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China(grant no. 61573168)

Author information

Authors and Affiliations

Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi, 214122, China
Qiyuan Zheng & Ying Chen

Authors

Qiyuan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ying Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Chen.

Ethics declarations

Conflict of Interests

Thesauthors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, Q., Chen, Y. Feature pyramid of bi-directional stepped concatenation for small object detection. Multimed Tools Appl 80, 20283–20305 (2021). https://doi.org/10.1007/s11042-021-10718-1

Download citation

Received: 17 September 2020
Revised: 23 November 2020
Accepted: 10 February 2021
Published: 05 March 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s11042-021-10718-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature pyramid of bi-directional stepped concatenation for small object detection

Abstract

Access this article

Similar content being viewed by others

MSSD: multi-scale object detector based on spatial pyramid depthwise convolution and efficient channel attention mechanism

Hierarchical Focused Feature Pyramid Network for Small Object Detection

An attention-based feature pyramid network for single-stage small object detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature pyramid of bi-directional stepped concatenation for small object detection

Abstract

Access this article

Similar content being viewed by others

MSSD: multi-scale object detector based on spatial pyramid depthwise convolution and efficient channel attention mechanism

Hierarchical Focused Feature Pyramid Network for Small Object Detection

An attention-based feature pyramid network for single-stage small object detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation