Skip to main content
Log in

Feature pyramid of bi-directional stepped concatenation for small object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, great breakthroughs have been made in object detection. However, performance of the most algorithms declines significantly when detecting small objects in an image. Thus, multi-scale feature maps are often used to develop network variants to generate multi-scale representations. Existing feature pyramid-based methods tend to keep the number of channels consistent and fuse different scales by adding corresponding elements or channel concatenation, which is prone to lose low-level detailed feature information in feature fusion process. To solve this problem, a bi-directional stepped concatenation feature pyramid construction method based on SSD (BSCF-SSD) is proposed. The stepped concatenation strategy helps to avoid the loss of information at the current layer during the pyramid construction process, and the bi-directional tactic ensures the fusion features contain both detailed and semantic information. Furthermore, an attentional interaction module is designed to better aggregate dual-stream features to improve network performance. The proposed method improves the detection accuracy of small objects with less speed loss. Experimental results show that the method achieves 80.3% and 82.4% mAP on Pascal VOC2007 using VGG16 and Resnet50, respectively. On the special aviation object dataset UCAS-AOD, BSCF-SSD with VGG16 still achieves moderate improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Computer vision and pattern recognition, pp 2874–2883

  2. Chen Z, Zhang Y, Wu C, Ran B (2019) Understanding individualization driving states via latent Dirichlet allocation model. IEEE Intell Transp Syst Mag 11(2):41–53

    Article  Google Scholar 

  3. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  4. Duan K, Du D, Qi H, Huang Q (2020) Detecting small objects using a channel-aware deconvolutional network. IEEE Trans Circ Syst Video Technol 30(6):1639–1652

    Article  Google Scholar 

  5. Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88 (2):303–338

    Article  Google Scholar 

  6. Fu C, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD : deconvolutional Single shot detector. In: IEEE conference on computer vision and pattern recognition. arXiv:1701.06659

  7. Girshick R, Donahue J, Darrell T, Malik J, He K, Zhang X, Ren S, Sun J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Trans Pattern Anal Machine Intell 37(9):1904–1916

    Google Scholar 

  8. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  9. Hassanien AE, Alberto OD (2017) Advances in soft computing and machine learning in image processing. Springer, Berlin

    Google Scholar 

  10. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Machine Intell 37(9):1904–1916

    Article  Google Scholar 

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Computer vision and pattern recognition, pp 770–778

  12. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. In: IEEE conference on computer vision and pattern recognition. arXiv:1704.04861

  13. Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: Computer vision and pattern recognition, pp 2261–2269

  14. Ivanov Y, Peleshko D, Makoveychuk O, Izonin I, Malets I, Lotoshunska N, Batyuk D (2015) Adaptive moving object segmentation algorithms in cluttered environments. In: The experience of designing and application of CAD systems in microelectronics. IEEE, pp 97–99

  15. Jeong J, Park H, Kwak N (2017) Enhancement of ssd by concatenating feature maps for object detection. In: IEEE conference on computer vision and pattern recognition. arXiv:1705.09587

  16. Kim S, Kook HK, Sun JY, Kang MC, Ko S (2018) Parallel feature pyramid network for object detection. In: European conference on computer vision, pp 239–256

  17. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Computer vision and pattern recognition, pp 845–853

  18. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5936–5944

  19. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Communications of The ACM 60(6):84–90

    Article  Google Scholar 

  20. Lai Q, Wang W, Sun H, Shen J (2019) Video saliency prediction using spatiotemporal residual attentive networks. IEEE Trans Image Process 29:1113–1126

    Article  MathSciNet  Google Scholar 

  21. Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. In: IEEE conference on computer vision and pattern recognition. arXiv:1712.00960

  22. Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Computer vision and pattern recognition, pp 510–519

  23. Li Y, Pang Y, Shen J, Cao J, Shao L (2020) NETNet: neighbor erasing and transferring network for better single shot object detection. In: Computer vision and pattern recognition, pp 13349–13358

  24. Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Computer vision and pattern recognition, pp 936–944

  25. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988

  26. Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Ling S (2020) Hrank: filter pruning using high-rank feature map. In: Computer vision and pattern recognition (CVPR)

  27. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC (2016) SSD: single Shot multibox detector. In: European conference on computer vision, pp 21–37

  28. Qunli YAO, Xian HU, Hong Lei (2019) Aircraft detection in remote sensing imagery with multi-scale feature fusion convolutional neural networks. Acta Geodaetica et Cartographica Sinica 48(10):1266–1274

    Google Scholar 

  29. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  30. Redmon J, Divvala SK, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Computer vision and pattern recognition, pp 779–788

  31. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Computer vision and pattern recognition, pp 6517–6525

  32. Redmon J, Farhadi A (2018) yolov3: an incremental improvement. In: IEEE conference on computer vision and pattern recognition. arXiv:1804.02767

  33. Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: International conference on computer vision, pp 1937–1945

  34. Shen J, Tang X, Dong X, Shao L (2019) Visual object tracking by hierarchical attention siamese network. IEEE Trans Cybern 50(7):3068–3080

    Article  Google Scholar 

  35. Shi W, Bao S, Tan D (2019) FFESSD: an accurate and efficient single-shot detector for target detection. Appl Sci 9(20):4276

    Article  Google Scholar 

  36. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: IEEE conference on computer vision and pattern recognition. arXiv:1409.1556

  37. Song H, Wang W, Zhao S, Shen J, Lam K-M (2018) Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 715–731

  38. Wang W, Shen J, Yu Y, Ma K-L (2016) Stereoscopic thumbnail creation via efficient stereo saliency detection. IEEE Trans Visualization Comput Graphic 23(8):2014–2027

    Article  Google Scholar 

  39. Wang W, Shen J, Cheng M-M, Shao L (2019) An iterative and cooperative top-down and bottom-up inference network for salient object detection. In: Computer vision and pattern recognition, pp 5968–5977

  40. Wang W, Zhao S, Shen J, Hoi SCH, Borji A (2019) Salient object detection with pyramid attention and salient edges. In: Computer vision and pattern recognition, pp 1448–1457

  41. Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Computer vision and pattern recognition, pp 528–537

  42. Wang W, Shen J, Yu Y, Ma K-L (2020) Deep learning for autonomous ship-oriented small ship detection. Saf Sci 130:104812

    Article  Google Scholar 

  43. Wang W, Shen J (2017) Deep visual attention prediction. IEEE Trans Image Process 27(5):2368–2378

    Article  MathSciNet  Google Scholar 

  44. Wang W, Shen J, Xie J, Cheng M-M, Ling H, Borji A (2019) Revisiting Video Saliency Prediction in the Deep Learning Era. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2924417

  45. Wang W, Shen J, Ling H (2018) A deep network solution for attention and aesthetics aware photo cropping. IEEE Trans Pattern Anal Mach Intell 41(7):1531–1544

    Article  Google Scholar 

  46. Wei H, Zhang Y, Wang B, Yang Y, Li H, Wang H (2020) X-linenet: detecting aircraft in remote sensing images by a pair of intersecting line segments. IEEE Transactions on Geoscience and Remote Sensing. https://doi.org/10.1109/TGRS.2020.2999082

  47. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Computer vision and pattern recognition, pp 4203–4212

  48. Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5813–5821

  49. Zhao H, Li Z, Fang L, Zhang T (2020) A balanced feature fusion SSD for object detection. Neural Process Lett 51:2789–2806. https://doi.org/10.1007/s11063-020-10228-5

    Article  Google Scholar 

  50. Zhu H, Chen X, Dai W, Fu K, Ye Q, Jiao J (2015) Orientation robust object detection in aerial images using deep convolutional neural network. In: International conference on image processing, pp 3735–3739

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China(grant no. 61573168)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Chen.

Ethics declarations

Conflict of Interests

Thesauthors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Q., Chen, Y. Feature pyramid of bi-directional stepped concatenation for small object detection. Multimed Tools Appl 80, 20283–20305 (2021). https://doi.org/10.1007/s11042-021-10718-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10718-1

Keywords

Navigation