Abstract
In recent years, great breakthroughs have been made in object detection. However, performance of the most algorithms declines significantly when detecting small objects in an image. Thus, multi-scale feature maps are often used to develop network variants to generate multi-scale representations. Existing feature pyramid-based methods tend to keep the number of channels consistent and fuse different scales by adding corresponding elements or channel concatenation, which is prone to lose low-level detailed feature information in feature fusion process. To solve this problem, a bi-directional stepped concatenation feature pyramid construction method based on SSD (BSCF-SSD) is proposed. The stepped concatenation strategy helps to avoid the loss of information at the current layer during the pyramid construction process, and the bi-directional tactic ensures the fusion features contain both detailed and semantic information. Furthermore, an attentional interaction module is designed to better aggregate dual-stream features to improve network performance. The proposed method improves the detection accuracy of small objects with less speed loss. Experimental results show that the method achieves 80.3% and 82.4% mAP on Pascal VOC2007 using VGG16 and Resnet50, respectively. On the special aviation object dataset UCAS-AOD, BSCF-SSD with VGG16 still achieves moderate improvement.
Similar content being viewed by others
References
Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Computer vision and pattern recognition, pp 2874–2883
Chen Z, Zhang Y, Wu C, Ran B (2019) Understanding individualization driving states via latent Dirichlet allocation model. IEEE Intell Transp Syst Mag 11(2):41–53
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Duan K, Du D, Qi H, Huang Q (2020) Detecting small objects using a channel-aware deconvolutional network. IEEE Trans Circ Syst Video Technol 30(6):1639–1652
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88 (2):303–338
Fu C, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD : deconvolutional Single shot detector. In: IEEE conference on computer vision and pattern recognition. arXiv:1701.06659
Girshick R, Donahue J, Darrell T, Malik J, He K, Zhang X, Ren S, Sun J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Trans Pattern Anal Machine Intell 37(9):1904–1916
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Hassanien AE, Alberto OD (2017) Advances in soft computing and machine learning in image processing. Springer, Berlin
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Machine Intell 37(9):1904–1916
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Computer vision and pattern recognition, pp 770–778
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. In: IEEE conference on computer vision and pattern recognition. arXiv:1704.04861
Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: Computer vision and pattern recognition, pp 2261–2269
Ivanov Y, Peleshko D, Makoveychuk O, Izonin I, Malets I, Lotoshunska N, Batyuk D (2015) Adaptive moving object segmentation algorithms in cluttered environments. In: The experience of designing and application of CAD systems in microelectronics. IEEE, pp 97–99
Jeong J, Park H, Kwak N (2017) Enhancement of ssd by concatenating feature maps for object detection. In: IEEE conference on computer vision and pattern recognition. arXiv:1705.09587
Kim S, Kook HK, Sun JY, Kang MC, Ko S (2018) Parallel feature pyramid network for object detection. In: European conference on computer vision, pp 239–256
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Computer vision and pattern recognition, pp 845–853
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5936–5944
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Communications of The ACM 60(6):84–90
Lai Q, Wang W, Sun H, Shen J (2019) Video saliency prediction using spatiotemporal residual attentive networks. IEEE Trans Image Process 29:1113–1126
Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. In: IEEE conference on computer vision and pattern recognition. arXiv:1712.00960
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Computer vision and pattern recognition, pp 510–519
Li Y, Pang Y, Shen J, Cao J, Shao L (2020) NETNet: neighbor erasing and transferring network for better single shot object detection. In: Computer vision and pattern recognition, pp 13349–13358
Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Computer vision and pattern recognition, pp 936–944
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Ling S (2020) Hrank: filter pruning using high-rank feature map. In: Computer vision and pattern recognition (CVPR)
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC (2016) SSD: single Shot multibox detector. In: European conference on computer vision, pp 21–37
Qunli YAO, Xian HU, Hong Lei (2019) Aircraft detection in remote sensing imagery with multi-scale feature fusion convolutional neural networks. Acta Geodaetica et Cartographica Sinica 48(10):1266–1274
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Redmon J, Divvala SK, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Computer vision and pattern recognition, pp 6517–6525
Redmon J, Farhadi A (2018) yolov3: an incremental improvement. In: IEEE conference on computer vision and pattern recognition. arXiv:1804.02767
Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: International conference on computer vision, pp 1937–1945
Shen J, Tang X, Dong X, Shao L (2019) Visual object tracking by hierarchical attention siamese network. IEEE Trans Cybern 50(7):3068–3080
Shi W, Bao S, Tan D (2019) FFESSD: an accurate and efficient single-shot detector for target detection. Appl Sci 9(20):4276
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: IEEE conference on computer vision and pattern recognition. arXiv:1409.1556
Song H, Wang W, Zhao S, Shen J, Lam K-M (2018) Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 715–731
Wang W, Shen J, Yu Y, Ma K-L (2016) Stereoscopic thumbnail creation via efficient stereo saliency detection. IEEE Trans Visualization Comput Graphic 23(8):2014–2027
Wang W, Shen J, Cheng M-M, Shao L (2019) An iterative and cooperative top-down and bottom-up inference network for salient object detection. In: Computer vision and pattern recognition, pp 5968–5977
Wang W, Zhao S, Shen J, Hoi SCH, Borji A (2019) Salient object detection with pyramid attention and salient edges. In: Computer vision and pattern recognition, pp 1448–1457
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Computer vision and pattern recognition, pp 528–537
Wang W, Shen J, Yu Y, Ma K-L (2020) Deep learning for autonomous ship-oriented small ship detection. Saf Sci 130:104812
Wang W, Shen J (2017) Deep visual attention prediction. IEEE Trans Image Process 27(5):2368–2378
Wang W, Shen J, Xie J, Cheng M-M, Ling H, Borji A (2019) Revisiting Video Saliency Prediction in the Deep Learning Era. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2924417
Wang W, Shen J, Ling H (2018) A deep network solution for attention and aesthetics aware photo cropping. IEEE Trans Pattern Anal Mach Intell 41(7):1531–1544
Wei H, Zhang Y, Wang B, Yang Y, Li H, Wang H (2020) X-linenet: detecting aircraft in remote sensing images by a pair of intersecting line segments. IEEE Transactions on Geoscience and Remote Sensing. https://doi.org/10.1109/TGRS.2020.2999082
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Computer vision and pattern recognition, pp 4203–4212
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5813–5821
Zhao H, Li Z, Fang L, Zhang T (2020) A balanced feature fusion SSD for object detection. Neural Process Lett 51:2789–2806. https://doi.org/10.1007/s11063-020-10228-5
Zhu H, Chen X, Dai W, Fu K, Ye Q, Jiao J (2015) Orientation robust object detection in aerial images using deep convolutional neural network. In: International conference on image processing, pp 3735–3739
Acknowledgements
This work is supported by the National Natural Science Foundation of China(grant no. 61573168)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
Thesauthors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zheng, Q., Chen, Y. Feature pyramid of bi-directional stepped concatenation for small object detection. Multimed Tools Appl 80, 20283–20305 (2021). https://doi.org/10.1007/s11042-021-10718-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10718-1