Elsevier

Pattern Recognition Letters

Volume 125, 1 July 2019, Pages 597-603
Pattern Recognition Letters

Extended-depth-of-field object detection with wavefront coding imaging system

https://doi.org/10.1016/j.patrec.2019.06.011Get rights and content

Highlights

  • This paper analyzes the influence by defocus on the detection results of state-of-the-art object detection pipeline.

  • Simulation is conducted to demonstrate the superiority on accuracy rate using wavefront coding technique.

  • A novel wavefront coding method is proposed and a based imaging system is designed for extending the depth of field.

  • Comparison experiments are carried out to show the improvement on detection results by applying wavefront coding technique.

Abstract

As an important issue in the field of computer vision, object detection has broad application prospects. Recent researches using convolutional neural networks (CNN) have shown the state-of-the-art results in the challenge competition. Most of them focused on improving the precision under ideal imaging conditions. However, it is hard to ensure that the optical imaging system works in the focused state in practice. In this study, we examine the impact of defocus on detection accuracy. The results show that even the state-of-the-art network is sensitive to a different defocus situation. Thus we put forward wavefront coding (WFC) technique for improving the performance over a large range of depth of field (DOF). Simulation results indicate the improvement on average precision of detection results by applying WFC. In addition, we propose a novel WFC method for overcoming the defects of the traditional one. Then the optical imaging system is designed under the guidance of the proposed theory. Experiments are conducted to suggest that the detection accuracy rate can be enhanced considerably with WFC.

Introduction

In recent years, deep learning has made substantial advances in many fields of computer vision, such as image classification [1], image inpainting [2], semantic segmentation [3], and motion deblurring [4]. Among the application-specific tasks, object detection [5] has extensive prospects for engineering use. This is an integrated framework containing classification, localization and detection. Traditional methods exploit handcrafted features, such as SIFT [6], HOG [7], heat kernel [8], shape model [9] and structural feature [10]. However, the results usually suffer from the dependency on specified datasets. The use of CNN has significantly improved the accuracy and robustness [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22].

Perhaps the first comparatively successful work with CNN features was the OverFeat [11] network using a sliding-window for detection. After that, Girshick et al. proposed R-CNN [12] which combines region proposals and CNNs. SPP-net [13] removed the restriction of fixed input image size with spatial pyramid pooling. Fast-RCNN [14] put forward the ROI pooling that avoided repeated calculations. Faster RCNN [15] relieved the dependence to selective search [16] by generating region proposal network (RPN) before region detection. Above all, these methods improved the accuracy rate by more than one-step detection scheme. The other branch of detection paid more attention to real-time performance with one-step end-to-end pipeline. YOLO [17] was the first architecture predicting bounding boxes and classification from one evaluation. SSD [18] successfully adopted multilayer CNN features for more accurate positioning. YOLO V2 [19] introduced the auxiliary anchors into the grid cells to enhance the accuracy while maintaining the running speed. In addition, more and more schemes [20], [21], [22], [23] were raised to increase the mean average precision (mAP) by well-designed tricks and deeper networks.

Until now, most researches focused on addressing the challenging problems caused by occlusion, deformation and small-size imaging. However, the images collected in the actual scenario rely heavily on the performance of imaging systems. Among the factors influencing the image quality, a fundamental one is the defocus value. In the case of defocus, images are usually blurred because the light rays converge at a plane which deviates from the imaging detector. The blurring effect can also greatly affect the behavior of detection performance. Fig. 1 shows the detection results with YOLO V2 by setting the score threshold as 0.2. As shown in Fig. 1(a), the best focus plane usually gets the best performance. When it comes to the plane with one wavelength deviated from the best focus position, shown in Fig. 1(b), the dog cannot be detected, and the bounding boxes of the car and the bicycle cannot be located as precisely as before.

To overcome the defects of defocus, researchers have tried to increase the DOF by employing apodizer. As a result, the optical intensity reduces too much. Meanwhile, the image resolution degrades severely. In 1995, WFC was proposed by Dowski et al. [24] by inserting a phase mask (PM) at the pupil plane. Thus parallel light rays passing through the optical system no longer converge as a point but a speckle. The optical transfer function (OTF) and point spread function (PSF) remains nearly invariant over a large range of defocus. And the intermediate images captured by the detector need subsequent processing to be sharpened and clarified. This technique has been proved to be successful in extending DOF and restraining kinds of aberrations.

During the past 20 years, the majority of the literatures on this topic have designed many kinds of PMs [25], [26], [27], [28] for improving the extension of DOF. Attempts have also been made for image quality enhancement by two PMs [29], [30] or rotation of one PM [31]. According to the profile, PMs can be divided into two categories: rotationally symmetrical and asymmetrical. Theoretical calculations show that the OTF and PSF of rotationally symmetrical PMs are sensitive to the defocus than that of asymmetrical PMs. Even though the rotationally symmetrical PM performs not as good as asymmetrical one on extension of DOF, its machining and assembly precision can be guaranteed well based on the existing process.

Instead of placing a single phase mask at the pupil plane, recent researches [32], [33] demonstrate that the first or higher order spherical aberrations can be utilized for extending the DOF. This technique is called spherical coding which can be easy to design due to its rotationally symmetric property. Inspired by this, we come up with a new wavefront coding imaging system which contains no specially designed PM. Several elements in the system are combined together for achieving the equal blurry imaging over the assigned region of DOF. We Call this method lens-combined modulated wavefront coding (LM-WFC).

This paper first introduces the basic principle of defocus, and analyzes the theory of wavefront coding. Then the simulation is given to show how the defocus value influences the detection precision, and the mAPs at defocus positions by traditional optical imaging and WFC are demonstrated. The results indicate that WFC successfully improves the accuracy rate. Then the LM-WFC method is described in detail. According to the proposed method, a LM-WFC based system is designed and machined into lens. Comparison experiments are conducted to show the improvement on detection results by applying WFC.

Section snippets

Theory of WFC

WFC is a kind of optical-digital hybrid system (Fig. 2). Two-step imaging enables its ability for extending the DOF. By inserting a PM, the light rays no longer converge as a point but spread as a uniform thin beam near the imaging plane. Over a large region of defocus, the detector can obtain defocus-insensitive sampled intermediate images. Different kinds of phase mask result in different modulation effect on intermediate images. In this study, we take the classic cubic phase mask (CPM) as

Detection results of traditional imaging and WFC in the case of defocus

Among the state-of-the-art object detection methods, we select the Faster RCNN [15] as the representative to show the influence by defocus. Faster RCNN is a single, unified network with links between RPN and detection network. By introducing the assistance of anchors, it establishes the RPN for accelerating the step of region proposal. Various scales and ratios of anchors enable it handle the changes of targets. ROI pooling helps to select scale-invariant features during the detection step. It

Design of LM-WFC system

In this section, we propose the LM-WFC technique which improves the DOF and other parameters. In view of the imaging characteristic, WFC is based on giving up the conjugate image but obtaining the fuzzy image, the tolerance for aberration of the imaging system is looser. In theory, it is possible to increase the aperture and the DOF at the same time through the method of rationally using aberration caused by the aperture increasement.

Different from the traditional WFC, there is not any single

Experiment results

In order to demonstrate how extending DOF method can help improve the detection, we manufacture the designed systems, as shown in Fig. 8. The two systems are tied together to share the same field of view, as shown in Fig. 8(c). We adopt the Fast image deconvolution with hyper-Laplacian algorithm [38] for restoring the intermediate images in LM-WFC system. After that, Faster RCNN [15] is exploited to detect the targets in the final images, and the results are presented in Fig. 7. In the first

Conclusion

In order to reduce the impact of defocus on the detection results in traditional imaging, we come up with a method by applying the WFC technique. The simulation results show that the images obtained by WFC outperform that by traditional imaging on the detection results over a large range of defocus. We also propose a new LM-WFC method to enlarge the DOF and aperture simultaneously. Start with the theory of traditional WFC, the design principle of LM-WFC system is described. Based on that, an

Declaration of competing interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in the manuscript entitled with “Extended-depth-of-field object detection with wavefront coding imaging system”.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC nos. 11774031 and 61705010) and Beijing Science and Technology Project (No. Z181100005918002).

References (38)

  • B. Xiao et al.

    Graph characteristics from the heat kernel trace

    Pattern Recognit.

    (2009)
  • K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556....
  • D. Pathak et al.

    Context encoders: feature learning by inpainting

  • J. Long et al.

    Fully convolutional networks for semantic segmentation

  • J. Sun et al.

    Learning a convolutional neural network for non-uniform motion blur removal

  • M. Everingham

    The pascal visual object classes (VOC) challenge

    Int. J. Comput. Vis.

    (2010)
  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

  • H. Zhang et al.

    Object detection via structural feature selection and shape model

    IEEE Trans. Image Process.

    (2013)
  • X. Bai et al.

    VHR object detection based on structural feature extraction and query expansion

    IEEE Trans. Geosci. Remote Sens.

    (2014)
  • P. Sermanet, et al., Overfeat: integrated recognition, localization and detection using convolutional networks,...
  • R. Girshick et al.

    Rich feature hierarchies for accurate object detection and semantic segmentation

  • K. He

    Spatial pyramid pooling in deep convolutional networks for visual recognition

  • R. Girshick

    Fast R-CNN

  • S. Ren et al.

    Faster R-CNN: towards real-time object detection with region proposal networks

    Adv. Neural Inf. Process. Syst.

    (2015)
  • J.R.R. Uijlings et al.

    Selective search for object recognition

    Int. J. Comput. Vis.

    (2013)
  • J. Redmon

    You only look once: unified, real-time object detection

  • W. Liu et al.

    SSD: single shot multibox detector

  • J. Redmon, A. Farhadi, YOLO9000: better, faster, stronger, arXiv preprint....
  • Cited by (0)

    View full text