IA-Mask R-CNN: Improved Anchor Design Mask R-CNN for Surface Defect Detection of Automotive Engine Parts

Zhu, Haijiang; Wang, Yinchu; Fan, Jiawei

doi:10.3390/app12136633

Open AccessArticle

IA-Mask R-CNN: Improved Anchor Design Mask R-CNN for Surface Defect Detection of Automotive Engine Parts

by

Haijiang Zhu

^*

,

Yinchu Wang

and

Jiawei Fan

College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(13), 6633; https://doi.org/10.3390/app12136633

Submission received: 8 June 2022 / Revised: 27 June 2022 / Accepted: 28 June 2022 / Published: 30 June 2022

(This article belongs to the Section Applied Industrial Technologies)

Download

Browse Figures

Versions Notes

Abstract

:

The detection of surface defects on automotive engine parts is an important part of automobile manufacturing quality assurance. The traditional detection methods rely on manual inspection and can be inaccurate and inefficient, while the existing deep learning-based methods, such as the Mask R-CNN detection method, have insufficient precision for detecting minor defects since the anchor scales design does not consider small defects. To overcome these shortcomings, this paper proposes an IA-Mask R-CNN detection method with an improved anchor scales design. First, an image dataset that contains 560 pictures of surface defects of automotive engine parts is established using a 1080P HDMI high-definition digital microscope capable of recording three million real pixels and labeled manually. Then, the anchor scales suitable for the surface defect detection of automotive engine parts are determined by labeled data analysis and used to improve the anchor design in Mask R-CNN. Finally, the proposed method is compared experimentally with the Faster R-CNN and Mask R-CNN, and qualitative and quantitative analyses are conducted. The experimental results show that, without increasing the number of parameters or training time of the Mask R-CNN, the proposed method performed better in detecting minor, as well as larger defects than the other detection methods.

Keywords:

deep learning; faster R-CNN; Mask R-CNN; anchor scales; minor defect detection

1. Introduction

The detection of surface defects of automotive engine parts is an important step in manufacturing quality assurance. If the automobile has defects that may cause personal injury and property damage, the automobile manufacturer will recall the products. The State Administration for Market Regulation (SAMR) of China statistics indicate that 3.71 million automobiles were recalled because of engine defects in 2021 alone, accounting for 42.5% of all recalls that year [1]. The detection of surface defects is often performed manually with the aid of special equipment by workers trained to recognize surface defects. However, this type of inspection can be inefficient and lacks stability, which directly affects the manufacturing quality.

Although the classical machine vision methods can solve these problems to a certain extent [2,3], with the emergence of Industry 4.0, conventional machine vision methods have become unable to meet flexibility requirements. In the classical machine vision methods, features are processed manually to adapt to a specific field, but their performances depend on the operator’s experience. For this reason, deep learning-based methods with an automatic feature selection have attracted great attention recently.

A convolutional neural network (CNN) is one of the basic deep learning models, and it was proposed in 1995 by LeCun [4]. In CNN, a number of convolution layers and pooling layers are used to process the input data. The CNN training is typically performed using the backpropagation (BP) algorithm, and the classification task is completed by a fully connected (FC) layer. In 2006, Hinton et al. [5] formally proposed the concept of deep learning and paved a path to the research on deep learning in the detection field. In 2012, the Alex Net proposed by Krizhevsky et al. [6] excelled in the Image Net LSVRC-2012 competition with a top-five error rate of 15.3%. However, a large size of the convolution kernel of this network causes high computation complexity and further restricts an increase in the network layer number. To address this problem, Simonyan et al. [7] further investigated network depth and proposed a deeper VGG Net model. This model uses multiple small convolution kernels instead of one large convolution kernel and ensures the feature extraction performance of the network by deepening the network layers. This model performed well in the 2014 Image Net competition, but a further increase in the network depth was limited by a gradient phenomenon. To further increase the network depth, He et al. [8] proposed the Res Net model, where the output of the previous network layer is mapped to the next layer by identity mapping, forming a shortcut connection; this model won in the classification task of the ILSVRC-2015 competition and achieved good results in detection and segmentation tasks.

To apply the various models to different applications of flaw detection and defect classification, many in-depth studies of different network structures have been conducted, and promising research results have been achieved. Girshick et al. [9] published their seminal work on deep learning and R-CNN-based target detection. They used a selective search algorithm to generate candidate regions, and using the CNN, they achieved a mean average precision (MAP) of 53.3% on the VOC2012 dataset. They also proposed a Fast R-CNN containing a region of interest (RoI) pooling layer [10]. This method has further improved the detection precision, but the detection framework of this method could not achieve the end-to-end training, which limited further improvement in the detection rate. To that end, Redmon [11,12] and Farhadi et al. [13] proposed the YOLO (You Only Look Once) series network, which uses a grid to divide an image into regions where the target is detected independently. Although this allows inverse propagation of the loss function through the network and increases the training and detection speeds, it results in poor adaptability to intensive and small object detection. To solve this problem, Ren et al. [14] added a region proposal network (RPN) to the Fast R-CNN and achieved the end-to-end training. This has simultaneously increased the detection precision and training speed, but at the time, there was no framework to implement target detection and case segmentation at the same time. In view of this, He et al. [15] expanded the Fast R-CNN model by adding a parallel branch of target prediction mask to the boundary framework recognition branch and obtained a good performance.

In the production process of the factory, defect detection based on machine vision uses a computer to process the acquired images, which need the support of special image processing analysis and classification software. The images are usually acquired by one or more cameras at the inspection site. The position of the camera is usually fixed. In general, industrial automation systems are designed to inspect known objects only at fixed locations. By lighting and arranging the scene properly, it is convenient to receive the image features for processing and classification. These features are also known in advance. When processing is highly time-constrained or computation-intensive and exceeds the processing power of the main processor, more powerful hardware devices (e.g., DSPs or FPGAs) are used to accelerate processing [16]. Based on this defect framework, Czimmermann et al. [17] reviewed the vision-based defect detection methods, including traditional methods and the latest deep learning-based defect detection methods. To promote intelligent development in factories further, Huang et al. [18] proposed a smart factory architecture to introduce deep learning technology and the Internet of Things into automated defect detection applications in factories. Lian et al. [19] proposed a defect detection method that combines the generative adversarial network and CNN to ensure the detection accuracy of minor surface defects by generating enlarged defect image samples. Tabernik et al. [20] proposed a segmentation deep learning-based method to detect surface cracks. The experiments have shown that this method can achieve a high detection accuracy by using approximately 25–30 defective samples. Recently, there have been few studies on the surface defect detection of automobile engine parts. Since automotive engine parts are small in size (approximately 2.8 cm in length and 1 cm in diameter), their surface defect lengths are of the millimeter order typically, which makes them difficult to recognize visually under normal lighting without using specialized auxiliary equipment. In addition, there are no open-source or reference datasets for the analysis of these types of defects. Furthermore, experimental surface defect detections conducted with the acquired data of automotive engine parts have shown that the two commonly used frameworks in the detection field, the Faster R-CNN and Mask R-CNN, perform poorly in surface defect detection, especially for small defects, due to the lack of targeted analysis of defect features.

Considering all that is mentioned, this paper constructs a surface defect dataset of automobile engine parts using surface defect images with the resolution of three million real pixels acquired on engine parts using a 1080P HDMI high-definition digital microscope and labels the acquired data to optimize the network. Then, a suitable anchor size for the detection segmentation scale of the Mask R-CNN is determined using the labeled data to improve the small defect detection capability. Namely, the appropriate anchor scales for surface defect detection of automotive engine parts are selected by labeled data analysis. Finally, the selected anchor scales are added to the Mask R-CNN structure, proposing an improved anchor Mask R-CNN (IA-Mask R-CNN). Comparative experiments have proven that the proposed model is effective in detecting surface defects of automotive engine parts.

The rest of the paper is organized as follows. The basic structure of the Mask R-CNN and the anchor design of the IA-Mask R-CNN are introduced in detail in Section 2. The acquisition and production of data, the expansion of the dataset, and the comparison experiment results are presented in Section 3. The discussion is given in Section 4. Finally, we conclude our paper in Section 5.

2. Proposed Methods

In this section, the basic structure of the Mask R-CNN is first described in detail. Then, the limitations of the Mask R-CNN model in defect detection of automotive engine parts are analyzed. After that, the optimal anchor scales are obtained by labeled data analysis and the network performances under different anchor scales combinations are compared to verify the effectiveness of the optimal anchor scale. Finally, an improved anchor network (IA-Mask R-CNN) is proposed to improve the defect detection accuracy of automotive engine parts.

2.1. Mask R-CNN

On the basis of the Faster R-CNN, the Mask R-CNN adds a parallel target prediction mask branch to the edge recognition branch. This allows the network to effectively detect a target in an image and also generates a high-quality segmentation mask for each case. In addition, it provides a conceptually simple but versatile segmentation framework of targets, as shown in Figure 1. This includes a convolution backbone network for feature extraction over the entire image, a bounding box used in the classification and regression of RoI, and the network head of the forecast mask.

The mask branch added to the Mask R-CNN is applied to the small-size fully convolutional network (FCN) of each region proposals. This branch predicts segmentation masks in a pixel-to-pixel manner. Meantime, faced with the problem that the RoI pooling cannot achieve the pixel-to-pixel alignment between the input and output of the Faster R-CNN, this paper proposes a simple quantization-free layer, the RoIAlign layer, to ensure the spatial position accuracy. On this basis, the Mask R-CNN performs segmentation forecast on the mask and classification and independently predicts a binary mask for each classification forecast and also performs the classification using the RoI classification branch of the network.

The Mask R-CNN performs two processes: (1) extracts region proposals using the RPN, and (2) outputs a binary mask for each RoI using the mask branch. These processes are conducted in parallel with the prediction of the bounding-boxes offset. During training, the multi-task loss of each sampled RoI is calculated by:

L = L_{c l s} + L_{b o x} + L_{m a s k},

(1)

where L_mask denotes a mask loss, L_cls is the classification loss, which represents a logarithmic number between the target and non-target, and L_box is the regression loss, which is calculated by:

L_{b o x} (t_{i}, t_{i}^{*}) = R (t_{i} - t_{i}^{*}),

(2)

where R is the robust loss function smooth_L₁, which is expressed as:

s m o o t h_{L_{1}} (x) = {\begin{matrix} 0.5 x^{2}, if | x | < 1 \\ | x | - 0.5, o t h e r w i s e \end{matrix} .

(3)

For the regression loss, the following four parameterized coordinates are used:

\begin{array}{l} t_{x} = (x - x_{a}) / w_{a}, t_{y} = (y - y_{a}) / h_{a}, t_{w} = \log (w / w_{a}), t_{h} = \log (h / h_{a}) \\ t_{x}^{*} = (x^{*} - x_{a}) / w_{a}, t_{y}^{*} = (y^{*} - y_{a}) / h_{a}, t_{w}^{*} = \log (w^{*} / w_{a}), t_{h}^{*} = \log (h^{*} / h_{a}) \end{array}

(4)

where x, y, w, and h represent the two coordinates at the center, width, and height of the box, respectively; x, x_a, and x* correspond to the predicted, anchor, and ground-truth boxes (similarly for y, w, and h), respectively. Therefore, it may be regarded as the bounding box regression from an anchor box to a nearby ground-truth box.

The mask branch has a K_m²-dimensional output for each RoI, with each output encoding K binary masks with a resolution of m × m, where K is the number of categories. Using the per-pixel sigmoid, L_mask is defined as a mean cross-entropy loss. For the RoI related to the ground-truth category k, L_mask is defined only for the kth mask while the other masks’ outputs do not contribute to the loss. The definition of L_mask allows the network to generate masks for each category without competition. The segmentation is decomposed into two branches, the classification branch and mask branch. The mask branch corresponding to the output is obtained using the class label predicted by the classification branch.

The RoIAlign layer eliminates the quantization of the RoIPooling layer and accurately aligns the extracted features with the input, i.e., any quantization of the RoI boundaries or bins is avoided; for instance, x/16 is used instead of [x/16] when VGG16 is used as a backbone network. The exact values of the input features of the four uniformly sampled points in each RoI bin are calculated by bilinear interpolation, and the results are compiled by the max-pooling. The RoIPooling and RoIAlign are presented in Figure 2.

2.2. Surface Defect Detection of Automobile Engine Parts Based on Mask R-CNN

The surface defect detection framework of automotive engine parts based on the Mask R-CNN is designed using the testing procedure shown in Figure 3. First, local images of automotive engine parts are collected by a 1080P high-definition digital microscope. The image size is 1280 × 720 pixels. Then, the convolutional characteristics of certain features of representative images are extracted by the backbone network. The features are then input into the RPN and branches with masks. This is followed by the acquisition of high-confidence detection frames in the RPN layer and using the Faster R-CNN to obtain the region information of multiple masks on the mask branch. Finally, the input image, the corresponding detection frame, and the mask information are displayed at the same time.

For a backbone network, this study selects the deeper ResNet101 network to ensure the feature extraction capability. This network can achieve residual mapping while providing a shortcut connection to achieve identity mapping, which solves the problem of accuracy decrease with the network depth. The structure of the ResNet101 is displayed in Figure 4.

In the Mask R-CNN, the number of anchor scales is increased to five to accommodate targets of more scales and thus increase the detection accuracy. The anchor scales of (32, 64, 128, 256, 512) provided by the feature pyramid network (FPN) are used [21]. Tests conducted on the constructed dataset have shown that this design of the anchor scales still had poor detection performance for small targets. As shown in Figure 5, surface defects in the yellow circle were not detected. To solve this problem, the anchor design is improved by using the suitable anchor scales for minor defect detection. The most suitable anchor size is obtained by labeled data analysis. In addition, using the appropriate anchor scales, the performance of the Mask R-CNN in detecting small defects is improved.

2.3. IA-Mask R-CNN

The distribution of labeled data plays an important role in the design of the network. Optimizing the network design according to the statistics and analysis of labeled data can accelerate the convergence and improve the target detection ability of the network. Therefore, we have made statistics on the width and height of the labeled data in the collected dataset, as shown in Figure 6a,b. Moreover, the statistical results are further analyzed on several common anchor scales, as shown in Table 1.

The anchor scales combination adopted by Mask R-CNN is (32, 64, 128, 256, 512). Combined with Figure 6 and Table 1, it can be found that the combination method of Mask R-CNN can cover 94.80% of the samples in height, but only 56.41% in width. The mismatch between the anchor design and the size of the labeled data directly leads to the poor performance of Mask R-CNN in engine surface defect detection. In order to solve this problem, this paper improves the anchor design, the anchor scales are set as (8, 16, 32, 64, 128, 256). This combination can cover 100% in height and 99.87% in width. In addition, the statistical analysis of the anchor ratios is shown in Figure 6c. The anchor ratios adopted by Mask R-CNN is (0.5, 1, 2), which can cover 95.91% of the samples. Therefore, these ratios are also adopted in our method. In order to verify the effectiveness of designing anchor scales based on labeled data distribution, we consider different anchor scales and compare the convergence performance, as shown in Figure 7.

By comparing the convergence performances of the five networks with different anchor scales, 20 epochs of training were conducted, and the results are presented in Table 2, where A represents the baseline, B represents one half of the baseline, C represents a quarter of the baseline, D stands for one-eighth of the baseline, and E stands for one-sixteenth of the baseline. By comparing the results of a total of six statistics, including the total loss and the loss of each part, it has been found that under the anchor scales of (8, 16, 32, 64, 128), the network has a lower convergence value regarding the five aspects, but the convergence of the baseline is poor from all aspects. This result verifies the influence of the anchor scales on the network convergence and shows that labeled data analysis is useful for network design. Therefore, the anchor scales of (8, 16, 32, 64, 128) are added to the Mask R-CNN framework. The Mask R-CNN with the improved anchor design is named IA-Mask R-CNN.

3. Experimental Results and Analysis

The equipment used in constructing the dataset was a 1080P HDMI high-definition digital microscope. All comparative experiments were conducted in the Pycharm2018 development environment using the Python 3.6 programming language. With the exception of the scripts of the JSON files for batch processing (.bash files) and labelme generation, which were written in Notepad++, all other scripts for data preprocessing (.py files) were written using Python3.6 in Pycharm2018. The dataset in the PASCAL VOC format was preprocessed by the labelImg labeling tool before being used in the Faster R-CNN, whereas the dataset in the COCO format was preprocessed by the labelImg labeling tool before being used in the Mask R-CNN. The deep learning environments included: CUDA 9.0.176, cuDNN 7.0.5, Tensorflow_GPU-1.7.0, Tensorboard-1.7.0, and Keras-2.2.4. The loss curve and the network structure diagram were monitored in Tensor board. Due to the limitation in laboratory equipment, the GPU used in the experiment was an NVIDIA GeForce GTX 1050Ti with a video memory of 4 GB.

3.1. Automotive Part Defect Data

The automotive engine parts produced by the Tianjin engine part factory had a length of only 2.8 cm and a diameter of 1 cm, as shown in Figure 8. The surface defects of the parts included bruise damages and machine marks. In practice, to remove unqualified parts from production, workers on the production line must visually identify millimeter-length surface defects of produced parts using a magnifying glass. However, due to a large number of produced parts on a daily basis, the efficiency of this method of inspection is very low.

To improve the efficiency of defect detection and enable the factory to realize intelligent detection, first, a camera sensor capable of capturing high-definition images of the produced parts was considered. A 1080P HDMI digital microscope, shown in Figure 9, was selected for acquiring high-definition images of the parts. The image sensor was capable of producing three million pixels with a static camera resolution of 1280 × 720 and a magnification of 10–220 times. The manual focusing range was from zero to 150 mm.

To improve the recognition of a defect area, an independently developed light source is used together with the camera sensor to obtain an image acquisition dataset. We design the independently developed light source based on the light emitting diode (LED) powered by a lithium battery. The volume of this light source is 305 mm × 49 mm × 28 mm, weighing only 200 g, which can be conveniently fixed on the side of the digital microscope. The entire equipment set is shown in Figure 10. Local image acquisition of bruise damage and machine marks was performed on 560 engine parts, and 560 valid images were obtained. The two categories of defects, namely the bruise damage and machine mark, were manually labeled to validate the effectiveness of the proposed method experimentally. The labeling diagram of automotive engine parts data using the labelImg tool is presented in Figure 11.

3.2. Quantitative Analysis

To verify the effectiveness of the improved anchor points, after the dataset was constructed, 500 randomly selected labeled images were used for training, and the remaining 60 labeled images were used for testing. The surface defect detection accuracies of automotive engine parts of the Faster R-CNN, Mask R-CNN, and proposed IA-Mask R-CNN models were tested. Due to the limitation of the GPU memory, the batch size in the experiment was set to one. The VGG16 and ResNet101 were first pre-trained on the COCO dataset that contained 80 image categories [22]. The networks were then fine-tuned using the pre-training weights and, finally, the Faster R-CNN was iterated 80,000 times and the Mask R-CNN and IA-Mask R-CNN were iterated 10,000 times. The quantitative results are shown in Table 3.

By quantitatively comparing the results of the three detection models, it was observed that, after adding the mask branch, the Mask R-CNN and IA-Mask R-CNN performed better in detecting surface defects of automotive engine parts than the Faster R-CNN and required a shorter training time. In addition, the number of iterations had decreased, and a higher precision was achieved with the support of the ResNet101. After improving the anchor scales, the IA-Mask R-CNN did not require additional training parameters compared to the Mask R-CNN. Under the same pre-training model, training time, and iteration number, the improved network achieved the best accuracy among all the detection models, thus validating the effectiveness of the improved anchor scales.

3.3. Qualitative Analysis

For the purpose of a more intuitive presentation of the comparison results of the three models, part of the test results on the test set are presented and compared qualitatively, as shown in Figure 12. Since no mask branch was added to the Faster R-CNN, the detected defect area could give only an approximate range, and the detection results were not detailed. Although the Mask R-CNN could achieve detailed defect area detection, it did not consider small targets and also ignored the effect of anchor scales, resulting in missed detections. In contrast, the proposed IA-Mask R-CNN, with the improved anchor design, could detect all defects, even small ones, such as those in the red circle in the bottom row of Figure 12. In addition, even for relatively large defect areas, the IA-Mask R-CNN had a better detection performance than the Mask R-CNN, as shown in the blue circle in the bottom row of Figure 12, which further validated the effectiveness of the proposed detection model.

4. Discussion

The detection of surface defects on automotive engine parts is very important for automobile production. The traditional manual detection methods have problems of low accuracy and being inefficient. Deep learning methods have been partially applied in the field of defect detection. However, the existing methods have an inadequate consideration of minor defects. To solve this problem, some researchers proposed a more complex network to improve the adaptability to minor defects. Although the accuracy has been higher, the efficiency has decreased. This paper does not modify the network, but focuses on the sample distribution, especially the analysis of size and ratio. On this basis, this paper improves the anchor design in Mask R-CNN. Meanwhile, this improvement has no additional cost in training and testing. We made a statistical analysis on the dataset and design the corresponding anchors. The effectiveness of the improved anchor design is verified by experiments. Moreover, the proposed method can be easily combined with networks with an anchor design.

5. Conclusions

In the anchor design of the Mask R-CNN, the detection of small targets is ignored, which results in a low accuracy in detecting surface defects of automotive engine parts. In this work, the IA-Mask R-CNN with an improved anchor design is proposed, and the optimal anchor scales are determined by labeled data analysis. The proposed design is compared with the other designs, and results show that the proposed IA-Mask R-CNN has a better performance in detecting minor surface defects of automotive engine parts, as well as defects in larger regions, than the other two designs. However, due to the limitation in video memory of the GPU and a large depth of the ResNet101, some hyper-parameters, such as batch size, cannot be adjusted to the optimum. This remains to be an area for further detection performance improvement of the proposed network. In addition, directions for future research could include the automatic selection of optimal anchor scales based on the statistical properties of a dataset and the improvement in detection and classification performances of the Mask R-CNN on different datasets. In addition, adding an attention mechanism to the model and applying feature fusions to improve the feature representation ability of the model are also worth considering.

Author Contributions

Conceptualization, H.Z.; methodology, H.Z. and Y.W.; formal analysis, Y.W. and J.F.; data curation, J.F.; writing—original draft preparation, H.Z. and Y.W.; writing—review and editing, H.Z. and Y.W.; funding acquisition, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported in part by the National Natural Science Foundation of China (61672084), and the Fundamental Research Funds for the Central Universities (XK1802-4).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset of this article is collected and produced by ourselves, and has been open-source on the website. The address is https://www.kaggle.com/yinchuwang/surfacedefectdetectionofautomotiveengineparts (accessed on 25 June 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

SAMR. Circular of the State Administration of Market Supervision on the Recall of Automobiles and Consumer Good Nationwide in 2021. Available online: https://www.samr.gov.cn/zw/zh/202203/t20220311_340340.html (accessed on 25 June 2022).
Paniagua, B.; Vega-Rodríguez, M.A.; Gomez-Pulido, J.A.; Sanchez-Perez, J.M. Improving the industrial classification of cork stoppers by using image processing and Neuro-Fuzzy computing. J. Intell. Manuf. 2010, 21, 745–760. [Google Scholar] [CrossRef]
Bulnes, F.G.; Usamentiaga, R.; Garcia, D.F.; Molleda, J. An efficient method for defect detection during the manufacturing of web materials. J. Intell. Manuf. 2016, 27, 431–445. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks; MIT Press: Cambridge MA, USA, 1995; Volume 3361. [Google Scholar]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Alex, K.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; Curran Associates: New York, NY, USA, 2012. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Boston, MA, USA, 7–12 June 2015; pp. 1440–1448. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In IEEE Transactions on Pattern Analysis & Machine Intelligence; IEEE: NewYork, NY, USA, 2017. [Google Scholar]
Malamas, E.N.; Petrakis, E.G.; Zeryakis, M.; Petit, L.; Legat, J.D. A survey on industrial vision systems, applications and tools. Image Vis. Comput. 2002, 21, 171–188. [Google Scholar] [CrossRef]
Czimmermann, T.; Ciuti, G.; Milazzo, M.; Chiurazzi, M.; Roccella, S.; Oddo, C.M.; Dario, P. Visual-based defect detection and classification approaches for industrial applications—A survey. Sensors 2020, 20, 1459. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Huang, D.C.; Lin, C.F.; Chen, C.Y.; Sze, J.R. The Internet technology for defect detection system with deep learning method in smart factory. In Proceedings of the 2018 4th International Conference on Information Management (ICIM), Oxford, UK, 25–27 May 2018; pp. 98–102. [Google Scholar]
Lian, J.; Jia, W.; Zareapoor, M.; Zheng, Y.; Luo, R.; Jain, D.K.; Kumar, N. Deep-learning-based small surface defect detection via an exaggerated local variation-based generative adversarial network. IEEE Trans. Ind. Inform. 2019, 16, 1343–1351. [Google Scholar] [CrossRef]
Tabernik, D.; Šela, S.; Skvarč, J.; Skočaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 2020, 31, 759–776. [Google Scholar] [CrossRef] [Green Version]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. ECCV 2014, 2, 5. [Google Scholar]

Figure 1. Mask R-CNN: add mask branch to Faster R-CNN.

Figure 2. Schematic diagram of the RoIPooling and RoIAlign. (a) RoIPooling; (b) RoIAlign.

Figure 3. Defect detection flowchart of automotive engine parts.

Figure 4. The ResNet101 structure.

Figure 5. Automobile engine part defect detection results of the Mask R-CNN. (a) missing defect (yellow circle); (b) missing defect (yellow circle); and (c) missing defect (yellow circle).

Figure 6. Size statistics of samples in collected dataset, the X-axis is the size of samples, the Y-axis is the number of samples: (a) anchor height statistics; (b) anchor width statistics; and (c) anchor ratio statistics.

Figure 7. Network convergence curves at different anchor scales: (a) anchor scales = (32, 64, 128, 256, 512) (baseline); (b) anchor scales = (16, 32, 64, 128, 256) (one half of the baseline); (c) anchor scales = (8, 16, 32, 64, 128) (a quarter of the baseline); (d) anchor scales = (4, 8, 16, 32, 64) (one-eighth of the baseline); and (e) anchor scales = (2, 4, 8, 16, 32) (one-sixteenth of the baseline).

Figure 8. Automobile engine parts.

Figure 9. A 1080P HDMI digital microscope.

Figure 10. Image acquisition equipment for automotive engine parts.

Figure 11. LabelImg of automotive engine parts data.

Figure 12. The comparison results of different methods: (a) Faster R-CNN; (b) Mask R-CNN; and (c) IA-Mask R-CNN.

Table 1. Statistics of labeled data in different anchor scale ranges (%).

	0–2	2–4	4–8	8–16	16–32	32–64	64–128	128–256	256–512
Anchor width	0	0.13	21.03	22.43	8.48	5.43	42.50	0	0
Anchor height	0	0	0	5.20	33.41	20.16	35.85	5.38	0

Table 2. Loss comparison of the network models under different anchor scales.

	Total Loss	Bbox Loss	Cls Loss	Mask Loss	RPN Bbox Loss	RPN Cls Loss
A	0.96	0.14	0.10	0.27	0.44	0.01
B	0.75	0.14	0.09	0.26	0.25	0.01
C	0.69	0.14	0.07	0.26	0.21	0.01
D	0.88	0.21	0.02	0.29	0.35	0.01
E	1.32	0.20	0.02	0.29	0.80	0.01

Table 3. Surface defect detection results of automotive engine parts of the three detection models.

	Faster R-CNN	Mask R-CNN	IA-Mask R-CNN
Backbone	VGG-16	ResNet101	ResNet101
Pretrained	Yes/COCO	Yes/COCO	Yes/COCO
Iterations	80,000	10,000	10,000
Training Time (h)	25	21	21
Bump injury AP (%)	69.7	77.9	81.5
Car mark AP (%)	82.4	91.2	93.9
mAP (%)	76.1	84.6	87.7

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, H.; Wang, Y.; Fan, J. IA-Mask R-CNN: Improved Anchor Design Mask R-CNN for Surface Defect Detection of Automotive Engine Parts. Appl. Sci. 2022, 12, 6633. https://doi.org/10.3390/app12136633

AMA Style

Zhu H, Wang Y, Fan J. IA-Mask R-CNN: Improved Anchor Design Mask R-CNN for Surface Defect Detection of Automotive Engine Parts. Applied Sciences. 2022; 12(13):6633. https://doi.org/10.3390/app12136633

Chicago/Turabian Style

Zhu, Haijiang, Yinchu Wang, and Jiawei Fan. 2022. "IA-Mask R-CNN: Improved Anchor Design Mask R-CNN for Surface Defect Detection of Automotive Engine Parts" Applied Sciences 12, no. 13: 6633. https://doi.org/10.3390/app12136633

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

IA-Mask R-CNN: Improved Anchor Design Mask R-CNN for Surface Defect Detection of Automotive Engine Parts

Abstract

1. Introduction

2. Proposed Methods

2.1. Mask R-CNN

2.2. Surface Defect Detection of Automobile Engine Parts Based on Mask R-CNN

2.3. IA-Mask R-CNN

3. Experimental Results and Analysis

3.1. Automotive Part Defect Data

3.2. Quantitative Analysis

3.3. Qualitative Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI