Abstract

Threshold segmentation is a very important technique. The existing threshold algorithms do not work efficiently for noisy grayscale images. This paper proposes a novel algorithm called three-dimensional minimum error thresholding (3D-MET), which is used to solve the problem. The proposed approach is implemented by an optimal threshold discriminant based on the relative entropy theory and the 3D histogram. The histogram is comprised of gray distribution information of pixels and relevant information of neighboring pixels in an image. Moreover, a fast recursive method is proposed to reduce the time complexity of 3D-MET from to , where stands for gray levels. Experimental results demonstrate that the proposed approach can provide superior segmentation performance compared to other methods for gray image segmentation.

1. Introduction

Image segmentation is an essential procedure to analyze the image structure as other processing steps heavily depend on its results, and a wide variety of segmentation techniques have been reported in the last two decades. Some of its applications relate to many fields, such as medical imaging [1, 2], document analysis [3, 4], object recognition [5], and SAR segmentation and quality inspection of materials [6].

Image thresholding based on gray level histogram information is an important technique for image segmentation. Most techniques can be roughly categorized into two groups: global thresholding and local thresholding. The former means the process of a whole image with only one threshold, while the latter means that one separates an image into several subimages and then handles each subimage with a selected threshold. Global thresholding is a nonparametric, unsupervised, and self-adapting method. Although the method appears to be simplistic, it is very important and fundamental with wide applicability [79]. The classic global thresholding methods are maximum between-cluster variance (Otsu) [10], minimum error thresholding (MET) [11], and maximum entropy method [12]. In [11], appropriate thresholds are selected by a minimum error criterion. This criterion is designed to minimize the classification error probability based on the assumption that histograms are governed by a mixture of Gaussian densities. Its essence is to consider image binarization as a Gaussian distribution fitting problem. In [13], a survey over 40 thresholding techniques by Mehmet Sezgin and Bülent Sankur shows that MET is the best performing thresholding algorithm.

Noticeably, all 1D thresholding algorithms only utilize 1D histogram of an image, which only represents the grey-level distribution of the image. Its performance degrades immensely when the difference of grey-level distribution between objects and background is inconspicuous; namely, its segmentation capability significantly degrades when the image is corrupted by strong noise. Therefore many 2D thresholding approaches which employ point pixel information and the local average grey level of the neighbourhood pixels have been proposed in [1416]. These methods show satisfactory results for Gaussian noise images. However, they are almost useless if the image is degraded with other types of noise or the mixed noise [17].

Actually, the more information contained in the image can be utilized to obtain a better segmentation. Studies have shown that the mean filter tends to curb the Gaussian noise, and median filter tends to curb the salt-and-pepper noise. Therefore, in this paper we propose a new algorithm called three-dimensional minimum error threshold (3D-MET), which employs point pixel information, the local average grey level, and median gray level of the neighborhood pixels, to cope with the problem of threshold segmentation for mixed noise image. So, here the “3D” refers to 3 parametric dimensions (pixel gray level, area mean gray level, and area median gray level) instead of 3 spacial dimensions, and threshold is applied along each parametric dimension separately. The choice of the thresholds along the three parametric dimensions is made by finding the threshold triplet that globally optimizes a given criterion. The basic idea of the proposed algorithm is to take into consideration more adequately spatial correlation between image pixels and image segmentation so as to reduce the impact of noise.

This paper is organized as follows. In Section 2, the 3D histogram is defined. Section 3 gives a detailed description of the proposed 3D-MET. Section 4 describes the fast recursive method of 3D-MET. The experiment results are discussed in Section 5 and Section 6 gives the concluding remarks of this work.

2. 3D Histogram

Let the pixels of a given image be represented in gray levels . The number of pixels at level is defined by and the total number of pixels by . Now assume the grey level value at coordinate () in the image is defined by . The average gray level and median gray level in neighborhood of the can be defined by and , respectively: where is neighborhood size and usually takes an odd number. Since , it follows that , .

Suppose the frequency of the three-element set () composed of , , and is ; then the joint probability density is defined as :

The distribution of can be summarized in a form of 3D histogram. is defined as the value of one point in 3D histogram, which represents the probability of the (). The domain of the 3D histogram is shown in Figure 1(a). Assume that the optimal threshold () divided the 3D histogram into eight subblocks, numbered 0 through 7, respectively, as shown in Figures 1(b) and 1(c). Generally, the gray level of pixels within an object and background region is symmetrical. This means that the probability of object and background almost always happens near the main diagonal of 3D histogram, whereas those off-diagonal subblocks contain the distributions of the edges and noise. Therefore, subblocks 0 and 1 denote the background and object, respectively, while the others, numbered 2 through 7, denote edges and noise. Conventionally we can suppose reasonably that object and background regions hold the absolute majority of 3D histogram; that is to say, the probability of off-diagonal subblocks is nearly zero; that is,

3. 3D Minimum Error Thresholding (3D-MET)

The 3D histogram defined above can be viewed as an estimate of the probability density function of the mixture population: where and are prior probability of object and background. and are 3D normally distributed with mean , , and , which satisfy where is the covariance matrix.

Now suppose that we dichotomize the pixels into two classes and (background and objects or vice versa) by an optimal threshold (). , , and are the segmentation thresholds corresponding to the original image, the mean filtered image, and the median filtered image, respectively. The probabilities of and are given by and the corresponding class mean levels and variances of and are The correlation coefficients of and , respectively, are The total mean level vector of the 3D histogram is

In the following, we introduce the relative entropy theory proposed in [18], which is used to measure the disparity between two distributions. The smallest value of relative entropy means the least disparity. So, we can use relative entropy to calculate the disparity between 3D histogram and the mixture probability ; that is,

can be simplified and rewritten as (see proof of the statement in the appendix), as shown in

Equation (19) is the criterion function of the proposed 3D-MET. The value of the threshold yielding the lowest value of criterion will give the best fit model and therefore the optimum minimum error threshold ; that is,

4. Fast Recursive Implementation of the 3D-MET

The advocated 3D-MET searches the optimum threshold exhaustively in 3D space, and each search has to start from (0,0,0). It is clear that this exhaustive search for the threshold vector that satisfies (20) is time consuming. In order to compute the value of each , its computation time is . To find out the lowest value of , the count of that we must compute is . So the total computation time for threshold is . In the following we propose a fast recursive method for 3D-MET to reduce its algorithm complexity. The method can be briefly stated as follows.

Step 1. Denote the zeroth-order cumulative moments of the 3D histogram by , the first-order cumulative moments by , , and , and second-order cumulative moments by , , and ; that is,

Step 2. Represent each component of mean levels and variances in (7) through (10) as a function of , ,, , , , or :

Step 3. Create four lookup tables to eliminate redundant calculation:

Step 4. Using the following recursive formula, calculate , , , and : Using the same argument, we have

Step 5. For each recursive result in Step 4, calculate once (19), until finding out the threshold () which minimizes .
While calculating the probabilities of class occurrence and variances of () in Step 4, the accumulation of frequencies is obtained by summing up only several datum terms. Therefore for each (), the computation complexity is decreased from to , and the computation complexity of 3D-MET is decreased from to .

5. Experimental Results and Discussion

Experiments are implemented in Visual C++ 6.0 language under a personal computer with microdevices (Intel Core 2 Duo) 1.83 GHz CPU, 1 G RAM in Windows XP system. Test images are two synthetic aperture radar (SAR) images and two license plate images. Performance is compared in segmentation quality and algorithm efficiency. In this section, the experimental results of our algorithm will be compared to 2D Otsu, 2D maximum entropy, and 2D-MET. For comparison purposes, all above algorithms are implemented recursively and the size of the neighborhood window is set to 3.

5.1. Segmentation Results of Different Noise Images

Figure 2(a) is a SAR image with 2% Gaussian noise, named “SAR1.” The 1D histogram of SAR1 is unimodal. The absence of an internal minimum is indicative of a unimodal histogram, which would correspond to a homogeneous image, as shown in Figure 2(b). Figures 2(c) and 2(d) show the segmentation results of 2D Otsu and 2D maximum entropy. Obviously, neither 2D Otsu nor 2D maximum entropy gets meaningful results; namely, the river is not separated from test image. Figure 2(e) is the result of 2D-MET and it shows that the river is separated from the test image successfully, but its visual effect is not good as Figure 2(f). That is to say, 3D-MET obtains the best segmentation performance.

Figure 3(a) shows the other SAR image with 2% Gaussian noise and 2% salt-and-pepper noise. Due to the interference of mixed noise, the 1D histogram of SAR2 shown in Figure 3(b) is approximated as a normal distribution, and there is an extremely small peak near the coordinate origin. The gray distribution between the river and the land of the SAR2 has biggish diversity. For such kind of image, the goal of segmentation is to separate rivers from the image. Figures 3(c) and 3(d) show that neither 2D Otsu method nor 2D maximum entropy separates the river from background. Figure 3(e) is the result of 2D-MET with threshold (102, 93), which separates rivers from background successfully, but it cannot eliminate salt-and-pepper noise. Owing to the utilization of mean levels and median levels of the neighbourhood pixels, 3D-MET gets the best segmentation result. Figure 3(f) shows that 3D-MET not only separates rivers from background successfully but also eliminates the mixed noise effectively.

5.2. Segmentation Results of Images Disturbed by Nonuniform Illumination

Figure 4(a) is a poorly illuminated license plate image, named “License 1.” From Figures 4(c) and 4(d), we can see that 2D Otsu and 2D maximum entropy suffered from nonuniform illumination too much to separate the license number from background. Figures 4(e) and 4(f) are the results of 2D-MET and 3D-MET with thresholds (31, 47) and (53, 77, 61), respectively. Obviously, both of them separated the license number from background clearly. But the result of the latter is better than the former.

In order to compare the performance of each thresholding in the mixed interference environment, Figure 5(a) is the other license plate image influenced by nonuniform illumination and 2% salt-and-pepper noise, named “License 2.” Figure 5(b) is the 1D histogram of License 2. Figure 5(c) shows the worst result of 2D Otsu with threshold (71, 75). The 2D maximum entropy separated the numbers from the license plate, but it cannot effectively curb the noise in the image, as shown in Figure 5(d). Figures 5(e) and 5(f) show the results of 2D-MET and 3D-MET. Both of the METs completely separated the license numbers; moreover, the border of the license plate and the number “5” in the bottom right corner of the image are clearly separated. By comparing Figures 5(e) and 5(f), it is easy to see that 2D-MET also cannot curb the salt-and-pepper noise and the proposed 3D-MET curbed the noise very well and got the best segmentation performance.

5.3. Evaluation of Segmentation Quality

According to the availability of a segmented image reference, all evaluation criteria can be classified into two categories: supervised or unsupervised evaluation [19]. The 3D-MET proposed in this paper is a nonparametric and unsupervised method, and no reference segmented image is available. Therefore, three criteria belonging to unsupervised evaluation are used in this paper: interregion contrast (IRC), intraregion uniformity (IRU), and inter-intra-criterion (IIC). Based on these normalized criteria, the higher their values within 0 and 1, the better the segmentation.

IRC and IRU introduced by Levine and Nazif are defined by the following expressions, respectively [20]:

IIC proposed by Rosenberger is the mean of IRC and IRU [21]. By using IIC, it will remove any irresolution during the comparison between two segmentations. The IIC is defined by the following expression:

Table 1 summarizes the results of the evaluation criteria obtained with different methods. We can see that the value of IIC is the highest with our method. A visual inspection shows that the discrimination between the regions is better ensured with our method. In addition, for License 1, the values of IRC and IRU of 2D Otsu are higher than those provided by 2D Maximum entropy, but for License 2 the positions are reversed.

5.4. Comparison of Algorithm Efficiency

In the following we compare the thresholds and computation time of each algorithm for the above four images. In this paper, the resolution of SAR1, SAR2, License 1, and License 2 is 486 × 411, 771 × 740, 293 × 250, and 272 × 133, respectively. The gray level of all test images is 256. The detected thresholds and computation time of the algorithms are reported in Table 2. The segmentation performance of the 3D-MET is much better than others, but its recursive search in a 3D space costs more time than the 2D thresholding, as shown in Table 2. For most applications, the proposed method is competent in practical engineering. And for the large SAR images or remote sensing images, the process takes a longer time and it is not suitable for real-time systems.

Table 3 shows the space cost of 3D-MET in different gray levels. The space size is in relation to the count of the gray level. The recursive implementation of 3D-MET needs to store additional four lookup tables, but this is only a small overhead relative to nowadays’ computer memory.

6. Conclusion

Due to the difficulty of the thresholding segmentation of images with mixed noise, we propose a new thresholding algorithm for image segmentation based on 3D histogram and relative entropy theory. Experimental results show that the proposed approach is valid and successful. The quantitative analysis also shows that the proposed method effectively improves the segmentation capability for mixed noise images. And the proposed method can be applied and extended to other classification applications. In addition, it is interesting to point out the following. Firstly, any single algorithm could not be successful for all image types; the combination of more than one thresholding algorithm is worth a try to obtain the robustness. Secondly, the 3D-MET takes more time to locate the optimal threshold than the 2D thresholding. This problem should be solved by processor upgrading or further improvement of the algorithm.

Appendix

The derivation of from is According to (7), (9), and (11)–(13), it follows that Substituting (A.2) into (A.1), we obtain Using the same argument, according to (8), (10), and (14)–(16), it follows that Adding (A.4) to (A.3), we have Because , is constant. So, according to (4), we have

And from the definition of 3D histogram, it is easy to find out that all correlation coefficients in matrix are functions of the neighborhood size . Generally, . So has no relation to the threshold selection and can be viewed as a constant.

Therefore, to minimize is equivalent to minimizing the following function:

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank the reviewers whose valuable suggestions have improved the quality of the paper. This work was supported by the National Natural Science Foundation of China under Project 61262037 and Project 61134002.