Abstract

Transmission estimation is a critical step in single-image dehazing. The estimate of each pixel describes the portion of the scene radiance that is degraded by hazing and finally reaches the image sensor. Transmission estimation is an underconstrained problem, and, thus, various assumptions, priors, and models are employed to make it solvable. However, most of the previous methods did not consider the different assumptions simultaneously, which, therefore, did not correctly reflect the previous assumptions in the final result. This paper focuses on this problem and proposes a method using an energy function that clearly defines the optimal transmission map and combines the assumptions from three aspects: fidelity, smoothness, and occlusion handling, simultaneously. Fidelity is measured by a novel principle derived from the dark channel prior, smoothness is described by the assumption of piecewise smoothening, and occlusion handling is achieved based on a new proposed feature. The transmissions are estimated by searching for the optimal solution of the function that can retain all the employed assumptions simultaneously. The proposed method is evaluated on the synthetic images of two datasets and various natural images. The results show that there is remarkable fidelity and smoothness in the transmission and that a good performance is exhibited for haze removal.

1. Introduction

Images captured in hazy weather have a limited visibility and low contrast [1] owing to the optical attenuation resulting from the scattering of the hydrometeors. There is a high demand for restoring haze-free images from a single hazy input. In photography, this requires correction of the color shift of the photos and enhancement of the overall quality of the photos. In computer vision domain, it provides an estimate of the scene radiance, which is an important input in most analysis or recognition algorithms.

As illustrated in Figure 1, a hazy image is widely modeled as a combination of scene radiance and global airlight , as expressed inwhere is the transmission map, which controls the scene attenuation and amount of haze in every pixel. When the haze is evenly dispersed, the transmission is inversely proportional to the scene depth aswhere is the depth map and is the scattering coefficient, commonly assumed to be a constant [13].

Given a hazy image, the airlight can be estimated by selecting the brightest pixels [2] or concatenating the maximum values of each channel [4]. More robustly, to avoid the disturbance caused by bright objects, He et al. [3] suggested selecting the brightest pixel among the pixels having the most high brightness values of the dark channel. These methods rely on the presence of the sky or dense haze regions. Alternatively, the method of Berman et al. [5] exhibits a comparatively better performance. With a known airlight, recovering the scene radiance, which is the objective of haze removal, is still an under-constrained problem. Additional assumptions, priors, or models (these terms are used alternatively below to express additional constraints) are employed explicitly or implicitly to previous methods to make the problem solvable.

An important category of dehazing methods is enhancement based, which focuses on enhancing the visibility of a hazy image. The employed assumptions are commonly decided based on the scene radiance, and, for visibility, they include high contrast [2, 6, 7], saliency and exposedness [6, 7], luminance, and saturation [7], or multiple quality assessments [8]. The haze removal results are robust, having a high visibility even when dealing with underwater, night time, or poor exposure images. However, these methods commonly do not follow the model of (1) and offer airlight and transmission estimates. An unnatural appearance might be yielded in some cases owing to the lack of physical interpretation. In comparison, a physical-based method is more valid and attracts more attention.

The assumptions of physical-based methods are selected by studying the laws of nature or intuition. A simple intuition is that a pixel with a dense haze appears white. Based on this, Zhu et al. [9] and Li et al. [10] made assumptions (Zhu et al. [9] using machine learning) that there was a simple relationship between the transmission and hazy color at each pixel. However, this assumption failed to provide a viable initial transmission map. The above two methods relied on a guided filter [11] during regularization, which further undermined the assumption (discussed in Section 2.1). Fattal [12] assumed that the transmission and surface shading were locally uncorrelated. The assumption was quite physically reasonable and made impressive contributions. However, it was shown by He et al. [3] that this method could not well handle strongly hazy images. Nishino et al. [13] modeled the scene radiance and gradient of the scene depth using heavy-tail distributions and assumed them to be independent. It was shown by Fattal [14] that the above method tended to underestimate the transmission and produce oversaturated results. Considering independency to be reasonable but not sufficient, assumptions regarding the quality of the scene radiance or transmission are still required.

The dark channel prior proposed by He et al. [3] is a clearly defined prior based on the scene radiance, which attracts significant attention and has tremendous potential [1524] despite receiving criticism for being invalid for uniformly bright objects [19, 25]. However, previously reported methods considered the prior but undermined its merits owing to their employment with the local constant assumption, a Laplacian matting [26], or a guided filter [11] (discussed in Section 2.1). Since the proposal of the dark channel prior, increasing publications have been paying more attention to integrating their assumptions in a prior or model, such as the color-lines model [14], haze-lines model [25], patch recurrence model [27], and model for remote sensing images [28]. The color-lines model proposed by Fattal [14] describes haze-free images containing numerous small patches, which are distributed by the internal colors on a line across the origin in the RGB space. After mixing with the haze, these lines are shifted toward the point of the airlight color. the airlight-intercept indicates the transmission of the pixels on this line. Although the above method exhibits a good performance in some cases, there are also numerous small patches in the resultant images distributed by their internal colors on a line that crosses the airlight vector and not the origin. The method does not consider this condition. Berman et al. [25] proposed a non-local method with the haze-lines model. A haze-line is a gathering of colors in a hazy image that lie on a ray starting from the airlight color in the RGB-space. There are two assumptions for each haze-line: a) pixels on the same haze-line are assumed to have the same scene radiance, and b) the pixel farthest one from the airlight point is assumed to be haze-free. The first assumption has ambiguities as discussed by Berman et al. [25], and the second one is not close to the reality. Therefore, the initial transmission estimates are of low fidelity in some regions and cause oversmoothness after regularization. The patch recurrence model describes the fact that small patches tend to repeat abundantly inside a natural haze-free image, regardless of the scale. Bahat et al. [27] employed this model and solved the transmission map that introduced the maximal internal patch recurrence in the haze removal method. Assuming a haze-free image has maximal internal patch recurrence is questionable. Indeed, for haze removal, the method produces numerous structures and sometimes more than natural.

There are also methods that build priors by learning [29, 30], but it is difficult to summarize them explicitly. It is recommended to refer to the reviews and comparisons provided by Ancuti et al. [31], Li et al. [32], or Singh et al. [33].

The priors and models introduced above require to work with the assumptions of transmission consistency (introduced in Section 2.1), which is similar to the fidelity and smoothness measurements in energy minimization. However, few methods attempt to apply them simultaneously; instead they estimate an initial map without considering the smoothness and then regularize it with an ambiguous trade-off for the fidelity. For example, Berman et al. [25] and Fattal [14] made a trade-off between the initial estimate and smoothness. The fidelity was simply defined as the weighted distance between the initial estimate and solution, which did not truly reflect the degree of compliance of the models. As a comparison, Bahat et al. [27] measured fidelity by counting the number of internal patch recurrences and found it to be steadily maintain with its major assumption.

In this study, we focus on this problem and propose an energy function that combines different assumptions to define a good transmission map. All of the employed assumptions are considered simultaneously while searching the optimal solution, which is the estimate of the transmission map. Another drawback in the previous methods is that they barely consider the occlusion problem. This usually occurs when a distant scene is partially occluded and divided into parts by a closer one, such as a wall behind a furcated trunk and the sky behind a net. These divided scenes are commonly assigned transmissions similar to the surroundings and appear veiled in the haze removal result. In this study, we also handle this problem by searching and linking the pixels of a divided scene, settling the constraints to extend the energy function. Along with the fidelity and smoothness assumptions, the effect of occlusion handling is also truly reflected in the optimal solutions.

The contributions of this study are listed below: (a) To solve the problem that traditional methods usually generate an initial transmission map based on partial assumptions and then refine it based on other parts, undermining the former parts, we propose a method that simultaneously considers different assumptions in an optimization framework. (c) To solve the problem that the transmissions of an occluded distant scene are usually over-estimated, we propose a novel feature to establish relationships between the fragmented part and major part and then develop them into a term in our optimization framework.

This paper is organized as follows. Section 2 introduces several assumptions of fidelity, smoothness, and occlusion handling. Related works based on these assumptions and their issues are also discussed. Section 3 introduces the basic algorithm and proposed energy function along with the solving process. The experiments are performed on synthetic images of two datasets and various natural images, which are illustrated in Section 4. The conclusion and future work described are in Section 5.

2. Background

2.1. Assumption of Smoothness

The assumptions for smoothness widely used in transmission maps include: local constant assumption, local linear assumption, and piecewise smoothness assumption.

According to the local constant, assumption transmissions in a small patch are the same, i.e.,where is a small patch centered at . Such an assumption is far from the reality, but easy to use. Therefore, it is used to assist in the initial transmission estimation [2, 3, 14]. For example, Tan et al. [2] aimed to solve a transmission map that improved the contrast of the haze removal result maximally. However, transmissions cannot be solved pixel by pixel because contrast has no definition on a single pixel. Therefore, the local constant assumption was employed, and thus, the pixels in were enhanced based on the same transmission, , with the optimal being that introduced the maximal contrast.

The local constant assumption causes the block effect, as shown in Figure 2(b). This effect was previously removed by employing regularization with the piecewise smoothness assumption or local linear assumption. If a block (related to the shape of in (3)) is small, it can be removed without significantly changing the initial estimate, as in the method of Fattal [14]. Otherwise, a Laplacian matting [26] or guided filter [11], which is better at block effect removal, is employed [3].

Laplacian matting and guided filters are widely employed to regularize the initial transmission map [3, 15, 17]. These algorithms adopt the local linear assumption, i.e.,where are linear coefficients assumed to be constant in and is the color channel. It is assumed that in a small patch, , a transmission is linear to the hazy colors.

The local linear assumption is also much different from the reality because by definition a transmission is only related to the amount of haze (or related to the depth when the haze is evenly dispersed). Moreover, inappropriately, different from employing the local constant assumption, most methods employing a Laplacian matting or guided filter retain such an assumption in their final result, causing the phenomenon of redundant details. As shown in Figure 2(e), the clock, building textures, and vehicles, which are unrelated to the transmissions, can be clearly identified from the map.

The piecewise smoothness assumption describes the intuition that similar pixels adjacent to each other have similar transmissions. Commonly, the similarity is measured in a hazy image, and the assumption is employed in the smoothness term of the energy functions [14, 25] as where is a data term and balances the fidelity and smoothness. The piecewise smoothness assumption is more in accordance with the reality than the previous two assumptions. However, it causes over0-smoothness if the data term has an inappropriately low weight. Examples are shown in Figures 2(c) and 2(f), taken from Berman et al. [25]. Here, the method is based on the haze-lines model, which is not robustly applicable everywhere. The regions that do not fit the model well have low weights in the regularization. The smoothness term plays a strong role in these regions, and, thus, there is over-smoothness; e.g., edges are noticeable between the buildings and sky.

In the proposed method, the piecewise smoothness assumption is chosen because it is the closest one to the reality. The over-smoothness is suppressed because the data term of the proposed energy function is strong everywhere. An example of our estimate is shown in Figure 2(d).

2.2. Assumption of Fidelity

Although various assumptions for fidelity have been proposed (introduced in Section 1), few have an explicit definition and are widely tested. In this respect, the dark channel prior proposed by He et al. [3] is superior among the known assumptions. It has a brief and clear definition based on the scene radiance, as expressed in Although the prior is defined to be only valid in non-sky regions, it is actually also compatible with the sky because the sky can be explained as a purely dark region with zero transmissions. (this might cause an unnatural appearance on the sky, as discussed by Wang et al. and Cui et al., which is ignored in this study).

The prior has exhibited its tremendous effectiveness in single-image dehazing and attracted numerous research studies [15, 17, 19]. However, most research works have not exploited its advantages fully. For examples, He et al. [3] estimated an initial transmission map based on the dark channel prior and local constant assumption, which caused the block effect. Laplacian matting was then employed to remove this effect, and the local linear assumption was introduced, which introduced redundant details. Such details reduce the contrast of a haze removal result, as discussed by Fattal [14] (Note that a higher contrast does not typically imply a better result). Wang et al. [17] assumed that the dark channel prior and local constant assumption were satisfied in each superpixel. Although their method avoided the block effect, the assumption was questionable because the colors in each superpixel tend to be similar, and the dark channel prior easily fails with similar colors. Furthermore, the method employed a guided filter to regularize its initial estimate, also generating redundant details. Similarly, Wang et al. [19] employed a Laplacian matting patch by patch, and Luan et al. [15] used a guided filter for regularization. Both of them did not avoid the redundant details.

Although the method of He et al. [3] has been surpassed by new methods in many comparative studies [32], the prior itself is outstanding. In this paper, we show that the prior can be used to construct a data term for an energy function that clearly reflects the degree of compliance of the solution. With this, the prior again exhibits its superiority.

2.3. Assumption for Occlusion Handling

Figure 3 illustrates an example of the occlusion problem, where the plants near the road are partially occluded and divided by the plants close to the camera (framed in yellow). Without occlusion handling, these isolated parts might be assigned transmissions similar to their surroundings, which overestimate the due values, as shown in Figure 3(d).

Among the introduced methods (in Section 1), that by Fattal [14] is the only one that generates specific solutions for such problems (However, the code is not available to us). The method adopts the similar assumption: pixels with similar colors have similar transmissions. Obviously, this is rather a false positive assumption. The method of He et al. [3] can slightly handle this problem attributed to the local linear assumption concealed in the Laplacian matting; but it is not effective particularly when the isolated part is extremely small, as shown in Figure 3(b). The method of Berman et al. [25] did not introduce any patch-based assumptions; although its initial estimate was fully free from this problem, it was undermined by the regularization with the shortcoming of over-smoothness, as shown in Figure 3(c).

In this study, the pixels of a common but divided scene are searched based on a novel feature, i.e., with the assumption that pixels with similar such features have similar transmissions. This method is similar to the method of Fattal [14], but gives more attention to suppress the false positive cases. The result of the proposed method with occlusion handling is shown in Figure 3(e).

3. Proposed Method

3.1. The Algorithm

The haze model of (1) is followed in this paper. In our practice, few hazy images do not contain a small part of the sky or dense haze region; the airlight estimation approach of He et al. [3] maintains its efficient most of the time. Therefore, we simply employ it to estimate the airlight. The transmissions are estimated by searching an approximately optimal configuration of the proposed energy function. With a known airlight and transmissions, the scene radiance is recovered throughConsidering the residual error, aerial perspective, and noise of the sky region [3], the transmissions are slightly increased by before dehazing, and the removal results are brightened by .

3.2. The Energy Function

The proposed energy function iswhere is the data term, measuring the fidelity of the transmissions at each pixel, is the smoothness term for the piecewise smoothness, and is the constraint term for the occlusion handling. or imply that the two pixels are neighbors in the image space or feature space.

3.3. The Data Term

The data term is derived from the internal constraints of the haze model and dark channel prior. Because the scene radiance is non-negative, we have and, thus,where is the lower bound of the transmission of pixel .

Briefly, pixels satisfying are called dark pixels. The transmission of each dark pixel has only one possibility: it is equal to its lower bound; this is becauseA pixel in a hazy image can be confirmed to be a dark pixel only if it satisfies , and there is still no additional evidence of the possible transmissions of most pixels. In this situation, the local constant assumption can be employed, but it may cause the block effect, as we have discussed above. In this paper, we propose another assumption that is much looser than the local constant assumption. For each pixel , at least one dark pixel in has the same transmission as (the dark pixel must exist in as illustrated in the dark channel prior).

Because every pixel in is a potential dark pixel with transmission , which might equal , we collect all such potentials in a label set aswhere is a circular mask with radius ( are the image width and height). For any two values in , prior or assumption cannot be used to determine that which is better. Thus, the data term is designed only to avoid the values not in , asNote that the data term plays a strong role in every pixel of the hazy image, and, thus, the defect of over-smoothness is suppressed.

3.4. Smoothness Term

Based on the piecewise smoothness assumption, the proposed smoothness term iswhere the set of adjacent pixel pairs, , which is defined in four-connected. The term is a traditional one, except that the similarity between the pixels is measured on not on , as in the studies by Fattal [14] and Berman et al. [25]. This method starts from the intuition that two pixels may still have similar transmissions even if they are distinct in color. Instead, is more in accordance with the intuition on the haze amount, and sometimes it is used as an initial estimate of the transmissions [9, 27]. An example is shown in Figure 4. In the measurement of the similarity in , the result is slightly more consistent and obvious on the structures circled in yellow. Most of time, measuring on or has little difference, but on is faster because it only contains one channel.

3.5. Constraint Term

The constraint term is proposed to handle the occlusion problem. The key point of occlusion handling is to recognize the pixels from a common but divided scene. It is a challenging task and nearly impossible for a low-level method. However, we do not have to recognize all the pixels, but only need to establish several links between them, such as the link illustrated in Figure 5(a). The sky region in the figure is divided by the wires of the suspension. With only the piecewise smoothness assumption, the divided sky regions have no relationship with the major one. They are considered as some types of white objects in the front and are assigned transmissions similar to their surroundings, as shown in Figure 5(c). By settling such links, these regions are collected and assigned reasonable values, as shown in Figure 5(d). Because several links are required, the method to search the pixels of a common but divided scene is allowed to be quite a true negative, making the problem easier to solve than before.

A novel feature is proposed aswherewith being a small mask with side length , and includes the texture content.The shorter the distance between and , the higher the possibility that pixels and belong to a common scene. Figure 5(b) shows two sets of pixels with very similar features, marked as blue and yellow. By linking the pixels of yellow, several isolated sky regions are successfully related to the major one (there are two pixels of yellow in the major sky region). By linking the pixels of blue, several isolated regions are collected together. However, these regions are not linked to the major one directly; they are linked with the pixels of yellow color in the image space, as circled in green in the figure. Note that there are hundreds of such sets of pixels, which are scattered in the suspension. Numerous links are settled between them, and, finally, they seem to be dispersed sky regions, as an integral one.

The first two elements of the feature in (15) are based on the intuitions that (a) there must be some notable difference between the divided scene and its covering; (b) the same colors frequently recur in the divided scene; (c) the same colors frequently recur in the covering. Figure 6 illustrates such intuitions. It shows several pixel-pairs with colors turning from white to brown, which can be found in abundance between the divided background and its covering: the tree. The relationships we intend to build can be easily settled by searching such pixel-pairs and linking them. However, this is rather a false positive method, and, as mentioned above, the method can only be true negative.

The last element of the feature is proposed to tighten the search criteria; it is based on the intuition that small patches centered at the edges between the divided scene and its covering contain similar colors. Then, the similarity in the color is replaced by the similarity in based on the following considerations: (a) pixels similar to each other in are similar to those in ; (b) is easier to use than a set of colors; (c) if a set of pixels () with different label sets is found to have very strong relationships (suppose their transmissions must be the same), there might be fewer choices for their solutions (). Therefore, the similarity in reflects the risk of the solution space reduction more intuitively.

Considering the features of all the pixels in a hazy image yields a point cloud in the feature space. Pixels close to each other in the feature space should be linked and assigned a large weight and vice versa. Such links are built as a graph construction problem, which can be solved well based on the -nearest neighbors method or the -matching method [34]. However, the built graph might be extremely large to solve. In practice, we simplify the graph construction by partitioning the feature space into numerous super-blocks and only linking the pixels of the features in the same super-block. The partition is based on the first six dimensions of the feature space (RGB for and ), and the super-blocks are not of the same size, as shown in Figure 7. The unequal partitioning is because all the intuitions described above occur in a haze-free image, and the conditions are stricter while the haze is denser, and, approximately, the color is whiter.

Given pixel-pair with features in the common super-block, it is included in set if it also satisfieswhere is the Hamming distance and provides the texture content. Each pair of pixels in is assumed to have exactly the same transmission; thus, the constraint term is

Despite the approach of occlusion handling being to avoid a false positive, it still occurs in some cases; e.g., several white containers on a cargo ship, as shown in Figure 5(d) are recognized as a part of the sky and assigned transmissions lower than the reality. As mentioned above, perfect occlusion handling is nearly impossible for a low-level method, and there must be a tradeoff. However, as illustrated in Section 4, the benefit of such an approach is much more obvious than its defect in the haze removal results; thus, it is typically employed in the energy function.

3.6. The Solving Process

Based on the definition of each term, minimization of the energy function in (8) is a discrete optimization problem that can be solved by the graph cut-based algorithm. More specifically, the problem is a multi-label problem; for an eight-bit image, there are 256 labels for the transmission of each pixel, and the node number is exactly the pixel number. Because the distance measurement between in the smoothness term is not a metric, -swap [3537] is employed to solve the optimal solution, as shown in Figure 8(b).

However, in practice, employing an -swap for such a large graph with numerous labels is intolerably time-consuming. Therefore, two modifications are made: (a) while solving the transmissions, the input images are compressed into five-bits; thus, there are only 32 labels. (b) The distance measurement between is replaced by the -norm (i.e., ); thus, -expansion [3537] can be employed instead, which is more robust and efficienct. Figure 8(c) shows the solution after such modifications. As can be seen, it remains close to the optimal solution except at the several abrupt edges, and it is most obvious in the sky region. The method of Farbman et al. [38] is employed to remove this defect by preserving the edges of the transmissions only on the positions that have notable edges of color. Figure 8(d) shows the final solution.

Note that different from most methods we introduce in Section 1 the solved initial estimate based on a group of assumptions. It is then regularized based on another group, we have a clear definition on what is the best case, and such approximation is no more than a fast approach. The concept of considering the assumptions simultaneously is still to be implemented. The difference can be determined by examining how clearly the employed assumptions are reflected in the solution and how close the three solutions in Figure 8 are compared to the initial and final results obtained from the other methods displayed in Figure 2,

To better illustrate the proposed method, a flowchart is presented in Figure 9. As discussed above, the energy function consists of the data term, smoothness term, and constraint term. The assumptions are considered simultaneously in the optimization process.

4. Experiment

The methods of Choi et al. [8], Meng et al. [4], He et al. [3], and Berman et al. [25] are compared with the proposed method. We validate these methods by (a) quantitatively comparing the accuracy of the haze removal results on synthetic images; (b) comparing the quality of the transmission maps and haze removal results qualitatively on natural images. The codes for these methods, including the proposal, are available (Choi et al. [8], http://live.ece.utexas.edu/research/fog/index.html. Meng et al. [4], http://www.escience.cn/people/menggaofeng/research.html. He et al. [3], https://github.com/sjtrny/Dark-Channel-Haze-Removal. Berman et al. [25], https://github.com/danaberman/non-local-dehazing.).

Most methods have their own strategy for estimating the airlight, but they generally differ slightly. We exclude this factor in our experiments by giving the estimate directly (for synthetic images) or uniformly as by He et al. [3] (for natural images). The haze removal results tend to be dark [3], and different methods employ different enhancement approaches to improve the visual appeal. We also exclude this factor by employing a gamma correction (the index is 0.9) uniformly, except for enhancement-based methods, which should consider this problem internally.

4.1. Quantitative Results

For real hazy images, the ground truth of the scene radiance and transmission map is not available. Thus the quantitative comparison is made on synthetic images. Two datasets of synthetic images are employed: D-HAZY [31] and the one of Fattal [14]. D-HAZY is built based on the Middelbury and NYU Depth datasets. Only a part of the Middlebury dataset is used because of the quality of the depth map.

Fattal [14] produced hazy images by combining a haze-free image and an arbitrary airlight, which might be counterintuitive, such as, in case a white balanced image is covered with a mazarine haze. In fact, the airlight is related to the haze-free image as [39], where is the reflectance. This implies that the airlight is a ground truth and a casual assumption might be unfair. Therefore, we reproduce the dataset by employing a white airlight, which might not be exactly the truth but is much more natural. Furthermore, the sky regions in the haze-free images are also painted white according to the airlight. Another problem of the dataset is that the ground truths of the transmission map are of low quality; thus, we improve them manually. A comparison of the original and improved samples is shown in Figure 10 (pay attention to the sky region in the hazy and haze-free images; the hollows and border in the right in the transmission map).

Because the ground truths of haze-free images are known, SSIM [40] (the structural similarity index) is employed to evaluate the results. Table 1 lists the results using D-HAZY and illustrates the outstanding of the proposed method. Figure 11 shows the sample of Mask in Table 1. The result of Choi et al. [8] is gloomy. The result of Meng et al. [4] has an obvious halo-effect in the vicinity of the straw hat and left-hand basket. The result of He et al. [3] is veiled but is obvious on the wall, chairs, and left-hand basket. The results of Berman et al. [25] are over brightened and saturated; furthermore, the table appears inconsistent with the brightness on the right. Comparatively, our results have the tone and brightness close to the ground truth, and the above problems do not appear.

Table 2 lists the results based on the samples modified from the datasets of Fattal [14] (the mean and median are not displayed owing to the small sample size). The same with the configuration of Fattal [14]; for each sample of the haze-free image, three extra tests are performed by adding Gaussian noises of different standard deviations to the hazy image. As shown, our method also has a competitive performance, and the rank is not much affected by the noise in the samples, Mansion and Reindeer.

4.2. Qualitative Results

Figures 1215 show the comparisons performed on natural images (Choi et al. [8] did not estimate the transmissions). These methods are evaluated by examining the performances based on: global, local, and detailed appearance, and occlusion handling. While comparing, the performance of each method is recorded in Table 3, where a check implies the method performs robustly well compared to the other methods, a circle represents that sometimes failure occurs, and cross implies that the method is comparatively weak.

Global appearance implies how much the haze removal result is in accordance with the intuition on the brightness, saturation, or tone in general. For physical-based method with a well estimated airlight, counter-intuitional brightness or saturation is found and the transmission map is overestimated or underestimated globally. It can be derived from (7) thatwhich implies that the intensity of each channel of the haze removal results (affects the brightness and saturation) has a positive correlation with the transmissions. In the displayed samples, the transmission maps of Berman et al. [25] are underestimated; thus, the results are gloomy or oversaturated, as is obvious in the samples of Tian’anmen, Forest, and River. In comparison, the transmission maps of He et al. [3] are robust in an appropriate range, and the results are of reasonable brightness and saturation. The proposed method remains close to that of He et al. [3] because both are based on the dark channel prior. The transmission maps of Meng et al. [4] are slightly higher than the ones of He et al. [3], but do not affect the haze removal results noticeably. The results of Choi et al. [8] are the most gloomy and appear to have an unnatural tone.

Local appearance is evaluated by examining the colors of specific scenes or objects. In the result of Meng et al. [4], as shown in Figure 12, there is a serious halo-effect in the vicinity of the building and the trees on its right. The transmissions appear to gradually vary on the boundaries. In the result of Meng et al. [4], as shown in Figure 13, the transmission edges are not according to the color edges, and in the result, the method produces several blocks of haze. It seems that the estimated transmission of Meng et al. [4] is not well constrained in the image space. The local linear assumption employed by He et al. [3] and the piecewise smoothness assumption employed by Berman et al. [25] and the proposal counter this phenomenon. Therefore, the problem is avoided in the results of these methods.

In the result of Berman et al. [25], Figure 13, the path suddenly becomes gloomy and the leaves turn burned dark in the middle of the scope. This failure is also reflected by the transmissions, which do not reflect these structures consistently. Along with the path, the transmissions fast fall to to reach a low value. In comparison, the transmissions of the other methods fall more gently, resulting in a less gloomy result. Along with the branches and leaves in the front, the transmissions of Berman et al. [25] also fall, whereas the transmissions of the other methods steadily maintain high values. A similar phenomenon also occurs in the result of Berman et al. [25], as shown in Figure 14, where the lawn suddenly turns gloomy in the left.

In the result of the proposed method, as depicted in Figure 14, the goose on the left is colored in purple. The transmissions of the goose are underestimated owing to a false positive case of the occlusion handling, which links the pixels of the goose with the background.

In the result of He et al. [3], as depicted in Figure 13, the sky region is blurry in the transmission map, mixed with the region of the distant path. Thus, the branches distant from this mixed region are veiled compared to the other methods.

The local appearances of the results of Choi et al. [8] are consistent with their global appearances, and do not show abrupt changes.

Detailed appearance is examined in small regions, textures, and edges. Despite the transmission maps of He et al. [3] and the proposal being similar to the global appearance, the haze removal results of He et al. [3] typically have a less contrast than results. This is because redundant details are present in its transmission maps, as proved by Fattal [14]. This problem exists obviously in all the transmission maps of He et al. [3], which reflect all the color variance in the hazy image without any differences. The root of the problem is the employment of the local linear assumption while concealed in the Laplacian matting (or guided filter), which is also widely employed by the latest dehazing methods [15, 17, 19]. In the result of Choi et al. [8], the details are fully lost in the front of Figure 12 and tree of Figure 14.

The occlusion problem also undermines the haze removal result in detail. It easily occurs in the regions of: the gaps between the flags and pillars, as in Figure 12; the gaps among the branches on the right scope, as in Figure 13; and the background behind the gaps of the trunk, as in Figure 14. Without occlusion handling, these regions appear to be veiled and their transmissions tend to be overestimated. The method of Meng et al. [4] barely handles this problem. The method of Berman et al. [25] is not patch-based and thus, is affected slghtly in its transmission maps. However, this advantage is unstable owing to the defect of over-smoothness, such as shown in the result in Figure 12, which fails at occlusion handling. The method of He et al. [3] also slightly handles this problem because of the local linear assumption. Although its transmission maps reflect the variances, they do not do this completely in these regions. In Figure 13, the gaps among the branches are still veiled. In comparison, the proposed method performs well in these regions and is most outstanding for the gaps in Figure 13.

A summary of the comparisons is presented in Table 3. The proposed method performs well in most respects. The employed assumptions are truly reflected in the result. The dark channel prior fully shows its power and leads to a good performance in terms of the global appearance. The piecewise smoothness assumption and occlusion handling lead to a good performance for the local and detailed appearances. A shortcoming of the proposal is that the occlusion handling might lead to inappropriate links, causing underestimation of the transmissions, as shown in Figure 14. However, under a general condition, it simply introduces some details in the transmissions, as shown in Figure 15, where the effects on the haze removal results are unnoticeable. However, it provides a significant improvement once the occlusion problem occurs. Therefore, the function is typically valid if manual selection is undesired.

5. Conclusion

In this paper, we emphasize on the problem that in single-image dehazing most previous methods do not consider their priors or assumptions simultaneously. Commonly, these methods provide an initial estimate of the transmission map based on one group of assumptions and then regularize it based on another group. The fidelity measurement in the regularization in the previous methods is the distance between the initial estimate and solution and did not correctly reflect the degree of compliance with the previous assumptions, thus undermining them.

Several well-known priors and assumptions are introduced, and the relative methods employing them are also discussed. An energy function that collects selective priors and assumptions is proposed. It provides a clear definition of the desired transmission map. Furthermore, a delicate approach to use the dark channel prior with a novel feature for occlusion handling is proposed. The transmission map is estimated by searching the optimal configuration of the energy function, and all the employed priors and assumptions are considered simultaneously in the solving process. Comparison with the-state-of-the-art methods shows that the proposed method delivers an outstanding performance.

Because the major prior employed in the proposal is the dark channel prior, our method also inherits the limitation of the prior: when the scene objects are inherently similar to the atmospheric light and no shadow is cast on them, the dark channel prior is invalid [3]. Our method underestimates the transmission of these objects. Another problem is that the proposed feature for occlusion handling is a false positive in some cases, finally resulting in an underestimation of the transmissions. However, the framework of the energy minimization can be easily extended. For any newly generated prior, an assumption or a feature can be tested to construct new terms and produce better results.

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Project no. 11372074 and 61473090).