Keywords

1 Introduction on Edge Detection and Thresholding

Edge detection is an important field in image processing because this process frequently attempts to capture the most important structures in the image. Hence, edge detection represents a fundamental step concerning computer vision approaches. Furthermore, edge detection itself could be used to qualify a region segmentation technique. Additionally, the edge detection assessment remains very useful in image segmentation, registration, reconstruction or interpretation. Hence, it is hard to design an edge detector which is able to extract the exact edge with good localization and orientation from an image. In the literature, different techniques have emerged and, due to its importance, edge detection continues to be an active research area [1]. The best-known and useful edge detection methods are based on gradient computing first-order fixed operators [2, 3]. Oriented operators compute the maximum energy in an orientation [4,5,6] or two directions [7]. Typically, these methods are composed of three steps:

  1. 1.

    Computation of the gradient magnitude and its orientation \(\eta \), see Fig. 1.

  2. 2.

    Non-maximum suppression to obtain thin edges: the selected pixels are those having gradient magnitude at a local maximum along the gradient direction \(\eta \) which is perpendicular to the edge orientation.

  3. 3.

    Thresholding of the thin contours to obtain an edge map.

Thus, Fig. 1 exposes the different possibilities of gradient and its associated orientations involving several edge detection algorithms compared in this paper.

The final step remains a difficult stage in image processing, however it represents a crucial operation to compare several segmentation algorithms. In edge detection, the hysteresis process uses the connectivity information of the pixels belonging to thin contours and thus remains a more elaborated method than binary thresholding. Simply, this technique determines a contour image that has been thresholded at different levels (low: \(\tau _L\) and high: \(\tau _H\)). The low threshold \(\tau _L\) determines which pixels are considered as edge points if at least one point higher than \(\tau _H\) exists in a contour chain where all the pixel values are also higher than \(\tau _L\), as represented with a signal in Fig. 1. Thus, the lower the thresholds are, the more the undesirable pixels are preserved.

Fig. 1.
figure 1

Gradient magnitude and orientation computation for a scalar image I and example of hysteresis threshold applied along a contour chain. \(I_\theta \) represents the image derivative using a first-order filter at the \(\theta \) orientation (in radians).

Usually, in order to compare several edge detection methods, the user has to try some thresholds to select the ones that appear visually as the best edge maps in quality. However, this assessment suffers from a main drawback: segmentations are compared using the threshold (deliberately) chosen by the user, this evaluation is very subjective and not reproducible. Hence, the purpose is to use the dissimilarity measures without any user intervention for an objective assessment. Finally, to consider a valuable edge detection assessment, the evaluation process should produce a result that correlates with the perceived quality of the edge image, which relies on human judgment [8,9,10]. In other words, a reliable edge map should characterize all the relevant structures of an image as closely as possible, without any disappearance of desired contours. Nevertheless, a minimum of spurious pixels can be created by the edge detector, disturbing at the same time the visibility of the main/desired objects to detect.

In this paper, a novel technique is presented to compare edge detection techniques by using hysteresis thresholds in a supervised way, being consistent with the visual perception of a human. Indeed, by comparing a ground truth contour map with an ideal edge map, several assessments can be compared by varying the parameters of the hysteresis thresholds. This study shows the importance to penalize stronger the false negative points, compared to the false positive points, leading to a new edge detection evaluation algorithm. The experiment using synthetic and real images demonstrated that the proposed method obtains contours maps closer to the ground truth without requiring tuning parameters and outperforms other assessment methods in an objective way.

2 Supervised Measures for Image Contour Evaluations

A supervised evaluation criterion computes a dissimilarity measure between a segmentation result and a ground truth obtained from synthetic data or an expert judgment (i.e. manual segmentation) [11,12,13,14]. In this paper, the closer to 0 the score of the evaluation is, the more the segmentation is qualified as good. This work focusses on comparisons of supervised edge detection evaluations and proposes a new measure, aiming at an objective assessment.

2.1 Error Measures Involving only Statistics

To assess an edge detector, the confusion matrix remains a cornerstone in boundary detection evaluation methods. Let \(G_t\) be the reference contour map corresponding to ground truth and \(D_c\) the detected contour map of an original image I. Comparing pixel per pixel \(G_t\) and \(D_c\), the 1st criterion to be assessed is the common presence of edge/non-edge points. A basic evaluation is composed of statistics; to that end, \(G_t \) and \(D_c \) are combined. Afterwards, denoting \(|\cdot |\) as the cardinality of a set, all points are divided into four sets (see Fig. 3):

  • True Positive points (TPs), common points of \(G_t \) and \(D_c\): \(TP = {\left| D_{c}\cap G_t\right| }\),

  • False Positive points (FPs), spurious detected edges of \(D_c \): \(FP = {\left| D_{c}\cap \lnot G_t\right| }\),

  • False Negative points (FNs), missing boundary points of \(D_c \): \(FN = {\left| \lnot D_{c} \cap G_t\right| }\),

  • True Negative points (TNs), common non-edge points: \(TN = {\left| \lnot D_{c}\cap \lnot G_t\right| }\).

Several edge detection evaluations involving confusion matrix are presented in Table 1. Computing only FPs and FNs [7] or their sum enables a segmentation assessment to be performed. The complemented Performance measure \(P_m^*\) considers directly and simultaneously the three entities TP, FP and FN to assess a binary image and decreases with improved quality of detection.

Another way to display evaluations is to create Receiver Operating Characteristic (ROC) [19] curves or Precision-Recall (PR) [18], involving True Positive Rates (TPR) and False Positive Rates (FPR): \( {TPR }= {{TP}\over {TP + FN}} \) and\({FPR } = {\frac{FP}{FP + TN}}.\) Derived from TPR and FPR, the three measures \(\varPhi \), \( \chi ^2\) and \( F_\alpha \) (detailed in Table 1) are frequently used. The complement of these measures enables to translate a value close to 0 as a good segmentation.

These measures evaluate the comparison of two edge images, pixel per pixel, tending to severely penalize a (even slightly) misplaced contour, as illustrated in Fig. 2. Consequently, some evaluations resulting from the confusion matrix recommend incorporating spatial tolerance. Tolerating a distance from the true contour and integrating several TPs for one detected contour can penalize efficient edge detection methods, or, on the contrary, advantage poor ones (especially for corners or small objects). Thus, from the discussion below, the assessment should penalize a misplaced edge point proportionally to the distance from its true location (some examples in [14], and, as shown in Fig. 2).

Table 1. List of error measures involving only statistics.
Table 2. List of error measures involving distances, generally: \(k=1\) or \(k=2\).
Fig. 2.
figure 2

Different \(D_c\): FPs and number of FNs are the same for \(D_1\) and for \(D_2\).

2.2 Assessment Involving Distances of Misplaced Pixels

A reference-based edge map quality measure requires that a displaced edge should be penalized in function not only of FPs and/or FNs but also of the distance from the position where it should be located. Table 2 reviews the most relevant measures involving distances. Thus, for a pixel p belonging to the desired contour \(D_c\), \(d_{G_t} (p)\) represents the minimal Euclidian distance between p and \(G_t\). If p belongs to the ground truth \(G_t\), \(d_{D_c} (p)\) is the minimal distance between p and \(D_c\). On the one hand, some distance measures are specified in the evaluation of over-segmentation (i.e. presence of FPs), like: \(\Upsilon \), \(D^k\), \(\varTheta \) and \(\varGamma \). On the other hand, \(\varOmega \) measure assesses an edge detection by computing only an under segmentation (FNs). Other edge detection evaluation measures consider both distances of FPs and FNs [9]. A perfect segmentation using an over-segmentation measure could be an image including no edge points and an image having most undesirable edge points (FPs) concerning under-segmentation evaluations (see Fig. 3). Also, another limitation of only over- and under-segmentation evaluations are that several binary images can produce the same result (Fig. 2). Therefore, as demonstrated in [9], a complete and optimum edge detection evaluation measure should combine assessments of both over- and under-segmentation.

Among the distance measures between two contours, one of the most popular descriptors is named the Figure of Merit (FoM). Nonetheless, for FoM, the distance of the FNs is not recorded and are strongly penalized as statistic measures (see above). For example, in Fig. 3, \(FoM (G_t, C) > FoM (G_t, M)\), whereas M contains both FPs and FNs and C only FNs. Further, for the extreme cases:

  • if \(FP = 0\): \( FoM \left( G_t, D_{c} \right) = 1 - TP / | G_t | = 1 - (| G_t | - FN) / | G_t |\),

  • if \(FN = 0\): \( FoM \left( G_t, D_{c} \right) = 1 - {1 \over {\max \left( \left| G_t \right| , \left| D_{c} \right| \right) }}\cdot {{\sum _{p \in { D_{c}\cap \lnot G_t}} {1 \over 1 + \kappa \cdot d^2_{G_t}(p)}}}\).

When \(FN > 0\) and FP constant, it behaves like matrix-based error assessments (Fig. 2). Moreover, for \(FP > 0\), the FoM penalizes the over-detection very low compared to the under-detection. On the contrary, the F measure computes the distances of FNs but not of the FPs, so F behaves inversely to FoM. Also, \(d_4\) measure depends particularly on TP, FP, FN and FoM but penalizes FNs like the FoM measure. SFoM and MFoM take into account both distances of FNs and FPs, so they can compute a global evaluation of a contour image. However, MFoM does not consider FPs and FNs at the same time, contrary to SFoM. Another way to compute a global measure is presented in [28] with the edge map quality measure \(D_p\). The right term computes the distances of the FNs between the closest correctly detected edge pixel, i.e. \(G_t \cap D_c\). Finally, \(D_p\) is more sensitive to FNs than FPs because of the coefficient \(1 \over {|I| - |G_t|}\).

Fig. 3.
figure 3

Results of evaluation measures and images for the experiments.

A second measure widely computed in matching techniques is represented by the Hausdorff distance H, which measures the mismatch of two sets of points [24]. This max-min distance could be strongly deviated by only one pixel which can be positioned sufficiently far from the pattern (Fig. 3). To improve the measure, one idea is to compute H with a proportion of the maximum distances; let us note \(H_{15\%}\) this measure for 15% of the values [24]. Nevertheless, as pointed out in [11], an average distance from the edge pixels in the candidate image to those in the ground truth is more appropriate, like \(S^k\) or \(\varPsi \). Eventually, Delta Metric (\(\varDelta ^k\)) [27] intends to estimate the dissimilarity between each element of two binary images, but is highly sensitive to distances of misplaced points [8, 14].

Fig. 4.
figure 4

Number of FNs penalizes \(\lambda \) and computation of a measure minimum score.

A new objective edge detection assessment measure: In [14] a measure of the edge detection assessment is developed: it is denoted \(\varPsi \) (Table 2) and improvements the over-segmentation measure \(\varGamma \), by combining both \(d_{G_t}\) and \(d_{D_c}\), see Fig. 3. \(\varPsi \) gives the same weight for \(d_{G_t}\) and \(d_{D_c}\) in its assessment of errors. Thus, using \(\varPsi \), a missing edge remains not enough penalized contrary to the distance of FPs which could be too important. Another example, in Fig. 3, \(\varPsi (G_t, C) < \varPsi (G_t, T)\) whereas C must be more penalized because of FNs which does not allow to identify the object (also Fig. 5). The solution proposed here is to penalize stronger the distances of the FNs depending on the number of TPs:

$$\begin{aligned} \lambda (G_t, D_c) = { FP + FN \over |G_t|^2} \cdot \!\! {\sqrt{ \sum \limits _{p \in D_{c} }d^2_{ G_t}(p) + \min \left( |G_t|^2, \,\, {|G_t|^2 \over TP^2} \right) \cdot \sum \limits _{p \in G_t }d^2_{D_c}(p) }} \end{aligned}$$
(1)

The term influencing the penalization of FN distances can be rewritten as: \( {|G_t|^2 \over TP^2} =\left( {FN + TP \over TP} \right) ^2 =\left( 1 + {FN \over TP} \right) ^2 \geqslant 1 \), ensuring a stronger penalty for \(d^2_{D_c}\), compared to \(d^2_{ G_t}\). When \(TP = 0\), the min function avoids the multiplication by infinity; moreover, the number of FNs is large, corresponding to a strong penalty with the weight term \(|G_t|^2\) (see Fig. 4 left). When \(|G_t| = TP \), \(\lambda \) is equivalent to \(\varPsi \) and \(\varGamma \) (see Fig. 3, image T). Also, compared to \(\varPsi \), \(\lambda \) penalizes more \(D_c\) having FNs, than \(D_c\) with only FPs, as illustrated in Fig. 3 (images C and T). Finally, the weight \(\frac{|G_t|^2}{ TP^2}\) tunes the \(\lambda \) measure by considering an edge map of better quality when FNs points are localized close to the desired contours \(D_c\).

The next subsection details the way to evaluate an edge detector in an objective way. Results presented in this communication show the importance to penalize stronger the false negative points, compared to the false positive points because the desired objects are not always completely visible by using ill-suited evaluation measure, and, \(\lambda \) provides a reliable edge detection assessment.

2.3 Minimum of the Measure and Ground Truth Edge Image

Dissimilarity measures are used for an objective assessment using binary images. Instead of choosing manually a threshold to obtain a binary image (see Fig. 3 in [9]), the purpose is to compute the minimal value of a dissimilarity measure by varying the thresholds (double loop: loop over \(\tau _L\) and loop over \(\tau _H\)) of the thin edges (see Table in Fig. 1). Thus, compared to a ground truth contour map, the ideal edge map for a measure corresponds to the desired contour at which the evaluation obtains the minimum score for the considered measure among the thresholded (binary) images. Theoretically, this score corresponds to the thresholds at which the edge detection represents the best edge map, compared to the ground truth contour map [8, 12, 30]. Figure 4 right illustrates the choice of a contour map in function of \(\tau _L\) and \(\tau _H\). Since small thresholds lead to heavy over-segmentation and strong thresholds may create numerous false negative pixels, the minimum score of an edge detection evaluation should be a compromise between under- and over-segmentation (detailed and illustrated in [8]).

As demonstrated in [8], the significance of the ground truth map choice influences on the dissimilarity evaluations. Indeed, if not reliable [31], an inaccurate ground truth contour map in terms of localization penalizes precise edge detectors and/or advantages the rough algorithms as edge maps presented in [9, 10]. For these reasons, the ground truth edge map concerning the real image in our experiments is built in a semi-automatic way detailed in [8].

3 Experimental results

In these experiments, the importance of an assessment to penalize stronger the false negative points is enlightened, compared to the false positive points. In order to study the performance of the contour detection evaluation measures, the hysteresis thresholds vary and the minimum score of the studied measure corresponds to the best edge map. The thin edges of both synthetic and real noisy images are computed by five or six edge detectors: Sobel [2], Canny [3], Steerable Filters of order 1 (\(SF_1\)) [4] or 5 (\(SF_5\)) [5], Anisotropic Gaussian Kernels (AGK) [6] and Half Gaussian Kernels (H-K) [7]. Figure 5 presents the results for 14 measures with their associated scores (bars) according to the hysteresis parameters. In the one hand, we must take into account the obtained edge map, and on the other hand the measure score. Generally, the optimal edge map for FoM, SFoM, \(f_2d_6\), \(\varPsi \) and \(\lambda \) measures allows to distinct the majority of the desired edges for each contour detection operator (except Sobel), whereas for the other assessments, contours are too disturbed by undesirable points or distinguished with high difficulty (especially \(\varPsi \) which does not penalizes enough FNs). Note that SFoM measure does not classify the Sobel algorithm as less efficient. Concerning the experiment with a real image in Fig. 6, 8 measures are compared together. For FoM, H, \(\varDelta ^k\) and \(S^{k}\), the ideal edge maps concerning Sobel edge detector are highly corrupted by undesirable contours, the main objects are not recognizable. The other segmentations are also disturbed by undesirable pixels for FoM, H and \(\varDelta ^k\). Moreover, the higher score for \(\varDelta ^k\) (AGK) does not represent the more disturbed map. Ultimately, using \(\lambda \), the essential structures are visible in the optimal contour map for each edge detector (objects are easily recognizable). Moreover, contrary to H, FoM, \(d_4\), \(\varDelta ^k\) and \(S^k\) measures, the scores of \(\lambda \) are coherent, in relation to the obtained segmentations (Sobel and H-K results).

Fig. 5.
figure 5

Comparison of best maps and minimum scores for different evaluation measures. The bars legend is presented in Fig. 6. \(G_t\) and original image are available in Fig. 3.

Fig. 6.
figure 6

Comparison of best maps and minimum scores for different evaluation measures. \(G_t\) and the original real image are presented in Fig. 3.

4 Conclusion and Future Works

This study presents a new supervised edge detection assessment method \(\lambda \) which enables to assess a contour map in an objective way. Based on the theory of the dissimilarity evaluation measures, the objective evaluation allows to evaluate 1st-order edge detectors. Indeed, the segmentation which obtains the minimum score of a measure is considered as the best one. Theory and experiments prove that the minimum score of the new dissimilarity measure \(\lambda \) corresponds to the best edge quality map evaluations, which is similarly closer to the ground truth, compared to the other methods. On the one hand, this new measure takes into account the distances of false positive points, in the other hand, it considers the distance of false negative points tuned by a weight. This weight depends on the number of false negative points: the more it is elevated, the more the segmentation is penalized. Thus, this enables to obtain objectively an edge map containing the main structures, similar to the ground truth, concerning a reliable edge detector. Finally, the computation of the minimum score of a measure does not require tuning parameters, which represents a huge advantage. For this purpose, we plan in a future study to deeply compare the robustness of several edge detection algorithms and use the new measure in object recognition.