Computationally efficient, one-pass algorithm for morphological filters

doi:10.1016/j.jvcir.2011.03.005

Journal of Visual Communication and Image Representation

Volume 22, Issue 5, July 2011, Pages 411-420

https://doi.org/10.1016/j.jvcir.2011.03.005 Get rights and content

Abstract

Many useful morphological filters are built as long concatenations of erosions and dilations: openings, closings, size distributions, sequential filters, etc. This paper proposes a new algorithm implementing morphological dilation and erosion of functions. It supports rectangular structuring element, runs in linear time w.r.t. the image size and constant time w.r.t. the structuring element size, and has minimal memory usage.

It has zero algorithm latency and processes data in stream. These properties are inherited by operators composed by concatenation, and allow their efficient implementation. We show how to compute in one pass an Alternate Sequential Filter (ASFⁿ) regardless the number of stages n.

This algorithm opens the way to such time-critical applications where the complexity and memory requirements of serial morphological operators represented a bottleneck limiting their usability.

Highlights

► Efficient, zero-latency, low-memory algorithm for serial morphological filters. ► Allows parallel execution of the operators despite their serial data dependence. ► Allows to obtain the results of an arbitrarilly long serial filter in a constant time. ► On dedicated HW allows to achieve unrivalled performances for large images.

Introduction

Since its introduction in late sixties, the mathematical morphology provides a complete set of image processing tools from filtering [1], [2], multi-scale image analysis [3] to pattern recognition [4], [5], [6]. They have been used in unrivalled number of applications [7], [8]. The most significant examples include biomedical and medical imaging, video surveillance, industrial control, video compression [9], stereology or remote sensing [10].

Nonetheless, not all useful operators can be easily implemented in real time with reasonable memory requirements. In demanding image-interpretation applications requiring a high correct-decision liability, one often uses robust but costly multi-criteria and/or multi-scale analysis.

These applications often consist of a serial concatenation of alternating atomic operators dilation and erosion with progressively increasing computing window called structuring element (SE).

Such operators cannot be parallelized due to the sequential data dependency of the individual atomic operators. The only possibility is to minimize the latency of each atomic operator and consider computing in stream. The latency minimization reduces the time to wait for individual pixel results. The stream computing allows transferring them immediately to the next atomic operator, as soon as they are available and before the entire image is processed. Thus, these atomic operators can work simultaneously, on data delayed in time. In such implementation, one has to sum the individual working memories of every atomic operator. Then also the memory may become penalizing for large, high-resolution images.

The work presented in this paper, aims to propose a new dilation/erosion algorithm with a constant processing time, low latency and low memory requirements for implementation of the individual atomic operators. Consequently, it allows to implement advantageously the following:

1.
Alternate Sequential Filters (ASFs) – that are a concatenation of openings and closings with a progressively increasing structuring element, useful for multi-scale analysis [1].
2.
Size distributions (granulometries) – that are a concatenation of openings allowing to assess the size distribution of a population of objects [3], [11], [12].
3.
Statistical learning – a selected set of morphological operators ξ_i can be separately applied to an image f. Then for every pixel f(x, y), the vector of values (ξ_i(f)(x, y)) can serve as vector of descriptors for pixel-wise learning and classification [6]. Obtaining them may be computationally intensive.

The remainder of the Introduction lists the most known fast algorithms of morphological dilation and erosion and discusses their properties, followed by the explanation of Novelties in this paper.

The Preliminaries, Section 2, introduce the basic principles of dilations, erosions and their combinations.

The Section 3 outlines the principle of the new algorithm: (i) the 2-D decomposition preserving sequential access to data and zero latency, (ii) elimination of useless values, (iii) the conversion of an anti-causal structuring element into a causal one, necessary to preserve the sequential access to data, and (iv) the encoding used to reduce the memory requirements and acceleration of computations.

Sections 4 Algorithm complexity and latency, 5 Memory requirements discuss the properties and Section 6 presents the case of the four stage ASF as an example.

The paper concludes by Benchmarks, general Conclusions and Future extensions, Sections 7 Benchmarks, 9 Future extensions. The commented pseudo code is given in the Appendix.

The mathematical morphology relies on two fundamental, complementary operations: erosion and dilation. They are local, defined within a computing window, specified by the so-called structuring element (SE), characterized by its size, its shape and origin. It is well known that, with the increasing size of the SE, the direct implementation leads to an extremely high computing cost. Although the fastest existing algorithms [13], [14], [15], [16] concentrate mainly on the reduction of the number of comparisons, few of them deal with the latency and memory requirements [17]. Moreover, the minimization of comparison number is not always proportional to the overall performance improvement [15].

In the following paragraphs, we concentrate on the presentation of the existing state of the art. We discuss it from the classical point of view of complexity, based on the number of comparisons. We bring it face to face with the latency and the memory requirements. For each algorithm, we analyze the possibility of its stream implementation since it is a key feature allowing efficient chaining of the atomic operators.

Let us start by the definition of the used basic terms. Consider a system Y = f(X) with X and Y the input and output data streams. By latency understand the distance between the same positions in the two streams. It is a dimensionless value, expressed in number of data samples. It is the sum of several factors:

(1)
operator latency – is induced by non causal operators due to the fact that the value to output depends on future signal samples. Consider a basic max filter y_j = max(x_j−w/2, … , x_j+w/2). One cannot output y_j before having read all x_i until x_j+w/2,
(2)
algorithm latency – some algorithms continue reading the input even after all needed input data are available. Several morphological dilation/erosion algorithms run in two (forward and backward) data scans, e.g. [13], [18]. Typically, in [18], before processing one image line, one needs to read the entire line. For 2-D dilation by a rectangle, implemented separately in the horizontal and vertical direction, one would need to wait the bottom of the image before writing the result. In [13] these forward and backward scans can be done on w pixels long intervals.

For example, the naive implementation of the morphological dilation (Eq. 3) has a considerable computation complexity $O (w$ -1) per pixel, with w the SE width (or area in 2-D), but no algorithm latency.

The operator latency – inherent to the operator – is incompressible. Consequently, the optimization effort should focus on the algorithm latency and the computational complexity.

The first concern related to the latency is therefore the time response of the system. Another concern related to the latency is the memory requirements. This can intuitively be explained by the fact that the latent (meaning “hidden”) data need to be temporarily stored somewhere to not to get lost. Obviously, a large latency requires large storage. An interesting conclusion is that using larger SE will have larger memory requirements.

The scientific community has adopted several approaches to speed up the erosion/dilation computation. The first one, we call direct computation, consists of a straightforward optimization of the computation given the SE shape.

The second approach relies on the SE decomposition into a sequence of reduced SE. Consequently, the optimization effort concentrates on the computation of this smaller SE. The special attention is paid to the SE decomposition into a series of 1-D SE, very popular in numerous applications [19], [20]. It allows better data access, reuse of intermediate results and is easy to parallelize.

In the following, refer to Table 1, Table 2 summarizing the properties of some algorithms cited below. By data memory understand the temporary storage for input or output data if the algorithm uses random data access. For instance the direct implementation needs random access to input data, whereas the output is written sequentially. It includes also the image transposition used by some algorithms. The working memory is any supplementary memory space required by the algorithm. It includes the data structures like FIFOs, LUTs, histograms, etc. Temporary constants, scalar variables, counters, etc., are omitted.

Direct 2-D computation. Optimized algorithms reduce the computing redundancy by using some well-suited data structures to keep the intermediate results. The most natural way is the approach used by Huang et al. [21] for median filtering, by Chaudhuri et al. [22] for rank-order filtering, and later by Van Droogenbroeck and Talbot [23]. They use a histogram to store the values within the span of the SE at some position in the image. During the translation of the SE over later image positions the histogram is updated by inclusion/deletion of the values of the entering/leaving points. The family of available shapes for the SE is arbitrary. On the other hand, using histogram makes that the input data have to be integers.

SE decomposition. It has soon become evident that the SE decomposition offers another possibility to obtain a fast implementation of more complex SEs both on specialized hardware as well as on sequential computers, and the literature soon became abundant see e.g. [24], [25], [26], [27], [28], [29], [30], [31]. The speedup is obtained by dividing the effort in two independent key aspects, an efficient decomposition and the algorithm used for computing the atomic operations.

Various types of decompositions have been proposed. Perhaps the most known decomposition of linear sets is the linear decomposition which comes from the associativity of the dilation, see Matheron [28]. Pecht [29] has proposed a more efficient logarithmic decomposition based on the extreme set of some SE. For example, for a polygon, the extreme set contains the vertices.

Van den Boomgaard and Wester [30] show that the Pecht decomposition can be improved for convex shapes. They propose a decomposition of an arbitrary shape into the union of convex shapes taken from a fixed collection of basis, efficiently decomposable shapes. Coltuc and Pitas [31] propose a factorization based algorithm running efficiently for 2ⁿ signals. Soille et al. [27] propose an extension of the 1-D van Herk algorithm to 2-D. The SE is a line oriented in an arbitrary angle. The decomposition is obtained by saving the 2-D image as an 1-D array, and recomputing the pixel indices correspondingly to the given orientation of the line. Another efficient algorithm has been recently proposed by Urbach and Wilkinson [16]. It decomposes a flat, arbitrary-shape SE by using a set of 1-D chords. The min/max statistics of the chords are stored in LUT.

1-D algorithms. The 1-D algorithms compute the partial 1-D dilations after the SE decomposition into lines.

One of the earliest, and most often used 1-D algorithms, is the van Herk algorithm [13] proposed in 1992. The same algorithm completed by theoretical background was also published by Gil and Werman [32], and later improved by Gevorkian et al. [33] and Gil and Kimmel [14]. The computational complexity is independent of the SE size. It requires two passes on the input data: causal and anti-causal. Consequently, computing in stream is impossible. Another, similar algorithm was proposed in [34] using ring-type buffers. Recently, Clienti et al. [35] propose an interesting modification of the Van Herk algorithm reducing the memory to 2W, implemented on an FPGA.

A different approach has been used by Lemonnier [18]. It identifies and propagates local extrema as long as it is required by the SE size. Again, two passes are needed: causal and anti-causal. Hence, the algorithm latency is N. The stream execution is impossible.

Van Droogenbroeck and Buckley [15] publish an anchor based algorithm for erosions and openings. The anchors are these portions of signal that remain unchanged by the operator. The algorithm gives good performance in terms of the computing time. The erosion can not run in place and stream processing is probably impossible. The principal disadvantage is in using histograms (suited only for integer values, and making the algorithm irregular).

Lemire proposes a fast, stream-processing algorithm for a left-sided SE [17], and later for symetric SE [36]. Both versions simultaneously compute 1-D dilation and erosion, run on floating point data and have low memory requirements and zero latency. However, even though the algorithms are supposed to run in stream, the intermediate storage of coordinates of local extrema actually represents a random access to the input data.

In order to asses the latency of 2-D algorithms we need to consider separately these different algorithm categories:

•
For decompositions of rectangles using R = H ⊕ V (R = rectangle, H/V = horizontal/vertical segment, respectively) the latency in 2-D is a multiplicative factor of the latency in 1-D and the image width. For two pass algorithms, e.g. Lemonnier [18], the 2-D latency equals one image frame. For locally two-pass algorithms, [13], [32], [33], [27] a specific decomposition would have to be found to optimize the latency in 2-D. The same holds also for other 1-D algorithms that in 1-D allow streaming processing [17], [36], [15].
•
Regarding the direct computation in 2-D, though UW [16] could theoretically read/write the input/output images sequentially, the possibility of streaming processing is not mentioned; they also use random accesses to intermediate data. Van Droogenbroeck–Talbot [23] and the naive implementation write output sequentially, but use random accesses to input data.
•
The computational complexity, used in the Table 1, may introduce to the result a supplementary delay – the time to compute the result. Whereas the two latencies are relative to the stream rate, the additional delay depends on the implementation and the computation platform. It can be (1) negligeable as in most R = H ⊕ V decompositions, where the latency prevails, or (2) dominant like in the direct implementation with large SE or in 3-D.

Although one can find several 1-D algorithms running with zero algorithm latency, none of the above cited algorithms combines all the features necessary for efficient implementation of composed operators in the form $ξ = δ_{B_{n}} ε_{B_{n - 1}} \dots δ_{B_{2}} ε_{B_{1}}$ for 2-D images.

Suppose the atomic operators δ, ε implemented using an algorithm with sequential access to data. This allows to run in parallel the entire ξ despite its internal sequential data dependence. If the atomic algorithm, in addition, has zero algorithm latency, then the entire chain ξ inherits the same properties: sequential data access and zero algorithm latency. This is an interesting property, since computing ξ suddenly becomes very efficient: in stream, with only the (further irreducible) operator latency of ξ. See the application example Fig. 4.

In this scope, the novelty of this paper is multiple. It is the only algorithm that combines all necessary features for efficient, parallel implementation of serial morphological operators. It uses a strictly sequential access to data, and can also run in place. The output is produced with zero algorithm latency. The algorithm runs in linear time w.r.t. the image size and constant time w.r.t. the SE size.

Its additional features include: very low memory requirements. A natural support of floating point data (not all previous algorithms can support floating point data). The origin can be arbitrarily placed within the structuring element, which is useful for even sized SE or specific SE decompositions.

Section snippets

Morphological dilation and erosion

Let δ, $ε : L \to L$ be a dilation and an erosion, performed on functions $f \in L$ , defined as f : D → V. Below assume D = supp(f) = Zⁿ, n = 1, 2, … and V = Z or R. δ_B, ε_B are parameterized by a structuring element B, assumed rectangular and flat i.e. B ⊂ D and translation-invariant.

Functional (operating on functions) erosion and dilation by a flat SE defined by extension to functions of the Minkowski set addition/subtraction definitions are given by $[δ_{B} (f)] (x) = [\underset{b \in B}{⋁} f_{b}] (x)$ $[ε_{B} (f)] (x) = [\underset{b \in \hat{B}}{⋀} f_{b}] (x)$ where $^{\land}$ denotes the transposition

Principle of the algorithm

According to the algorithm classification presented in Section 1, the proposed algorithm belongs to the SE decomposition approach using an improved 1-D dilation algorithm.

Algorithm complexity and latency

In this section we shall analyze the latency and the complexity of the algorithm.

Memory requirements

In 1-D, the FIFO size is upper-bounded by the width of the SE, which is the memory-worst case encountered whereever there are no useless data to eliminate. This occurs at all monotonically decreasing intervals of the signal that are longer than the SE width. This is equivalent to results obtained in [15], where for computing ε_Bf(x), only the values f(x_i), with x_i = B(x), are needed. Similar results hold for the dilation.

In 2-D, the memory requirements are given by the decomposition of the

Application

In the following we give as example the implementation of an ASF^λ given by Eq. (5). Rewrite the filter as a concatenation of erosions and dilations ${ASF}^{λ} = δ_{B_{λ}} ε_{B_{λ}} ε_{B_{λ}} δ_{B_{λ}} \dots δ_{B_{1}} ε_{B_{1}} ε_{B_{1}} δ_{B_{1}}$ and reduce it into its canonical form ${ASF}^{λ} = δ_{B_{λ}} ε_{B_{λ} \oplus B_{λ}} δ_{B_{λ}} \dots δ_{B_{1}} ε_{B_{1} \oplus B_{1}} δ_{B_{1}}$ This ASF^λ can be implemented in a stream in one raster scan of the input image. The writing position of the preceding operator in the cascade becomes the reading position of the following operator. The operator latency of the entire ASF will be given by

Benchmarks

This section illustrates the execution time of this algorithm, w.r.t. various criteria, measured on an Intel Core 2 2 GHz CPU, with 2 GB 800 MHz Dual Port RAM, running Linux. The time reported below is the processor time spent in the dilation/erosion algorithm as reported by a profiler (obtained as a mean after several runs to reduce inaccuracy).

The first experiment, see Fig. 5, illustrates the running time w.r.t. the content of the image. We have used a constant and a white-noise image to

Conclusions

This paper proposes a new algorithm for functional dilation or erosion by a flat, rectangular structuring element for 2-D data (easily extensible to n-D images, and SE in form of n-D boxes).

The algorithm has zero algorithm latency and strictly sequential access to data. The combination of these two properties allows their inheritance to operators composed by concatenation. The entire concatenation chain, despite its internal sequential data dependence, can run simultaneously. The algorithm runs

Other SE shapes

The algorithm, described above with rectangles, can also be extended to other shapes decomposable into linear segments (e.g. polygons as in [27]).

Spatially variant SE

This paper is the third step of a wider work towards an efficient implementation of morphological operations with spatially variant structuring elements that are useful for adaptive filters. The first step has been the stream implementation of dilation/erosion of sets [39]. It has the same algorithmic properties: zero latency and optimal memory,

Acknowledgement

The authors thank the anonymous reviewers for their comments and remarks that allowed to improve the paper.

References (41)

J. Serra
Morphological filtering: an overview
Signal Process.
(1994)
S. Mukhopadhyay et al.
An edge preserving noise smoothing technique using multiscale morphology
Signal Process.
(2002)
M. van Herk
A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels
Patt. Recog. Lett.
(1992)
B. Obara
Identification of transcrystalline microcracks observed in microscope images of a dolomite structure using image analysis methods based on linear structuring element processing
Comput. Geosci.
(2007)
B. Chaudhuri
An efficient algorithm for running window pel gray level ranking 2-D images
Patt. Recogn. Lett.
(1990)
M. Van Droogenbroeck et al.
Fast computation of morphological operations with arbitrary structuring elements
Patt. Recog. Lett.
(1996)
X. Zhuang et al.
Morphological structuring element decomposition
Comput. Vis. Graphics Image Process.
(1986)
J. Pecht
Speeding up successive Minkowski operations
Patt. Recog. Lett.
(1985)
D. Lemire
Faster retrieval with a two-pass dynamic-time-warping lowerbound
Patt. Recog.
(2009)
S. Sternberg
Grayscale morphology
Comput. Vis. Graph. Image Process
(1986)

J. Serra

(1988)

E. Dougherty

Mathematical Morphology in Image Processing

(1992)

P. Maragos

Pattern spectrum and multiscale shape representation

IEEE Trans. Pattern Anal. Mach. Intell

(1989)

A. Cord, D. Jeulin, F. Bach. Segmentation of random textures by morphological and linear operators, in: Proceedings of...

R. Haralick et al.

Image analysis using mathematical morphology

IEEE Trans. Pattern Anal. Mach. Intell

(1987)

M. Wilkinson, J. Roerdink, editors. Mathematical Morphology and Its Application to Signal and Image Processing, in:...

P. Salembier et al.

Morphological operators for image and video compression

IEEE Trans. Image Process.

(1996)

P. Soille et al.

Advances in mathematical morphology applied to geoscience and remote sensing

IEEE Trans. Geosci. Remote Sensing

(2002)

R. Sabourin et al.

Off-line signature verification by local granulometric size distributions

IEEE Trans. Pattern Anal. Mach. Intell

(1997)

L. Vincent

Granulometries and opening trees

Fundamenta Informaticae

(2000)

Cited by (0)

View full text

Computationally efficient, one-pass algorithm for morphological filters

Abstract

Highlights

Introduction

Section snippets

Morphological dilation and erosion

Principle of the algorithm

Algorithm complexity and latency

Memory requirements

Application

Benchmarks

Conclusions

Other SE shapes

Spatially variant SE

Acknowledgement

Signal Process.

Signal Process.

Patt. Recog. Lett.

Comput. Geosci.

Patt. Recogn. Lett.

Patt. Recog. Lett.

Comput. Vis. Graphics Image Process.

Patt. Recog. Lett.

Patt. Recog.

Comput. Vis. Graph. Image Process

Mathematical Morphology in Image Processing

Pattern spectrum and multiscale shape representation

IEEE Trans. Pattern Anal. Mach. Intell

Image analysis using mathematical morphology

IEEE Trans. Pattern Anal. Mach. Intell

Morphological operators for image and video compression

IEEE Trans. Image Process.

Advances in mathematical morphology applied to geoscience and remote sensing

IEEE Trans. Geosci. Remote Sensing

Off-line signature verification by local granulometric size distributions

IEEE Trans. Pattern Anal. Mach. Intell

Granulometries and opening trees

Fundamenta Informaticae