1 Introduction

The segmentation of retinal vessels and their morphology such as length, width, tortuosity, and branching patterns can be used for the diagnosis, screening, treatment, and evaluation of various cardiovascular and ophthalmologic diseases [1]. It has been shown that morphological features of retinal vessels in childhood and adulthood are related to cardiovascular risk factors, such as blood pressure and body mass index [2], and both coronary heart disease and stroke in later life [3]. Automatic detection of retinal vessels and the analysis of their morphology are feasible in screening programs for diabetic retinopathy [4], arteriolar narrowing detection [5], detection of foveal avascular regions [6], retinopathy of prematurity evaluation [7], and investigation of general cardiovascular diseases and hypertension [8].

Temporal or multimodal image registration [9], optic disc identification, and fovea localization [10] are possible using automatic algorithms for retinal vessel detection and branch point detection. The retinal vascular tree can even be used for biometric identification, as it has been found to be unique for each individual [11, 12].

The automated analysis of the above-mentioned morphological features is accepted by the medical community [13], as manual measurement is not only time-consuming, but also dependent on the observer and their experience.

Many algorithms and techniques have been proposed for the segmentation of retinal blood vessels using two-dimensional (2D) colored retinal images acquired from a conventional fundus camera [13]. With the introduction of confocal scanning laser ophthalmoscopy (cSLO) [14], it has become possible to image the fundus without the need for mydriatics. Moreover, confocal scanning has several advantages over a conventional fundus camera in terms of convenience for screening.

cSLO uses a focused light beam to scan over the area of the fundus to be imaged [1517], and only a small spot from the fundus is illuminated at any time. The light that returns from this spot determines the brightness of a corresponding point in the generated image or screen. An array of points or pixels can be built up by scanning successive points on the fundus with the light beam. This is achieved using a spinning and oscillating mirror that helps the light beam to scan faster across the fundus.

cSLO images are monochromic and thus differences in spectral information are lost. However, a monochromatic cSLO image differs from a conventional monochromatic image in that the contrast in a confocal image arises mainly from differences in the absorption of incident light [18]. This means that variation in the wavelengths found in a monochromatic fundus camera image differs from that in a confocal image. In a confocal image, the blood vessels show up well when using red illumination, whereas this does not happen in conventional monochromatic images. The resolution of cSLO may be less than that of a conventional fundus camera for structures that have high contrast; however, cSLO may be able to resolve structures that cannot be seen by conventional imaging because of poor contrast. This is due to the nature of the technology, as the visibility of the structure depends on its contrast with its surroundings.

As mentioned previously, cSLO has several advantages. These include:

  1. (a)

    Dilation is no longer required to ensure high-quality results. The burden of fundus investigations on the patient is low, resulting in a high acceptance rate for longitudinal investigations.

  2. (b)

    Examinations can be performed in under 2 min.

  3. (c)

    cSLO is often integrated with other technologies such as spectral-domain optical coherence tomography (SD-OCT), thereby facilitating multimodality imaging, which links different views on a particular disease and opens the research spectrum [19].

Retinal vessels are arterioles and venules starting from the optic disc and are spread out over the fundus. There is a strong light reflex along the centerline of the retinal vessels that is more apparent on arterioles than venules and in younger compared to older participants.

As mentioned before, methods for retinal vessel segmentation using 2D colored retinal images acquired from a conventional fundus camera can be divided into two groups, namely rules-based methods and supervised methods. Rules-based methods use vessel tracking, mathematical morphology, matched filtering, model-based locally adaptive thresholding, or deformable models. Supervised methods are based on pixel classification, which consists of classifying each pixel into one of two classes, vessel and non-vessel.

There are several approaches for rules-based methods. Vessel tracking uses the centerlines of vessels to obtain the vasculature of the structure [2025]. An initial set of starting points, for example, at the optic nerve, is established automatically or manually and the vessels are traced from there. Another approach is based on the knowledge that vessels are piecewise linear and connected [7, 2629]; therefore, morphological operators can be used to filter the vasculature from the background. Matched filtering techniques [3036] extract the vessel silhouette from the background using a 2D linear structural element with a Gaussian cross-profile section that is rotated into three dimensions, with the kernel rotated into many different orientations (usually 8 or 12) to fit into the vessels of different configurations. A general framework based on a verification-based multi-threshold probing scheme was presented by Jiang et al. [37]. A deformable or snake model has also been used [38, 39] that evolves by iterative adaption to fit the shape of the desired structure.

Supervised methods are based on training with manually labeled images [4046]. The images are pre-processed using approaches similar to rules-based methods. Thereafter, each pixel is classified as a vessel or non-vessel and then pixels are trained using manually labeled images.

This paper proposes a rules-based method partly adopted from Marin et al. [46], Pal et al. [47], and Chanwimaluang et al. [33]. The method has several steps: (1) image pre-processing for gray-level homogenization and blood vessel enhancement, (2) binary thresholding operation, and (3) removal of falsely detected isolated vessel pixels.

To date, we have not been able to find a method for vessel segmentation in cSLO images. Analogue techniques from fundus photography [33, 46, 47] were adopted for this new image acquisition technique, which is becoming increasingly popular. The purpose of our study is to determine whether existing techniques can be applied to localizing retinal blood vessels in cSLO images.

2 Materials and Methods

To evaluate the vessel segmentation methodology, 31 peripapillary scan acquisition cSLO images of one randomly selected eye of 31 healthy participants (17 female and 14 male; mean age, 64.0 ± 8.2 years) were used. Furthermore, as this approach adopts previous algorithms for conventional fundus photographs, it was evaluated using the dataset digital retinal images for vessel extraction (DRIVE) [43]. DRIVE is a publicly available database consisting of a total of 40 color fundus photographs. The photographs were obtained from a diabetic retinopathy screening program in the Netherlands that comprised 453 subjects between 31 and 86 years of age. This enables comparison with the literature. The Ethics Committee of the Medical Association of Hamburg ruled that approval was not required for this study, as all data were acquired anonymously. The study followed the recommendations of the Declaration of Helsinki (Seventh revision, 64th Meeting, Fortaleza, Brazil) and Good Clinical Practice. Written informed consent was obtained from each patient before any examination procedures were performed. If the patients were not able to give informed consent, they were excluded from the study.

Only patients meeting the inclusion and exclusion criteria were included. The ophthalmic inclusion criteria were (i) best-corrected visual acuity of 0.3 LogMAR or better, (ii) spherical refraction within ±5.0 dioptres (D), (iii) cylindrical correction within ±2.0 D, and (iv) normal results for visual field testing (Humphrey Visual Field Analyzer 30–2 [76 points over the central 30° of the visual field]; Humphrey, San Leandro, CA, USA). The exclusion criteria were (i) intensive alcohol abuse, (ii) body mass index >30 kg/m2, (iii) intraocular pressure ≥21 mmHg, (iv) anterior ischaemic optic neuropathy, (v) high myopia, and (vi) congenital abnormalities of the optic nerve.

Patients underwent various ophthalmic examinations, including (i) assessment of best-corrected visual acuity by auto-refractometry (OCULUS/NIDEK auto-refractometer, OCULUS Optikgeräte GmbH, Wetzlar, Germany) followed by subjective refractometry, (ii) slit lamp-assisted biomicroscopy of the anterior segment, (iii) ophthalmoscopy after medical dilation of the pupil, (iv) visual field testing (Humphrey Visual Field Analyzer 30-2 [76 points over the central 30° of the visual field]), (v) Goldmann applanation tonometry, and (vii) cSLO image acquisition (SPECTRALIS; Heidelberg Engineering, Heidelberg, Germany).

cSLO images were acquired using the SPECTRALIS device (SPECTRALIS software version 6.0a; Heidelberg Engineering), which is a combination of normal SD-OCT and cSLO. In our study, at least three high-resolution peripapillary images were taken. Scans with low fixation or failing retinal nerve fiber layer (RNFL) segmentation (thus possible low quality of cSLO as well) were excluded. To minimize possible variability, all images were acquired by one trained investigator. The criteria for determining the scan quality were a clear fundus image before and after image acquisition and absence of scan or algorithm failures.

All cSLO images were manually segmented by two experienced observers independently. These were set as the gold standard (approximate ground truth) and were compared with the automatically segmented vessels of the algorithm. Statistical analysis was carried out using a commercially available software package (Prism 6 for Mac OSX; GraphPad Software, Version 6.0d). The means and standard deviations were determined, and p values were corrected according to Bonferroni to correct for performing multiple statistical analyses. All p values were two-tailed and a p value of <0.05 was considered to indicate statistical significance. Correlation was performed using Pearson correlation calculations, as the values sampled from the populations followed an approximate Gaussian distribution. The correlation coefficient is indicated by r. One eye of each participant was used for statistical analysis.

3 Calculation

The proposed methods are (1) image pre-processing for gray-level homogenization and blood vessel enhancement, (2) binary thresholding operation, and (3) removal of falsely detected isolated vessel pixels. The input images are monochromatic in the case of cSLO. For color photographs from the DRIVE dataset, only the green channel of the retinal image was selected because it best highlights vessels.

3.1 Image Pre-processing for Gray-Level Homogenization and Blood Vessel Enhancement

cSLO images have high levels of noise; therefore, pre-processing is needed before pixel features can be extracted in the classification step. The pre-processing includes: (a) removal of the vessel central light reflex, (b) homogenization of the background, and (c) enhancement of the vessels. These steps are shown in Fig. 1a–f.

Fig. 1
figure 1

Demonstration of image pre-processing for gray-level homogenization and blood vessel enhancement. a Original image, b Original image magnified before vessel central light reflex removal using morphological opening operation, c Original image magnified after vessel central light reflex removal using morphological opening operation, d Homogenization of background using Gaussian filter, e Further homogenization of background by reducing intensity variations and enhancing contrast and f Enhancement of vessels using morphological Top-Hat transformation

3.1.1 Vessel Central Light Reflex Removal

Retinal blood vessels in cSLO images typically appear darker than the surrounding tissue due to their lower reflectance. Furthermore, inner vessel pixels appear darker than the outer ones; however, many vessels include a central light reflex. For accurate segmentation of the vessel, this bright strip needs to be removed. Therefore, a morphological opening operation is applied using a three-pixel-diameter disc. An example is shown in Figs. 1b, c. This step was not needed for the DRIVE dataset with color fundus photographs.

3.1.2 Homogenization of Background

Apart from the vessels, the fundus contains some areas of non-uniform intensity due to unequal distribution of the RNFL, translucency of the choroid tissue, and variable illumination. For the feature vector operation later on, this variation in intensity needs to be removed as much as possible to improve the performance of vessel segmentation. Firstly, a 3 × 3 mean filter is applied to smooth the image, followed by a Gaussian kernel of size 3 × 3, with a standard deviation of 1.8. With this filtering, the smallest vessels might not be detected, but the overall performance is increased due to noise reduction. Secondly, a background image is generated using a 20 × 20 filter. This background image is subtracted from the former image, resulting in homogenization of the background.

The resultant image is not well distributed, and covers the full range of 0–255 (refer to 8-bit images). To express this linearly, the values are transformed into integers covering the whole range of possible gray levels, leading to a shade-corrected image with reduced background intensity variations and enhanced contrast. To remove the effect of differing illumination, we created a homogenized image I h using the following equation: I h  = I sc  + 128 − max(I sc ) where, max(I sc ) is the maximum value of the image I sc . If the pixel value of I h is less than 0 or greater than 255, then it gets replaced with 0 or 255, respectively.

3.1.3 Enhancement of Vessels

The final step for pre-processing the cSLO image is the vessel enhancement. This is achieved by applying the morphological top-hat transformation. An image with bright retinal features such as the optic disc and the possible presence of exudates of reflection artefacts is generated by applying a morphological opening (disc that is eight pixels in radius). The image is removed from the former generated homogenized image, resulting in the highlighting of the darker structures in the image (i.e., blood vessels, fovea, possible presence of micro-aneurysms or haemorrhages).

3.2 Binary Thresholding

An entropy-based thresholding scheme can be used to distinguish between vessel segments and the background, as it takes into account the spatial distribution of gray levels, and the image pixel intensities are not independent of each other. This step was adopted from Chanwimaluang et al. [33]. We calculated a co-occurrence matrix, T, based on pixel values of image I. As we assume that there are 256 color levels, the size of T is 256 × 256. T contains the structural information of the image, which is obtained by analyzing consecutive pixels and their co-occurrence. Any location, for example, T i,j , will contain the number of times the pixel values i and j occur consecutively. Such consecutive occurrences of the same pixel value can appear horizontally or vertically. If we suppose that P and Q are the height and width of the image, then we can formally describe T as follows:

$$T_{i,j} = \mathop \sum \limits_{m = 0}^{P} \mathop \sum \limits_{n = 0}^{Q} \delta$$

where δ = 1 if either I(mn) = i and I(mn + 1) = j or I(mn) = i and I(m + 1, n) = j; otherwise, δ = 0. Then, the probability of the co-occurrence of pixels i to j is calculated as:

$$P_{ij} = \frac{{T_{i,j} }}{{\mathop \sum \nolimits_{i} \mathop \sum \nolimits_{j} T_{i,j} }}$$

The threshold value can be any value between 0 and 255, inclusive. If we suppose that s is such that 0 ≤ s ≤ 255, then the following quantities of all s are calculated as follows:

$$P_{As} = \mathop \sum \limits_{i = 0}^{8} \mathop \sum \limits_{j = 0}^{8} P_{ij}$$
$$P_{Cs} = \mathop \sum \limits_{i = s + 1}^{255} \mathop \sum \limits_{j = s + 1}^{255} P_{ij}$$

Then, the second-order entropies of the foreground, H As , and the background, H Cs , for all s are calculated as:

$$H_{As} = - .5P_{As} log_{2} P_{As}$$
$$H_{Cs} = - .5P_{Cs} log_{2} P_{Cs}$$

The total second-order entropy, H s , is calculated for all s as:

$$H_{s} = H_{As} + H_{Cs}$$

The final threshold will be s, for which H s is maximum:

$${\text{Threshold}} = \arg \mathop {\hbox{max} }\limits_{xs} H_{x}$$

The result of the binary thresholding operation is demonstrated in Fig. 2a.

Fig. 2
figure 2

Demonstration of binary thresholding and removing falsely detected vessel pixels. a Binary thresholding operation, b post-processing for removing falsely detected isolated vessel pixels

3.3 Post-processing for Removing Falsely Detected Isolated Vessel Pixels

We encountered a significant amount of unnecessary segments that were not connected with other segments and could thus be considered as noise. To remove these unnecessary segments, we applied a method for removing unconnected components. It is obvious that all the unnecessary segments are smaller in size, which means that the total number of pixels inside the unnecessary segments is relatively low.

Component regions are built in the image, and all pixels in a component region are given the same label. In order to remove artefacts, the pixel area in each connected region is measured; during artefact removal, each region connected to an area below p is reclassified as a non-vessel. An image after the removal of all non-vessel classified pixels is shown in Fig. 2b.

4 Results

In order to quantify the algorithmic performance of the proposed method on a fundus image, the resulting segmentation was compared to its corresponding ground truth image. This image was obtained using a manually created vessel mask, in which all vessel pixels are set to one and all non-vessel pixels are set to zero. Thus, automated vessel segmentation performance can be assessed. Algorithmic performance was based on a vessel pixel and non-vessel pixel comparison with the ground truth image and evaluated in terms of sensitivity, specificity, true positive rate, false positive rate, and accuracy.

For the algorithm performance on the DRIVE dataset versus the ground truth, the results were a sensitivity of 0.6745, a specificity of 0.9714, a true positive rate of 0.6745, a false positive rate of 0.0286, and an accuracy of 0.9334 (Table 1).

Table 1 Average performance measures for color fundus photography (using DRIVE dataset) and cSLO (separated into inter-observer performance and algorithm versus observer performance)

For the first observer versus second observer evaluation, the results were a sensitivity of 0.6579, a specificity of 0.9699, a true positive rate of 0.6579, a false positive rate of 0.0301, and an accuracy of 0.9401. For the algorithm performance versus the first observer evaluation, the results were a sensitivity of 0.7542, a specificity of 0.8607, a true positive rate of 0.7542, a false positive rate of 0.1393, and an accuracy of 0.8508. The best and worst algorithm segmentations are given in Table 2. Individual performance values for all 31 patients are shown in Fig. 3.

Table 2 Best and worst cases of segmentation results
Fig. 3
figure 3

Individual performance values for all 31 patients compared to manual segmentation of first observer

5 Discussion

Methods for retinal vessel detection in fundus recordings, regardless of the image acquisition technique and the resulting image, can be classified into rules-based and supervised methods. This study proposed a rules-based method for retinal vessel detection in monochromatic cSLO fundus images. To the best of our knowledge, this is the first approach for automated vessel detection in cSLO images. The cSLO images are pre-processed for gray-level homogenization and blood vessel enhancement, followed by a binary thresholding operation and removal of falsely detected isolated vessel pixels. In the best case, the proposed approach for 31 previously described cSLO images had a sensitivity of 0.7542, a specificity of 0.8607, a true positive rate of 0.7542, a false positive rate of 0.1393, and an accuracy of 0.8508. For the DRIVE color fundus images, an accuracy of 0.9334 was obtained.

A direct comparison of our approach with other retinal vessel segmentation algorithms is possible using the DRIVE dataset. Published fundus photography vessel segmentation was often performed on the publicly available datasets DRIVE and STARE. The DRIVE database consists of a total of 40 color fundus photographs. The STARE database [31] contains 20 fundus photographs for blood vessel segmentation, 10 of which contain pathology. Performance values are presented in Table 3 (partly adapted from Fraz et al. [13]). Comparing the performance values for DRIVE with the chosen literature shows no differences to very small differences in the selected performance values (sensitivity, specificity, and accuracy) The performance values for cSLO images obtained using our approach are lower compared to those for fundus photographs.

Table 3 Selection of performance measures for different vessel segmentation methodologies using local dataset, DRIVE database, or STARE database

This is the first algorithm for vessel detection in cSLO images, and thus, there is room for further improvement. A possible reason for the performance differences is that the images were different in quality. cSLO may be able to resolve certain structures better than does fundus photography, as the visibility of structures depends on their contrast with surroundings. However, the total resolution of cSLO is less than that of fundus photography. Furthermore, fundus photography enables separate color channel analysis. In most cases, the green channel is extracted as it provides the best vessel/background contrast in color images (the red channel is the brightest color channel and has the lowest contrast, and the blue channel offers poor dynamic range). Therefore, blood-containing elements in the retina are best represented (i.e., have the highest contrast) in the green channel [48]. A combination of two colors can also be used, for example, by using the red and green channels of a given retinal image to correct non-uniform illumination in color fundus images [49]. This can be even further optimized when using different color spaces such as the red–green–blue color space, the hue-saturation-intensity (HSI) color space, or the luminance–in-phase-quadrature color space (by the National Television System Committee [NTSC]) [29]. However, these options do not exist for cSLO images as they are monochromatic, thus lacking potentially relevant segmentation information.

These results contribute to current knowledge that shows that it might be possible to automatically localize retinal blood vessels in monochromatic cSLO images, although with potentially inferior performance compared to that of fundus photography, using a rules-based method.

The potential limitations of this study are as follows. First, only one method was presented, so it might be unclear whether the inferior performance results compared to those of fundus photography are due to our approach or the nature of the technology. Second, no comparison between rules-based and supervised methods was presented, and therefore, further studies that consider other approaches are required. Third, we were unable to use a publicly available database for cSLO, as none exists. As a consequence, comparisons of future algorithms with our approach are less diagnostically significant. However, we are happy to share our dataset (including manually segmented images) with other study groups.

The strengths of this is study are as follows. First, this is the first approach for monochromatic cSLO images showing that vessel localization is possible using this image acquisition technique. Second, the study used high-quality images (taken by an experienced, trained investigator) and manually segmented images from two observers. Third, the proposed approach was tested on the publically available dataset DRIVE for better comparison with the literature.

6 Conclusion

This work demonstrated that it is possible to localize vessels in monochromatic cSLO images of the retina using a rules-based approach adopted from color fundus photography approaches with an accuracy of 0.8508 compared an inter-observer accuracy of 0.9401. The performance results are inferior to those of fundus photography, which could be due to the nature of the technology. Further studies are needed to evaluate alternative approaches for vessel detection.