Cross-validation of a semantic segmentation network for natural history collection specimens

de la Hidalga, Abraham Nieva; Rosin, Paul L.; Sun, Xianfang; Livermore, Laurence; Durrant, James; Turner, James; Dillen, Mathias; Musson, Alicia; Phillips, Sarah; Groom, Quentin; Hardisty, Alex

doi:10.1007/s00138-022-01276-z

Cross-validation of a semantic segmentation network for natural history collection specimens

Original Paper
Open access
Published: 21 March 2022

Volume 33, article number 39, (2022)
Cite this article

Download PDF

You have full access to this open access article

Machine Vision and Applications Aims and scope Submit manuscript

Cross-validation of a semantic segmentation network for natural history collection specimens

Download PDF

4846 Accesses
4 Citations
93 Altmetric
11 Mentions
Explore all metrics

Abstract

Semantic segmentation has been proposed as a tool to accelerate the processing of natural history collection images. However, developing a flexible and resilient segmentation network requires an approach for adaptation which allows processing different datasets with minimal training and validation. This paper presents a cross-validation approach designed to determine whether a semantic segmentation network possesses the flexibility required for application across different collections and institutions. Consequently, the specific objectives of cross-validating the semantic segmentation network are to (a) evaluate the effectiveness of the network for segmenting image sets derived from collections different from the one in which the network was initially trained on; and (b) test the adaptability of the segmentation network for use in other types of collections. The resilience to data variations from different institutions and the portability of the network across different types of collections are required to confirm its general applicability. The proposed validation method is tested on the Natural History Museum semantic segmentation network, designed to process entomological microscope slides. The proposed semantic segmentation network is evaluated through a series of cross-validation experiments designed to test using data from two types of collections: microscope slides (from three institutions) and herbarium sheets (from seven institutions). The main contribution of this work is the method, software and ground truth sets created for this cross-validation as they can be reused in testing similar segmentation proposals in the context of digitization of natural history collections. The cross-validation of segmentation methods should be a required step in the integration of such methods into image processing workflows for natural history collections.

Evaluating Segmentation Approaches on Digitized Herbarium Specimens

Semantic Segmentation of Herbarium Specimens Using Deep Learning Techniques

Deep learning based approach for digitized herbarium specimen segmentation

Article 31 March 2022

Abdelaziz Triki, Bassem Bouaziz, … Jitendra Gaikwad

1 Introduction

The need to increase global accessibility to natural history collection specimens and to reduce handling and deterioration of valuable, and often fragile, physical samples has spurred the evolution of advanced digitization practices. Early online databases recording specimens’ catalog data have morphed into modern online portals which allow browsing digital specimens including taxonomic data, location, specimen-specific traits and images, along with other types of media (videos, audio recordings, links to related specimens and scientific publications). Collections consisting of millions of diverse specimens facilitated the emergence of high-throughput digitization workflows which also prompted research into novel acquisition methods, image standardization, curation, preservation and publishing. In some areas, this has promoted the creation of successful processing workflows capable of processing high volumes of specimens. However, the advance has not been extended to all areas, resulting in various activities of the digitization workflows which still rely on manual processes, and therefore throttle the speed with which the images can be processed and published. Image quality control and information extraction from specimen labels are among the digitization workflow activities which can benefit from greater automation. In this context, semantic segmentation methods can support the automation and improvement of image quality management and information extraction from images of physical specimens [25]. The processing speed of human operators and the high cost of hiring and training personnel for these activities directly affect the throughput of the whole workflow, which in turn prevents the speedy publishing of digitization results.

The use of artificial intelligence has been proposed to speed up some processing steps of the workflows after image acquisition [13, 29, 32, 35, 36]. The adoption of these methods requires being able to determine whether they are fit for purpose which means that the method is flexible and resilient requiring minimal training and validation for processing different datasets from different collections and institution.

This paper reports on a study that aimed to (a) validate the effectiveness of a semantic segmentation method for use in image sets from collections different from the one in which the model was initially trained on; and (b) validate the adaptability of the segmentation method for use in other types of collections. The purpose of these experiments is to determine whether the segmentation model is robust enough for creating a segmentation service which can then be incorporated into automated workflows supporting to speed up image processing/curation. The resilience to variations in data from different institutions and the portability of the semantic segmentation models across different types of collections are required to assure the reliability of the model in operating conditions.

The target segmentation method is the Natural History Museum, London (NHM), semantic segmentation network created for the segmentation of entomological microscope slide images [10]. The NHM semantic segmentation network (NHM-SSN) has been openly published and used for processing NHM entomology slides. However, its application to collections from other institutions or other types of collections had not been addressed. This paper presents a group of cross-validation experiments designed to test the applicability of NHM-SSN using data from different institutions and two types of collections: microscope slides (from three institutions) and herbarium sheets (from seven institutions). The paper is structured as follows: Section two describes the digitization of microscope slides and herbarium sheets from the perspective of the main features of the images produced in each case and the requirements for further processing that may be improved with segmentation. Section three describes the NHM semantic segmentation network, its development and architecture. Section four describes the design of the cross-validation experiments, including the details of the collections used. Section five presents the results and analysis from the cross-validation experiments. Section six analyzes the results, providing insights into the strengths and weaknesses of the segmentation method. Finally, section seven provides a conclusion and suggestions for further work.

2 Digitization of microscope slides and herbarium sheets

Microscope slides and herbarium sheets collections contain specimens that are close to 2D representations, i.e., although the specimens exist physically three dimensional (length, width, vertical depth), they are pressed flat and the depth dimension is such that for imaging purposes the majority of slides and herbarium sheets can be treated as two-dimensional. The equipment used for imaging in each case can vary between institutions and collections; however, there is overall consistency of characteristics of the images to be produced for each type of these collections.

2.1 Microscope slides

Microscope slide digitization produces images of individual slides that can contain up to four kinds of elements: (1) specimen itself (coverslip area), (2) labels(s), (3) barcode and (4) nomenclatural type labels(s). All these elements are contained within the slide itself. In some slides, labels may be placed on both sides of the slide and require more than one pass through the image acquisition step, producing two images per slide. Figure 1 shows two examples of microscope slides, from the Natural History Museum (NHM) and Naturalis Biodiversity Center (Naturalis). After acquisition, slide images may need further processing, which includes naming and linking the images to the corresponding specimen records, marking type specimens and extracting data from labels.

The goal of the semantic segmentation approach for these types of images is to correctly identify all the image elements and differentiate between the instances present on each image. That is, for semantic segmentation, pixels in an image are assigned to a class corresponding to one of the four image elements types listed above. For instance, segmentation pixels can be assigned the type label so that multiple instances from the same type can be identified (i.e., multiple labels). As the examples from Fig. 1 show, the colors, textures and shapes of elements can vary between collections, for instance, the barcode labels used are clearly different. Notice also that the specimen image in Fig. 1a has more labels than the specimen image Fig. 1b. Other potential issues are the overlapping of labels, for instance in Fig. 1a the type label is placed over one of the larger labels on the slide; this also can happen with barcodes. A further potential issue is text written directly on the slides. In such a case there are no clear borders for the label and the text area merges with the background.

The physical features of the microscope slides influence the resulting images, and these include the type of specimens being preserved, the mounting techniques and curation processes used, and the slides themselves [2]. In this study we consider slides having a standard size of 25 mm × 75 mm (approximately 3″ × 1″) [4]. The resolution of the specimen images used can vary from 900 pixels per inch (ppi)^{Footnote 1} to 28,500 ppi.^{Footnote 2}

2.2 Herbarium sheets

The majority of botanical institutions follow the digitization guidelines of the Global Plants Initiative (GPI) [14, 15] which specifies the elements to include and the resolution for herbarium sheet images. Consequently, most of the images produced when digitizing herbarium sheets are homogeneous. According to these GPI guidelines each herbarium sheet must include: (1) specimen, (2) color chart, (3) scale bar, (4) labels, (5) barcode and (6) institution name. The GPI guidelines also consider the needs for multiple images per specimen because of drawings and letters attached to herbarium specimens, labels covered by specimen parts or information attached to the back of the sheet. Some specimens also have envelopes or capsules containing loose material associated with the specimen (such as seeds, flowers or sprouts). In some workflows these are left unopened. In others these are opened, emptied onto trays and imaged as well. For instance, at RGBK, the digitization process specifies opening the capsules. In this case they take one image of the sheet with the capsule closed. Then one with the capsule open and normally cut and paste the contents in the open capsule on top of the closed capsule image. However, if there is writing on the closed capsule or multiple capsules to open, then multiple images might be used, one with the capsules open and one with capsules shut. Consequently, digitized herbarium sheet specimens can include of more than one image. The GPI guidelines recommend scanning at 600 ppi for archival quality images. Herbarium sheets generally have a standard size of 29 × 43 cm (or 11.4 × 16.9 in), although there is some variation in this. Specimens also vary in size. The goal of applying semantic segmentation for processing images of herbarium sheets is to correctly identify all the image elements and differentiate between the instances of each type present in each image (as shown in Fig. 2).

Despite the existence of the standard guidelines from GPI, there can be several variations in the types of elements used, such as type of color chart, scales and barcodes. Some specimens from MBG (Fig. 2b), for example, have a transparent scale that is sometimes placed in between the specimen parts. The sizes and shapes of color charts are also variable. The Finnish Museum of Natural History, for example, uses two small color charts on the side of the sheet, while others use long color charts spanning the length of the sheet. Some color charts also include a scale bar. Additionally, herbarium sheets can contain more than one specimen, and as a result they may contain more than one barcode. Some barcodes are simple and only include the specimen identifier, while others are printed in labels that also include the name of the institution.

2.3 Image processing and segmentation

The further processing activities of microscope slides and herbarium sheets images that can benefit from segmentation include identification of image elements, identification of regions of interest, identification of nomenclatural type specimens (identification of type labels), verification of image names (reading barcodes) and image quality verification (color, sharpness, cropping).

The presence of the different elements is a basic requirement for both herbarium sheets and microscope slides. Verifying that large batches of images contain the minimal required elements can be delegated to the semantic segmentation process. This identification in turn can facilitate verification of file names and linking to physical specimen records in collection management systems through the barcodes. Similarly, breaking up the large image into smaller regions of interest can also benefit optical character recognition (OCR) processes, improving both speed and accuracy [27]. For a longer discussion of these aspects see the report on a pilot study performed to evaluate the suitability of segmentation comparing processing segmented images vs. processing full images [25].

In addition to the issues highlighted for microscope slides and herbarium sheets, it is important to acknowledge that digitization projects are time and resource limited. Consequently, projects tend to work on subsets of the collections and are separated from one another in time by months or even years. As a result, the quality of the images, the image elements used and the layout of the specimens may change between digitization projects targeting different portions of large collections. This can be the consequence of many factors, including changes in working procedures, standards and best practices applied and equipment and techniques used.

3 Semantic segmentation of natural history specimen images

Image processing in combination with artificial intelligence methods has been proposed to address different issues of natural history digitization projects, such as: identification of nomenclatural type specimens [33], morphological analysis [11, 37], specimen identification [6, 13], identification of the elements present in a specimen image [35, 36], automated information extraction [16], phenological research [38] and phenotype studies [19, 30]. Some methods are specifically designed to take advantage of the large quantities of image data made available recently, while others focus on improving the quantity and quality of the data included in those datasets. The NHM semantic segmentation network (NHM-SSN) falls within the latter category as it is part of the continuing effort to automate and improve the museum’s digitization workflows [10]. The NHM-SSN was developed in the context of the efforts for digitizing microscope slide collections [1, 2, 18] as a resource which could speed up and automate some portions of the image processing and curation steps. One of the envisaged advantages of combining artificial intelligence and image processing methods is the creation of services that can be seamlessly integrated within the image processing workflows of large digitization projects. For this to be possible, however, it is necessary to ensure that the methods are flexible and adaptable for use in the imaging workflows of different projects, targeting different collections and implemented by different institutions. This portability goal is one of the unexplored areas of the application of semantic segmentation.

3.1 The NHM semantic segmentation network

The NHM-SSN is a semi-supervised semantic and instance segmentation network developed originally for the segmentation of images of microscope slides. First, a semantic segmentation step breaks an image into smaller segments grouping pixels into different classes predefined to represent the element types of interest. In a second step, the separate instances of each element class are identified. This process is illustrated by the example in Fig. 3. Figure 3a shows the original microscope slide image. The elements present are classified as i) specimen (in the center), ii) labels (either side of the specimen), iii) type label (small label circled in red in the left side), iv) barcode label (small white label to upper left of specimen), while the rest of the image is classified as 'background.' The labeling of these different elements as colored classes in the first step is shown in Fig. 3b with the definition of classes linked to predefined colors (yellow, red, light blue, dark blue, black). The result of the second step identifying instances of each class is shown in Fig. 3c, with each instance being assigned a different color.

The training of the semantic segmentation network to support this process requires creating a large set of ground truth images. However, sets of ground truth images are expensive and hard to obtain, because ground truths for selected specimen images need to be manually generated and this requires training personnel and assigning resources for them to work in creating the ground truths. In this scenario, it is desirable to use methods that can perform well with small ground truth datasets for training. Semi-supervised learning covers several techniques that employ large datasets of images without ground truths (unlabeled data) to enhance the capability of models otherwise learned on small ground truth sets.

3.2 Network structure

The NHM-SSN developed by the NHM [10] is available online.^{Footnote 3} The github repository wiki explains the architecture of the NHM-SSN (shown in Fig. 4). The SSN consists of two branches: primary segmentation branch (right, in blue) and reconstruction branch for regularizing the network and enabling semi-supervised learning (left, in gray). Both branches contain an identical sub-network with shared weights (in red). The difference between them being that noise is injected before each layer in the reconstruction side. The up-sampling part of this branch then tries to reconstruct these inputs to be the same as they were before the noise was added (derived from the architecture of ladder networks [28]). The shared sub-network is the embedding network (red), and the other subnetwork (in green) is for reconstruction denoising.

3.3 Related work

Recently others have taken a similar deep learning approach to the identification of the elements present in specimen images [35, 36], using methods such as You Only Look Once (YOLO), Region-Based Convolutional Neural Networks (R-CNN), and Single-Shot Detector (SSD) [35], comparing the accuracy per element detected and the processing time per image, subsequently refining the results of the most successful segmentation method (YOLO [36]). While these methods show successful segmentation results, they have not carried out thorough cross-validation experiments as described in this paper.

Apart from the specific efforts for digitization of natural history specimens, the work of identifying elements on 2D images is closely related to document layout segmentation efforts which support the analysis of digitized printed documents [5, 17, 20, 21, 26, 31]. In these areas page segmentation aims at identifying distinct text regions, images, tables and other non-text objects. In this research area, the identification and differentiation of text orientation and grouping has been used as a step for supporting OCR and data extraction [5], document classification [21], differentiation of text and images [17], text block order [31] and text unwarping [20].

The main differences between the segmentation of specimens and documents are: (1) the amount of text, (2) the type of non-text entities, (3) the purpose of extraction. The amount of text on specimens varies greatly from some lines in microscope slides to large paragraphs or even booklets attached to herbarium sheets, however, but they are nowhere near the amount of text on books or printed articles. Non-text entities on documents are typically images and tables, the types of objects on a digital specimens (scanned herbarium sheets and microscope slides) can be simpler but harder to identify because of differences in position, sizes, and color, in some cases the placement of elements is maintained for a digitization campaign but can change for the next. For instance, the placement of elements added to herbarium sheets can vary, and sometimes elements are placed on top of the sheet, on the borders of the sheet or a mix of places. Purpose of extraction is also different. In this area, information extraction, in this case OCR follows segmentation, is closer to the targets of layout segmentation. Quality control is another case; this can be part of the digitization workflow to ensure that required elements are present in images, to ensure the lighting and visibility of elements is correct. Quality control using segmentation can also be used when receiving batches of digitized specimens from contractors, to ensure that the images are consistent with the specification for the type of specimens being digitized. Rapid cataloguing and classification are a third purpose which mainly focuses on the identification of barcodes, which can vary in placement, type, coloring and size.

4 Experiment design

The objective of this paper is to validate if the NHM-SSN is suitable for deployment as a generic component that could be integrated into automated image processing workflows. This demands a low training cost and portability across datasets. This means that some retraining is required when switching the type of images being segmented; however, the model should be robust and ready with relatively small training sets, requiring hundreds rather than thousands of examples. The portability requirement is in turn covered by the fact that the images to be segmented have sufficient visual similarity required for the NHM-SSN to generalize in new data. In practice this should mean that the model provides consistent classification results that are not affected by the origin of the training sets, making it project and institution independent.

4.1 Training and cross-validation process

A cross-validation process was designed to test whether these two requirements are fulfilled by NHM-SSN. The cross-validation process encompasses training, testing and validating the models with different datasets. The requires training/testing of the NHM-SSN is performed with images from a single institution (Fig. 5). The training/testing dataset consists of a training dataset (80% of the images with ground truths), an unlabeled dataset (100% of specimen images w/o ground truths) and a testing dataset (10% of the images with ground truths). The training process will produce a set of learned models which then are applied to the testing dataset to determine the accuracy of the model. This accuracy is then used to determine which learned model to use in the cross-validation step. This process is repeated independently for the images from each institution producing a learned model associated to each institution.

After training and selecting learning models, the evaluation datasets (remaining 10% of the images with ground truths) are used in the cross-validation of the models, to determine the actual accuracy of the learned models when applied to sets from other institutions (Fig. 6).

The experiments were performed in two batches, one for microscope slides and another for herbarium sheets. Each cross-validation batch consists of three image datasets from three different institutions. Each dataset is split accordingly and used in training following the process described above. The results of the cross-validation are then tabulated and plotted for comparison. The results from these experiments should indicate whether NHM-SSN fulfills the portability requirements.

Further analysis of the results provides insights into the strengths and weaknesses of the segmentation method and the effectiveness of the cross-validation process. The validation and evaluation of the semantic segmentation process classifies each pixel of the image. The results of the pixel classification are compared to the ground truth images to generate the accuracy scores reported above. These data can be further analyzed by looking at the actual results and trying to measure further performance indicators such as specificity, recall and precision. The process for analyzing the performance of the models involves measuring how well the predicted instance-class pairs match the instance-class pairs of the ground truth. This type of mapping will allow the identification of true positives, true negatives, false positives and false negatives. These values can then be used to create a confusion matrix and use the matrix = values to calculate additional evaluation measures (Table 1). Accuracy and error are complementary measures (ACC = 1 – ERR and ERR = 1-ACC).

Table 1 Confusion matrix and basic evaluation measures [34]

Cross-validation of a semantic segmentation network for natural history collection specimens

Abstract

Similar content being viewed by others

Evaluating Segmentation Approaches on Digitized Herbarium Specimens

Semantic Segmentation of Herbarium Specimens Using Deep Learning Techniques

Deep learning based approach for digitized herbarium specimen segmentation

1 Introduction

2 Digitization of microscope slides and herbarium sheets

2.1 Microscope slides

2.2 Herbarium sheets

2.3 Image processing and segmentation

3 Semantic segmentation of natural history specimen images

3.1 The NHM semantic segmentation network

3.2 Network structure

3.3 Related work

4 Experiment design

4.1 Training and cross-validation process

4.2 Data collection

4.3 Data preparation

5 Training and cross-validation experiments

5.1 Microscope slides training

5.2 Microscope slides cross-validation

5.3 Herbarium Sheets Training

5.4 Herbarium sheets cross-validation

6 Analysis of results

6.1 Microscope slides segmentation issues

6.2 Herbarium sheet segmentation issues

6.3 Discussion

6.4 Possible issues in model selection

6.5 Alternative segmentation methods

7 Further work and conclusions

7.1 Further work

7.2 Conclusions

Data availability

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation