Elsevier

Pattern Recognition

Volume 44, Issue 9, September 2011, Pages 1892-1902
Pattern Recognition

Learning effective color features for content based image retrieval in dermatology

https://doi.org/10.1016/j.patcog.2010.10.024Get rights and content

Abstract

We investigate the extraction of effective color features for a content-based image retrieval (CBIR) application in dermatology. Effectiveness is measured by the rate of correct retrieval of images from four color classes of skin lesions. We employ and compare two different methods to learn favorable feature representations for this special application: limited rank matrix learning vector quantization (LiRaM LVQ) and a Large Margin Nearest Neighbor (LMNN) approach. Both methods use labeled training data and provide a discriminant linear transformation of the original features, potentially to a lower dimensional space. The extracted color features are used to retrieve images from a database by a k-nearest neighbor search. We perform a comparison of retrieval rates achieved with extracted and original features for eight different standard color spaces. We achieved significant improvements in every examined color space. The increase of the mean correct retrieval rate lies between 10% and 27% in the range of k=1–25 retrieved images, and the correct retrieval rate lies between 84% and 64%. We present explicit combinations of RGB and CIE-Lab color features corresponding to healthy and lesion skin. LiRaM LVQ and the computationally more expensive LMNN give comparable results for large values of the method parameter κ of LMNN (κ25) while LiRaM LVQ outperforms LMNN for smaller values of κ. We conclude that feature extraction by LiRaM LVQ leads to considerable improvement in color-based retrieval of dermatologic images.

Introduction

In the last decades the availability of digital images produced by scientific, educational, medical, industrial and other applications has increased dramatically. Thus, the management of the expanding visual information has become a challenging task. Since the 1990s Content Based Image Retrieval (CBIR) is a rapidly advancing research area, which uses visual content to search images from large databases according to the user's interest [36], [23], [21], [12], [22], [26], [41], [17]. A typical CBIR system extracts visual information from an image and converts it internally to a multidimensional feature vector representation. For retrieval, the dissimilarities (distances) between the feature vector of a query image and the feature vectors of the images in the database are computed. Then, the database images most similar to the query are presented to the user. CBIR may especially be interesting in the field of computer aided diagnostics when it is partly based on images. An intelligent pre-selection of images with a trained system might help a medical doctor to efficiently search for patients, who had problems similar to the actual case.

The visual content of an image can be described by color, texture, shape or spatial relationship. A good visual content descriptor should be insensitive to the specific imaging process, e.g. invariant under changes of illumination. The prevalent visual content for image retrieval is color. Frequently used color descriptors are color moments, histograms, coherence vectors and correlograms [33], [24]. Before a color descriptor can be selected, the underlying color space has to be specified.

There are many different color spaces available, which may be beneficial in different application domains. The color representations most commonly used in electronic systems are RGB and CIE-XYZ. CIE-XYZ and the related CIE-Lab and CIE-Luv are designed to match human perception. In [40] the authors argue, that normalized TSL (Tint, Saturation, Lightness) is superior to other color spaces for skin modeling with a unimodal Gaussian joint probability density function. The color space YCrCb is adjusted for efficient image compression, but the transformation simplicity and explicit separation of luminance and chrominance components appear attractive for skin color modeling [25], [46], [9]. Surveys on color spaces and their use can be found in [40], [43]. We are not aware of a general rule for the choice of the color space and the representation might follow the users preference. So we decided to investigate eight different color spaces, which are commonly used and may be useful for the task at hand.

Color is an important attribute for primary skin efflorescences [3]. Color features have proven beneficial in many applications and medical sciences, especially for the recognition of skin lesions [14], [40], [43], [38], [34], [19], [37], [25], [46], [18] or the classification of skin cancer [28], [44], [1], [16], [10], [42]. A dermatologist might be interested in pictures of similar skin lesions in comparison to an actual case to verify the diagnosis or confer with similar symptoms. This can be interpreted as a problem of CBIR. The authors of [4] study the use of color features and the effectiveness of different color spaces in this context. They conclude that the representation of an image by the difference in the average color of healthy and lesion skin gives better results than the explicit use of the pair of colors. Fig. 1 shows two example retrievals for a CBIR system in the field of skin lesion comparison in dermatology. In [4], the best results were achieved with the CIE-Lab color representation.

Since the difference of two color values is a special case of a linear transformation, the question arises whether better results can be achieved by more general linear transformations. Of course, it is possible that the use of a combination of a cyclic distance measure in the case of color spaces containing a “hue”-descriptor might lead to superior results. We will address this interesting questions in further studies. One well known technique to achieve a linear projection of feature vectors to a subspace which minimizes the overlap between different classes is Linear Discriminant Analysis (LDA) [13]. In this paper we employed and compared two different recent techniques, which are able to find discriminant feature transformations based on a supervised training procedure. The Large Margin Nearest Neighbor (LMNN) [45] approach has the advantage that it is based on a convex cost function, so it returns the global optimum for the current configuration of training data and parameters, based on the kNN approach. The Limited Rank Matrix Learning Vector Quantization (LiRaM LVQ) [29], [30], [8], [32] on the other hand follows a stochastic gradient descent procedure and may get stuck in local minima. On the other hand, it has the advantage of low computational costs. It is a prototype-based method, in which the decision boundary is defined by the Voronoi cells of prototypes following the large margin principle [11]. Both algorithms are available in general form and turned out to be effective classifiers in many applications. In our real world example application of CBIR in dermatology, the LiRaM LVQ approach turned out to be quite robust concerning the initialization and parameter setting. With comparably low computational costs it leads to similar or better results than the LMNN approach with optimal parameter setting on most color spaces discovered. We improve the correct retrieval rate in CBIR of dermatological images significantly by applying adaptive linear transformations.

The main aim of this work is to demonstrate in terms of a real world example, that an adaptive, i.e. data driven transformation of original color features can improve the retrieval performance of a CBIR system significantly. We concentrate on the performance enhancement achieved by using the most basic, easy and fast acquirable set of important features for the problem at hand, i.e. color information only.

In Section 2 we explain the real world data set, the feature extraction process, we present and discuss the methods we use to determine optimal transformations of color features and their use in the CBIR system. In Section 3 we present results and conclude in Section 4.

Section snippets

Methodology

An illustration of the Methodology is shown in Fig. 2.

Retrieval rate

In this Section we summarize the retrieval results for the different color representations using transformed features from LMNN, global and localized LiRaM LVQ. We compare them with those obtained in the original feature spaces and with the difference features from [4] obtained with the transformation A:A=100100010010001001.The overall mean rates r obtained with LiRaM LVQ and ΩR3×6 are displayed in Fig. 5 for each color space as a function of the number k, i.e. the number of pictures the

Summary and conclusion

In this paper show the usefulness of adaptive distances and corresponding feature space transformations on an example real world application. We observe that CBIR on color is a powerful tool for analysis of dermatological image databases. Previously unnoticed color similarities may give new insight into the correlations between and within various skin diseases. We introduce discriminative color descriptors which are obtained by LiRaM LVQ and LMNN during supervised training, and we compare and

Acknowledgements

The authors thank Piet Toonder of the UMCG for making available the image data set used in this study. This work was supported by the “Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO)” under project code 612.066.620.

Kerstin Bunte graduated at the Faculty of Technology at the University of Bielefeld, Germany, and joined the Institute of Mathematics and Computing Science of the University of Groningen, The Netherlands, in September 2007. Her recent work has focused on Machine Learning techniques, especially Learning Vector Quantisation and their usability in the field of image processing, dimension reduction and visualization. Further information can be obtained from http://www.cs.rug.nl/∼kbunte/

References (46)

  • A. Blum, H. Luedtke, U. Ellwanger, R. Schwabe, G. Rassner, C. Garbe, Digital image analysis for diagnosis of cutaneous...
  • T. Bojer, B. Hammer, D. Schunk, K.T. von Toschanowitz, Relevance determination in learning vector quantization, in: M....
  • J.L. Bolognia et al.

    Dermatology

    (2007)
  • H.H.W.J. Bosman et al.

    Comparison of color representations for content based image retrieval in dermatology

    Skin Research and Technology.

    (2010)
  • M. Brand, Charting a manifold, Technical Report 15, Mitsubishi Electric Research Laboratories (MERL), 2003, URL:...
  • K. Bunte et al.

    Nonlinear discriminative data visualization

  • K. Bunte, P. Schneider, B. Hammer, F.-M. Schleif, T. Villmann, M. Biehl, Discriminative visualization by limited rank...
  • D. Chai, A. Bouzerdoum, A Bayesian approach to skin color classification in YCbCr color space, in: Proceedings IEEE...
  • Y. Cheng et al.

    Skin lesion classification using relative color features

    Skin Research and Technology

    (2008)
  • K. Crammer et al.

    Margin analysis of the LVQ algorithm

  • R. Datta, J. Li, J.Z. Wang, Content-based image retrieval: approaches and trends of the new age, in: Proceedings of...
  • R.O. Duda et al.

    Pattern Classification

    (November 2004)
  • C.D. Felice et al.

    Predictive value of skin color for illness severity in the high-risk newborn

    Pediatric Research

    (2002)
  • Cited by (63)

    • An effective hashing method using W-Shaped contrastive loss for imbalanced datasets

      2022, Expert Systems with Applications
      Citation Excerpt :

      Dorileo, Frade, Roselino, Rangayyan, & Azevedo-Marques, 2008) applied the retrieved information based on the red, blue, green, hue, and saturation color components of the dermatologic images. ( Bunte, Biehl, Jonkman, & Petkov, 2011) is another study that performed a CBIR study using the colors of skin lesion images. They contributed color features using the limited rank matrix learning vector quantization (LiRaM LVQ) and a Large Margin Nearest Neighbor (LMNN) approach. (

    • Multi-Label classification of multi-modality skin lesion via hyper-connected convolutional neural network

      2020, Pattern Recognition
      Citation Excerpt :

      This is attributed to the fact that skin lesions usually vary in size, shape, color, boundaries and the amount of hair [9,10]. Thus research has been directed at developing computer aided diagnosis (CAD) systems to assist in clinical diagnosis [11,12]. Traditional dermoscopy imaging based classification methods [13,14,15] have two stages: (i) feature extraction with handcrafted techniques for encoding the image characteristics e.g., extracting scale-invariant feature transform (SIFT) features and encoding the extracted features with bag-of-words (BoW), and (ii) assigning an image label using supervised approaches e.g., support vector machine (SVM).

    • An expert system for selecting wart treatment method

      2017, Computers in Biology and Medicine
      Citation Excerpt :

      Melanoma is a type of skin cancer developing from melanocytes which is a type of pigment-containing cells [41]. A number of studies have been performed on other skin diseases, using machine-learning algorithms [42–45]. However, as far as we know, there has been no machine-learning research conducted in the field of wart treatment thus far.

    View all citing articles on Scopus

    Kerstin Bunte graduated at the Faculty of Technology at the University of Bielefeld, Germany, and joined the Institute of Mathematics and Computing Science of the University of Groningen, The Netherlands, in September 2007. Her recent work has focused on Machine Learning techniques, especially Learning Vector Quantisation and their usability in the field of image processing, dimension reduction and visualization. Further information can be obtained from http://www.cs.rug.nl/∼kbunte/

    Michael Biehl received a Ph.D. in Theoretical Physics from the University of Giessen, Germany, in 1992 and the venia legendi in Theoretical Physics from the University of Würzburg, Germany, in 1996. He is currently Associate Professor with Tenure in Computing Science at the University of Groningen, The Netherlands. His main research interest is in the theory, modelling and application of Machine Learning techniques. He is furthermore active in the modelling and simulation of complex physical systems. He has co-authored more than 100 publications in international journals and conferences; preprint versions and further information can be obtained from http://www.cs.rug.nl/∼biehl/

    Marcel F. Jonkman received the M.D. and Ph.D. degree in Medicine from the University of Groningen, Groningen, the Netherlands. He is professor of Dermatology and chair of the department of Dermatology of the University Medical Center Groningen, the Netherlands. Previously he held a fellow position at Jefferson Medical University, Philadelphia (as Royal Dutch Academy of Sciences scholar).

    He is the author of one book, and has authored over 130 scientific papers. His current research includes bullous diseases, genetic skin diseases and creating education programs for dermatologists in training. Dr. Jonkman is a member of the editorial boards of the European Journal of Dermatology, and the Journal of Dermatological Science.

    Nicolai Petkov received the Dr.sc.techn. degree in Computer Engineering (Informationstechnik) from Dresden University of Technology, Dresden, Germany. He is professor of computer science and head of the Intelligent Systems group of the Institute of Mathematics and Computing Science of the University of Groningen, the Netherlands. He is the author of two monographs and coauthor of another book on parallel computing, holds four patents and has authored over 100 scientific papers. His current research is in image processing, computer vision and pattern recognition, and includes computer simulations of the visual system of the brain, computer applications in health care and life sciences and creating computer programs for artistic expression. Dr. Petkov is a member of the editorial boards of several journals.

    View full text