Elsevier

Marine Geology

Volume 264, Issues 3–4, 15 August 2009, Pages 230-241
Marine Geology

Fuzzy clustering for seafloor classification

https://doi.org/10.1016/j.margeo.2009.06.006Get rights and content

Abstract

In order to develop quantitative seafloor sediment classification techniques it is important to acknowledge that by nature the boundaries between soft sediments are characterized by transition zones and therefore are indeterminate and gradual. A fuzzy clustering method, fuzzy c-means (FCM), was used to identify these transition zones within a subset of the data used to generate the Australian Seascapes classification model. The overlapping classes and gradual boundaries resulting from the fuzzy c-means algorithm provided estimates of sediment boundaries that are a closer model of reality than sharp boundaries. FCM output is given in the form of membership layers for each class, hard classes for each grid cell based on the maximum membership value, and a confusion index layer quantifying uncertainty in class attribution. The confusion index layer provided a spatial representation of transition zones and overlap between seafloor classes and highlighted areas of greatest uncertainty. We extended the standard FCM algorithm by applying the new FMLE fuzzy clustering algorithm that takes into account spatial relationships in the data. In addition, we implemented and applied new cluster validity techniques, PCAES, PBMF, and XB to determine the optimal number of clusters in the data, which is a novel pattern recognition application for seabed mapping. The 5-class FCM classification provided the most reliable result. The results of this research were tested and validated on a simulated dataset and then the clustering and validation algorithms were applied to marine sediment data to identify Seascapes. The new results were compared with previously published Seascapes classes identified with hard ISODATA clustering techniques from GeoScience Australia's Seascapes classification result. With the increasing use of physical surrogates to explain marine biodiversity, this research plays a crucial role in the development of techniques to identify habitat zones on the seabed.

Introduction

Seafloor classification and habitat mapping have become the topic of many recent interdisciplinary studies (e.g. Greene et al., 1995, Sotheran et al., 1997, Pinn et al., 1998, Todd et al., 1999, Kostylev et al., 2001, Goff et al., 2004). Much of this research has focused on new morphological detail and sediment textural attributes to identify seafloor benthos and bio-physical interactions (e.g. Kostylev et al., 2003). Such an approach requires the application of reliable and accurate seafloor descriptors in combination with a robust means to statistically assess descriptor associations. Historically, seafloor mapping was primarily concerned with the identification, spatial extent and geometric relationship of geological units (Orpin and Kostylev, 2006). These geological units were based on, for example, the relationship between sediment grain size and biota, which has been well advanced in the literature (Hall, 1994, Snelgrove and Butman, 1994, Snelgrove, 1998, Snelgrove, 1999). However, mean grain size alone does not appear to be a determinant of species distribution or community composition (Post et al., 2006). These studies have emphasised the need for a robust technique to classify soft sediment substrates, if relationships between the substrate and benthic habitats are to be better understood and quantitatively assessed.

Benthic sediment mapping has primarily involved two separate analyses: (1) interpolation of sediment samples into a continuous representation of the seafloor, and (2) classification of multivariate field samples into seafloor categories. Variability within seafloor classes is often studied on large spatial scales governed by strong environmental gradients, e.g. tidal level (Underwood et al., 1996), wave exposure (Menge, 1978) and depth. Cost and other constraints may limit the number of samples that can be taken or analysed and consequently limit the degree to which the results can be accurately interpolated and or extrapolated to other areas. Accuracy assessment is therefore an integral component of a useful and robust mapping program. It provides a feedback loop to the mapping methodology or sediment sampling regime and it allows researchers to evaluate the quality or usefulness of benthic habitat maps with respect to their unique applications. Unfortunately, marine geological and geophysical data are difficult and costly to collect over large areas, precluding thorough quantitative assessments of the thematic map accuracy. One of the main disadvantages of a thematic accuracy assessment is the non-spatial nature of the accuracy statistics, which do not show spatially where uncertainties lie in the result.

GeoScience Australia's (GA) Australian National Marine Bioregionalisation project on seafloor classification is aimed at boundary definition and delineation of benthic sediment classes (Heap et al., 2005). While surrogacy studies provide important clues to how the biota are related to physical properties and which physical properties are most relevant, those studies are at a spatial scale that is too coarse or fine to help managers make informed decisions about the conservation and sustainable use of Australia's entire marine region. Individually, physical datasets are not always informative about predicting seabed habitat, but when combined with other physical datasets to produce Seascapes they can effectively represent the spatial distribution of marine biodiversity. A Seascape corresponds to an area of similar physical properties and, by association, habitats and communities (Whiteway et al., 2007).

Geoscience Australia's Seascapes classification process is based on a statistical clustering procedure to objectively identify seafloor classes. Clustering is an unsupervised classification method where objects or samples are grouped into classes (clusters) based upon their similarity in attribute values in a multidimensional feature space (visualised as a scatter plot), i.e. samples that have similar attribute values are close together in feature space and therefore form a (distinct) cluster (Pakhira et al., 2005). It is an approach to unsupervised learning and also one of the major techniques of pattern recognition (Yang and Ko, 1996, Wu and Yang, 2002, Wu and Yang, 2005, Tsekouras et al., 2005).

Traditionally, landscape or seafloor classification is based on methods that rely on delineating classes according to a scientist's personal insight or expert opinion (Bie and Beckett, 1973, McBratney et al., 1992). The advantage of using a statistical clustering technique for classification instead of visual interpretation is that the method is objective, repeatable, and applicable to large data sets. On the other hand, boundaries generated between clusters could be based on minor attribute differences, or, in the worst case, reflect noise in the data (e.g. caused by sampling bias, analysis techniques, or measurement error). The most common clustering algorithms group samples into distinct classes with hard or discrete boundaries. Frequently, the attributes of benthic sediment classes vary gradually over space and representing these gradual variations by discrete boundaries may result in a loss of useful information and a possible increase in error due to the arbitrary placing of inappropriate boundaries (Burrough et al., 2000).

Hard clustering methods, such as k-means and ISODATA, assign only one cluster label to each sample in the data set, i.e. the most similar cluster with the closest centre (Wu and Yang, 2005). Hard clustering with the ISODATA algorithm was applied by GeoScience Australia in the Australian National Marine Bioregionalisation project (Heap et al., 2005). Hard clustering only allows the clusters to be disjoint and non-overlapping therefore any sample may belong to one and only one class. Since Zadeh (1965) proposed fuzzy set theory as a formal mathematical notion of dealing with vagueness and class overlap, fuzzy clustering (fuzzy c-means) has been widely studied and applied in a variety of areas such as ecology and soil science (Bezdek et al., 1984, Yang, 1993, Baraldi and Blonda, 1999, Burrough et al., 2000). Fuzziness is a type of uncertainty or vagueness characterizing classes that for various reasons cannot have or do not have sharply defined boundaries (Burrough and Frank, 1996). Fuzziness is expressed by membership values ranging from 0.0 to 1.0, conveying the degree to which a sample belongs to a class. In this model, a sample can belong to more than one class, which is a very powerful concept for depicting class overlap caused by vagueness in class definition. In addition, it highlights the existence of transition zones between classes as a very common characteristic of natural classes, especially between soft sediment classes on the seabed. Fuzzy clustering provides a means of spatially depicting classification uncertainty related to transition zones and class overlap, which is not possible with the non-spatial statistics derived from an accuracy assessment.

The objective of this study is to apply fuzzy clustering algorithms to identify seafloor classes from seafloor sediment samples. In addition, we implement a range of recently published cluster validation techniques to determine the optimal number of class clusters from the unsupervised fuzzy clustering algorithms. We test and validate the fuzzy clustering and cluster validity algorithms on a simulated data set. Finally, we present the results of a fuzzy Seascape classification based on the data sets in the Australian National Marine Bioregionalisation project.

Section snippets

Study area and data sets

Seascapes represent a combination of different physical data that have an identifiable and consistent relationship with marine biota (Heap et al., 2005). Seascapes describe a layer of ecologically meaningful physical properties to spatially represent seabed habitats. Each area of a Seascape corresponds to an area of similar physical properties and, by association, habitats and communities. Geoscience Australia has used physical properties that have consistent relationships with the biota and

Results

This section presents the results of a fuzzy c-means classification applied to GeoScience Australia's Seascapes data. This research is one of the first studies to apply a fuzzy clustering approach, and more specifically the FMLE spatial clustering algorithm, to a spatial marine dataset. In addition, the cluster validity measures described in Section 2.6 were published in pattern recognition studies and applied only to simulated datasets. This is the first study to apply these new and advanced

Discussion

This study has demonstrated that fuzzy c-means classification applied to Seascapes data provides valuable insight into uncertainty related to class attribution and transition zones between seabed classes. Traditional mapping techniques that derive hard classes are not sufficient to map classes with transition zones and marine sediments provide an excellent example as a case study. Advanced methods of classification or interpolation cannot enforce discrete partitioning of attribute or

Conclusions

This study has applied two fuzzy classification algorithms, the FCM and the FMLE, to identify Seascape classes from marine sediment samples around Tasmania, Australia. Class overlap and vagueness in class definition is a common problem in the environmental sciences and we have highlighted the power of using the FCM and FMLE classification algorithms in a spatial marine context. The fuzzy c-means approach has significant advantages over crisp/hard classifiers in that it allows the quantification

Acknowledgements

The sediment data for this analysis was provided by GeoScience Australia from the MaRS Database [http://www.ga.gov.au/oracle/mars/]. The authors gratefully acknowledge the support of Dr. Peter Harris and Dr. Andrew Heap from GeoScience Australia. This research was supported by Tasmanian Aquaculture and Fisheries Institute at the University of Tasmania. The authors would like to thank the two anonymous reviewers for their constructive feedback.

References (53)

  • SotheranI.S. et al.

    Mapping of marine benthic habitats using image processing techniques within a raster-based geographic information system

    Estuar. Coast. Shelf Sci.

    (1997)
  • ToddB.J. et al.

    Quaternary geology and surficial sediment processes; Browns Bank, Scotian Shelf; based on multibeam bathymetry

    Mar. Geol.

    (1999)
  • TsekourasG. et al.

    A hierarchical fuzzy-clustering approach to fuzzy modeling

    Fuzzy Sets and Syst.

    (2005)
  • VriendS.P. et al.

    The application of fuzzy c-means cluster analysis and non-linear mapping to geochemical datasets: examples from Portugal

    Appl. Geochem.

    (1988)
  • WangW. et al.

    On fuzzy cluster validity indices

    Fuzzy Sets and Syst.

    (2007)
  • WuK.L. et al.

    Alternative c-means clustering algorithms

    Pattern Recogn.

    (2002)
  • WuK.L. et al.

    A cluster validity index for fuzzy clustering

    Pattern Recogn. Lett.

    (2005)
  • YangM.S. et al.

    On a class of fuzzy c-numbers clustering procedures for fuzzy data

    Fuzzy Sets Syst.

    (1996)
  • ZadehL.A.

    Fuzzy sets

    Inf. Control

    (1965)
  • BaraldiA. et al.

    A survey of fuzzy clustering algorithms for pattern recognition — part 1

    IEEE Trans.

    (1999)
  • BezdekJ.C.

    Cluster validity with fuzzy sets

    J. Cybern.

    (1974)
  • BezdekJ.C.

    Computing with uncertainty

    IEEE Commun. Mag.

    (1992)
  • BurroughP.A. et al.

    Geographic Objects with Indeterminate Boundaries, Number 2 in GISDATA

    (1996)
  • BurroughP.A. et al.

    Principles of geographical information systems

  • BurroughP.A. et al.

    Improving a reconnaissance soil classification by multivariate methods

    J. Soil Sci.

    (1976)
  • Canty, M.J., 2007. Image Analysis, Classification and Change Detection in Remote Sensing with Algorithms for ENVI/IDL....
  • Cited by (61)

    • Biogeography, benthic ecology and habitat classification schemes

      2019, Seafloor Geomorphology as Benthic Habitat: GeoHab Atlas of Seafloor Geomorphic Features and Benthic Habitats
    View all citing articles on Scopus
    View full text