Hybrid image classification and parameter selection using a shared memory parallel algorithm

https://doi.org/10.1016/j.cageo.2006.10.014Get rights and content

Abstract

This work presents a shared memory parallel version of the hybrid classification algorithm IGSCR (iterative guided spectral class rejection) to facilitate the transition from serial to parallel processing. This transition is motivated by a demonstrated need for more computing power driven by the increasing size of remote sensing data sets due to higher resolution sensors, larger study regions, and the like. Parallel IGSCR was developed to produce fast and portable code using Fortran 95, OpenMP, and the Hierarchical Data Format version 5 (HDF5) and accompanying data access library. The intention of this work is to provide an efficient implementation of the established IGSCR classification algorithm. The applicability of the faster parallel IGSCR algorithm is demonstrated by classifying Landsat data covering most of Virginia, USA into forest and non-forest classes with approximately 90% accuracy. Parallel results are given using the SGI Altix 3300 shared memory computer and the SGI Altix 3700 with as many as 64 processors reaching speedups of almost 77. Parallel IGSCR allows an analyst to perform and assess multiple classifications to refine parameters. As an example, parallel IGSCR was used for a factorial analysis consisting of 42 classifications of a 1.2 GB image to select the number of initial classes (70) and class purity (70%) used for the remaining two images.

Introduction

As remote sensing data sets continue to increase in size, there is a demonstrated need for faster computing resources to decrease processing time. Furthermore, when dealing with certain classification algorithms, more accurate results may be obtained by using slightly different input parameters. A significantly faster (parallel) implementation of these classification algorithms would allow an analyst to make several runs using different parameters in the equivalent time required to make one serial run, potentially producing more accurate classification results. Although there are increasingly more parallel computers available to the research community, porting existing serial applications to a parallel environment is usually non-trivial. This paper discusses specific changes that are made to the IGSCR (iterative guided spectral class rejection) classification algorithm to produce a shared memory parallel algorithm with accompanying pseudocode for the classification algorithms (Wayman et al., 2001, Musy et al., 2006). A further goal of this implementation is to create source code that is both portable and open source. The final parallel IGSCR code runs on multiple hardware platforms and operating systems, and it does not have the same “black box” that is associated with commercial software libraries. A final goal of this work is to demonstrate the utility of the parallel IGSCR implementation by accurately and efficiently classifying Landsat data covering the state of Virginia into forest and non-forest land use informational classes.

The following is a description of the outline of the rest of the paper. The second section contains a comprehensive review of the work that has lead up to this work as background information. The third section includes detailed descriptions of the serial IGSCR algorithm and the serial K-means and maximum likelihood algorithms that comprise the IGSCR algorithm. The fourth section includes a description of the Hierarchical Data Format version 5 (HDF5) and Application Programming Interface (API) and how these are used in this implementation. The fifth section describes the modifications that are made to serial IGSCR to produce a parallel IGSCR algorithm, and the sixth section demonstrates the parallel results and accompanying analysis. The seventh section wraps up the paper with a discussion of conclusions reached.

Section snippets

IGSCR

Unsupervised classification is a process by which all pixels or objects with similar spectral values (spectral classes) are identified (clustering) and then subsequently labeled with respect to informational classes (labeling). Supervised classification, in contrast, requires analyst identification of the spectral classes within each informational class beforehand (training). Remaining pixels or objects are then assigned to a spectral class using a decision rule (classification). As with

Algorithm description

K-means clusters N data points in Eb (real b-dimensional Euclidean space, all b-tuples of real numbers) into K different clusters such that the sum of the distances between each cluster mean and the data points belonging to the cluster cannot be reduced by reassigning any of the data points to any of the existing clusters (Hartigan, 1975). The algorithm begins with K initialized cluster mean points m(1),,m(K)Eb, and each data point x(i)Eb is assigned to the nearest cluster mean according to

Description of HDF5

One goal of this implementation is to achieve portability by removing dependencies on specific platforms and software, which would limit the use of the proprietary data format that was used in the previous implementation. The HDF5 data format and API library was chosen because it is flexible, robust, and already widely used in the scientific community (HDF5, 2005). HDF5 can be used on a variety of operating systems with many C and Fortran compilers (HDF5, 2005).

HDF5 is a standard that defines a

Parallel IGSCR

The strategy for parallelizing IGSCR is to locate operations in the original algorithm that may be run in parallel and to concentrate on those operations that will result in the greatest speedup. These operations that may be run concurrently will fall into one of two categories of parallelism, functional or data parallelism. Data parallelism, or single instruction multiple data (SIMD) parallelism, is exhibited when multiple processes perform identical operations on different members of one data

Data description

Parallel IGSCR was tested using three mosaicked Landsat Enhanced Thematic Mapper Plus (ETM+) satellite images taken from Landsat Worldwide Reference System (WRS) paths 15, 16, and 17, covering the majority of the state of Virginia, USA. These images, which will hereafter be referred to as VA15, VA16, and VA17, respectively, were obtained on April 28, 2004; May 8, 2005; and November 2, 2003, respectively. They are roughly similar in size, with VA15, VA16, VA17 being 1.1, 1.2, 0.9 GB,

Conclusions

Prior to this work, IGSCR classification was performed using a “black box” proprietary software library, and classification of a full Landsat scene required several hours of processing time. With the creation of a portable and open source serial version of IGSCR, the black box was removed, allowing analysts to verify and modify the algorithms that are used. The large speedup observed was the result of adding both memory and processors, and therefore does not necessarily indicate good parallel

Acknowledgments

This research was supported by NASA (NAG5-10548), Department of Energy (DE-FGO2-O6ER2572O), and the Department of Computer Science, Virginia Polytechnic Institute and State University. The computation results presented here were in part obtained using Virginia Tech's 128 processor SGI Altix 3700. The authors gratefully acknowledge remote sensing technical assistance by Dr. Christine E. Blinn, Department of Forestry, Virginia Polytechnic Institute and State University and provision of testing

References (26)

  • Amdahl, G.M., 1967. Validity of the single processor approach to achieving large scale computing capabilities In: AFIPS...
  • Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A.,...
  • Ball, G.H., Hall, D.J., 1965. A novel method of data analysis and pattern classification. Technical Report AD 699616,...
  • M.E. Bauer et al.

    Satellite inventory of Minnesota forest resources

    Photogrammetric Engineering & Remote Sensing

    (1994)
  • Bechtold, W.A., Scott, C.T., 2005. The forest inventory and analysis plot design. In: Bechtold, W.A., Patterson, P.L....
  • Bernstein, A.J., 1966. Analysis of programs for parallel processing. IEEE Transactions on Computers...
  • L. Bruzzone et al.

    Unsupervised retraining of a maximum likelihood classifier for the analysis of multitemporal remote sensing images

    IEEE Transactions on Geosciences and Remote Sensing

    (2001)
  • J. Byeungwoo et al.

    Partially supervised classification using weighted unsupervised clustering

    IEEE Transactions on Geosciences and Remote Sensing

    (1999)
  • Duda, R.O., Hart, P.E., 1973. Pattern Classification and Scene Analysis. Wiley, New York,...
  • Fortran, 1997. Information technology—programming languages—Fortran. ISO/IEC 1539-1, Geneva, Switzerland,...
  • Hartigan, J.A., 1975. Clustering Algorithms. Wiley, New York,...
  • HDF5, 2005. HDF5 User's Guide. Hierarchical Data Format (HDF) Group, National Center for Supercomputing Applications...
  • K.Y. Huang

    A synergistic automatic clustering technique (SYNERACT) for multispectral image analysis

    Photogrammetric Engineering & Remote Sensing

    (2002)
  • Cited by (25)

    • An SMP soft classification algorithm for remote sensing

      2014, Computers and Geosciences
      Citation Excerpt :

      Clustering requires many iterations over the full remote sensing image, but the simplest supervised classification algorithms only require one pass over an image (Alpaydin, 2010). This added computational cost of clustering in IGSCR is reduced using parallel processing in the parallel version of IGSCR, PIGSCR (Phillips et al., 2007). Parallel, shared memory versions of K-means clustering and supervised maximum likelihood classification are used, along with implementation strategies specific to shared memory architectures to achieve good parallel speedup in PIGSCR (Phillips et al., 2007).

    • Leveraging the power of multi-core platforms for large-scale geospatial data processing: Exemplified by generating DEM from massive LiDAR point clouds

      2010, Computers and Geosciences
      Citation Excerpt :

      In order to exploit the full power of a multi-core processor, the main solution is to thread the serial application and enable the operating system to dispatch threads to different processor cores. Research has been performed on designing parallel algorithms for multi-core processor in various fields such as image processing (Phillips et al., 2007), petroleum modeling (Bücker et al., 2008), linear algebra (Buttari et al., 2009), multimedia coding (Liao et al., 2007), 3D visualization (Wang and Jaja, 2008), etc. Most of this work is based on specific domain application background.

    View all citing articles on Scopus
    View full text