Hybrid image classification and parameter selection using a shared memory parallel algorithm
Introduction
As remote sensing data sets continue to increase in size, there is a demonstrated need for faster computing resources to decrease processing time. Furthermore, when dealing with certain classification algorithms, more accurate results may be obtained by using slightly different input parameters. A significantly faster (parallel) implementation of these classification algorithms would allow an analyst to make several runs using different parameters in the equivalent time required to make one serial run, potentially producing more accurate classification results. Although there are increasingly more parallel computers available to the research community, porting existing serial applications to a parallel environment is usually non-trivial. This paper discusses specific changes that are made to the IGSCR (iterative guided spectral class rejection) classification algorithm to produce a shared memory parallel algorithm with accompanying pseudocode for the classification algorithms (Wayman et al., 2001, Musy et al., 2006). A further goal of this implementation is to create source code that is both portable and open source. The final parallel IGSCR code runs on multiple hardware platforms and operating systems, and it does not have the same “black box” that is associated with commercial software libraries. A final goal of this work is to demonstrate the utility of the parallel IGSCR implementation by accurately and efficiently classifying Landsat data covering the state of Virginia into forest and non-forest land use informational classes.
The following is a description of the outline of the rest of the paper. The second section contains a comprehensive review of the work that has lead up to this work as background information. The third section includes detailed descriptions of the serial IGSCR algorithm and the serial K-means and maximum likelihood algorithms that comprise the IGSCR algorithm. The fourth section includes a description of the Hierarchical Data Format version 5 (HDF5) and Application Programming Interface (API) and how these are used in this implementation. The fifth section describes the modifications that are made to serial IGSCR to produce a parallel IGSCR algorithm, and the sixth section demonstrates the parallel results and accompanying analysis. The seventh section wraps up the paper with a discussion of conclusions reached.
Section snippets
IGSCR
Unsupervised classification is a process by which all pixels or objects with similar spectral values (spectral classes) are identified (clustering) and then subsequently labeled with respect to informational classes (labeling). Supervised classification, in contrast, requires analyst identification of the spectral classes within each informational class beforehand (training). Remaining pixels or objects are then assigned to a spectral class using a decision rule (classification). As with
Algorithm description
K-means clusters N data points in (real b-dimensional Euclidean space, all b-tuples of real numbers) into K different clusters such that the sum of the distances between each cluster mean and the data points belonging to the cluster cannot be reduced by reassigning any of the data points to any of the existing clusters (Hartigan, 1975). The algorithm begins with K initialized cluster mean points , and each data point is assigned to the nearest cluster mean according to
Description of HDF5
One goal of this implementation is to achieve portability by removing dependencies on specific platforms and software, which would limit the use of the proprietary data format that was used in the previous implementation. The HDF5 data format and API library was chosen because it is flexible, robust, and already widely used in the scientific community (HDF5, 2005). HDF5 can be used on a variety of operating systems with many C and Fortran compilers (HDF5, 2005).
HDF5 is a standard that defines a
Parallel IGSCR
The strategy for parallelizing IGSCR is to locate operations in the original algorithm that may be run in parallel and to concentrate on those operations that will result in the greatest speedup. These operations that may be run concurrently will fall into one of two categories of parallelism, functional or data parallelism. Data parallelism, or single instruction multiple data (SIMD) parallelism, is exhibited when multiple processes perform identical operations on different members of one data
Data description
Parallel IGSCR was tested using three mosaicked Landsat Enhanced Thematic Mapper Plus (ETM+) satellite images taken from Landsat Worldwide Reference System (WRS) paths 15, 16, and 17, covering the majority of the state of Virginia, USA. These images, which will hereafter be referred to as VA15, VA16, and VA17, respectively, were obtained on April 28, 2004; May 8, 2005; and November 2, 2003, respectively. They are roughly similar in size, with VA15, VA16, VA17 being 1.1, 1.2, 0.9 GB,
Conclusions
Prior to this work, IGSCR classification was performed using a “black box” proprietary software library, and classification of a full Landsat scene required several hours of processing time. With the creation of a portable and open source serial version of IGSCR, the black box was removed, allowing analysts to verify and modify the algorithms that are used. The large speedup observed was the result of adding both memory and processors, and therefore does not necessarily indicate good parallel
Acknowledgments
This research was supported by NASA (NAG5-10548), Department of Energy (DE-FGO2-O6ER2572O), and the Department of Computer Science, Virginia Polytechnic Institute and State University. The computation results presented here were in part obtained using Virginia Tech's 128 processor SGI Altix 3700. The authors gratefully acknowledge remote sensing technical assistance by Dr. Christine E. Blinn, Department of Forestry, Virginia Polytechnic Institute and State University and provision of testing
References (26)
- Amdahl, G.M., 1967. Validity of the single processor approach to achieving large scale computing capabilities In: AFIPS...
- Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A.,...
- Ball, G.H., Hall, D.J., 1965. A novel method of data analysis and pattern classification. Technical Report AD 699616,...
- et al.
Satellite inventory of Minnesota forest resources
Photogrammetric Engineering & Remote Sensing
(1994) - Bechtold, W.A., Scott, C.T., 2005. The forest inventory and analysis plot design. In: Bechtold, W.A., Patterson, P.L....
- Bernstein, A.J., 1966. Analysis of programs for parallel processing. IEEE Transactions on Computers...
- et al.
Unsupervised retraining of a maximum likelihood classifier for the analysis of multitemporal remote sensing images
IEEE Transactions on Geosciences and Remote Sensing
(2001) - et al.
Partially supervised classification using weighted unsupervised clustering
IEEE Transactions on Geosciences and Remote Sensing
(1999) - Duda, R.O., Hart, P.E., 1973. Pattern Classification and Scene Analysis. Wiley, New York,...
- Fortran, 1997. Information technology—programming languages—Fortran. ISO/IEC 1539-1, Geneva, Switzerland,...
A synergistic automatic clustering technique (SYNERACT) for multispectral image analysis
Photogrammetric Engineering & Remote Sensing
Cited by (25)
Progress in the remote sensing of groundwater-dependent ecosystems in semi-arid environments
2023, Physics and Chemistry of the EarthAn SMP soft classification algorithm for remote sensing
2014, Computers and GeosciencesCitation Excerpt :Clustering requires many iterations over the full remote sensing image, but the simplest supervised classification algorithms only require one pass over an image (Alpaydin, 2010). This added computational cost of clustering in IGSCR is reduced using parallel processing in the parallel version of IGSCR, PIGSCR (Phillips et al., 2007). Parallel, shared memory versions of K-means clustering and supervised maximum likelihood classification are used, along with implementation strategies specific to shared memory architectures to achieve good parallel speedup in PIGSCR (Phillips et al., 2007).
Leveraging the power of multi-core platforms for large-scale geospatial data processing: Exemplified by generating DEM from massive LiDAR point clouds
2010, Computers and GeosciencesCitation Excerpt :In order to exploit the full power of a multi-core processor, the main solution is to thread the serial application and enable the operating system to dispatch threads to different processor cores. Research has been performed on designing parallel algorithms for multi-core processor in various fields such as image processing (Phillips et al., 2007), petroleum modeling (Bücker et al., 2008), linear algebra (Buttari et al., 2009), multimedia coding (Liao et al., 2007), 3D visualization (Wang and Jaja, 2008), etc. Most of this work is based on specific domain application background.
Feature reduction using a singular value decomposition for the iterative guided spectral class rejection hybrid classifier
2009, ISPRS Journal of Photogrammetry and Remote Sensing