Elsevier

Ecological Informatics

Volume 52, July 2019, Pages 57-68
Ecological Informatics

Semantic region of interest and species classification in the deep neural network feature domain

https://doi.org/10.1016/j.ecoinf.2019.05.006Get rights and content

Highlights

  • Detecting and classifying of semantic animal region in camera-trap images.

  • Generating candidate animal regions by k-means and graph cut semantic segmentation.

  • Accurate classification is performed by semantic region.

Abstract

In this paper, we focus on animal object detection and species classification in camera-trap images collected in highly cluttered natural scenes. Using a deep neural network (DNN) model training for animal- background image classification, we analyze the input camera-trap images to generate a multi-level visual representation of the input image. We detect semantic regions of interest for animals from this representation using k-mean clustering and graph cut in the DNN feature domain. These animal regions are then classified into animal species using multi-class deep neural network model. According the experimental results, our method achieves 99.75% accuracy for classifying animals and background and 90.89% accuracy for classifying 26 animal species on the Snapshot Serengeti dataset, outperforming existing image classification methods.

Introduction

Wildlife monitoring with camera-traps allows us to collect data at large scales in space and time to study the impact of climate changes, land-use, and human actions on wildlife population dynamics, and biodiversity (Kays et al., 2011). Camera-traps are stationary camera-sensor systems attached to trees in the field. Triggered by animal motion with on-board infrared motion sensors, they capture short image sequences of the animal appearance and activities associated with other sensor data, such as light level, moisture, temperature, and GPS sensor data. Camera traps are now being extensively used in wildlife monitoring due to their relatively low cost, rapid deployment, and easy maintenance (He et al., 2016). From camera-trap images, we can extract useful information such as animal species, motion, appearance, and biometric features (Kays et al., 2014). They are widely used in ecological research to track animal movements (Silveira et al., 2003), study habitat use (Bowkett et al., 2008), assess species behaviors and population dynamics (Karanth et al., 2006), and identify new species (Rovero and Menegon, 2008).

In this work, we focus on automatic animal species recognition in highly cluttered camera-trap images using deep learning methods. Fig. 1 shows some samples of camera-trap images with animals often appearing in a relatively small region of the highly cluttered wooded scene. We recognize that it is not efficient to detect and classify animal species directly on the whole image using image classification or object detection approaches. Instead, it is highly desirable to first locate the animal region in the large image and then perform animal species classification on this smaller image region. In this way, we expect that the animal classification will be more accurate and robust since the interference from the cluttered background is suppressed.

Deep neural networks (DNN) have emerged as a powerful method for image representation in various computer vision and machine learning tasks, such as object detection and classification (Oquab et al., 2014; Razavian et al., 2014). They provide a rich hierarchical set of learned visual features, from low-level pixel statistics to high-level semantic features. In this paper, we propose to explore how this DNN visual representation could be used for semantic animal region detection and species classification in challenging natural scenes. Specifically, we first design and train a DNN for animal-background object classification, which is used to analyze the input image to generate multi-layer feature maps, representing the responses of different image regions to the animal-background classifier. In this DNN feature domain, we perform clustering and graph cut to construct the semantic regions of animals. We then perform animal species classification on these semantic regions. Our experimental results demonstrate that the proposed method significantly outperforms existing classification and detection methods.

The main contributions of this work include: (1) we have developed a new approach for representing camera-trap image using semantic regions and detecting semantic regions of interest for animals in highly cluttered natural scenes in the learned DNN feature domain. (2) We have proposed a method to identify semantic regions of interest for more accurate image classification. (3) We have achieved accurate classification of animal species on a challenging dataset, outperforming existing methods.

The rest of paper is organized as follows. Section 2 reviews the related work. Section 3 presents our animal species classification method using semantic region of interest. Experimental results are presented in Section 4. Further discussions are provided in Section 5. Section 6 concludes the paper.

Section snippets

Related work

A number of computer vision and machine learning methods have been developed for animal object detection and classification. Linear support vector machine (SVM) was used by Yu et al. (2013) to classify 18 species of animals on a dataset with over 7000 images. They used sparse coded spatial pyramid (ScSPM) that generates global features and extracts dense SIFT (scale-invariant feature transform) (Lowe, 2004) descriptor and cell-structured local binary patterns (cLBP) as the local features. A

Animal species recognition using semantic region of interest

In this section, we present the proposed method for animal species recognition using semantic region detection in the DNN feature domain.

Experimental results

In this section, we show the performance of our algorithm to classify animal regions into species. First, we used AlexNet architecture to train the two-class animals and background DCNN model. We achieved 99.75% of accuracy in this task. Second, we apply our semantic segmentation algorithm to find the animal region then we use the animal vs background DCNN model to suppress the background regions. We train another DCNN model with 26 classes to classify the animal regions into their species. To

Further discussions

During our experiments, we found that many animals share features such as colors, lines, horns, furs, etc., that make it very difficult to classify similar species. Fig. 12 shows some examples of classification errors with rows (a), (c), and (e) showing the original images of animal species classified into different species in rows (b), (d), and (f). We can see that classification errors for species caused by many factors such as poor illumination, unexpected animal poses, heavy occlusions,

Conclusions

In this paper, we have successfully developed an animal species classification method for camera-trap images with highly cluttered scenes. We use the DNN trained for animal-background classification to analysis the input image and construct a semantic region representation using k-mean clustering and graph cut in the DNN domain. With semantic animal regions being detected, we trained a DNN model to perform animal species classification on these regions. Our experimental results on the

Acknowledgements

This work was supported in part by NSF grant CyberSEES-1539389.

References (35)

  • Y.H.S. Kumar et al.

    Animal classification system: a block based approach

    Procedia Comput. Sci.

    (2015)
  • S. Matuska et al.

    Classification of wild animals based on SVM and local descriptors

    AASRI Procedia

    (2014)
  • L. Silveira et al.

    Camera trap, line transect census and track surveys: a comparative evaluation

    Biol. Conserv.

    (2003)
  • A.E. Bowkett et al.

    The use of camera-trap data to model habitat use by antelope species in the udzungwa mountain forests, Tanzania

    Afr. J. Ecol.

    (2008)
  • Y. Boykov et al.

    An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • Y. Boykov et al.

    Fast approximate energy minimization via graph cuts

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • G. Chen et al.

    Deep convolutional neural network based species recognition for wild animal monitoring

  • X. Cui et al.

    Data augmentation for deep neural network acoustic modeling

  • D. Erhan et al.

    Scalable object detection using deep neural networks

  • B.S. Everitt et al.

    An Introduction to Classification and Clustering, in Cluster Analysis

    (2011)
  • Z. Ge et al.

    Fine-grained bird species recognition via hierarchical subset learning

  • R. Girshick

    Fast R-CNN

  • A. Gomez et al.

    Animal identification in low quality camera-trap images using very deep convolutional neural networks and confidence thresholds

  • A. Gomez et al.

    Towards automatic wild animal monitoring: identification of animal species in camera-trap images using very deep convolutional neural networks

    Ecol. Inform.

    (2017)
  • K. He et al.

    Deep residual learning for image recognition

  • Z. He et al.

    Visual informatics tools for supporting large-scale collaborative wildlife monitoring with citizen scientists

    IEEE Circ. Syst. Mag.

    (2016)
  • K.U. Karanth et al.

    Assessing tiger population dynamics using photographic capture–recapture sampling

    Ecology

    (2006)
  • View full text