Elsevier

Applied Acoustics

Volume 113, 1 December 2016, Pages 193-201
Applied Acoustics

Acoustic classification of Australian frogs based on enhanced features and machine learning algorithms

https://doi.org/10.1016/j.apacoust.2016.06.029Get rights and content

Abstract

Frogs are often considered as excellent indicators of the overall state of the natural environment, but a steady decrease in the frog population has been noticed worldwide. To monitor this change of frog population and optimise the protection policy, frog call classification has become an important bioacoustic research topic. However, automatic acoustic classification of frog calls has not been adequately addressed in the literature. In this paper, an enhanced feature representation for frog call classification using the temporal, perceptual and cepstral features is presented. With the enhanced feature representation, the time-frequency information of frog calls can be effectively represented, which gives a good classification performance. To be specific, each continuous frog recording is first segmented into individual syllables using the Ha¨rma¨’s method. Then, temporal, perceptual, and cepstral features are calculated from each syllable: syllable duration, Shannon entropy, Rényi entropy, zero-crossing rate, averaged energy, oscillation rate, spectral centroid, spectral flatness, spectral roll-off, signal bandwidth, spectral flux, fundamental frequency, linear predictive coding, and Mel-frequency cepstral coefficients. Next, different feature vectors are fused to obtain different enhanced feature representations. Finally, different enhanced feature representations are compared using five machine learning algorithms: linear discriminant analysis, K-nearest neighbour, support vector machines, random forest, and artificial neural network. Experiment results show that our proposed feature representation could achieve better classification performance comparing to other methods with twenty-four frog species, which are geographically well distributed throughout Queensland, Australia.

Introduction

Nowadays, great pressure has been placed on global biodiversity due to habitat loss, invasive species, pollution, climate change, and resources overexploitation [1]. Consequently, animal (frog) population has been dramatically decreased. On one hand, frog population is declining, on the other frogs are often regarded as excellent bio-indicators because of their sensitivity to the environment change. Thus, it is becoming ever more necessary to monitor the frog population.

Since frogs are often heard rather than seen1 and vocalisations of frogs consist of acoustic cues for their communication, acoustic has long been utilised to monitor frog species. There are many types of calls made by frogs, including territorial calls, distress calls, warning calls, release calls, and mating calls [2]. Among them, mating calls are termed as advertisement calls, and can be used to identify frog species. Advertisement calls of species, which are more closely related phylogenetically, are predicted to be more similar than those of distant species [3]. Therefore, acoustic information from advertisement calls can be used for frog call classification.

To monitor frogs’ advertisement calls, a traditional field survey method, which requires ecologists to physically visit sites to collect biodiversity data, is both time-consuming and costly. In contrast, recent advances in acoustic sensor techniques provide us a new way to monitor environments over larger spatial temporal scales. But the use of acoustic sensors leads to the rapid growth of acoustic data [4]. Developing semi-automatic or automatic methods for the classification of collected acoustic data by sensors is thus in high demand and attracts a lot of research.

Many studies have investigated the recognition or classification of frog calls. Prior frog call classification system is commonly structured as follows: (1) pre-processing, (2) syllable segmentation, (3) feature extraction, (4) feature fusion, (5) classification. Grigg et al. [5] proposed a system to identify 22 frog species recorded in northern Australia based on peak values (intensity of spectrogram) and Quinlan’s machine learning system. Lee et al. [6] introduced a recognition method based on the analysis of spectrogram to classify frog and cricket calls. Mel-frequency cepstral coefficients (MFCCs) of each frame were calculated and averaged as the feature, and linear discriminant analysis (LDA) was used for classifying 30 kinds of frog calls and 19 kinds of cricket calls. Huang et al. [7] extracted spectral centroid, signal bandwidth, and threshold crossing rate as features, and used a K-nearest neighbour (K-NN) classifier and support vector machines (SVM) to classify frog calls. Acevedo et al. [8] used three classifiers, LDA, decision tree (DT), and SVM, for automated classification of bird and amphibian calls. The best average classification accuracy achieved was 94.95%. A method for classifying Australia frogs was proposed by Han et el. [9] where they achieved high accuracy by using hybrid spectral-entropy approach with a K-NN classifier. To utilise the time-varying information, Chen et al. [10] developed a novel feature named multi-stage average spectrum (MSAS) to classify frog calls. Syllable length was first employed for the pre-classification of frog calls; then MSAS was used to perform final classification via template matching. In [11], frog calls were classified using Linear predictive coding (LPC), MFCCs and a K-NN classifier. In [3], Gingras et al. presented a system for the classification of frog genus. This automatic system was built on a SVM model, a K-NN algorithm, and a multivariate Gaussian distribution classifier. Three parameters used were mean values for dominant frequency, coefficient of variation of root-mean square energy, and spectral flux, respectively. Huang et al. [12] developed a method for the classification of anuran vocalisations using fast learning neural-networks. The average classification rate can reach up to 93.4% in average. Bedoya et al. [13] used a fuzzy clustering algorithm (Learning Algorithm for Multivariate Data Analysis) for the recognition of anuran calls. Accuracies between 99.38% and 100% were achieved for two datasets, respectively. However, most features used in the prior work are based on either temporal features, perceptual features, or cepstral features. It is obvious that a combination of three types of features can discriminate a wider variety of species that may share similar characteristics in either temporal, perceptual or cepstral information but not all.

In this study, an enhanced feature representation is proposed for frog call classification, which includes temporal, perceptual, and cepstral features, as an extension of our previous paper [14]. Specifically. After segmenting continuous frog calls into individual syllables. Temporal, perceptual, and cepstral features are extracted from each syllable. Next, different features are fused to obtain the unified feature representation. Finally, the unified feature representation is fed into five machine learning algorithms to perform the task of frog call classification. Twenty-four frog species, which are geographically well distributed throughout Queensland, Australia, are used in this experiment. Experiment results show that our proposed enhanced feature representation can achieve an average classification accuracy of 99.8%, which outperforms other feature representations.

The main contributions and the differences of this work with respect to Xie et al. [14] are (1) the design and realisation of a wide data set of more frog species, with highly noisy background, occurring at different SNRs ranging from −10 dB to 40 dB; (2) a novel feature representation based on feature fusion, which achieves a higher classification accuracy; (3) A post-processing step for syllable segmentation, which reduces the bias introduced by segmentation; (4) five machine learning algorithms are compared to perform the classification; (5) a detailed discussion of various window sizes of MFCCs and perceptual features.

The remainder of this paper is organised as follows: Section 2 describes the methods for frog call classification in detail, which consists of data description, pre-processing, syllable segmentation, feature extraction, feature fusion, and classification. Section 3 reports the experiment results and discussion. The conclusion and future work are offered in Section 4.

Section snippets

Architecture of the classification system for frog calls

Our frog call classification system consists of six steps (Fig. 1): data description, syllable segmentation, pre-processing, feature extraction, feature fusion, and classification. Detailed information of each step is shown in following subsections. Different from previous studies [7], [14], pre-processing is applied to the segmented syllables rather than continuous recordings.

Experiment results

In this experiment, performance statistics are estimated with fivefold cross validation. The performance of the proposed frog call classification system is evaluated by quantitatively expressed detection metrics, such as average accuracy, precision, and specificity. The definition of accuracy, precision, and specificity can be defined asSensitivity=TPTP+FNSpecificity=TNTN+FPAccuracy=TP+TNTP+TN+FP+FNwhere TP is true positive, FP is true positive, TN is true negative, and FN is false negative.

Discussion

Table 2 shows the classification performance of previous methods. Since previous studies often used different datasets to perform the classification task, we implement all those features and apply them to the dataset with the same classifier (SVM). Compared with those previous methods, this proposed enhanced feature representation significantly outperforms other methods. Therefore, it can be concluded that our feature representation can effectively characterise different frog calls. From the

Conclusion and future work

In this paper, we proposed a novel enhanced feature representation to classify frog calls with various machine learning algorithms. After segmenting continuous recordings into individual syllables, a variety of acoustic features are extracted from each syllable. Then, different features are fused to form different feature representations. Finally, various machine learning algorithms are used to classify frog calls with different feature representations. Our proposed enhanced feature

Acknowledgements

Thanks to the QUT Eco-acoustics Research Group for providing the datasets used in this experiment, as well as to the support from the Wet Tropics Management Authority, Queensland, Australia. Thanks to the anonymous reviewers for their careful work and thoughtful suggestions that have helped improve this paper substantially.

All funding for this research was provided by the Queensland University of Technology and the China Scholarship Council (CSC).

References (25)

  • K.D. Wells

    The ecology and behavior of amphibians

    (2010)
  • B. Gingras et al.

    A three-parameter model for classifying anurans into four genera based on advertisement calls

    J Acoust Soc Am

    (2013)
  • Cited by (30)

    • Visualization and categorization of ecological acoustic events based on discriminant features

      2021, Ecological Indicators
      Citation Excerpt :

      Finding this best ideal subset implies in targeting a small number of features capable of performing specific classification tasks with high precision (Alpaydin, 2014). In sound classification tasks (e.g. Phillips et al., 2018; Xie and Towsey, 2016), methods usually extract acoustic metrics or features that are used to train a learning model. Features are therefore employed to summarize and describe a soundscape.

    • Multileveled ternary pattern and iterative ReliefF based bird sound classification

      2021, Applied Acoustics
      Citation Excerpt :

      For example, Hickling et al., [20], showed that acoustic properties can be used to detect insects in grains. Similarly, Xie et al., [21], environmental observation classified frog sounds using machine learning algorithms to monitor change in the frog population and optimize conservation policy. In another study Clarke et al., [22], a method for automatic interpretation of dog behavior using acoustic features has been proposed.

    View all citing articles on Scopus
    View full text