Acoustic classification of Australian frogs based on enhanced features and machine learning algorithms

doi:10.1016/j.apacoust.2016.06.029

Applied Acoustics

Volume 113, 1 December 2016, Pages 193-201

https://doi.org/10.1016/j.apacoust.2016.06.029 Get rights and content

Abstract

Frogs are often considered as excellent indicators of the overall state of the natural environment, but a steady decrease in the frog population has been noticed worldwide. To monitor this change of frog population and optimise the protection policy, frog call classification has become an important bioacoustic research topic. However, automatic acoustic classification of frog calls has not been adequately addressed in the literature. In this paper, an enhanced feature representation for frog call classification using the temporal, perceptual and cepstral features is presented. With the enhanced feature representation, the time-frequency information of frog calls can be effectively represented, which gives a good classification performance. To be specific, each continuous frog recording is first segmented into individual syllables using the H $\ddot{a}$ rm $\ddot{a}$ ’s method. Then, temporal, perceptual, and cepstral features are calculated from each syllable: syllable duration, Shannon entropy, R $\overset{́}{e}$ nyi entropy, zero-crossing rate, averaged energy, oscillation rate, spectral centroid, spectral flatness, spectral roll-off, signal bandwidth, spectral flux, fundamental frequency, linear predictive coding, and Mel-frequency cepstral coefficients. Next, different feature vectors are fused to obtain different enhanced feature representations. Finally, different enhanced feature representations are compared using five machine learning algorithms: linear discriminant analysis, K-nearest neighbour, support vector machines, random forest, and artificial neural network. Experiment results show that our proposed feature representation could achieve better classification performance comparing to other methods with twenty-four frog species, which are geographically well distributed throughout Queensland, Australia.

Introduction

Nowadays, great pressure has been placed on global biodiversity due to habitat loss, invasive species, pollution, climate change, and resources overexploitation [1]. Consequently, animal (frog) population has been dramatically decreased. On one hand, frog population is declining, on the other frogs are often regarded as excellent bio-indicators because of their sensitivity to the environment change. Thus, it is becoming ever more necessary to monitor the frog population.

Since frogs are often heard rather than seen¹ and vocalisations of frogs consist of acoustic cues for their communication, acoustic has long been utilised to monitor frog species. There are many types of calls made by frogs, including territorial calls, distress calls, warning calls, release calls, and mating calls [2]. Among them, mating calls are termed as advertisement calls, and can be used to identify frog species. Advertisement calls of species, which are more closely related phylogenetically, are predicted to be more similar than those of distant species [3]. Therefore, acoustic information from advertisement calls can be used for frog call classification.

To monitor frogs’ advertisement calls, a traditional field survey method, which requires ecologists to physically visit sites to collect biodiversity data, is both time-consuming and costly. In contrast, recent advances in acoustic sensor techniques provide us a new way to monitor environments over larger spatial temporal scales. But the use of acoustic sensors leads to the rapid growth of acoustic data [4]. Developing semi-automatic or automatic methods for the classification of collected acoustic data by sensors is thus in high demand and attracts a lot of research.

Many studies have investigated the recognition or classification of frog calls. Prior frog call classification system is commonly structured as follows: (1) pre-processing, (2) syllable segmentation, (3) feature extraction, (4) feature fusion, (5) classification. Grigg et al. [5] proposed a system to identify 22 frog species recorded in northern Australia based on peak values (intensity of spectrogram) and Quinlan’s machine learning system. Lee et al. [6] introduced a recognition method based on the analysis of spectrogram to classify frog and cricket calls. Mel-frequency cepstral coefficients (MFCCs) of each frame were calculated and averaged as the feature, and linear discriminant analysis (LDA) was used for classifying 30 kinds of frog calls and 19 kinds of cricket calls. Huang et al. [7] extracted spectral centroid, signal bandwidth, and threshold crossing rate as features, and used a K-nearest neighbour (K-NN) classifier and support vector machines (SVM) to classify frog calls. Acevedo et al. [8] used three classifiers, LDA, decision tree (DT), and SVM, for automated classification of bird and amphibian calls. The best average classification accuracy achieved was 94.95%. A method for classifying Australia frogs was proposed by Han et el. [9] where they achieved high accuracy by using hybrid spectral-entropy approach with a K-NN classifier. To utilise the time-varying information, Chen et al. [10] developed a novel feature named multi-stage average spectrum (MSAS) to classify frog calls. Syllable length was first employed for the pre-classification of frog calls; then MSAS was used to perform final classification via template matching. In [11], frog calls were classified using Linear predictive coding (LPC), MFCCs and a K-NN classifier. In [3], Gingras et al. presented a system for the classification of frog genus. This automatic system was built on a SVM model, a K-NN algorithm, and a multivariate Gaussian distribution classifier. Three parameters used were mean values for dominant frequency, coefficient of variation of root-mean square energy, and spectral flux, respectively. Huang et al. [12] developed a method for the classification of anuran vocalisations using fast learning neural-networks. The average classification rate can reach up to 93.4% in average. Bedoya et al. [13] used a fuzzy clustering algorithm (Learning Algorithm for Multivariate Data Analysis) for the recognition of anuran calls. Accuracies between 99.38% and 100% were achieved for two datasets, respectively. However, most features used in the prior work are based on either temporal features, perceptual features, or cepstral features. It is obvious that a combination of three types of features can discriminate a wider variety of species that may share similar characteristics in either temporal, perceptual or cepstral information but not all.

In this study, an enhanced feature representation is proposed for frog call classification, which includes temporal, perceptual, and cepstral features, as an extension of our previous paper [14]. Specifically. After segmenting continuous frog calls into individual syllables. Temporal, perceptual, and cepstral features are extracted from each syllable. Next, different features are fused to obtain the unified feature representation. Finally, the unified feature representation is fed into five machine learning algorithms to perform the task of frog call classification. Twenty-four frog species, which are geographically well distributed throughout Queensland, Australia, are used in this experiment. Experiment results show that our proposed enhanced feature representation can achieve an average classification accuracy of 99.8%, which outperforms other feature representations.

The main contributions and the differences of this work with respect to Xie et al. [14] are (1) the design and realisation of a wide data set of more frog species, with highly noisy background, occurring at different SNRs ranging from −10 dB to 40 dB; (2) a novel feature representation based on feature fusion, which achieves a higher classification accuracy; (3) A post-processing step for syllable segmentation, which reduces the bias introduced by segmentation; (4) five machine learning algorithms are compared to perform the classification; (5) a detailed discussion of various window sizes of MFCCs and perceptual features.

The remainder of this paper is organised as follows: Section 2 describes the methods for frog call classification in detail, which consists of data description, pre-processing, syllable segmentation, feature extraction, feature fusion, and classification. Section 3 reports the experiment results and discussion. The conclusion and future work are offered in Section 4.

Section snippets

Architecture of the classification system for frog calls

Our frog call classification system consists of six steps (Fig. 1): data description, syllable segmentation, pre-processing, feature extraction, feature fusion, and classification. Detailed information of each step is shown in following subsections. Different from previous studies [7], [14], pre-processing is applied to the segmented syllables rather than continuous recordings.

Experiment results

In this experiment, performance statistics are estimated with fivefold cross validation. The performance of the proposed frog call classification system is evaluated by quantitatively expressed detection metrics, such as average accuracy, precision, and specificity. The definition of accuracy, precision, and specificity can be defined as $Sensitivity = \frac{TP}{TP + FN}$ $Specificity = \frac{TN}{TN + FP}$ $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ where TP is true positive, FP is true positive, TN is true negative, and FN is false negative.

Discussion

Table 2 shows the classification performance of previous methods. Since previous studies often used different datasets to perform the classification task, we implement all those features and apply them to the dataset with the same classifier (SVM). Compared with those previous methods, this proposed enhanced feature representation significantly outperforms other methods. Therefore, it can be concluded that our feature representation can effectively characterise different frog calls. From the

Conclusion and future work

In this paper, we proposed a novel enhanced feature representation to classify frog calls with various machine learning algorithms. After segmenting continuous recordings into individual syllables, a variety of acoustic features are extracted from each syllable. Then, different features are fused to form different feature representations. Finally, various machine learning algorithms are used to classify frog calls with different feature representations. Our proposed enhanced feature

Acknowledgements

Thanks to the QUT Eco-acoustics Research Group for providing the datasets used in this experiment, as well as to the support from the Wet Tropics Management Authority, Queensland, Australia. Thanks to the anonymous reviewers for their careful work and thoughtful suggestions that have helped improve this paper substantially.

All funding for this research was provided by the Queensland University of Technology and the China Scholarship Council (CSC).

References (25)

J. Wimmer et al.
Analysing environmental acoustic data through collaboration and automation
Future Gener Comput Syst
(2013)
C.-H. Lee et al.
Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis
Pattern Recogn Lett
(2006)
C.-J. Huang et al.
Frog classification using machine learning techniques
Expert Syst Appl
(2009)
M.A. Acevedo et al.
Automated classification of bird and amphibian calls using machine learning: a comparison of methods
Ecol Inform
(2009)
N.C. Han et al.
Acoustic classification of australian anurans based on hybrid spectral-entropy approach
Appl Acoust
(2011)
W.-P. Chen et al.
Automatic recognition of frog calls using a multi-stage average spectrum
Comput Math Appl
(2012)
C.-J. Huang et al.
Intelligent feature extraction and classification of anuran vocalizations
Appl Soft Comput
(2014)
C. Bedoya et al.
Automatic recognition of anuran species based on syllable identification
Ecol Inform
(2014)
C.-H. Lee et al.
Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis
Pattern Recogn Lett
(2006)
S. Wang et al.
Robust underwater noise targets classification using auditory inspired time–frequency analysis
Appl Acoust
(2014)

K.D. Wells

The ecology and behavior of amphibians

(2010)

B. Gingras et al.

A three-parameter model for classifying anurans into four genera based on advertisement calls

J Acoust Soc Am

(2013)

Cited by (30)

Systematic review of machine learning methods applied to ecoacoustics and soundscape monitoring
2023, Heliyon
Soundscape ecology is a promising area that studies landscape patterns based on their acoustic composition. It focuses on the distribution of biotic and abiotic sounds at different frequencies of the landscape acoustic attribute and the relationship of said sounds with ecosystem health metrics and indicators (e.g., species richness, acoustic biodiversity, vectors of structural change, gradients of vegetation cover, landscape connectivity, and temporal and spatial characteristics). To conduct such studies, researchers analyze recordings from Acoustic Recording Units (ARUs). The increasing use of ARUs and their capacity to record hours of audio for months at a time have created a need for automatic processing methods to reduce time consumption, correlate variables implicit in the recordings, extract features, and characterize sound patterns related to landscape attributes. Consequently, traditional machine learning methods have been commonly used to process data on different characteristics of soundscapes, mainly the presence–absence of species. In addition, it has been employed for call segmentation, species identification, and sound source clustering. However, some authors highlight the importance of the new approaches that use unsupervised deep learning methods to improve the results and diversify the assessed attributes. In this paper, we present a systematic review of machine learning methods used in the field of ecoacoustics for data processing. It includes recent trends, such as semi-supervised and unsupervised deep learning methods. Moreover, it maintains the format found in the reviewed papers. First, we describe the ARUs employed in the papers analyzed, their configuration, and the study sites where the datasets were collected. Then, we provide an ecological justification that relates acoustic monitoring to landscape features. Subsequently, we explain the machine learning methods followed to assess various landscape attributes. The results show a trend towards label-free methods that can process the large volumes of data gathered in recent years. Finally, we discuss the need to adopt methods with a machine learning approach in other biological dimensions of landscapes.
Visualization and categorization of ecological acoustic events based on discriminant features
2021, Ecological Indicators
Citation Excerpt :
Finding this best ideal subset implies in targeting a small number of features capable of performing specific classification tasks with high precision (Alpaydin, 2014). In sound classification tasks (e.g. Phillips et al., 2018; Xie and Towsey, 2016), methods usually extract acoustic metrics or features that are used to train a learning model. Features are therefore employed to summarize and describe a soundscape.
Although sound classification in soundscape studies are generally performed by experts, the large growth of acoustic data presents a major challenge for performing such task. At the same time, the identification of more discriminating features becomes crucial when analyzing soundscapes, and this occurs because natural and anthropogenic sounds are very complex, particularly in Neotropical regions, where the biodiversity level is very high. In this scenario, the need for research addressing the discriminatory capability of acoustic features is of utmost importance to work towards automating these processes. In this study we present a method to identify the most discriminant features for categorizing sound events in soundscapes. Such identification is key to classification of sound events. Our experimental findings validate our method, showing high discriminatory capability of certain extracted features from sound data, reaching an accuracy of $89.91 %$ for classification of frogs, birds and insects simultaneously. An extension of these experiments to simulate binary classification reached accuracy of $82.64 %, 100.0 %$ and $99.40 %$ for the classification between combinations of frogs-birds, frogs-insects and birds-insects, respectively.
Multileveled ternary pattern and iterative ReliefF based bird sound classification
2021, Applied Acoustics
Citation Excerpt :
For example, Hickling et al., [20], showed that acoustic properties can be used to detect insects in grains. Similarly, Xie et al., [21], environmental observation classified frog sounds using machine learning algorithms to monitor change in the frog population and optimize conservation policy. In another study Clarke et al., [22], a method for automatic interpretation of dog behavior using acoustic features has been proposed.
Birds may need to be identified for purposes such as environmental monitoring, follow-up, and species detection in the ecological area. Automatic sound classifiers have been used to perform species detection. Many methods have been presented in the literature to classify bird sounds with high accuracy. Nowadays, deep learning models have been used to classify data with high classification accuracy. However, these networks have high computational complexity. To obtain a highly accurate and lightweight classification model, a new multileveled and handcrafted features based machine learning model is presented. The presented automated bird sound classification model uses the multileveled ternary pattern (TP) feature generation, feature selection, and classification phases. The multileveled feature generation network can reach high classification accuracies since they generate high-level, low-level, and mid-level features. To construct levels, discrete wavelet transform (DWT) is employed to use the effectiveness of the DWT in bird sound classification. An improved version of the ReliefF, which is iterative ReliefF (IRF), is considered as feature selector. IRF selects the most informative features automatically, and these features are operated on linear discriminant (LD), k nearest neighbor (kNN), bagged tree (BT), and support vector machine (SVM) classifiers to calculate results of variable classifiers. The proposed multilevel TP and IRF based bird sound classification method reached 96.67% accuracy by using SVM on the 18 classes bird sound dataset.
Quantitative ultrasonic testing for near-surface defects of large ring forgings using feature extraction and GA-SVM
2021, Applied Acoustics
Water immersion ultrasonic testing technology is widely used in non-destructive testing of large ring forgings, but near-surface defects are difficult to identify in near-surface blind zone, because the defect echo always overlaps with interface echo. In this article, a method combining signal feature extraction with genetic algorithm optimization support vector machine (GA-SVM) was proposed to realize quantitatively testing of near-surface defects. Firstly, the near-surface artificial defects were machined on the test specimens from the large ring forgings, and a total of 160 signal samples were obtained by water immersion ultrasonic experiments. Then the time-domain features of ultrasonic signals were extracted, the spectrum features were obtained by fast Fourier transform. And the ultrasonic signals were processed by intrinsic time-scale decomposition, and the statistical features of the rotating components with different frequencies were extracted. Finally, three kinds of neural network classifiers were used to identify the size and depth of defects by the features database. The experimental results showed that the identification error of the GA-SVM classifier with the different number of sample databases was always minimal. The ultrasonic testing experiment of near-surface notch defects further verified the correctness and feasibility of the GA-SVM classifier.
Bioacoustic signal classification in continuous recordings: Syllable-segmentation vs sliding-window
2020, Expert Systems with Applications
Frog population has been experiencing rapid decreases worldwide, which is regarded as one of the most critical threats to the global biodiversity. Therefore, large volumes of frog recordings have been collected for assessing this decline. Building an automatic frog species classification system is becoming ever more important. The traditional system for classifying frog species consists of four steps: (1) bioacoustic signal preprocessing, (2) segmentation, (3) feature extraction, (4) classification. Each prior step has a direct impact on the subsequent step. Consequently, the final classification performance is highly affected by the initial three steps. However, the performance of bioacoustic signal segmentation is highly dependent on the background noise of those environmental recordings. In this study, we propose an end-to-end approach for acoustic classification of frog species in continuous recordings. First, a sliding window is used to segment the audio signal into frames. Then, 1D-Convolution Neural Network and long short-term memory (CNN-LSTM) network is used to learn a representation from the raw audio signal, where three Convolutional layers and one LSTM layer are used to capture the signal’s pattern. Experimental results in classifying 23 Australian frog species demonstrate the effectiveness of our proposed CNN-LSTM based method. Compared to the syllable-segmentation based frog species classification system, our proposed CNN-LSTM based approach is more robust in frog species classification under various noisy conditions.
Optimal design for real-time quantitative monitoring of sand in gas flowline using computational intelligence assisted design framework
2019, Journal of Petroleum Science and Engineering
Global demand for oil and gas is still increasing rapidly. The direct consequence of this is the increased operating pressure amid concerns over increasing sand production. According to the Society of Petroleum Engineers (SPE), 70% of the world's hydrocarbon reserves are contained in reservoirs situated on unconsolidated formations. Given the reality of these formations, sand production will certainly be a problem of significant concern particularly during the later life of the fields when they become more ‘mature’. However, to monitor sand and optimise its production for improved recovery and safety, life extension and economy of the fields and ensured reliability, the automatic detection and prediction of sand flow characteristic measurements; sand flow rate (SFR), sand concentration (SC), line pressure drop (PD), and gas velocity (GV), has become an important research topic of great interest. Despite this importance, discussion of the topic is still lacking in the literature. This paper proposes a novel and robust architecture of intelligent real-time sand flow characteristic measurement using an acoustic sensor and computational intelligence assisted design (CIAD) framework. It fully incorporates acoustic signal processing and analysis, prediction algorithms and optimisation algorithms in the design. Acoustic features based on acoustic signal processing techniques are extracted to reduce the dimensionality of the acoustic signals. A classical Artificial Neural Network (ANN) is used to model the non-linear relationships between the acoustic signal characteristics and the flow characteristics measurands. In addition, the ANN algorithm adapts its weights and biases using the Grey Wolf Optimiser (GWO) through minimisation of the cost function during the training phase. Preliminary results obtained on a laboratory test rig demonstrate that an acoustic sensor coupled with CIAD may provide simple and robust practical solution to the measurement problem of particle-laden gas flow characteristics in real-time.

View all citing articles on Scopus

View full text

Acoustic classification of Australian frogs based on enhanced features and machine learning algorithms

Abstract

Introduction

Section snippets

Architecture of the classification system for frog calls

Experiment results

Discussion

Conclusion and future work

Acknowledgements

Future Gener Comput Syst

Pattern Recogn Lett

Expert Syst Appl

Ecol Inform

Appl Acoust

Comput Math Appl