Marine mammal sound classification based on a parallel recognition model and octave analysis
Graphical abstract
Introduction
In nature many animals use sound communication to exchange information. For instance, in the aquatic environment, marine mammals, including whales, depend on sound for both social interactions and to locate prey. For example, the use of passive acoustics to detect and classify species in-situ provides a means of identifying a species in their habitat, reveals their behavior as well as the population density. Automatic classification of marine mammal sounds is perhaps the most challenging task in the field of animal bioacoustics due to the unknown statistical signal properties, as well as the use of different recording systems and low signal to noise ratio (SNR) conditions, among others. Such discrepancies often lead to sub-optimal system performance.
This work evaluates different architectures for automatic classification of eleven marine mammal species found in the Gulf of Mexico, which is home to a high diversity of organisms. The model proposed herein could be useful to monitor, reduce, and avoid some human activities which occur in areas inhabited by protected species.
In this paper the sounds belonging to the following species are classified as:
- 1.
Two mysticete cetaceans: Minke Whale (Balaenoptera acutostrata) and Humpback Whale (Megaptera novaeangliae).
- 2.
Five odontocete cetaceans: Killer Whale (Orcinus orca), False Killer Whale (Pseudorca crassidens), Atlantic Spotted Dolphin (Stenella frontalis), Common Bottlenose Dolphin (Tursiops truncatus) and Sperm Whale (Physeter macrocephalus).
- 3.
One sirenia: West Indian Manatee (Trichechus manatus).
This work considers multiple types of sounds emitted by each species including whistles, calls, squeaks, thumps, moans, and others. Therefore the identification of a given class is determined by the features extracted from any of these types of sounds, potentially reducing the time required to detect and classify the species. However, given the external conditions that the recording systems are exposed to, three more classes are included: natural (rain, bubbles, etc.), anthropogenic (vessel’s engines) and unknown (Fig. 1).
Currently there are multiple observatories, such as the international program “Listen to the Deep Ocean Environment” led by the Laboratory of Applied Bioacoustics of the Technical University of Catalonia [28] or the ALOHA observatory, operated by the University of Hawaii [2]. Thus, the amount of recordings has grown exponentially, which demonstrates the necessity for applying automatic methods to assess both on-site and off-site systems, in order to minimize manual interaction or supervision.
Underwater sounds are produced by a variety of natural sources, such as breaking waves, rain and marine life. It is also produced by a variety of man-made sources [23], [4], [18] such as ships and military sonars. Most sounds are relatively present everywhere in the ocean at all times. The background sound in the ocean is called ambient noise and is always present, whereby other sounds are only present during specific periods of time or places in the ocean.
Marine mammals [35] such as whales and dolphins produce sounds over a much wider frequency range, often outside the human hearing range. On the one hand, some large Baleen whales (mysticetes) produce sounds having frequencies lower than 10 Hz (below the human hearing range). On the other hand, dolphin echolocation clicks usually contain frequencies greater than 100 kHz (over the human hearing range). Other species also produce sounds such as the toadfish and drums, as well as some marine invertebrates like the snapping shrimp.
Passive acoustic classification is generally performed by the sonar operator. Presently, the final classification of a given sound is the responsibility of the operator [18], [41]. However, by having these automatic classification systems the operator may make decisions with more confidence and being able using his skills mainly to analyze the most important or complex sounds.
Most of the works related to automatic classification of the noise produced by ships have dealt with features extracted in the frequency domain using Fast Fourier Transform (FFT) power spectrum [50], [24], [27], auto-regressive modeling [16], [25] and wavelet transforms [16], [10]. Regarding species detection, early automatic techniques make use of matched filters, Hidden Markov Models, and spectrogram cross-correlation [12]. These methods are later improved by using machine learning approaches such as feedforward neural network classifiers [34], [39], [13], [33], [32], [40]. Other machine learning algorithms, such as classification and regression tree classifiers (CART), have also been implemented in recognizing contact calls made from the North Atlantic Right Whale [14], [15]. Improvements over single recognition methods have been obtained by using an advanced technique that combines several recognition methods running in parallel [14], [15], [40].
Whales are widely studied mostly due to their unique communication capabilities. Abousleiman et al. [1] have developed an algorithm to pre-process the sound before applying a tree based hierarchical classifier. The main goal is to determine whether a North Atlantic Right Whale is present or not. They perform this binary classification by identifying a unique call made by the whale known as “contact call” or “up-call”, achieving a success rate close to 85%. André et al. [3] and Zaugg et al. [53] detect cetacean emissions considering specific frequency bands, reaching a classification accuracy above 90%.
Existing schemes rely on the use of cepstral coefficients [9], [42], [37] as the input feature space used for capturing mostly pitch information on different vocalizations. Other approaches, such as auditory perception features, spectrograms, and frequency contours have been used as well [8], [19], [52], [51], [29].
PAMGuard is an open-source freely available suite of passive-acoustic monitoring software for marine mammals [20]. Oswald et al. [38] developed in ROCCA (currently incorporated in PAMGuard), which is an open source software package that measures 54 whistle contour features and is able to classify whistles of seven species and one genus: Globicephala macrorhynchus, Pseudorca crassidens, Steno bredanensis, Stenella attenuata, Stenella coeruleoalba, Stenella longirostris, Tursiops truncatus, and Delphinus species. The classifier deployed is based on Random Forest Analysis trained on 54 whistle contour features, yielding an overall successful classification score of 62%.
The proposed method is mainly built in two stages. First, an octave analysis is performed, which is widely used in acoustical analysis and audio signal processing. Although the signals to be classified are transient signals, these are still considered for feature extraction due to the frequency behavior. Second, a neural network model is used for identification of the eleven classes.
This paper is organized as follows. Section 2 gives a description of typical marine mammal signals, including a detailed explanation of the pre-processing, processing, and feature extraction process. Section 3 describes the neural model while experimental results are presented in Section 4. Finally, the conclusions are drawn in Section 5.
Section snippets
Spectral and temporal properties of marine mammal sounds
Social sounds of marine mammals are usually studied with a spectrographic analyzer, which determines the “instantaneous” frequency and relative amplitude of a signal as a time function, with the information usually plotted as a spectrogram. Many of the sounds emitted by marine mammals will have a pulse-like or burst-like property.
Sound emissions by odontocetes (toothed whales and dolphins) can be classified into two broad categories, frequency-varying continuous tonal sounds, referred to as
Neural model
Given the complexity of underwater signal and based on extensive experimental analysis a coarse-to-fine classification is used, i.e., each level from the hierarchy depicted in Fig. 1 is delegated to a single classifier. Four neural networks are used to perform the entire classification process. A single neural network does either a coarse or fine classification based on the corresponding features set (see Fig. 7), having as result a different output space. The output from all previous
Results and discussions
After having tested neural networks separately with various octave bands (1/3, 1/6 and 1/12), the classification rate per network using the test set is shown in Table 2. It can be seen that the optimal analysis is 1/6, because the 1/3 analysis fails to identify Delphinidae subclasses (O. orca, P. crassidens, S. frontalis and T. truncatus) and although the 1/12 analysis gives more resolution in frequency, it extracts twice the number of features, which increases the processing time without any
Conclusions and future work
In summary, some processing and pattern recognition techniques have been evaluated in order to improve the performance of sounds detection and classification from several marine mammals along with other common sounds found on passive recordings. Specifically, a computational model combining four parallel neural networks based on a decision module is proposed in this work.
Using parallel neural networks results in a more robust and effective classification model. The advantage of this methodology
References (53)
- et al.
Listening to the deep: live monitoring of ocean noise and cetacean acoustic signals
Mar Pollut Bull
(2011) - et al.
Real-time aircraft noise likeness detector
Appl Acoust
(2010) - et al.
Aircraft take-off noises classification based on human auditory’s matched features extraction
Appl Acoust
(2014) - et al.
Detecting marine mammals with an adaptive sub-sampling recorder in the Bering Sea
Appl Acoust
(2010) - et al.
Aircraft class identification based on take-off noise signal segmentation in time
Expert Syst Appl
(2013) - et al.
Airport take-off noise assessment aimed at identify responsible aircraft classes
Sci Total Environ
(2016) - et al.
Dynamic hierarchical aggregation of parallel outputs for aircraft take-off noise identification
Eng Appl Artif Intell
(2015) - et al.
Real-time acoustic classification of sperm whale clicks and shipping impulses from deep-sea observatories
Appl Acoust
(2010) - Abousleiman R, Qu G, Rawashdeh O. North Atlantic right whale contact call detection. In: ICML 2013, Proceedings:...
- ALOHA Cabled Observatory. Retrieved September 22,...
Ocean ambient sound: comparing the 1960s with the 1990s for a receiver off the California coast
Acoust Res Lett Online
Principles of marine bioacoustics
Automatic classification of killer whale vocalizations using dynamic time warping
J Acoust Soc Am
Hidden Markov and Gaussian mixture models for automatic call classification
J Acoust Soc Am, JASA Express Lett
Classification of underwater signals using wavelet transforms and neural networks
Math Comput Model
Acoustic behavior of mysticete whales
Quantitative analysis of animal vocal phonology: an application to swamp sparrow song
Ethology
Quantifying complex patterns of bioacoustic variation: use of a neural network to compare killer whale (Orcinus orca) dialects
J Acoust Soc Am
Vocalization among marine animals
Preprocessing passive sonar signals for neural classification
IET Radar Sonar Navig
Detection and classification of right whale calls using an edge detector operating on a smoothed spectrogram
J Can Acoust
PAMGUARD: semiautomated, open-source software for real-time acoustic detection and localization of cetaceans
J Acoust Soc Am
Cited by (32)
A machine learning-based underwater noise classification method
2021, Applied AcousticsA tristimulus-formant model for automatic recognition of call types of laying hens
2021, Computers and Electronics in AgricultureCitation Excerpt :The average precision rates are 93.6 ± 1.7% (MFCCs-12+BPNN model) and 91.3 ± 1.7% (MFCCs-3+TF+BPNN model). Other similar animal sound recognition rates are the following: 98% for blue monkeys (2 call types: ‘pyow’ and ‘hack’ calls) (Mielke and Zuberbühler, 2013), 92% for geese (an average accuracy for 3 behaviours) and 84% (an average precision for 3 behaviours) (Steen et al., 2012), 80.4–92.5% for birds (Cheng et al., 2010), 90% for marine mammals (three call types: whistles, calls and squeaks) (González-Hernández et al., 2017), 84% for cattle (three ingestive behaviours: chews, bites and composite chew-bites) (Chelotti et al., 2016) and 92.5–95.6% for black lemurs(Pozzi et al., 2009). Favaro demonstrated that ANNs are a powerful tool for studying goat kid contact calls.
DES-Pat: A novel DES pattern-based propeller recognition method using underwater acoustical sounds
2021, Applied AcousticsCitation Excerpt :However, it is known that the classification of the underwater acoustic dataset is very complex. These are the variable aspect range of the object to be detected underwater [33], the presence of natural or man-made foreign objects under water [34], the effects of latitude and longitude change [35], the reverberation problem [36,37], variable water temperature, water salinity, water depth. In addition, there is a variety of creatures and geometric shapes of objects outside the underwater target.
Data selection in frog chorusing recognition with acoustic indices
2020, Ecological InformaticsCitation Excerpt :That synthetic well-balanced dataset dose not match real-world case, where class proportions are not balanced. The data description in other works is similar, which does not explore data selection in their cases, either (Ganchev et al., 2015; González-Hernández et al., 2017; Potamitis et al., 2014; Stowell et al., 2018). Frog chorusing behaviours can be affected by the environmental changes, hence the data distribution can change frequently throughout the monitoring period.
Classification of underwater acoustical dataset using neural network trained by Chimp Optimization Algorithm
2020, Applied AcousticsCitation Excerpt :Classification of underwater acoustical dataset is challenging owing to several problems that include: changing the target signature with the changing of aspect angles, range, and grazing angle [1,2], challenging natural and man-made clutter [3], effects of latitude and longitude [4], highly variable and reverberant working environment [5,6], dependence on the water’s temperature, the salinity, the depth [7] and the lack of any pre-knowledge about the form and the geometry of the non-targets [8].
Environmental sound classification with dilated convolutions
2019, Applied AcousticsCitation Excerpt :Sound signal retrieval (SIR) as a hot issue has been widely discussed that people in many application areas. For example, in the classification of marine mammalian sounds, a marine mammal classification calculation model was proposed [1] to extract and classify the data out of the online marine animal sound database such that scientists are able to more accurately detect, identify and locate different endangered species and high-intensity anthropogenic sources that may cause damage to marine ecosystems; for identifying the aircrafts, researchers analyze the noises of their take-off [2]. The interested reader is referred to [3–5].