Elsevier

Applied Acoustics

Volume 119, April 2017, Pages 17-28
Applied Acoustics

Marine mammal sound classification based on a parallel recognition model and octave analysis

https://doi.org/10.1016/j.apacoust.2016.11.016Get rights and content

Highlights

  • Acoustics underwater analysis.

  • Marine mammals classification.

  • Pattern recognition using decision committee technique.

Abstract

The ocean is full of a variety of sounds from natural, biological or anthropogenic sources. Listening to the animals sounds allows scientists to detect, identify, and locate different endangered species as well as listening to high intensity anthropogenic sources, which could harm the marine ecosystem. In this work, a new computational model for marine mammals classification is presented and validated with data from an online database. The feature extraction is performed using 1/6 octave analysis and the classification is carried out based on an independent ensemble methodology, where the outputs of four parallel feed forward neural networks are combined to classify eleven possible classes (seven marine mammals plus four additional classes). Unlike similar works, this paper considers multiple sounds emitted by each species such as whistles, calls and squeaks. The model demonstrated favorable performance reaching a classification rate of 90% at a low computational cost.

Introduction

In nature many animals use sound communication to exchange information. For instance, in the aquatic environment, marine mammals, including whales, depend on sound for both social interactions and to locate prey. For example, the use of passive acoustics to detect and classify species in-situ provides a means of identifying a species in their habitat, reveals their behavior as well as the population density. Automatic classification of marine mammal sounds is perhaps the most challenging task in the field of animal bioacoustics due to the unknown statistical signal properties, as well as the use of different recording systems and low signal to noise ratio (SNR) conditions, among others. Such discrepancies often lead to sub-optimal system performance.

This work evaluates different architectures for automatic classification of eleven marine mammal species found in the Gulf of Mexico, which is home to a high diversity of organisms. The model proposed herein could be useful to monitor, reduce, and avoid some human activities which occur in areas inhabited by protected species.

In this paper the sounds belonging to the following species are classified as:

  • 1.

    Two mysticete cetaceans: Minke Whale (Balaenoptera acutostrata) and Humpback Whale (Megaptera novaeangliae).

  • 2.

    Five odontocete cetaceans: Killer Whale (Orcinus orca), False Killer Whale (Pseudorca crassidens), Atlantic Spotted Dolphin (Stenella frontalis), Common Bottlenose Dolphin (Tursiops truncatus) and Sperm Whale (Physeter macrocephalus).

  • 3.

    One sirenia: West Indian Manatee (Trichechus manatus).

This work considers multiple types of sounds emitted by each species including whistles, calls, squeaks, thumps, moans, and others. Therefore the identification of a given class is determined by the features extracted from any of these types of sounds, potentially reducing the time required to detect and classify the species. However, given the external conditions that the recording systems are exposed to, three more classes are included: natural (rain, bubbles, etc.), anthropogenic (vessel’s engines) and unknown (Fig. 1).

Currently there are multiple observatories, such as the international program “Listen to the Deep Ocean Environment” led by the Laboratory of Applied Bioacoustics of the Technical University of Catalonia [28] or the ALOHA observatory, operated by the University of Hawaii [2]. Thus, the amount of recordings has grown exponentially, which demonstrates the necessity for applying automatic methods to assess both on-site and off-site systems, in order to minimize manual interaction or supervision.

Underwater sounds are produced by a variety of natural sources, such as breaking waves, rain and marine life. It is also produced by a variety of man-made sources [23], [4], [18] such as ships and military sonars. Most sounds are relatively present everywhere in the ocean at all times. The background sound in the ocean is called ambient noise and is always present, whereby other sounds are only present during specific periods of time or places in the ocean.

Marine mammals [35] such as whales and dolphins produce sounds over a much wider frequency range, often outside the human hearing range. On the one hand, some large Baleen whales (mysticetes) produce sounds having frequencies lower than 10 Hz (below the human hearing range). On the other hand, dolphin echolocation clicks usually contain frequencies greater than 100 kHz (over the human hearing range). Other species also produce sounds such as the toadfish and drums, as well as some marine invertebrates like the snapping shrimp.

Passive acoustic classification is generally performed by the sonar operator. Presently, the final classification of a given sound is the responsibility of the operator [18], [41]. However, by having these automatic classification systems the operator may make decisions with more confidence and being able using his skills mainly to analyze the most important or complex sounds.

Most of the works related to automatic classification of the noise produced by ships have dealt with features extracted in the frequency domain using Fast Fourier Transform (FFT) power spectrum [50], [24], [27], auto-regressive modeling [16], [25] and wavelet transforms [16], [10]. Regarding species detection, early automatic techniques make use of matched filters, Hidden Markov Models, and spectrogram cross-correlation [12]. These methods are later improved by using machine learning approaches such as feedforward neural network classifiers [34], [39], [13], [33], [32], [40]. Other machine learning algorithms, such as classification and regression tree classifiers (CART), have also been implemented in recognizing contact calls made from the North Atlantic Right Whale [14], [15]. Improvements over single recognition methods have been obtained by using an advanced technique that combines several recognition methods running in parallel [14], [15], [40].

Whales are widely studied mostly due to their unique communication capabilities. Abousleiman et al. [1] have developed an algorithm to pre-process the sound before applying a tree based hierarchical classifier. The main goal is to determine whether a North Atlantic Right Whale is present or not. They perform this binary classification by identifying a unique call made by the whale known as “contact call” or “up-call”, achieving a success rate close to 85%. André et al. [3] and Zaugg et al. [53] detect cetacean emissions considering specific frequency bands, reaching a classification accuracy above 90%.

Existing schemes rely on the use of cepstral coefficients [9], [42], [37] as the input feature space used for capturing mostly pitch information on different vocalizations. Other approaches, such as auditory perception features, spectrograms, and frequency contours have been used as well [8], [19], [52], [51], [29].

PAMGuard is an open-source freely available suite of passive-acoustic monitoring software for marine mammals [20]. Oswald et al. [38] developed in ROCCA (currently incorporated in PAMGuard), which is an open source software package that measures 54 whistle contour features and is able to classify whistles of seven species and one genus: Globicephala macrorhynchus, Pseudorca crassidens, Steno bredanensis, Stenella attenuata, Stenella coeruleoalba, Stenella longirostris, Tursiops truncatus, and Delphinus species. The classifier deployed is based on Random Forest Analysis trained on 54 whistle contour features, yielding an overall successful classification score of 62%.

The proposed method is mainly built in two stages. First, an octave analysis is performed, which is widely used in acoustical analysis and audio signal processing. Although the signals to be classified are transient signals, these are still considered for feature extraction due to the frequency behavior. Second, a neural network model is used for identification of the eleven classes.

This paper is organized as follows. Section 2 gives a description of typical marine mammal signals, including a detailed explanation of the pre-processing, processing, and feature extraction process. Section 3 describes the neural model while experimental results are presented in Section 4. Finally, the conclusions are drawn in Section 5.

Section snippets

Spectral and temporal properties of marine mammal sounds

Social sounds of marine mammals are usually studied with a spectrographic analyzer, which determines the “instantaneous” frequency and relative amplitude of a signal as a time function, with the information usually plotted as a spectrogram. Many of the sounds emitted by marine mammals will have a pulse-like or burst-like property.

Sound emissions by odontocetes (toothed whales and dolphins) can be classified into two broad categories, frequency-varying continuous tonal sounds, referred to as

Neural model

Given the complexity of underwater signal and based on extensive experimental analysis a coarse-to-fine classification is used, i.e., each level from the hierarchy depicted in Fig. 1 is delegated to a single classifier. Four neural networks are used to perform the entire classification process. A single neural network does either a coarse or fine classification based on the corresponding features set (see Fig. 7), having as result a different output space. The output from all previous

Results and discussions

After having tested neural networks separately with various octave bands (1/3, 1/6 and 1/12), the classification rate per network using the test set is shown in Table 2. It can be seen that the optimal analysis is 1/6, because the 1/3 analysis fails to identify Delphinidae subclasses (O. orca, P. crassidens, S. frontalis and T. truncatus) and although the 1/12 analysis gives more resolution in frequency, it extracts twice the number of features, which increases the processing time without any

Conclusions and future work

In summary, some processing and pattern recognition techniques have been evaluated in order to improve the performance of sounds detection and classification from several marine mammals along with other common sounds found on passive recordings. Specifically, a computational model combining four parallel neural networks based on a decision module is proposed in this work.

Using parallel neural networks results in a more robust and effective classification model. The advantage of this methodology

References (53)

  • R.K. Andrew et al.

    Ocean ambient sound: comparing the 1960s with the 1990s for a receiver off the California coast

    Acoust Res Lett Online

    (2002)
  • ANSI S1.11-2004. Specification for octave-band and fractional-octave-band analog and digital filters;...
  • W.W.L. Au et al.

    Principles of marine bioacoustics

    (2008)
  • J.C. Brown et al.

    Automatic classification of killer whale vocalizations using dynamic time warping

    J Acoust Soc Am

    (2007)
  • J.C. Brown et al.

    Hidden Markov and Gaussian mixture models for automatic call classification

    J Acoust Soc Am, JASA Express Lett

    (2009)
  • C. Chen et al.

    Classification of underwater signals using wavelet transforms and neural networks

    Math Comput Model

    (1998)
  • C.W. Clark

    Acoustic behavior of mysticete whales

  • C.W. Clark et al.

    Quantitative analysis of animal vocal phonology: an application to swamp sparrow song

    Ethology

    (1987)
  • V.B. Deecke et al.

    Quantifying complex patterns of bioacoustic variation: use of a neural network to compare killer whale (Orcinus orca) dialects

    J Acoust Soc Am

    (1999)
  • Dugan PJ, Rice AN, Urazghildiiev IR, Clark CW. North Atlantic right whale acoustic signal processing: Part I....
  • Dugan PJ, Rice AN, Urazghildiiev IR, Clark CW. North Atlantic right whale acoustic signal processing: Part II. Improved...
  • Eom K, Wellman M, Srour N, Hillis D, Chellappa R. Acoustic target classification using multiscale methods. In:...
  • W.E. Evans

    Vocalization among marine animals

  • W.S. Filho et al.

    Preprocessing passive sonar signals for neural classification

    IET Radar Sonar Navig

    (2011)
  • D. Gillespie

    Detection and classification of right whale calls using an edge detector operating on a smoothed spectrogram

    J Can Acoust

    (2004)
  • D. Gillespie et al.

    PAMGUARD: semiautomated, open-source software for real-time acoustic detection and localization of cetaceans

    J Acoust Soc Am

    (2009)
  • Cited by (32)

    • A tristimulus-formant model for automatic recognition of call types of laying hens

      2021, Computers and Electronics in Agriculture
      Citation Excerpt :

      The average precision rates are 93.6 ± 1.7% (MFCCs-12+BPNN model) and 91.3 ± 1.7% (MFCCs-3+TF+BPNN model). Other similar animal sound recognition rates are the following: 98% for blue monkeys (2 call types: ‘pyow’ and ‘hack’ calls) (Mielke and Zuberbühler, 2013), 92% for geese (an average accuracy for 3 behaviours) and 84% (an average precision for 3 behaviours) (Steen et al., 2012), 80.4–92.5% for birds (Cheng et al., 2010), 90% for marine mammals (three call types: whistles, calls and squeaks) (González-Hernández et al., 2017), 84% for cattle (three ingestive behaviours: chews, bites and composite chew-bites) (Chelotti et al., 2016) and 92.5–95.6% for black lemurs(Pozzi et al., 2009). Favaro demonstrated that ANNs are a powerful tool for studying goat kid contact calls.

    • DES-Pat: A novel DES pattern-based propeller recognition method using underwater acoustical sounds

      2021, Applied Acoustics
      Citation Excerpt :

      However, it is known that the classification of the underwater acoustic dataset is very complex. These are the variable aspect range of the object to be detected underwater [33], the presence of natural or man-made foreign objects under water [34], the effects of latitude and longitude change [35], the reverberation problem [36,37], variable water temperature, water salinity, water depth. In addition, there is a variety of creatures and geometric shapes of objects outside the underwater target.

    • Data selection in frog chorusing recognition with acoustic indices

      2020, Ecological Informatics
      Citation Excerpt :

      That synthetic well-balanced dataset dose not match real-world case, where class proportions are not balanced. The data description in other works is similar, which does not explore data selection in their cases, either (Ganchev et al., 2015; González-Hernández et al., 2017; Potamitis et al., 2014; Stowell et al., 2018). Frog chorusing behaviours can be affected by the environmental changes, hence the data distribution can change frequently throughout the monitoring period.

    • Classification of underwater acoustical dataset using neural network trained by Chimp Optimization Algorithm

      2020, Applied Acoustics
      Citation Excerpt :

      Classification of underwater acoustical dataset is challenging owing to several problems that include: changing the target signature with the changing of aspect angles, range, and grazing angle [1,2], challenging natural and man-made clutter [3], effects of latitude and longitude [4], highly variable and reverberant working environment [5,6], dependence on the water’s temperature, the salinity, the depth [7] and the lack of any pre-knowledge about the form and the geometry of the non-targets [8].

    • Environmental sound classification with dilated convolutions

      2019, Applied Acoustics
      Citation Excerpt :

      Sound signal retrieval (SIR) as a hot issue has been widely discussed that people in many application areas. For example, in the classification of marine mammalian sounds, a marine mammal classification calculation model was proposed [1] to extract and classify the data out of the online marine animal sound database such that scientists are able to more accurately detect, identify and locate different endangered species and high-intensity anthropogenic sources that may cause damage to marine ecosystems; for identifying the aircrafts, researchers analyze the noises of their take-off [2]. The interested reader is referred to [3–5].

    View all citing articles on Scopus
    View full text