Functional clustering of mouse ultrasonic vocalization data

Xiaoling Dou; Shingo Shirahata; Hiroki Sugimoto

doi:10.1371/journal.pone.0196834

Abstract

Mouse ultrasonic vocalizations (USVs) are studied in many fields of science. However, various noise and varied USV patterns in observed signals make complete automatic analysis difficult. We improve several methods to reduce noise, detect USV calls and automatically cluster USV calls. After reduction of noise and detection of USV calls, we consider USV calls as functional data and characterize them as USV functions with B-spline basis functions. For discontinuous USV calls, breakpoints in the USV functions are defined using multiple knots in the construction of the B-spline basis functions, and a hierarchical method is used to cluster the USV functions by shape. We finally show the performance of the proposed methods with USV data recorded for laboratory mice.

Citation: Dou X, Shirahata S, Sugimoto H (2018) Functional clustering of mouse ultrasonic vocalization data. PLoS ONE 13(5): e0196834. https://doi.org/10.1371/journal.pone.0196834

Editor: Yael Abreu-Villaça, Universidade do Estado do Rio de Janeiro, BRAZIL

Received: June 24, 2017; Accepted: April 21, 2018; Published: May 9, 2018

Copyright: © 2018 Dou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are contained within the paper and its Supporting Information files.

Funding: This work was supported by Transdisciplinary Research Integration Center, Research Organization of Information and Systems, Japan (Dr. Xiaoling Dou) (http://www.rois.ac.jp).

Competing interests: The authors have declared that no competing interests exist.

Introduction

Mice are known to emit ultrasonic vocalizations (USVs) in many types of social behaviors, and the USVs seem to differ depending on a variety of contextual determinants; e.g., the age, genetic background and behavioral state affect the vocalization rate, syntax, frequency and duration [1]. This suggests that the USVs of mice convey information in social communication and this makes them important ([2], [3], [4], [5], [6], [7], [8]). It is also known that mice USVs contain different USV call (or syllable) types ([1], [9], [10], [11]). The importance of different patterns of USV calls has recently been reported; e.g., USVs emitted by males were shown to attract females in a playback test. It has been shown that female mice preferentially approach specific types of USVs emitted by males [12] and tend to approach USVs of a type that is emitted by a strain different from their own parents’ strain [13]. However, potentially important factors (e.g., the number of specific USV call types and the specific sequential emission pattern of call types) within different patterns of USVs used for social communication are still unclear. Clarifying differences among USV call types would help address this issue. The detailed characterization of USV call types requires many steps including recording, call detection, call type classification and clustering. These steps are taken using commercially available software. In these steps, call type classification and clustering are still time-consuming because noise reduction and pattern classification are done manually in most cases ([10], [12]). Currently available software (e.g., Avisoft Bioacoustics and the ISOMAP analysis method of [9]) can calculate and display spectrograms and allow automatic signal detection. However, suitable methods for the automatic classification and clustering of USV calls have not yet been developed.

We are interested in the variety of shapes of USV calls and we consider clustering call types. We take USV records from six laboratory male mice of two strains. The USV records contain not only USV phrases of syllables but also background noise, sounds of gnawing, squeaks (also called noisy syllables in [1]), movement noise and bedding noise of mice. USV calls appear in a frequency range higher than 30kHz and have several syllable types. Background noise can be considered uniformly distributed across all frequencies. The other sounds usually appear at frequencies lower than the frequencies of USV calls, but some spike into the range of USV frequencies. This was also pointed out by [9]. To focus on USV calls, we consider all other sounds as noise, estimate the weighted dominant frequency and detect USV calls.

To classify calls into subsets, [9] suggested considering each call as a function of time and clustering USV calls into several types by frequency jumps, but their analysis failed to cluster USV calls into subtypes. There was thus still a need for a new method of further subtype clustering. Similarly, [14] proposed a series of methods for handling continuous nonharmonic mouse USV data. The methods include procedures that reduce noise, find the signal of the calls and group continuous USV calls through functional clustering. Because USV calls are more often discontinuous in some datasets, we propose in this paper a method that employs B-spline basis functions to characterize USV calls as USV functions. Using this method, we can analyze both continuous and discontinuous nonharmonic USV calls. This overcomes the limitation of the orthogonal polynomial method introduced in [14] and reveals more subtypes of syllables with frequency jumps. We also make improvements to noise reduction; these can be considered a generalization of the method given in the same paper.

We consider nonharmonic USV call signals as functional data and represent the segments of frequencies as one-dimensional functions of time. Functional data analysis was proposed by [15, 16] and has been further developed and applied in many fields [17, 18]. Some functional clustering methods have been proposed for the unsupervised learning of functional data. As an example, [19] proposed a clustering method based on Gaussian mixture distributions for sparsely sampled functional data. [20] developed k-center functional clustering with functional principal component analysis. A two-stage clustering method was proposed by [21]; this method allows B-spline basis functions to be fitted to functional data and coefficient vectors to be partitioned through k-means clustering. We here propose a method similar to that in [21], using two-stage functional clustering with B-splines. However, our method is suitable for discontinuous curves. Furthermore, we apply Ward’s method for hierarchical clustering.

The paper is organized as follows. We first introduce our experiments and USV datasets. We then review the method of reducing noise and propose a weighted-frequency method to calculate frequency for each frame when detecting USV calls. After noise reduction, we define the USV calls as functions employing a B-spline method and cluster them employing a functional clustering method. We finally demonstrate our methods with six real USV datasets and present conclusions.

Materials and methods

In experiments on testing mouse social behavior, we recorded USV data for three male mice of the strain BALB/cAnN and three male mice of the strain C57BL/6JJcl. The details of the experiments and the information of USV frames produced by fast Fourier transform of the data are given as follows.

Experiments and datasets

Ethics statement

Mice were maintained in accordance with National Institute of Genetics (NIG) guidelines. This study was carried out in strict accordance with the recommendations in the Guidelines for Proper Conduct of Animal Experiments of the Science Council of Japan. The protocol was approved by the Institutional Committee for Animal Care and Use of the National Institute of Genetics (Permit Numbers: 21-14).

Animals

Ultrasound emissions from male mice C57BL/6JJcl were recorded during male–female interaction. C57BL/6JJcl were purchased from CLEA Japan, Inc. (Tokyo, Japan). Mice were bred at the NIG and used in the experiments. All animals were kept at the NIG under a 12-h light/dark cycle (light from 8:00 to 20:00) in a temperature-controlled room.

Apparatus

An ultrasound microphone (CM16/CMPA Condenser ultrasound microphone, Avisoft Bioacoustics) and recorder (UltraSoundGate 116H, Avisoft Bioacoustics) were used for recording. The microphone was positioned approximately 10 cm above a cage that contained the mice ([12]).

Recording procedure

A 15-week-old male C57BL/6JJcl was paired with a 15-week-old female C57BL/6JJcl for 1 week. Two days before the test day, the male mouse was housed individually. Another 15-week-old female C57BL/6JJcl was injected with pregnant mare serum gonadotropin to control its sexual cycle. On the test day, the male mouse was transferred to a small cage (12 cm ×20 cm) with wood chip bedding, and the second female mouse was then introduced into the small cage. Immediately after the female had been introduced, the recording of sound began. The recording was performed during the early part of the dark phase, 20:00–24:00 pm. Ultrasound data for BALB/cAnN were obtained from [12].

Datasets and fast Fourier transform

The six raw datasets recorded for the two mouse strains were transformed by fast Fourier transform into six sequences of USV frames as follows. Each frame had a duration of 0.85 ms. Each USV frame overlapped 50% with the next frame, and is represented by 256 samples taken at a sampling rate of 300kHz; i.e., the highest measurable frequency (the Nyquist frequency) was 150kHz, which is half the sampling frequency, and the time between adjacent frames was 0.425 ms. Hence for each frame, 126 squared amplitudes (or powers) were computed for the sampling frequencies. The durations and transformed matrix sizes of the datasets are listed in Table 1. Our analysis uses the obtained matrices.

Download:

Table 1. Durations and the transformed matrix sizes of the datasets.

https://doi.org/10.1371/journal.pone.0196834.t001

Each dataset is expressed by a matrix X = {x_i,j}, where i = 1, …, T and j = 1, …, F are for time and frequency, respectively. The entries, x_i,j, are intensities corresponding to the level of vocalization.

Noise reduction

A noise reduction process has been proposed for mouse USV data [14]. The process involves three steps: taking a moving average, determining the minimum USV call intensity and separating the USV call signals from the matrix. This subsection briefly reviews these three steps and then makes an improvement to the third step.

The first step is to use a small matrix of size m × n to calculate the two-dimensional moving average of the original matrix data. This gives the moving-average matrix where with m and n respectively indicating time and frequency. Moving averages are often used in time series to model the trend of data. The purpose of the moving-average step here is to reduce perturbation in the signals, which helps in distinguishing noise from the voice signal and finding the trend of the USV signals.

The second step is to choose a threshold intensity, according to the noise level, and to set all entries smaller than the threshold in the moving-average matrix to zero. This effectively reduces the background noise in the experiment. Low-frequency noise and noise spiking from low frequency to high frequency can be removed by setting the intensities at those frames to zero.

The third step is to detect USV calls. [14] chose the frequency corresponding to the maximum intensity for each frame , i = 1, ⋯, T − m + 1. When a frame has more than one maximum, the median frequency among those with maximum intensity is chosen.

Choosing the frequency from the maximum intensity may result in insufficiently smoothed USV curves. We here propose using a weighted frequency as a generalization of using the maximum intensity. For time i, the weighted frequency can be defined by (1) where is a vector containing elements of larger than the 100p% quantile. We see that the frequency is thereby weighted by the ratio of in the largest 100(1 − p)% quantile intensities in .

This produces a sequence of frequency in time. The sequence contains USV calls and interspaces of different durations. To detect the USV calls from the sequence, we need to determine the interspaces between adjacent USV calls. [14] proposed determining the length of interspaces by considering the total number of USV calls for different lengths of interspace. Choosing a threshold for interspacing that is too short may separate the sequence into too many segments; in contrast, choosing a threshold that is too long may merge distinct USV calls into a long segment. A plot of the total number of USV calls for different lengths of interspace can suggest the proper length of interspace, at which the total number of USV calls becomes stable. The obtained length of interspace, which we will call l, is the shortest length of the interspaces between adjacent frequency USV calls. That is to say, if two adjacent frequency points are separated by l or more units of time, we judge that they are from distinct USV calls; otherwise they are considered to be part of the same USV call.

From this, we obtain USV calls with different lengths. Since very short USV calls may be noise, we remove these short “USV calls” to conclude the noise reduction process.

Definition of functional data

Our purpose is to classify the USV calls by shape. We consider the USV calls as functional data and define them as functions. Many methods are available for estimating USV functions, such as the use of Chebyshev polynomials [22], kernel methods, and wavelet methods. Because almost all USV calls from the BALB/cAnN mouse are continuous, [14] proposed the use of Chebyshev polynomials to model the USV calls of the BALB/cAnN mouse. However, this method cannot be applied to discontinuous USV calls.

Because many USV calls from the C57BL/6JJcl mice are not continuous, they may contain several jumps or breakpoints. We prefer the B-spline basis function method [23] for this reason. The B-spline basis function method is one of the most popular methods of representing data as functions. Libraries of code for working with B-spline functions are available in many programming languages, including R, S-Plus and MATLAB. In this paper, we use the function bs() in the splines package for R.

A B-spline basis function is a spline function defined by its order and a sequence of knots. Let (2) be m + 2d + 1 knots on [0, 1]. We denote by β_k,d(t) the kth B-spline basis function of order d for the knot sequence in (2). β_k,d(t) is then defined recursively in terms of divided differences: with the initial condition

This process generates P(≔ d + m) B-spline basis functions. We know that any linear combination of a set of B-spline basis functions is still a spline function that is a piecewise polynomial in which the adjacent polynomials join smoothly at the knots. An essential property of B-splines is that knot duplication reduces continuity. If we duplicate an interior knot in the construction of the knot sequence, and generate the B-spline sequence, then the linear combination of B-splines has one less continuous derivative at the duplicated knot. This means that constructing functions with breakpoints is straightforward.

Assume that we have N USV calls. There are n_i pairs of (t_i,j, f_i,j), j = 1, ⋯, n_i, in the ith USV call, i = 1, ⋯, N. To define a spline, we first specify the degree of the B-splines and knots over the interval where the function is to be approximated. These choices determine the goodness of fit of the functions. Cubic splines (d = 3) and quantiles of are typical choices for piecewise polynomials and interior knots, respectively. Without getting deeply involved in model selection, we suggest setting single interior knots at the same quantiles for both continuous and discontinuous USV calls. For discontinuous USV calls, we specify a constant κ as the threshold. When |f_{i,j − 1} − f_i,j| > κ, the curve is considered discontinuous at time j, and j is called a breakpoint. To construct a jump at a breakpoint in the regression function, we put d + 1 knots at the breakpoint. Here, d is the degree of the B-spline functions. This allows us to obtain the same length of coefficient vectors for the USV calls with the same number of breakpoints.

For all USV calls, we assume that or in matrix form, where , respectively indicate time and frequency. is then the set of B-spline basis functions with degree d. Using the least-squares method, for each USV call i, we obtain the regression function with coefficient vector

Hence, for each curve, we obtain a coefficient vector of length P + 1 = d + m + 1.

Functional clustering

We propose a two-step clustering method for splitting the obtained USV curves into clusters. In the first step, we group the curves by the number of breakpoints. Curves with the same number of breakpoints have the same length of coefficient vectors and should be gathered into the same cluster. These clusters are called the first-level clusters, or groups.

In the second step, we separate the curves in each first-level cluster by shape. This process is performed by clustering the coefficient vectors, using standard multivariate clustering methods, such as Ward’s method.

Results

Analysis of BALB/cAnN data

Our first example is a dataset (S1 Data) from a BALB/cAnN male mouse, BALB/cAnN 1752. The intensity level range is [5, 5330]. To reduce noise, we first apply the moving-average method with a small matrix of size 15 × 3. We then set intensities less than 200 to 0. Because almost all USV calls are at a frequency larger than 40 × 1172Hz, signals in this range are considered. By considering the largest 5% quantile intensities at each frame, a sequence of weighted frequency (1) over time is obtained. Fig 1 shows part of the original data and the results of the noise reduction method in the first two panels. Fig 2 shows that the total number of USV calls becomes stable when the length of the interspace between two adjacent USV calls is 30. We thus set 30 × 0.425 ms as the shortest length possible between adjacent USV calls. After deleting 33 short “USV calls” (shorter than 15 × 0.425 ms = 6.375 ms), we obtain 393 USV calls for analysis. Fig 2 reproduces the result obtained using the maximum-frequency method in [14]. The figure shows that the weighted frequency method achieves the same result as the maximum frequency method and can be considered a generalization of the maximum frequency method.

Download:

Fig 1. The first panel shows a part of the original data (log transformed).

After the moving average is taken and background noise is reduced, we obtain the sequence of USV signals using weighted frequencies in the second panel and the USV functions generated by the B-spline method in the last panel.

https://doi.org/10.1371/journal.pone.0196834.g001

Download:

Fig 2. The shortest length of interval between adjacent USV calls can be determined from the change in the total number of USV calls while adjusting the length.

https://doi.org/10.1371/journal.pone.0196834.g002

We regard the obtained signals of USV calls as functional data and use the B-spline regression to define them as functions. Because almost all USV patterns are continuous, we set eight equally spaced interior knots on the domain of each USV curve. To define breakpoints, we set κ = 8. We then add four multiple knots for each breakpoint and fit the signals with cubic B-spline basis functions. From this, we finally obtain 386 continuous USV functions, six USV functions with one breakpoint, and one USV function with two breakpoints. The obtained functions for the USV call examples are given in the last panel of Fig 1. A histogram of the roots of mean squared errors (RMSEs) for the 393 USV curves is presented in Fig 3. The mean and variance of the RMSEs are respectively 0.143 and 0.024.

Download:

Fig 3. Histogram of RMSEs for fitting 393 USV curves.

The mean of the RMSEs is 0.143, while the variance is 0.024.

https://doi.org/10.1371/journal.pone.0196834.g003

The final step is functional clustering. The 393 functions are first classified by their number of break points into three groups: continuous curves (Group I), discontinuous curves with one breakpoint (Group II) and discontinuous curves with two breakpoints (Group III).

For continuous curves, we use Ward’s [24] clustering criterion to obtain a dendrogram, as shown in Fig 4. We can divide the 386 continuous curves in Group I into three or five clusters. Because we consider only the shapes of the curves, all the curves are plotted in the interval of time [0, 1]. In the case of splitting Group I into five clusters, we have five types of syllable shapes: downward, flat, mound, upward, and mound-upward (see examples in Fig 5). Group II can be divided into two clusters as jump-up and jump-down shapes (Fig 6). Only one discontinuous USV function has two breakpoints in Group III; this is shown in the last panel of Fig 6. Examples for Groups II and III are given in Fig 7.

Download:

Fig 4. Cluster dendrogram and clusters obtained by clustering the coefficient matrix of continuous USV functions for mouse BALB/cAnN 1752 using Ward’s method.

https://doi.org/10.1371/journal.pone.0196834.g004

Download:

Fig 5. Examples of the five clusters in Group I.

https://doi.org/10.1371/journal.pone.0196834.g005

Download:

Fig 6. Cluster dendrogram and clustering of discontinuous USV functions with one breakpoint for mouse BALB/cAnN 1752 are shown in the first three panels.

The USV function with two breakpoints is shown in the last panel.

https://doi.org/10.1371/journal.pone.0196834.g006

Download:

Fig 7. Examples of the two clusters in Group II and the USV call in Group III.

https://doi.org/10.1371/journal.pone.0196834.g007

USV datasets from the other two BALB/cAnN mice are analyzed in the same way, except that the threshold intensity is chosen as 130 because of the low intensity ranges of the datasets. From data of mouse BALB/cAnN 1563, we detected 25 USV calls, among which 24 are continuous and one is discontinuous. The calls are clustered in S1 File of Supporting information. From the data of mouse BALB/cAnN 1565, we obtain 189 continuous USV calls and four discontinuous USV calls. The clustering is given in S2 File of Supporting information.

Analysis of C57BL/6JJcl data

This subsection makes a detailed analysis with a dataset (S4 Data) for a C57BL/6JJcl male mouse, C57BL/6JJcl 2358. Different strains of mice produce different types of USV calls. The first panel of Fig 8 shows some USV calls of this mouse. We see that some USV curves are continuous; however, many USV curves contain jumps. Considering the intensity level ([0, 32766]), we choose the largest 2% quantile intensities at each frame and obtain 430 USV calls, as seen in Fig 9. For this dataset, we put only one interior knot at the middle of the interval for each curve, and set κ = 5 as the threshold with which to define breakpoints. Using the same method as the first dataset, we reduce the noise and delete 56 short “USV calls.” Finally, 374 USV functional data items are used. Fig 10 presents a histogram of RMSEs of fitting. The mean and variance of the RMSEs are respectively 0.546 and 0.118. According to the number of breakpoints, we divide all USV calls into five groups (Table 2). The curves in the first four groups are then separated further by shape. Cluster dendrograms and clusters are obtained by clustering the coefficient matrix of USV functions of each group using Ward’s method.

Download:

Fig 8. The first panel shows some of the data (log transformed) for mouse C57BL/6JJcl 2358.

The second panel is the result after noise reduction. The last panel shows the USV curves obtained using the B-spline basis function method.

https://doi.org/10.1371/journal.pone.0196834.g008

Download:

Fig 9. Determining the interval length between adjacent USV calls.

https://doi.org/10.1371/journal.pone.0196834.g009

Download:

Fig 10. Histogram of RMSEs for the fitting of 374 USV curves.

The mean of the RMSEs is 0.546 and the variance is 0.118.

https://doi.org/10.1371/journal.pone.0196834.g010

Download:

Table 2. Three-hundred and seventy-four USV curves are divided into five groups by the number of breakpoints.

https://doi.org/10.1371/journal.pone.0196834.t002

Fig 11 shows that the continuous USV functions can be split into two clusters (flat or downward, upward). Their examples are given in Fig 12. Fig 13 presents the upward and downward clusters and two outliers of USV curves with one breakpoint (see examples in Fig 14). USV functions with two breakpoints can be separated into four clusters— upward, downward, concave and convex— as shown in Figs 15 and 16. Figs 17 and 18 present dendrograms and four clusters of the USV curves with three breakpoints. We do not separate curves in Group V. Curves in Group V contain four or more breakpoints and have complicated shapes. We have confirmed that most of them are obtained from harmonic USV calls, which should not be characterized by one-dimensional functions.

Download:

Fig 11. Cluster dendrogram obtained for continuous functions and two clusters.

https://doi.org/10.1371/journal.pone.0196834.g011

Download:

Fig 12. Examples of the two clusters in Group I.

https://doi.org/10.1371/journal.pone.0196834.g012

Download:

Fig 13. Clustering of discontinuous USV functions with one breakpoint for mouse C57BL/6JJcl 2358.

https://doi.org/10.1371/journal.pone.0196834.g013

Download:

Fig 14. Examples of the four clusters in Group II.

https://doi.org/10.1371/journal.pone.0196834.g014

Download:

Fig 15. Clustering of discontinuous USV functions with two breakpoints for mouse C57BL/6JJcl 2358.

https://doi.org/10.1371/journal.pone.0196834.g015

Download:

Fig 16. Examples of the four clusters in Group III.

https://doi.org/10.1371/journal.pone.0196834.g016

Download:

Fig 17. Cluster dendrogram for discontinuous functions with three breakpoints and four clusters.

https://doi.org/10.1371/journal.pone.0196834.g017

Download:

Fig 18. Examples of the four clusters in Group IV.

https://doi.org/10.1371/journal.pone.0196834.g018

S3 File of Supporting information shows the groups and clusters of USV functions obtained for mouse C57BL/6JJcl 2396. Among the 225 USV functions in total, 93 are continuous. Most USV calls from mouse C57BL/6JJcl 2420 are harmonic. In this case, modeling USV calls as one-dimensional functions is not appropriate, and jumps on the obtained curves are unstable. If we forcibly analyze the data using the same method, the method works but the result is less reliable. An analysis result is presented in S4 File of Supporting information. In the result, 839 USV functions are obtained. Among them, 270 are continuous.

From the data analysis of the two mouse strains, we also see a difference in the proportions of discontinuous USV functions. The proportions of discontinuous USV functions given by the six mice are as follows.

Because the six mice can be considered independent, the Mann–Whitney test (also called the Wilcoxon rank-sum test) statistic U = 0 calculated from Table 3 suggests that with a one-tailed test at a significance level of 0.05 (Statistical Table 8.2 (2), [25]), USV syllables emitted by mice of strain BALB/cAnN have fewer breakpoints than those emitted by C57BL/6JJcl.

Download:

Table 3. Proportions of discontinuous USV functions in the six datasets.

https://doi.org/10.1371/journal.pone.0196834.t003

Discussion

We proposed a functional clustering method for mouse USV data as well as a two-dimensional moving average method and a method for weighting frequency to reduce noise. We defined USV syllables as functions with B-spline basis functions and used multiple knots to define breakpoints for discontinuous USV curves. USV functions with the same number of breakpoints were grouped together and then clustered further by shape.

The methods were proposed for nonharmonic USV calls. It is difficult to use these methods to characterize and classify harmonic USV calls because we define these USV calls as one-dimensional functions. A relatively large number of breakpoints in a USV curve suggests that the curve may be from a harmonic USV call.

In data analysis, we empirically specified some parameters, such as the size of the matrix to use for the moving average, the minimum USV intensity, the quantile for large intensity p and the threshold for jump κ. The size of the matrix for the moving average was chosen so as to distinguish the USV signals from noise. The minimum of the USV intensity depends on the level of noise and the experimental environment. In our analysis, it was a larger value of x_i,j at which there is no USV call. The quantile for large intensity was selected so as to avoid extra jumps in the USV curves. In representing the USV functional data by B-spline basis functions, we proposed the use of more interior knots and a higher breakpoint threshold κ for continuous USV calls and the use of fewer interior knots and a lower breakpoint threshold for discontinuous calls. A higher number of interior knots allows the expression of more detailed characteristics of the USV curves and results in good clustering for small differences. However, using fewer interior knots allows characterization of the features of the USV curves simply and the good classification of USV calls with simple shapes. Hence, to cluster USV calls in a broad way, choosing fewer interior knots is better. Finally, the breakpoint threshold κ should be chosen by considering the continuity of USV calls and the sizes of all jumps. As we consider frequency jumps in clustering USV functions, identifying the difference in the proportion of discontinuous USV functions is useful in characterizing the USV calls of different mouse strains.

Supporting information

S1 R Code. R module (usv.R) contains code with which to perform the methods described in the article.

We provide eight functions in file usv.R. Three of the functions are for reducing noise, defining USV functional data and clustering USV functions by shape. The other five functions are used to plot figures of original USV data, detected USV signals, USV functions, figures for finding the minimum interspace of adjacent USVs and histograms of RMSEs. File usv.R contains code with which to demonstrate the methods and to show the examples and results in the paper.

https://doi.org/10.1371/journal.pone.0196834.s001

(R)

S1 Data. Mouse USV dataset balb1752.txt.

https://doi.org/10.1371/journal.pone.0196834.s002

(ZIP)

S2 Data. Mouse USV dataset balb1563.txt.

https://doi.org/10.1371/journal.pone.0196834.s003

(ZIP)

S3 Data. Mouse USV dataset balb1565.txt.

https://doi.org/10.1371/journal.pone.0196834.s004

(ZIP)

S4 Data. Mouse USV dataset B6_2358.txt.

https://doi.org/10.1371/journal.pone.0196834.s005

(ZIP)

S5 Data. Mouse USV dataset B6_2396.txt.

https://doi.org/10.1371/journal.pone.0196834.s006

(BZ2)

S6 Data. Mouse USV dataset B6_2420.txt.

https://doi.org/10.1371/journal.pone.0196834.s007

(BZ2)

S1 File. Analysis result of dataset balb1563.txt.

https://doi.org/10.1371/journal.pone.0196834.s008

(PDF)

S2 File. Analysis result of dataset balb1565.txt.

https://doi.org/10.1371/journal.pone.0196834.s009

(PDF)

S3 File. Analysis result of dataset B6_2396.txt.

https://doi.org/10.1371/journal.pone.0196834.s010

(PDF)

S4 File. Analysis result of dataset B6_2420.txt.

https://doi.org/10.1371/journal.pone.0196834.s011

(PDF)

Acknowledgments

The authors are grateful to the Editor, Academic Editor and the two reviewers for their helpful comments which improve the presentation of the paper. The authors also express their sincere thanks to Professor Tsuyoshi Koide of National Institute of Genetics for his help with the experiments and valuable advice on the paper.

References

1. Heckman J, McGuinness B, Celikel T, Englitz B. Determinants of the mouse ultrasonic vocal structure and repertoire. Neuroscience and Biobehavioral Reviews. 2016;65: 313–325. pmid:27060755
- View Article
- PubMed/NCBI
- Google Scholar
2. Smotherman WP, Bell RW, Starzec J, Elias J, Zachman TA, Maternal responses to infant vocalizations and olfactory cues in rats and mice. Behav Biol. 1974;12: 55–66. pmid:4429513
- View Article
- PubMed/NCBI
- Google Scholar
3. Maggio JC, Whitney G. Ultrasonic vocalizing by adult female mice (Mus musculus). J Comp Psychol. 1985;99: 420–436. pmid:4075780
- View Article
- PubMed/NCBI
- Google Scholar
4. Hahn ME, Lavooy MJ. A review of the methods of studies on infant ultrasound production and maternal retrieval in small rodents. Behav Genet. 2005;35: 31–52. pmid:15674531
- View Article
- PubMed/NCBI
- Google Scholar
5. Nakatani J, Tamada K, Hatanaka F, Ise S, Ohta H, Inoue K, et al. Abnormal behavior in a chromosome-engineered mouse model for human 15q11-13 duplication seen in autism. Cell. 2009;137: 1235–1246. pmid:19563756
- View Article
- PubMed/NCBI
- Google Scholar
6. Scattoni ML, Crawley J, Ricceri L. Ultrasonic vocalizations: a tool for behavioural phenotyping of mouse models of neurodevelopmental disorders. Neuroscience & Biobehavioral Reviews. 2009;33: 508–515.
- View Article
- Google Scholar
7. Scattoni ML, Ricceri L, Crawley JN. Unusual repertoire of vocalizations in adult BTBR T+tf/J mice during three types of social encounters. Genes Brain Behav. 2011;10: 44–56. pmid:20618443
- View Article
- PubMed/NCBI
- Google Scholar
8. Curry T, Egeto P, Wang H, Podnos A, Wasserman D, Yeomans J. Dopamine receptor D2 deficiency reduces mouse pup ultrasonic vocalizations and maternal responsiveness. Genes Brain Behav. 2013;12: 397–404. pmid:23521753
- View Article
- PubMed/NCBI
- Google Scholar
9. Holy TE, Guo Z. Ultrasonic songs of male mice. PLoS Biology. 2005;3(12): 2177–2086.
- View Article
- Google Scholar
10. Scattoni ML, Gandhy SU, Ricceri L, Crawley JN. Unusual repertoire of vocalizations in the BTBR T+tf/J mouse model of autism. PloS ONE. 2008;3(8): e3067. pmid:18728777
- View Article
- PubMed/NCBI
- Google Scholar
11. Grimsley JMS, Monaghan JJM, Wenstrup JJ. Development of social vocalizations in mice. PLoS ONE. 2011;6(3): e17460. pmid:21408007
- View Article
- PubMed/NCBI
- Google Scholar
12. Sugimoto H, Okabe S, Kato M, Koshida N, Shiroishi T, Mogi K, et al. A role for strain differences in waveforms of ultrasonic vocalization during male–female interaction. PLoS ONE. 2011;6(7): e22093. pmid:21818297
- View Article
- PubMed/NCBI
- Google Scholar
13. Asaba A, Okabe S, Nagasawa M, Kato M, Koshida N, Osakada T, et al. Developmental social environment imprints female preference for male song in mice. PLoS ONE. 2014;9: e87186. pmid:24505280
- View Article
- PubMed/NCBI
- Google Scholar
14. Dou X, Shirahata S, Sugimoto H, Koide T. Noise reduction and classification of mouse ultrasonic vocalization data. ASTE Special Issue on the “Financial & Pension Mathematical Science”. 2015;B12: 71–79.
- View Article
- Google Scholar
15. Ramsay JO, Silverman BW. Functional data analysis. Springer; 1997.
16. Ramsay JO, Silverman BW. Functional data analysis. 2nd ed. Springer; 2006.
17. Ferraty T, Vieu P. Nonparametric functional data analysis. Springer; 2006.
18. Ramsay JO, Silverman BW. Applied functional data analysis. Springer; 2002.
19. James GM, Sugar C. Clustering for sparsely sampled functional data. Journal of the American Statistical Association. 2003;98: 397–408.
- View Article
- Google Scholar
20. Chiou JM, Li PL. Functional clustering and identifying substructures of longitudinal data. J R Statist Soc. 2007;B69: 679–699.
- View Article
- Google Scholar
21. Abraham C, Cornillon PA, Matzner-Lober E, Molinari N. Unsupervised curve clustering using B-splines. Scand J Stat. 2003;30: 581–595.
- View Article
- Google Scholar
22. Rivlin TJ. Chebyshev polynomials—From approximation theory to algebra and number theory. 2nd ed. John Wiley & Sons; 1990.
23. de Boor C. A practical guide to splines. Revised Edition. Springer; 2000.
24. WARD JH. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association. 1963,58: 236–244.
- View Article
- Google Scholar
25. Sani F, Todman J. Experimental design and statistics for psychology. Wiley Online Library; 2006.

[ref1] 1. Heckman J, McGuinness B, Celikel T, Englitz B. Determinants of the mouse ultrasonic vocal structure and repertoire. Neuroscience and Biobehavioral Reviews. 2016;65: 313–325. pmid:27060755
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Smotherman WP, Bell RW, Starzec J, Elias J, Zachman TA, Maternal responses to infant vocalizations and olfactory cues in rats and mice. Behav Biol. 1974;12: 55–66. pmid:4429513
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Maggio JC, Whitney G. Ultrasonic vocalizing by adult female mice (Mus musculus). J Comp Psychol. 1985;99: 420–436. pmid:4075780
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Hahn ME, Lavooy MJ. A review of the methods of studies on infant ultrasound production and maternal retrieval in small rodents. Behav Genet. 2005;35: 31–52. pmid:15674531
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Nakatani J, Tamada K, Hatanaka F, Ise S, Ohta H, Inoue K, et al. Abnormal behavior in a chromosome-engineered mouse model for human 15q11-13 duplication seen in autism. Cell. 2009;137: 1235–1246. pmid:19563756
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Scattoni ML, Crawley J, Ricceri L. Ultrasonic vocalizations: a tool for behavioural phenotyping of mouse models of neurodevelopmental disorders. Neuroscience & Biobehavioral Reviews. 2009;33: 508–515.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref7] 7. Scattoni ML, Ricceri L, Crawley JN. Unusual repertoire of vocalizations in adult BTBR T+tf/J mice during three types of social encounters. Genes Brain Behav. 2011;10: 44–56. pmid:20618443
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref8] 8. Curry T, Egeto P, Wang H, Podnos A, Wasserman D, Yeomans J. Dopamine receptor D2 deficiency reduces mouse pup ultrasonic vocalizations and maternal responsiveness. Genes Brain Behav. 2013;12: 397–404. pmid:23521753
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Holy TE, Guo Z. Ultrasonic songs of male mice. PLoS Biology. 2005;3(12): 2177–2086.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref10] 10. Scattoni ML, Gandhy SU, Ricceri L, Crawley JN. Unusual repertoire of vocalizations in the BTBR T+tf/J mouse model of autism. PloS ONE. 2008;3(8): e3067. pmid:18728777
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref11] 11. Grimsley JMS, Monaghan JJM, Wenstrup JJ. Development of social vocalizations in mice. PLoS ONE. 2011;6(3): e17460. pmid:21408007
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref12] 12. Sugimoto H, Okabe S, Kato M, Koshida N, Shiroishi T, Mogi K, et al. A role for strain differences in waveforms of ultrasonic vocalization during male–female interaction. PLoS ONE. 2011;6(7): e22093. pmid:21818297
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref13] 13. Asaba A, Okabe S, Nagasawa M, Kato M, Koshida N, Osakada T, et al. Developmental social environment imprints female preference for male song in mice. PLoS ONE. 2014;9: e87186. pmid:24505280
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref14] 14. Dou X, Shirahata S, Sugimoto H, Koide T. Noise reduction and classification of mouse ultrasonic vocalization data. ASTE Special Issue on the “Financial & Pension Mathematical Science”. 2015;B12: 71–79.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref15] 15. Ramsay JO, Silverman BW. Functional data analysis. Springer; 1997.

[ref16] 16. Ramsay JO, Silverman BW. Functional data analysis. 2nd ed. Springer; 2006.

[ref17] 17. Ferraty T, Vieu P. Nonparametric functional data analysis. Springer; 2006.

[ref18] 18. Ramsay JO, Silverman BW. Applied functional data analysis. Springer; 2002.

[ref19] 19. James GM, Sugar C. Clustering for sparsely sampled functional data. Journal of the American Statistical Association. 2003;98: 397–408.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref20] 20. Chiou JM, Li PL. Functional clustering and identifying substructures of longitudinal data. J R Statist Soc. 2007;B69: 679–699.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref21] 21. Abraham C, Cornillon PA, Matzner-Lober E, Molinari N. Unsupervised curve clustering using B-splines. Scand J Stat. 2003;30: 581–595.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref22] 22. Rivlin TJ. Chebyshev polynomials—From approximation theory to algebra and number theory. 2nd ed. John Wiley & Sons; 1990.

[ref23] 23. de Boor C. A practical guide to splines. Revised Edition. Springer; 2000.

[ref24] 24. WARD JH. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association. 1963,58: 236–244.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref25] 25. Sani F, Todman J. Experimental design and statistics for psychology. Wiley Online Library; 2006.

Figures

Abstract

Introduction

Materials and methods

Experiments and datasets

Noise reduction

Definition of functional data

Functional clustering

Results

Analysis of BALB/cAnN data

Analysis of C57BL/6JJcl data

Discussion

Supporting information

S1 R Code. R module (usv.R) contains code with which to perform the methods described in the article.

S1 Data. Mouse USV dataset balb1752.txt.

S2 Data. Mouse USV dataset balb1563.txt.

S3 Data. Mouse USV dataset balb1565.txt.

S4 Data. Mouse USV dataset B6_2358.txt.

S5 Data. Mouse USV dataset B6_2396.txt.

S6 Data. Mouse USV dataset B6_2420.txt.

S1 File. Analysis result of dataset balb1563.txt.

S2 File. Analysis result of dataset balb1565.txt.

S3 File. Analysis result of dataset B6_2396.txt.

S4 File. Analysis result of dataset B6_2420.txt.

Acknowledgments

References