Predicting Agreement and Disagreement in the Perception of Tempo

Peeters, Geoffroy; Marchand, Ugo

doi:10.1007/978-3-319-12976-1_20

Geoffroy Peeters¹⁷ &
Ugo Marchand¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8905))

Included in the following conference series:

International Symposium on Computer Music Multidisciplinary Research

1917 Accesses

Abstract

In the absence of a music score, tempo can only be defined by its perception by users. Thus recent studies have focused on the estimation of perceptual tempo defined by listening experiments. So far, algorithms have only been proposed to estimate the tempo when people agree on it. In this paper, we study the case when people disagree on the perception of tempo and propose an algorithm to predict this disagreement. For this, we hypothesize that the perception of tempo is correlated to a set of variations of various viewpoints on the audio content: energy, harmony, spectral-balance variations and short-term-similarity-rate. We suppose that when those variations are coherent, a shared perception of tempo is favoured and when they are not, people may perceive different tempi.We then propose several statistical models to predict the agreement or disagreement in the perception of tempo from these audio features. Finally, we evaluate the models using a test-set resulting from the perceptual experiment performed at Last-FM in 2011.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Experiment 3 is performed on musical excerpts specifically chosen for their extremely slow or fast tempo and leads to a bi-modal distribution with peaks around 50 and 200 bpm. Because of the specificities of these musical excerpts, we do not consider the results of it here.
2.
As explained in Sect. 2.1, we only keep the principal axes which explain more than 10 % of the overall variance. This leads to a final vector of 34-dimensions instead of 4*20 \(=\) 80 dimensions.
3.
It should be noted that for easiness of understanding we represent in Fig. 4 the features \(d_i(\lambda )\) while the \(\underline{C}\) is computed on \(d_i(b)\).
4.
The IQR is a measure of statistical dispersion, being equal to the difference between the upper and lower quartiles. It is considered more robust to the presence of outliers than the standard deviation.
5.
The log-scale is used to take into account the logarithmic character of tempo. In log-scale, the intervals [80–85] bpm and [160–170] bpm are equivalent.
6.
\(\text {Recall}=\frac{\text {True Positive}}{\text {True Positive + False Negative}}\).
7.
As opposed to Precision, the Recall is not sensitive on class distribution hence the mean-over-class-Recall is preferred over the F-Measure.
8.
It should be noted that we didn’t plot the relationship between \(T_{harmo}\) and the other estimated tempi because the effect we wanted to show was less clear. We will investigate why in the next paragraph.
9.
Firstly the test-set for our experiment and the one of [14] largely differ in their genre distribution. In [14], the tracks are equally distributed between classical, country, dance, hip-hop, jazz, latin, reggae, rock/pop and soul. In our test-set, most of the tracks are pop/rock tracks (50 %), soul and country (about 10 % each). The other genres represent less than 5 % each. The experimental protocols also largely differ. Our test-set comes from a web experiment, done without any strict control on the users, whereas McKinney and Moelants had a rigorous protocol (lab experiment, chosen people). Users have then very different profiles. In McKinney and Moelants experiment, the 33 subjects had an average of 7 years of musical education. In our case, we reckon that almost nobody had a musical training.

References

Bartsch, M.A., Wakefield, G.H.: To catch a chorus: Using chroma-based representations for audio thumbnailing. In: IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, pp. 15–18 (2001)
Google Scholar
Chen, C.W., Cremer, M., Lee, K., DiMaria, P., Wu, H.H.: Improving perceived tempo estimation by statistical modeling of higher-level musical descriptors. In: 126th Audio Engineering Society Convention. Audio Engineering Society, Munich (2009)
Google Scholar
Chua, B.Y., Lu, G.: Determination of perceptual tempo of music. In: Wiil, U.K. (ed.) CMMR 2004. LNCS, vol. 3310, pp. 61–70. Springer, Heidelberg (2005)
Chapter Google Scholar
En-Najjary, T., Rosec, O., Chonavel, T.: A new method for pitch prediction from spectral envelope and its application in voice conversion. In: Proceedings of the INTERSPEECH (2003)
Google Scholar
Flandrin, P.: Time-Frequency/Time-Scale Analysis, vol. 10. Academic Press, San Diego (1998)
Google Scholar
Foote, J.: Visualizing music and audio using self-similarity. In: Proceedings of the Seventh ACM International Conference on Multimedia (Part 1). pp. 77–80 (1999)
Google Scholar
Foote, J.: Automatic audio segmentation using a measure of audio novelty. In: Proceedings of IEEE International Conference on Multimedia and Exp (ICME), vol. 1, pp. 452–455 (2000)
Google Scholar
Gkiokas, A., Katsouros, V., Carayannis, G.: Reducing tempo octave errors by periodicity vector coding and svm learning. In: Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), pp. 301–306 (2012)
Google Scholar
Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G., Uhle, C., Cano, P.: An experimental comparison of audio tempo induction algorithms. IEEE Trans. Audio Speech Lang. Process. 14(5), 1832–1844 (2006)
Article Google Scholar
Hockman, J., Fujinaga, I.: Fast vs slow: learning tempo octaves from user data. In: Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), pp. 231–236 (2010)
Google Scholar
Laroche, J.: Efficient tempo and beat tracking in audio recordings. J. Audio Eng. Soc. 51(4), 226–233 (2003)
Google Scholar
Levy, M.: Improving perceptual tempo estimation with crowd-sourced annotations. In: Proceedings of the 12th International Society for Music Information (ISMIR), pp. 317–322 (2011)
Google Scholar
McKinney, M.F., Moelants, D.: Extracting the perceptual tempo from music. In: 5th International Conference on Music Information Retrieval (ISMIR) (2004)
Google Scholar
Moelants, D., McKinney, M.: Tempo perception and musical content: what makes a piece fast, slow or temporally ambiguous. In: Proceedings of the 8th International Conference on Music Perception and Cognition, pp. 558–562 (2004)
Google Scholar
van Noorden, L., Moelants, D.: Resonance in the perception of musical pulse. J. New Music Res. 28(1), 43–66 (1999)
Article Google Scholar
Peeters, G.: Template-based estimation of time-varying tempo. EURASIP J. Adv. Sign. Process. 2007, 067215 (2007). doi:10.1155/2007/67215
Peeters, G.: Sequence representation of music structure using higher-order similarity matrix and maximum-likelihood approach. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 35–40 (2007)
Google Scholar
Peeters, G., Flocon-Cholet, J.: Perceptual tempo estimation using gmm-regression. In: Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, pp. 45–50 (2012)
Google Scholar
Peeters, G., Papadopoulos, H.: Simultaneous beat and downbeat-tracking using a probabilistic framework: theory and large-scale evaluation. IEEE Trans. Audio Speech Lang. Process. 19(6), 1754–1769 (2011)
Article Google Scholar
Seyerlehner, K., Widmer, G., Schnitzer, D.: From rhythm patterns to perceived tempo. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 519–524 (2007)
Google Scholar
Xiao, L., Tian, A., Li, W., Zhou, J.: Using statistic model to capture the association between timbre and perceived tempo. In: Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR). pp. 659–662 (2008)
Google Scholar
Zapata, J.R., Holzapfel, A., Davies, M.E., Oliveira, J.L., Gouyon, F.: Assigning a confidence threshold on automatic beat annotation in large datasets. In: 13th International Society for Music Information Retrieval Conference (ISMIR). pp. 157–162 (2012)
Google Scholar

Download references

Acknowledgments

This work was partly supported by the Quaero Program funded by Oseo French State agency for innovation and by the French government Programme Investissements d’Avenir (PIA) through the Bee Music Project.

Author information

Authors and Affiliations

STMS - IRCAM - CNRS - UPMC - 1, pl. Igor Stravinky, 75004, Paris, France
Geoffroy Peeters & Ugo Marchand

Authors

Geoffroy Peeters
View author publications
You can also search for this author in PubMed Google Scholar
Ugo Marchand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geoffroy Peeters .

Editor information

Editors and Affiliations

CNRS - LMA, Marseille, France
Mitsuko Aramaki
Toulon-Var University and CNRS - LMA, Marseille, France
Olivier Derrien
CNRS - LMA, Marseille, France
Richard Kronland-Martinet
CNRS - LMA, Marseille, France
Sølvi Ystad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peeters, G., Marchand, U. (2014). Predicting Agreement and Disagreement in the Perception of Tempo. In: Aramaki, M., Derrien, O., Kronland-Martinet, R., Ystad, S. (eds) Sound, Music, and Motion. CMMR 2013. Lecture Notes in Computer Science(), vol 8905. Springer, Cham. https://doi.org/10.1007/978-3-319-12976-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-12976-1_20
Published: 05 December 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12975-4
Online ISBN: 978-3-319-12976-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics