Abstract
In the absence of a music score, tempo can only be defined by its perception by users. Thus recent studies have focused on the estimation of perceptual tempo defined by listening experiments. So far, algorithms have only been proposed to estimate the tempo when people agree on it. In this paper, we study the case when people disagree on the perception of tempo and propose an algorithm to predict this disagreement. For this, we hypothesize that the perception of tempo is correlated to a set of variations of various viewpoints on the audio content: energy, harmony, spectral-balance variations and short-term-similarity-rate. We suppose that when those variations are coherent, a shared perception of tempo is favoured and when they are not, people may perceive different tempi.We then propose several statistical models to predict the agreement or disagreement in the perception of tempo from these audio features. Finally, we evaluate the models using a test-set resulting from the perceptual experiment performed at Last-FM in 2011.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Experiment 3 is performed on musical excerpts specifically chosen for their extremely slow or fast tempo and leads to a bi-modal distribution with peaks around 50 and 200Â bpm. Because of the specificities of these musical excerpts, we do not consider the results of it here.
- 2.
As explained in Sect. 2.1, we only keep the principal axes which explain more than 10 % of the overall variance. This leads to a final vector of 34-dimensions instead of 4*20 \(=\) 80 dimensions.
- 3.
It should be noted that for easiness of understanding we represent in Fig. 4 the features \(d_i(\lambda )\) while the \(\underline{C}\) is computed on \(d_i(b)\).
- 4.
The IQR is a measure of statistical dispersion, being equal to the difference between the upper and lower quartiles. It is considered more robust to the presence of outliers than the standard deviation.
- 5.
The log-scale is used to take into account the logarithmic character of tempo. In log-scale, the intervals [80–85] bpm and [160–170] bpm are equivalent.
- 6.
\(\text {Recall}=\frac{\text {True Positive}}{\text {True Positive + False Negative}}\).
- 7.
As opposed to Precision, the Recall is not sensitive on class distribution hence the mean-over-class-Recall is preferred over the F-Measure.
- 8.
It should be noted that we didn’t plot the relationship between \(T_{harmo}\) and the other estimated tempi because the effect we wanted to show was less clear. We will investigate why in the next paragraph.
- 9.
Firstly the test-set for our experiment and the one of [14] largely differ in their genre distribution. In [14], the tracks are equally distributed between classical, country, dance, hip-hop, jazz, latin, reggae, rock/pop and soul. In our test-set, most of the tracks are pop/rock tracks (50Â %), soul and country (about 10Â % each). The other genres represent less than 5Â % each. The experimental protocols also largely differ. Our test-set comes from a web experiment, done without any strict control on the users, whereas McKinney and Moelants had a rigorous protocol (lab experiment, chosen people). Users have then very different profiles. In McKinney and Moelants experiment, the 33 subjects had an average of 7Â years of musical education. In our case, we reckon that almost nobody had a musical training.
References
Bartsch, M.A., Wakefield, G.H.: To catch a chorus: Using chroma-based representations for audio thumbnailing. In: IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, pp. 15–18 (2001)
Chen, C.W., Cremer, M., Lee, K., DiMaria, P., Wu, H.H.: Improving perceived tempo estimation by statistical modeling of higher-level musical descriptors. In: 126th Audio Engineering Society Convention. Audio Engineering Society, Munich (2009)
Chua, B.Y., Lu, G.: Determination of perceptual tempo of music. In: Wiil, U.K. (ed.) CMMR 2004. LNCS, vol. 3310, pp. 61–70. Springer, Heidelberg (2005)
En-Najjary, T., Rosec, O., Chonavel, T.: A new method for pitch prediction from spectral envelope and its application in voice conversion. In: Proceedings of the INTERSPEECH (2003)
Flandrin, P.: Time-Frequency/Time-Scale Analysis, vol. 10. Academic Press, San Diego (1998)
Foote, J.: Visualizing music and audio using self-similarity. In: Proceedings of the Seventh ACM International Conference on Multimedia (Part 1). pp. 77–80 (1999)
Foote, J.: Automatic audio segmentation using a measure of audio novelty. In: Proceedings of IEEE International Conference on Multimedia and Exp (ICME), vol. 1, pp. 452–455 (2000)
Gkiokas, A., Katsouros, V., Carayannis, G.: Reducing tempo octave errors by periodicity vector coding and svm learning. In: Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), pp. 301–306 (2012)
Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G., Uhle, C., Cano, P.: An experimental comparison of audio tempo induction algorithms. IEEE Trans. Audio Speech Lang. Process. 14(5), 1832–1844 (2006)
Hockman, J., Fujinaga, I.: Fast vs slow: learning tempo octaves from user data. In: Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), pp. 231–236 (2010)
Laroche, J.: Efficient tempo and beat tracking in audio recordings. J. Audio Eng. Soc. 51(4), 226–233 (2003)
Levy, M.: Improving perceptual tempo estimation with crowd-sourced annotations. In: Proceedings of the 12th International Society for Music Information (ISMIR), pp. 317–322 (2011)
McKinney, M.F., Moelants, D.: Extracting the perceptual tempo from music. In: 5th International Conference on Music Information Retrieval (ISMIR) (2004)
Moelants, D., McKinney, M.: Tempo perception and musical content: what makes a piece fast, slow or temporally ambiguous. In: Proceedings of the 8th International Conference on Music Perception and Cognition, pp. 558–562 (2004)
van Noorden, L., Moelants, D.: Resonance in the perception of musical pulse. J. New Music Res. 28(1), 43–66 (1999)
Peeters, G.: Template-based estimation of time-varying tempo. EURASIP J. Adv. Sign. Process. 2007, 067215 (2007). doi:10.1155/2007/67215
Peeters, G.: Sequence representation of music structure using higher-order similarity matrix and maximum-likelihood approach. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 35–40 (2007)
Peeters, G., Flocon-Cholet, J.: Perceptual tempo estimation using gmm-regression. In: Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, pp. 45–50 (2012)
Peeters, G., Papadopoulos, H.: Simultaneous beat and downbeat-tracking using a probabilistic framework: theory and large-scale evaluation. IEEE Trans. Audio Speech Lang. Process. 19(6), 1754–1769 (2011)
Seyerlehner, K., Widmer, G., Schnitzer, D.: From rhythm patterns to perceived tempo. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 519–524 (2007)
Xiao, L., Tian, A., Li, W., Zhou, J.: Using statistic model to capture the association between timbre and perceived tempo. In: Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR). pp. 659–662 (2008)
Zapata, J.R., Holzapfel, A., Davies, M.E., Oliveira, J.L., Gouyon, F.: Assigning a confidence threshold on automatic beat annotation in large datasets. In: 13th International Society for Music Information Retrieval Conference (ISMIR). pp. 157–162 (2012)
Acknowledgments
This work was partly supported by the Quaero Program funded by Oseo French State agency for innovation and by the French government Programme Investissements d’Avenir (PIA) through the Bee Music Project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Peeters, G., Marchand, U. (2014). Predicting Agreement and Disagreement in the Perception of Tempo. In: Aramaki, M., Derrien, O., Kronland-Martinet, R., Ystad, S. (eds) Sound, Music, and Motion. CMMR 2013. Lecture Notes in Computer Science(), vol 8905. Springer, Cham. https://doi.org/10.1007/978-3-319-12976-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-12976-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12975-4
Online ISBN: 978-3-319-12976-1
eBook Packages: Computer ScienceComputer Science (R0)