Skip to main content

Predicting Agreement and Disagreement in the Perception of Tempo

  • Conference paper
  • First Online:
Sound, Music, and Motion (CMMR 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8905))

Included in the following conference series:

  • 1917 Accesses

Abstract

In the absence of a music score, tempo can only be defined by its perception by users. Thus recent studies have focused on the estimation of perceptual tempo defined by listening experiments. So far, algorithms have only been proposed to estimate the tempo when people agree on it. In this paper, we study the case when people disagree on the perception of tempo and propose an algorithm to predict this disagreement. For this, we hypothesize that the perception of tempo is correlated to a set of variations of various viewpoints on the audio content: energy, harmony, spectral-balance variations and short-term-similarity-rate. We suppose that when those variations are coherent, a shared perception of tempo is favoured and when they are not, people may perceive different tempi.We then propose several statistical models to predict the agreement or disagreement in the perception of tempo from these audio features. Finally, we evaluate the models using a test-set resulting from the perceptual experiment performed at Last-FM in 2011.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Experiment 3 is performed on musical excerpts specifically chosen for their extremely slow or fast tempo and leads to a bi-modal distribution with peaks around 50 and 200 bpm. Because of the specificities of these musical excerpts, we do not consider the results of it here.

  2. 2.

    As explained in Sect. 2.1, we only keep the principal axes which explain more than 10 % of the overall variance. This leads to a final vector of 34-dimensions instead of 4*20 \(=\) 80 dimensions.

  3. 3.

    It should be noted that for easiness of understanding we represent in Fig. 4 the features \(d_i(\lambda )\) while the \(\underline{C}\) is computed on \(d_i(b)\).

  4. 4.

    The IQR is a measure of statistical dispersion, being equal to the difference between the upper and lower quartiles. It is considered more robust to the presence of outliers than the standard deviation.

  5. 5.

    The log-scale is used to take into account the logarithmic character of tempo. In log-scale, the intervals [80–85] bpm and [160–170] bpm are equivalent.

  6. 6.

    \(\text {Recall}=\frac{\text {True Positive}}{\text {True Positive + False Negative}}\).

  7. 7.

    As opposed to Precision, the Recall is not sensitive on class distribution hence the mean-over-class-Recall is preferred over the F-Measure.

  8. 8.

    It should be noted that we didn’t plot the relationship between \(T_{harmo}\) and the other estimated tempi because the effect we wanted to show was less clear. We will investigate why in the next paragraph.

  9. 9.

    Firstly the test-set for our experiment and the one of [14] largely differ in their genre distribution. In [14], the tracks are equally distributed between classical, country, dance, hip-hop, jazz, latin, reggae, rock/pop and soul. In our test-set, most of the tracks are pop/rock tracks (50 %), soul and country (about 10 % each). The other genres represent less than 5 % each. The experimental protocols also largely differ. Our test-set comes from a web experiment, done without any strict control on the users, whereas McKinney and Moelants had a rigorous protocol (lab experiment, chosen people). Users have then very different profiles. In McKinney and Moelants experiment, the 33 subjects had an average of 7 years of musical education. In our case, we reckon that almost nobody had a musical training.

References

  1. Bartsch, M.A., Wakefield, G.H.: To catch a chorus: Using chroma-based representations for audio thumbnailing. In: IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, pp. 15–18 (2001)

    Google Scholar 

  2. Chen, C.W., Cremer, M., Lee, K., DiMaria, P., Wu, H.H.: Improving perceived tempo estimation by statistical modeling of higher-level musical descriptors. In: 126th Audio Engineering Society Convention. Audio Engineering Society, Munich (2009)

    Google Scholar 

  3. Chua, B.Y., Lu, G.: Determination of perceptual tempo of music. In: Wiil, U.K. (ed.) CMMR 2004. LNCS, vol. 3310, pp. 61–70. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. En-Najjary, T., Rosec, O., Chonavel, T.: A new method for pitch prediction from spectral envelope and its application in voice conversion. In: Proceedings of the INTERSPEECH (2003)

    Google Scholar 

  5. Flandrin, P.: Time-Frequency/Time-Scale Analysis, vol. 10. Academic Press, San Diego (1998)

    Google Scholar 

  6. Foote, J.: Visualizing music and audio using self-similarity. In: Proceedings of the Seventh ACM International Conference on Multimedia (Part 1). pp. 77–80 (1999)

    Google Scholar 

  7. Foote, J.: Automatic audio segmentation using a measure of audio novelty. In: Proceedings of IEEE International Conference on Multimedia and Exp (ICME), vol. 1, pp. 452–455 (2000)

    Google Scholar 

  8. Gkiokas, A., Katsouros, V., Carayannis, G.: Reducing tempo octave errors by periodicity vector coding and svm learning. In: Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), pp. 301–306 (2012)

    Google Scholar 

  9. Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G., Uhle, C., Cano, P.: An experimental comparison of audio tempo induction algorithms. IEEE Trans. Audio Speech Lang. Process. 14(5), 1832–1844 (2006)

    Article  Google Scholar 

  10. Hockman, J., Fujinaga, I.: Fast vs slow: learning tempo octaves from user data. In: Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), pp. 231–236 (2010)

    Google Scholar 

  11. Laroche, J.: Efficient tempo and beat tracking in audio recordings. J. Audio Eng. Soc. 51(4), 226–233 (2003)

    Google Scholar 

  12. Levy, M.: Improving perceptual tempo estimation with crowd-sourced annotations. In: Proceedings of the 12th International Society for Music Information (ISMIR), pp. 317–322 (2011)

    Google Scholar 

  13. McKinney, M.F., Moelants, D.: Extracting the perceptual tempo from music. In: 5th International Conference on Music Information Retrieval (ISMIR) (2004)

    Google Scholar 

  14. Moelants, D., McKinney, M.: Tempo perception and musical content: what makes a piece fast, slow or temporally ambiguous. In: Proceedings of the 8th International Conference on Music Perception and Cognition, pp. 558–562 (2004)

    Google Scholar 

  15. van Noorden, L., Moelants, D.: Resonance in the perception of musical pulse. J. New Music Res. 28(1), 43–66 (1999)

    Article  Google Scholar 

  16. Peeters, G.: Template-based estimation of time-varying tempo. EURASIP J. Adv. Sign. Process. 2007, 067215 (2007). doi:10.1155/2007/67215

  17. Peeters, G.: Sequence representation of music structure using higher-order similarity matrix and maximum-likelihood approach. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 35–40 (2007)

    Google Scholar 

  18. Peeters, G., Flocon-Cholet, J.: Perceptual tempo estimation using gmm-regression. In: Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, pp. 45–50 (2012)

    Google Scholar 

  19. Peeters, G., Papadopoulos, H.: Simultaneous beat and downbeat-tracking using a probabilistic framework: theory and large-scale evaluation. IEEE Trans. Audio Speech Lang. Process. 19(6), 1754–1769 (2011)

    Article  Google Scholar 

  20. Seyerlehner, K., Widmer, G., Schnitzer, D.: From rhythm patterns to perceived tempo. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 519–524 (2007)

    Google Scholar 

  21. Xiao, L., Tian, A., Li, W., Zhou, J.: Using statistic model to capture the association between timbre and perceived tempo. In: Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR). pp. 659–662 (2008)

    Google Scholar 

  22. Zapata, J.R., Holzapfel, A., Davies, M.E., Oliveira, J.L., Gouyon, F.: Assigning a confidence threshold on automatic beat annotation in large datasets. In: 13th International Society for Music Information Retrieval Conference (ISMIR). pp. 157–162 (2012)

    Google Scholar 

Download references

Acknowledgments

This work was partly supported by the Quaero Program funded by Oseo French State agency for innovation and by the French government Programme Investissements d’Avenir (PIA) through the Bee Music Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geoffroy Peeters .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Peeters, G., Marchand, U. (2014). Predicting Agreement and Disagreement in the Perception of Tempo. In: Aramaki, M., Derrien, O., Kronland-Martinet, R., Ystad, S. (eds) Sound, Music, and Motion. CMMR 2013. Lecture Notes in Computer Science(), vol 8905. Springer, Cham. https://doi.org/10.1007/978-3-319-12976-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12976-1_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12975-4

  • Online ISBN: 978-3-319-12976-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics