Skip to main content
Log in

Music genre profiling based on Fisher manifolds and Probabilistic Quantum Clustering

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Probabilistic classifiers induce a similarity metric at each location in the space of the data. This is measured by the Fisher Information Matrix. Pairwise distances in this Riemannian space, calculated along geodesic paths, can be used to generate a similarity map of the data. The novelty in the paper is twofold; to improve the methodology for visualisation of data structures in low-dimensional manifolds, and to illustrate the value of inferring the structure from a probabilistic classifier by metric learning, through application to music data. This leads to the discovery of new structures and song similarities beyond the original genre classification labels. These similarities are not directly observable by measuring Euclidean distances between features of the original space, but require the correct metric to reflect similarity based on genre. The results quantify the extent to which music from bands typically associated with one particular genre can, in fact, crossover strongly to another genre.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://the.echonest.com/.

  2. http://www.ifs.tuwien.ac.at/mir/msd/.

References

  1. Amari SI (1998) Natural gradient works efficiently in learning. Neural comput 10(2):251–276

    Article  Google Scholar 

  2. Amari Si WuS (1999) Improving support vector machine classifiers by modifying kernel functions. Neural Netw 12(6):783–789

    Article  Google Scholar 

  3. Bogdanov D, Serra J, Wack N, Herrera P (2009) From low-level to high-level: Comparative study of music similarity measures. In: 2009 11th IEEE international symposium on multimedia, pp 453–458. IEEE

  4. Carter KM, Raich R, Finn WG, Hero III AO (2009) Fine: fisher information nonparametric embedding. IEEE Trans Pattern Anal Mach Intell 31(11):2093–2098. https://doi.org/10.1109/TPAMI.2009.67

    Article  Google Scholar 

  5. Casaña-Eslava RV, Lisboa PJ, Ortega-Martorell S, Jarman IH, Martín-Guerrero JD (2020) Probabilistic quantum clustering. Knowledge-Based Syst. https://doi.org/10.1016/j.knosys.2020.105567

    Article  Google Scholar 

  6. Casaña-Eslava RV, Jarman IH, Lisboa PJ, Martín-Guerrero JD (2017) Quantum clustering in non-spherical data distributions: finding a suitable number of clusters. Neurocomputing 268:127–141

    Article  Google Scholar 

  7. Casaña-Eslava RV, Martín-Guerrero JD, Ortega-Martorell S, Lisboa PJ, Jarman IH (2019) Scalable implementation of measuring distances in a riemannian manifold based on the fisher information metric. In: 2019 International joint conference on neural networks (IJCNN), pp 1–7. IEEE

  8. Casey MA, Veltkamp R, Goto M, Leman M, Rhodes C, Slaney M (2008) Content-based music information retrieval: current directions and future challenges. Proc IEEE 96(4):668–696

    Article  Google Scholar 

  9. Chambers SJ, Jarman IH, Etchells TA, Lisboa PJG (2013) Inference of number of prototypes with a framework approach to k-means clustering. Int J Biomed Eng Technol 13(4):323–340

    Article  Google Scholar 

  10. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learning 20(3):273–297

    MATH  Google Scholar 

  11. Cox MA, Cox TF (2008) Multidimensional scaling. In: Chen C, Härdle WK, Unwin A (eds) Handbook of data visualization, Springer, Heidelberg, pp 315–347

    Chapter  Google Scholar 

  12. Dijkstra EW (1959) A note on two problems in connexion with graphs. Numerische Mathematik 1(1):269–271

    Article  MathSciNet  Google Scholar 

  13. Floyd RW (1962) Algorithm 97: shortest path. Commun ACM 5(6):345. https://doi.org/10.1145/367766.368168

    Article  Google Scholar 

  14. Goto M, Goto T (2005) Musicream: New music playback interface for streaming, sticking, sorting, and recalling musical pieces. In: ISMIR, pp 404–411

  15. Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53(3–4):325–338

    Article  MathSciNet  Google Scholar 

  16. Hamasaki M, Goto M (2013) Songrium: A music browsing assistance service based on visualization of massive open collaboration within music content creation community. In: Proceedings of the 9th international symposium on open collaboration, pp 1–10

  17. Haykin SS (2009) Neural networks and learning machines, 3rd edn. Pearson Education, Upper Saddle River

    Google Scholar 

  18. Horn D, Gottlieb A (2001) Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys Rev Lett 88(1):018702

    Article  Google Scholar 

  19. Horn D, Gottlieb A (2001) The method of quantum clustering. Proc Neural Inf Process Syst NIPS 2001:769–776

    Google Scholar 

  20. Jaakkola T, Haussler D et al (1999) Exploiting generative models in discriminative classifiers. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing systems 11. MIT Press, pp 487–493

  21. Jones MC, Downie JS, Ehmann AF (2007) Human similarity judgments: Implications for the design of formal evaluations. In: ISMIR, pp 539–542

  22. Kaski S, Sinkkonen J (2000) Metrics that learn relevance. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. Neural computing: new challenges and perspectives for the new millennium, vol 5, pp 547–552 https://doi.org/10.1109/IJCNN.2000.861526

  23. Kaski S, Sinkkonen J, Peltonen J (2001) Bankruptcy analysis with self-organizing maps in learning metrics. IEEE Trans Neural Netw 12(4):936–947. https://doi.org/10.1109/72.935102

    Article  MATH  Google Scholar 

  24. Kim JH, Tomasik B, Turnbull D (2009) Using artist similarity to propagate semantic information. ISMIR 9:375–380

    Google Scholar 

  25. Knees P, Pampalk E, Widmer G (2004) Artist classification with web-based data. In: ISMIR

  26. Knees P, Schedl M, Pohle T, Widmer G (2006) An innovative three-dimensional user interface for exploring music collections enriched. In: Proceedings of the 14th ACM international conference on Multimedia, pp 17–24

  27. Kullback S (1997) Information theory and statistics. Courier Corporation, New York

    MATH  Google Scholar 

  28. Lübbers D, Jarke M (2009) Adaptive multimodal exploration of music collections. In: Proceedings of the 10th international society for music information retrieval conference, pp 195–200. ISMIR, Kobe, Japan. https://doi.org/10.5281/zenodo.1415518

  29. Li Y, Wang Y, Wang Y, Jiao L, Liu Y (2016) Quantum clustering using kernel entropy component analysis. Neurocomputing 202:36–48

    Article  Google Scholar 

  30. Lippens S, Martens JP, De Mulder T (2004) A comparison of human and automatic musical genre classification. In: 2004 IEEE international conference on acoustics, speech, and signal processing, vol 4, pp. iv–iv. IEEE

  31. Lisboa PJG, Etchells TA, Jarman IH, Chambers SJ (2013) Finding reproducible cluster partitions for the k-means algorithm. BMC Bioinf 14(Suppl. 1):S8

    Article  Google Scholar 

  32. Mandel MI, Pascanu R, Eck D, Bengio Y, Aiello LM, Schifanella R, Menczer F (2011) Contextual tag inference. ACM Trans Multimed Comput Commun Appl (TOMM) 7(1):1–18

    Google Scholar 

  33. McKay C (2010) Automatic music classification with jMIR. Citeseer

  34. McKay C, Fujinaga I, Depalle P (2005) jaudio: A feature extraction library. In: Proceedings of the international conference on music information retrieval, pp 600–603

  35. Miotto R, Barrington L, Lanckriet GR (2010) Improving auto-tagging by modeling semantic co-occurrences. In: ISMIR, pp 297–302

  36. Nash J (1954) C1 isometric imbeddings. Ann Math 60(3):383–396. https://doi.org/10.2307/1969840

    Article  MathSciNet  MATH  Google Scholar 

  37. Nash J (1956) The imbedding problem for Riemannian manifolds. Ann Math 63(1):20–63. https://doi.org/10.2307/1969989

    Article  MathSciNet  MATH  Google Scholar 

  38. Newman ME (2004) Detecting community structure in networks. Eur Phys J B Conden Matter Complex Syst 38(2):321–330

    Article  Google Scholar 

  39. Parisi L, RaviChandran N, Manaog ML (2020) A novel hybrid algorithm for aiding prediction of prognosis in patients with hepatitis. Neural Comput Appl 32(8):3839–3852

    Article  Google Scholar 

  40. Platt J et al (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74

    Google Scholar 

  41. Rao CR (1992) Information and the accuracy attainable in the estimation of statistical parameters. In: Kotz S, Johnson NL (eds) Breakthroughs in statistics. Springer, New York, pp 235–247

    Chapter  Google Scholar 

  42. Ruiz H, Etchells TA, Jarman IH, Martín JD, Lisboa PJ (2013) A principled approach to network-based classification and data representation. Neurocomputing 112:79–91

    Article  Google Scholar 

  43. Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409

    Article  Google Scholar 

  44. Schedl M, Flexer A, Urbano J (2013) The neglected user in music information retrieval research. J Intell Inf Syst 41(3):523–539

    Article  Google Scholar 

  45. Schedl M, Gutiérrez EG, Urbano J (2014) Music information retrieval: Recent developments and applications. Foundations and Trends in Information Retrieval. 2014 Sept 12; 8 (2-3): 127–261

  46. Schedl M, Pohle T, Knees P, Widmer G (2011) Exploring the music similarity space on the web. ACM Trans Inf Syst (TOIS) 29(3):1–24

    Article  Google Scholar 

  47. Schindler A, Mayer R, Rauber A (2012) Facilitating comprehensive benchmarking experiments on the million song dataset. In: ISMIR, pp 469–474

  48. Schindler A, Rauber A (2012) Capturing the temporal domain in echonest features for improved classification effectiveness. In: International workshop on adaptive multimedia retrieval, Springer, pp 214–227

  49. Seyerlehner K, Schedl M, Pohle T, Knees P (2010) Using block-level features for genre classification, tag classification and music similarity estimation. Submission to Audio Music Similarity and Retrieval Task of MIREX 2010

  50. Sordo M et al (2012) Semantic annotation of music collections: A computational approach. Ph.D. thesis, Universitat Pompeu Fabra

  51. Torgerson WS (1952) Multidimensional scaling: I. theory and method. Psychometrika 17(4):401–419

    Article  MathSciNet  Google Scholar 

  52. Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302

    Article  Google Scholar 

  53. Urbano J (2013) Evaluation in audio music similarity. Ph.D. thesis, Universidad Carlos III de Madrid

  54. Urbano J, Morato J, Marrero M, Martín D (2010) Crowdsourcing preference judgments for evaluation of music similarity tasks. In: ACM SIGIR workshop on crowdsourcing for search evaluation, ACM New York, pp 9–16

  55. Vincent P, Bengio Y (2003) Manifold parzen windows. In: Advances in neural information processing systems, pp 849–856

  56. Warshall S (1962) A theorem on boolean matrices. J ACM 9(1):11–12. https://doi.org/10.1145/321105.321107

    Article  MathSciNet  MATH  Google Scholar 

  57. Young G, Householder AS (1938) Discussion of a set of points in terms of their mutual distances. Psychometrika 3(1):19–22

    Article  Google Scholar 

  58. Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. Adv Neural Inf Process Syst 17:1601–1608

    Google Scholar 

  59. Zhang YC, Séaghdha DÓ, Quercia D, Jambor T (2012) Auralist: introducing serendipity into music recommendation. In: Proceedings of the fifth ACM international conference on Web search and data mining, pp 13–22

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raúl V. Casaña-Eslava.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Derivation of FI matrix as a function of MLP

For obtaining \(\mathbf {FI} \left( \mathbf {x} \right)\) as a function of the MLP output estimators, the soft-max logarithms and their derivatives are needed:

$$\begin{aligned}&p_j = \frac{ \text {exp} \left( a_j \right) }{\sum _{k=1}^{J} \text {exp} \left( a_k \right) }\end{aligned}$$
(31)
$$\begin{aligned}&\text {log} (p_j) = a_j - \text {log} \left( \sum _{k=1}^{J} \text {exp} \left( a_k \right) \right) \end{aligned}$$
(32)
$$\begin{aligned}&\nabla \text {log} (p_j) = \nabla a_j - \sum _{k=1}^{J} p_k \nabla a_k \end{aligned}$$
(33)

where \(p_j = p(c_j|\mathbf {x})\), \(\nabla = \nabla _{\mathbf {x}} = \frac{d}{d{\mathbf {x}}}\) and \(a_j = a_j( \mathbf {x} )\) for notation abbreviation. Now, combining Eqs. 2 and 31 and expanding the product:

$$\begin{aligned} \begin{aligned}&\mathbf {FI} \left( \mathbf {x} \right) = \sum _{j=1}^J \left( \nabla a_j - \sum _{k=1}^{J} p_k \nabla a_k \right) \left( \nabla a_j - \sum _{l=1}^J p_l \nabla a_l \right) ^T p_j \\&\quad = \sum _{j=1}^J \Biggl ( (\nabla a_j)(\nabla a_j)^T - \sum _{l=1}^J (\nabla a_j)(\nabla a_l)^T p_l \\&\qquad - \sum _{k=1}^J (\nabla a_k)(\nabla a_j)^T p_k + \sum _{k=1}^J \sum _{l=1}^J (\nabla a_k)(\nabla a_l)^T p_k p_l \Biggr ) p_j \end{aligned} \end{aligned}$$
(34)

Rearranging terms and considering that any variable t can be expressed as \(\sum _i^J tp_i = t \sum _i^J p_i = t\), we get:

$$\begin{aligned} & \mathbf {FI}\left( \mathbf {x} \right) = \sum _{j=1}^J \Biggl ( \sum _{k=1}^J \sum _{l=1}^J (\nabla a_j)(\nabla a_j)^T p_k p_l \\&\quad - \sum _{k=1}^J \sum _{l=1}^J (\nabla a_j)(\nabla a_l)^T p_k p_l - \sum _{k=1}^J \sum _{l=1}^J (\nabla a_k)(\nabla a_j)^T p_k p_l \\&\quad + \sum _{k=1}^J \sum _{l=1}^J (\nabla a_k)(\nabla a_l)^T p_k p_l \Biggr ) p_j \\& = \sum _{j=1}^J \Biggl ( \sum _{k=1}^J \sum _{l=1}^J \biggl ( (\nabla a_j)(\nabla a_j)^T -(\nabla a_j)(\nabla a_l)^T \\&\quad -(\nabla a_k)(\nabla a_j)^T + (\nabla a_k)(\nabla a_l)^T \biggr ) p_k p_l \Biggr ) p_j \end{aligned}$$
(35)

After merging the summations, the final expression of the \(\mathbf {FI} \left( \mathbf {x} \right)\) for the MLP is obtained:

$$\begin{aligned} \mathbf {FI} \left( \mathbf {x} \right) = \sum _{j=1}^J \sum _{k=1}^J \sum _{l=1}^J \nabla \left( a_j - a_k \right) \nabla \left( a_j - a_l \right) ^T p_j p_k p_l \end{aligned}$$
(36)

With this Eq. 36 (equivalent to Eq. 5), the metric is estimated locally, computing differential distances as:

$$\begin{aligned} d(\mathbf {x}, \, \mathbf {x}+d\mathbf {x})^2 = d\mathbf {x}^T \cdot \mathbf {FI}(\mathbf {x}) \cdot d\mathbf {x} \end{aligned}$$
(37)

B Silhouette figures

The Silhouette metric is especially adequate when the clustering method is based on minimizing pairwise distances within the cluster members. However, when the clustering also takes account of density similarity, non-spherical shapes are expected and hence, the Silhouette metric of a PQC probably might be worse than K-means Silhouette even if the PQC cluster better reflects the data profiles. In any case, the Silhouette metric is computed for all cluster solutions based in the cMDs data (Fig. 12).

Fig. 12
figure 12

Silhouette figures for the different cluster solutions. The genre labels correspond with: Rap (0), Pop-Rock (1), Jazz (2)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Casaña-Eslava, R.V., Jarman, I.H., Ortega-Martorell, S. et al. Music genre profiling based on Fisher manifolds and Probabilistic Quantum Clustering. Neural Comput & Applic 33, 7521–7539 (2021). https://doi.org/10.1007/s00521-020-05499-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05499-x

Keywords

Navigation