Abstract
Do similarity or distance measures ever go wrong? The inherent subjectivity in similarity discernment has long supported the view that all judgements of similarity are equally valid, and that any selected similarity measure may only be considered more effective in some chosen domain. This paper presents evidence that such a view is incorrect for structural similarity comparisons. Similarity and distance measures occasionally do go wrong, and produce judgements that can be considered as errors in judgement. This claim is supported by a novel method for assessing the quality of similarity and distance functions, which is based on relative scale of similarity with respect to chosen reference objects. The method may be applied in any domain, and is demonstrated for common measures of structural similarity in graphs. Finally, the paper identifies three distinct kinds of relative similarity judgement errors, and shows how the distribution of these errors is related to graph properties under common similarity measures.
K. A. Naudé—This research was supported by the National Research Foundation, South Africa.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Morain-Nicolier, F., Landré, J., Ruan, S.: Binary symbol recognition from local dissimilarity map. In: 8th International Workshop on Graphic Recognition GREC 2009, pp. 143–148 (2009)
Boyer, L., Habrard, A., Sebban, M.: Learning metrics between tree structured data: application to image recognition. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 54–66. Springer, Heidelberg (2007)
Rahman, S.A., Bashton, M., Holliday, G.L., Schrader, R., Thornton, J.M.: Small Molecule Subgraph Detector (SMSD) toolkit. Journal of Cheminformatics 1(1), 12 (2009)
Cao, Y., Jiang, T., Girke, T.: A maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinformatics 24(13), i366–i374 (2008)
Islam, A., Inkpen, D.: Semantic similarity of short texts. In: Nicolov, N., Angeliva, G., Mitkov, R. (eds.) Text, pp. 227–236. John Benjamins Publishing Company (2009)
Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., Stumme, G.: Evaluating similarity measures for emergent semantics of social tagging. In: Proceedings of the 18th International Conference on World Wide Web, pp. 641–650. ACM, New York (2009)
Blondel, V.D., Gajardo, A., Heymans, M., Senellart, P., Van Dooren, P.: A measure of similarity between graph vertices: applications to synonym extraction and web searching. SIAM Review 46(4), 647–666 (2004)
Cleverdon, C., Mills, J., Keen, M.: Factors Determining the Performance of Indexing Systems. ASLIB Cranfield project, Cranfield University, Cranfield, Technical report (1966)
Nene, S.A., Nayar, S.K., Murase, H.: Columbia Object Image Library (COIL-100). Technical report CUCS-006-96, Columbia University (1996)
Colantoni, P., Laget, B.: Color image segmentation using region adjacency graphs. In: Sixth International Conference on Image Processing and its Applications, vol. 2, pp. 698–702, July 1997
Chevalier, F., Domenger, J., Benoispineau, J., Delest, M.: Retrieval of objects in video by similarity based on graph matching. Pattern Recognition Letters 28(8), 939–949 (2007)
Riesen, K., Bunke, H.: IAM graph database repository for graph based pattern recognition and machine learning. In: da Vitoria Lobo, N., Kasparis, T., Roli, F., Kwok, J.T., Georgiopoulos, M., Anagnostopoulos, G.C., Loog, M. (eds.) Structural, Syntactic, and Statistical Pattern Recognition. LNCS, vol. 5342, pp. 287–297. Springer, Heidelberg (2008)
Agrawal, R., Gollapudi, S., Kannan, A., Kenthapadi, K.: Similarity search using concept graphs. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China, pp. 719–728 (2014)
Zafarani, R., Liu, H.: Evaluation Without Ground Truth in Social Media. Communications of the ACM 58(6), 54–60 (2015)
Albert, R., Barabasi, A.L.: Topology of evolving networks: local events and universality. Physical Review Letters 85(24), 5234–5237 (2000)
Zager, L., Verghese, G.: Graph similarity scoring and matching. Applied Mathematics Letters 21(1), 86–94 (2008)
Boschloo, R.: Raised conditional level of significance for the \(2\times 2\)-table when testing the equality of two probabilities. Statistica Neerlandica 24(1), 1–9 (1970)
Shaffer, J.P.: Recent developments towards optimality in multiple hypothesis testing. Lecture Notes-Monograph Series, 16–32 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Naudé, K.A., Greyling, J.H., Vogts, D. (2015). When Similarity Measures Lie. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds) Similarity Search and Applications. SISAP 2015. Lecture Notes in Computer Science(), vol 9371. Springer, Cham. https://doi.org/10.1007/978-3-319-25087-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-25087-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25086-1
Online ISBN: 978-3-319-25087-8
eBook Packages: Computer ScienceComputer Science (R0)