Abstract
Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω, equipped with a distance, ρ, and an underlying probability distribution, μ. While performing an asymptotic analysis, we send the intrinsic dimension d of Ω to infinity, and assume that the size of a dataset, n, grows superpolynomially yet subexponentially in d. Exact similarity search refers to finding the nearest neighbour in the dataset X to a query point ω∈Ω, where the query points are subject to the same probability distribution μ as datapoints. Let denote a class of all 1-Lipschitz functions on Ω that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets {ω:f(ω)≥a}, a∈ℝ is o(n 1/4/log2 n). (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption d O(1) is reasonable.) We deduce the Ω(n 1/4) lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in (Ω,X). In paricular, this bound is superpolynomial in d.
Similar content being viewed by others
References
Andoni, A., Indyk, P., Pǎtrascu, M.: On the optimality of the dimensionality reduction method. In: Proc. 47th IEEE Symp. on Foundations of Computer Science, pp. 449–458 (2006)
Barkol, O., Rabani, Y.: Tighter lower bounds for nearest neighbor search and related problems in the cell probe model. In: Proc. 32nd ACM Symp. on the Theory of Computing, pp. 388–396 (2000)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful. In: Proc. 7-th Intern. Conf. on Database Theory, ICDT-99, Jerusalem, pp. 217–235 (1999)
Borodin, A., Ostrovsky, R., Rabani, Y.: Lower bounds for high-dimensional nearest neighbor search and related problems. In: Proc. 31st Annual ACS Sympos. Theory Comput, pp. 312–321 (1999)
Bustos, B., Navarro, G., Chávez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24, 2357–2366 (2003)
Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognit. Lett. 26, 1363–1376 (2005)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33, 273–321 (2001)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proc. 23rd Int. Conf. on Very Large Data Bases, VLDB’97, Athens, Greece, pp. 426–435 (1997)
Ciaccia, P., Patella, M., Zezula, P.: A cost model for similarity queries in metric spaces. In: Proc. 17-th ACM Symposium on Principles of Database Systems, PODS’98, Seattle, WA, pp. 59–68 (1998)
Clarkson, K.L.: An algorithm for approximate closest-point queries. In: Proc. 10th Symp. Comp. Geom, pp. 160–164. Stony Brook, New York (1994)
Clarkson, K.L.: Nearest-neighbor searching and metric space dimensions. In: Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pp. 15–59. MIT Press, New York (2006)
Faragó, A., Linder, T., Lugosi, G.: Fast nearest neighbor search in dissimilarity spaces. IEEE Trans. Pattern Anal. Mach. Intell. 18, 957–962 (1993)
Goldberg, P.W., Jerrum, M.R.: Bounding the Vapnik–Chervonenkis dimension of concept classes parametrised by real numbers. Mach. Learn. 18, 131–148 (1995)
Hellerstein, J.M., Koutsoupias, E., Miranker, D.P., Papadimitriou, C., Samoladas, V.: On a model of indexability and its bounds for range queries. J. ACM 49(1), 35–55 (2002)
Indyk, P.: Nearest neighbours in high-dimensional spaces. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, pp. 877–892. Chapman and Hall/CRC, Boca Raton/London/New York/Washington, (2004)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, Dallas, Texas, pp. 604–613 (1998)
Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbour queries. In: Prof. 16-th Symposium on PODS, pp. 369–380 (1997)
Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput. 30, 457–474 (2000)
Ledoux, M.: The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs, vol. 89. American Mathematical Society, Providence (2001)
Milman, V.D., Schechtman, G.: Asymptotic Theory of Finite Dimensional Normed Spaces. Lecture Notes in Mathematics, vol. 1200. Springer, Berlin (1986)
Miltersen, P.B.: Cell probe complexity—a survey. In: 19th Conference on the Foundations of Software Technology and Theoretical Computer Science, FSTTCS (1999). Advances in Data Structures Workshop
Navarro, G.: Searching in metric spaces by spatial approximation. VLDB J. 11, 28–46 (2002)
Navarro, G.: Analysing metric space indexes: what for? Invited paper. In: Proc. 2nd Int. Workshop on Similarity Search and Applications, SISAP 2009, Prague, Czech Republic, pp. 3–10 (2009)
Navarro, G., Reyes, N.: Dynamic spatial approximation trees for massive data. In: Proc. 2nd Int. Workshop on Similarity Search and Applications, SISAP 2009, Prague, Czech Republic, pp. 81–88 (2009)
Panigrahy, R., Talwar, K., Wieder, U.: A geometric approach to lower bounds for approximate near-neighbor search and partial match. In: Proc. 49th IEEE Symp. on Foundations of Computer Science, pp. 414–423 (2008)
Panigrahy, R., Talwar, K., Wieder, U.: Lower bounds on near neighbor search via metric expansion. In: Foundations of Computer Science, FOCS, pp. 805–814 (2010)
Pǎtrascu, M., Thorup, M.: Higher lower bounds for near-neighbor and further rich problems. In: Proc. 47th IEEE Symp. on Foundations of Computer Science, pp. 646–654 (2006)
Pestov, V.: On the geometry of similarity search: dimensionality curse and concentration of measure. Inf. Process. Lett. 73, 47–51 (2000)
Pestov, V.: An axiomatic approach to intrinsic dimension of a dataset. Neural Netw. 21, 204–213 (2008)
Pestov, V.: Indexability, concentration, and VC theory. J. Discrete Algorithms (2012). doi:10.1016/j.jda.2011.10.002
Pestov, V., Stojmirović, A.: Indexing schemes for similarity search: an illustrated paradigm. Fundam. Inform. 70, 367–385 (2006)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2005)
Santini, S.: Exploratory Image Databases: Content-Based Retrieval. Academic Press, New York (2001)
Shaft, U., Ramakrishnan, R.: Theory of nearest neighbors indexability. ACM Trans. Database Syst. 31, 814–838 (2006)
Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40, 175–179 (1991)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Vempala, S.S.: The Random Projection Method. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 65. American Mathematical Society, Providence (2004)
Vidyasagar, M.: Learning and Generalization, with Applications to Neural Networks, 2nd edn. Springer, London (2003)
Volnyansky, I., Pestov, V.: Curse of dimensionality in pivot-based indexes. In: Proc. 2nd Int. Workshop on Similarity Search and Applications, SISAP 2009, Prague, Czech Republic, pp. 39–46 (2009)
Weber, R., Schek, H.-J., Blott, S.: A quantatitive analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24-th VLDB Conference, New York, pp. 194–205 (1998)
White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proc. 12th Conf. on Data Engineering, ICDE’96, La Jolla, CA, pp. 516–523 (1996)
Yianilos, P.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proc. 3rd Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 311–321 (1993)
Zezula, P., Amato, G., Dohnal, Y., Batko, M.: Similarity Search. The Metric Space Approach. Springer, New York (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pestov, V. Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions. Algorithmica 66, 310–328 (2013). https://doi.org/10.1007/s00453-012-9638-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-012-9638-2