skip to main content
research-article

Pigeonring: a principle for faster thresholded similarity search

Published:01 September 2018Publication History
Skip Abstract Section

Abstract

The pigeonhole principle states that if n items are contained in m boxes, then at least one box has no more than n/m items. It is utilized to solve many data management problems, especially for thresholded similarity searches. Despite many pigeonhole principle-based solutions proposed in the last few decades, the condition stated by the principle is weak. It only constrains the number of items in a single box. By organizing the boxes in a ring, we propose a new principle, called the pigeonring principle, which constrains the number of items in multiple boxes and yields stronger conditions.

To utilize the new principle, we focus on problems defined in the form of identifying data objects whose similarities or distances to the query is constrained by a threshold. Many solutions to these problems utilize the pigeonhole principle to find candidates that satisfy a filtering condition. By the new principle, stronger filtering conditions can be established. We show that the pigeonhole principle is a special case of the new principle. This suggests that all the pigeonhole principle-based solutions are possible to be accelerated by the new principle. A universal filtering framework is introduced to encompass the solutions to these problems based on the new principle. Besides, we discuss how to quickly find candidates specified by the new principle. The implementation requires only minor modifications on top of existing pigeonhole principle-based algorithms. Experimental results on real datasets demonstrate the applicability of the new principle as well as the superior performance of the algorithms based on the new principle.

References

  1. N. Ailon and B. Chazelle. Faster dimension reduction. Commun. ACM, 53(2):97--104, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Ajtai. The complexity of the pigeonhole principle. Combinatorica, 14(4):417--433, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  3. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of molecular biology, 215(3):403--410, 1990.Google ScholarGoogle Scholar
  4. D. C. Anastasiu and G. Karypis. L2AP: fast cosine similarity search with prefix L-2 norm bounds. In ICDE, pages 784--795, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Andoni, P. Indyk, T. Laarhoven, I. P. Razenshteyn, and L. Schmidt. Practical and optimal LSH for angular distance. In NIPS, pages 1225--1233, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Apostol. Modular Functions and Dirichlet Series in Number Theory. Graduate Texts in Mathematics. Springer New York, 1997.Google ScholarGoogle Scholar
  7. A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Berchtold, C. Böhm, and H. Kriegel. The pyramid-technique: Towards breaking the curse of dimensionality. In SIGMOD, pages 142--153, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In ICML, pages 97--104, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Bouros, S. Ge, and N. Mamoulis. Spatio-textual similarity joins. PVLDB, 6(1):1--12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Z. Broder. On the resemblance and containment of documents. In SEQS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Brualdi. Introductory Combinatorics. Math Classics. Pearson, 2017.Google ScholarGoogle Scholar
  14. S. R. Buss. Polynomial size proofs of the propositional pigeonhole principle. The Journal of Symbolic Logic, 52(4):916--927, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. R. Buss, R. Impagliazzo, J. Krajícek, P. Pudlák, A. A. Razborov, and J. Sgall. Proof complexity in algebraic systems and bounded depth frege systems with modular counting. Computational Complexity, 6(3):256--298, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. Cao, S. C. Li, B. C. Ooi, and A. K. H. Tung. Piers: An efficient model for similarity search in DNA sequence databases. SIGMOD Record, 33(2):39--44, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Chan, A. W. Fu, and C. T. Yu. Haar wavelets for efficient similarity search of time-series: With and without time warping. IEEE Trans. Knowl. Data Eng., 15(3):686--705, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In SIGMOD, pages 313--324, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Chaudhuri, V. Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, pages 792--803, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491--502, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Christiani and R. Pagh. Set similarity search beyond minhash. In STOC, pages 1094--1107, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Christiani, R. Pagh, and J. Sivertsen. Scalable and robust set similarity join. CoRR, abs/1707.06814, 2017.Google ScholarGoogle Scholar
  24. P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. U. Daepp and P. Gorkin. Reading, Writing, and Proving: A Closer Look at Mathematics. Undergraduate Texts in Mathematics. Springer New York, 2003.Google ScholarGoogle Scholar
  26. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, pages 253--262, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Deng, A. Kim, S. Madden, and M. Stonebraker. Silkmoth: An efficient method for finding related sets with maximum matching constraints. PVLDB, 10(10):1082--1093, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Deng, G. Li, and J. Feng. A pivotal prefix based filtering algorithm for string similarity search. In SIGMOD, pages 673--684, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Deng, G. Li, J. Feng, and W. Li. Top-k string similarity search with edit-distance constraints. In ICDE, pages 925--936, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Deng, G. Li, H. Wen, and J. Feng. An efficient partition based method for exact set similarity joins. PVLDB, 9(4):360--371, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Deng, Y. Tao, and G. Li. Overlap set similarity joins with theoretical guarantees. In SIGMOD, pages 905--920, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. J. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB, 1(2):1542--1552, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Feng, J. Wang, and G. Li. Trie-join: a trie-based method for efficient string similarity joins. VLDB J., 21(4):437--461, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Fenz, D. Lange, A. Rheinländer, F. Naumann, and U. Leser. Efficient similarity search in very large string sets. In SSDBM, pages 262--279, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. E. Frentzos, K. Gratsias, and Y. Theodoridis. Index-based most similar trajectory search. In ICDE, pages 816--825, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  36. A. W. Fu, E. J. Keogh, L. Y. H. Lau, C. A. Ratanamahatana, and R. C. Wong. Scaling and time warping in time series querying. VLDB J., 17(4):899--921, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, pages 541--552, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Hadjieleftheriou, A. Chandel, N. Koudas, and D. Srivastava. Fast indexes and algorithms for set similarity selection queries. In ICDE, pages 267--276, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Haken. The intractability of resolution. Theor. Comput. Sci., 39:297--308, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  42. Y. Hwang, B. Han, and H. Ahn. A fast nearest neighbor search algorithm by nonlinear embedding. In CVPR, pages 3053--3060, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b<sup>+</sup>-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2):364--397, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. Searching in one billion vectors: re-rank with source coding. CoRR, abs/1102.3828, 2011.Google ScholarGoogle Scholar
  45. Y. Jiang, G. Li, J. Feng, and W. Li. String similarity joins: An experimental evaluation. PVLDB, 7(8):625--636, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. Jin, S. S. Bhowmick, B. Choi, and S. Zhou. PRAGUE: towards blending practical visual subgraph query formulation and query processing. In ICDE, pages 222--233, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. E. J. Keogh and C. A. Ratanamahatana. Exact indexing of dynamic time warping. Knowl. Inf. Syst., 7(3):358--386, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  48. E. J. Keogh, L. Wei, X. Xi, M. Vlachos, S. Lee, and P. Protopapas. Supporting exact indexing of arbitrarily rotated shapes and periodic time series under euclidean and warping distance measures. VLDB J., 18(3):611--630, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Kim, S. Park, and W. W. Chu. An index-based approach for similarity search supporting time warping in large sequence databases. In ICDE, pages 607--614, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Krajícek, P. Pudlák, and A. R. Woods. An exponenetioal lower bound to the size of bounded depth frege proofs of the pigeonhole principle. Random Struct. Algorithms, 7(1):15--40, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. T. W. Lam, W. Sung, S. Tam, C. Wong, and S. Yiu. Compressed indexing and local alignment of DNA. Bioinformatics, 24(6):791--797, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. Li, J. Lu, and Y. Lu. Efficient merging and filtering algorithms for approximate string searches. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. C. Li, B. Wang, and X. Yang. VGRAM: Improving performance of approximate queries on string collections using variable-length grams. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. G. Li, D. Deng, and J. Feng. A partition-based method for string similarity joins with edit-distance constraints. ACM Trans. Database Syst., 38(2):9:1--9:33, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. G. Li, D. Deng, J. Wang, and J. Feng. Pass-Join: A partition-based method for similarity joins. PVLDB, 5(1):253--264, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Y. Liang and P. Zhao. Similarity search in graph databases: A multi-layered indexing approach. In ICDE, pages 783--794, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  57. A. X. Liu, K. Shen, and E. Torng. Large scale hamming distance query processing. In ICDE, pages 553--564, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. W. Lu, X. Du, M. Hadjieleftheriou, and B. C. Ooi. Efficiently supporting edit distance based string similarity search using B<sup>+</sup>-trees. IEEE Trans. Knowl. Data Eng., 26(12):2983--2996, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  59. Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB, pages 950--961, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. W. Mann and N. Augsten. PEL: position-enhanced length filter for set similarity joins. In Proc. GvD (Foundations of Databases), pages 89--94, 2014.Google ScholarGoogle Scholar
  61. W. Mann, N. Augsten, and P. Bouros. An empirical evaluation of set similarity join techniques. PVLDB, 9(9):636--647, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. C. Meek, J. M. Patel, and S. Kasetty. OASIS: an online and accurate technique for local-alignment searches on biological sequences. In VLDB, pages 910--921, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. M. Mor and A. S. Fraenkel. A hash code method for detecting and correcting spelling errors. Commun. ACM, 25(12):935--938, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. R. Neamtu, R. Ahsan, E. A. Rundensteiner, and G. N. Sárközy. Interactive time series exploration powered by the marriage of similarity distances. PVLDB, 10(3):169--180, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. M. Norouzi, A. Punjani, and D. J. Fleet. Fast search in hamming space with multi-index hashing. In CVPR, pages 3108--3115, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. E. Ong and M. Bober. Improved hamming distance search using variable length hashing. In CVPR, pages 2000--2008, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  67. B. C. Ooi, K. Tan, C. Yu, and S. Bressan. Indexing the edges - A simple and yet efficient approach to high-dimensional indexing. In PODS, pages 166--174, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. P. Papapetrou, V. Athitsos, G. Kollios, and D. Gunopulos. Reference-based alignment in large sequence databases. PVLDB, 2(1):205--216, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. J. B. Paris, A. J. Wilkie, and A. R. Woods. Provability of the pigeonhole principle and the existence of infinitely many primes. The Journal of Symbolic Logic, 53(4):1235--1244, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  70. J. Peng, H. Wang, J. Li, and H. Gao. Set-based similarity search for time series. In SIGMOD, pages 2039--2052, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. T. Pitassi, P. Beame, and R. Impagliazzo. Exponential lower bounds for the pigeonhole principle. Computational Complexity, 3:97--140, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. J. Qin, W. Wang, C. Xiao, Y. Lu, X. Lin, and H. Wang. Asymmetric signature schemes for efficient exact edit similarity query processing. ACM Trans. Database Syst., 38(3):16:1--16:44, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. J. Qin, Y. Wang, C. Xiao, W. Wang, X. Lin, and Y. Ishikawa. GPH: Similarity search in hamming space. In ICDE, pages 29--40, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  74. J. Qin and C. Xiao. Pigeonring: A principle for faster thresholded similarity search. CoRR, abs/1804.01614, 2018.Google ScholarGoogle Scholar
  75. T. Rakthanmanon, B. J. L. Campana, A. Mueen, G. E. A. P. A. Batista, M. B. Westover, Q. Zhu, J. Zakaria, and E. J. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD, pages 262--270, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. S. Ranu, D. P, A. D. Telang, P. Deshpande, and S. Raghavan. Indexing and matching trajectories under inconsistent sampling rates. In ICDE, pages 999--1010, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  77. R. Raz. Resolution lower bounds for the weak pigeonhole principle. J. ACM, 51(2):115--138, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. A. A. Razborov. Proof complexity of pigeonhole principles. In DLT, pages 100--116, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. L. A. Ribeiro and T. Härder. Generalizing prefix filtering to improve set similarity joins. Inf. Syst., 36(1):62--78, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. K. Riesen and H. Bunke. IAM graph database repository for graph based pattern recognition and machine learning. In SSPR & SPR, pages 287--297, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Y. Sakurai, M. Yoshikawa, and C. Faloutsos. FTW: fast similarity search under the time warping distance. In PODS, pages 326--337, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. H. Samet. Foundations of multidimensional and metric data structures. Morgan Kaufmann, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. S. Sarawagi and A. Kirpal. Efficient set joins on similarity predicates. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. V. Satuluri and S. Parthasarathy. Bayesian locality sensitive hashing for fast similarity search. PVLDB, 5(5):430--441, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. A. Savasere, E. Omiecinski, and S. B. Navathe. An efficient algorithm for mining association rules in large databases. In VLDB, pages 432--444, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. H. Shang, X. Lin, Y. Zhang, J. X. Yu, and W. Wang. Connected substructure similarity search. In SIGMOD, pages 903--914, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. H. Shang, K. Zhu, X. Lin, Y. Zhang, and R. Ichise. Similarity search on supergraph containment. In ICDE, pages 637--648, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  88. S. Shang, L. Chen, Z. Wei, C. S. Jensen, K. Zheng, and P. Kalnis. Trajectory similarity join in spatial networks. PVLDB, 10(11):1178--1189, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Z. Shang, G. Li, and Z. Bao. DITA: distributed in-memory trajectory analytics. In SIGMOD, pages 725--740, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. In Journal of Molecular Biology, volume 147(1), pages 195--197, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  91. Y. Sun, W. Wang, J. Qin, Y. Zhang, and X. Lin. SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB, 8(1):1--12, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. N. Ta, G. Li, Y. Xie, C. Li, S. Hao, and J. Feng. Signature-based trajectory similarity join. IEEE Trans. Knowl. Data Eng., 29(4):870--883, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Y. Tabei, T. Uno, M. Sugiyama, and K. Tsuda. Single versus multiple sorting in all pairs similarity search. Journal of Machine Learning Research - Proceedings Track, 13:145--160, 2010.Google ScholarGoogle Scholar
  94. Y. Tang, L. H. U, Y. Cai, N. Mamoulis, and R. Cheng. Earth mover's distance based similarity search at scale. PVLDB, 7(4):313--324, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. T. Tao. Small and large gaps in the primes. Latinos in the Mathematical Sciences Conference, April 2015.Google ScholarGoogle Scholar
  96. Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst., 35(3), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. E. Tiakas, A. Papadopoulos, A. Nanopoulos, Y. Manolopoulos, D. Stojanovic, and S. Djordjevic-Kajan. Searching for similar trajectories in spatial networks. Journal of Systems and Software, 82(5):772--788, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. E. Tiakas and D. Rafailidis. Scalable trajectory similarity search based on locations in spatial networks. In MEDI, pages 213--224, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1958--1970, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In ICDE, pages 673--684, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. G. Wang, B. Wang, X. Yang, and G. Yu. Efficiently indexing large sparse graphs for similarity search. IEEE Trans. Knowl. Data Eng., 24(3):440--451, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. J. Wang, G. Li, and J. Feng. Can we beat the prefix filtering?: an adaptive framework for similarity join and search. In SIGMOD, pages 85--96, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. J. Wang, G. Li, and J. Feng. Extending string similarity join to tolerant fuzzy token matching. ACM Trans. Database Syst., 39(1):7:1--7:45, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. J. Wang, H. T. Shen, J. Song, and J. Ji. Hashing for similarity search: A survey. CoRR, abs/1408.2927, 2014.Google ScholarGoogle Scholar
  105. P. Wang, C. Xiao, J. Qin, W. Wang, X. Zhang, and Y. Ishikawa. Local similarity search for unstructured text. In SIGMOD, pages 1991--2005, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. S. Wang, Z. Bao, J. S. Culpepper, Z. Xie, Q. Liu, and X. Qin. Torch: A search engine for trajectory data. In SIGIR, pages 535--544, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. W. Wang, J. Qin, C. Xiao, X. Lin, and H. T. Shen. Vchunkjoin: An efficient algorithm for edit similarity joins. IEEE Trans. Knowl. Data Eng., 25(8):1916--1929, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. W. Wang, C. Xiao, X. Lin, and C. Zhang. Efficient approximate entity extraction with edit constraints. In SIMGOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. X. Wang, X. Ding, A. K. H. Tung, S. Ying, and H. Jin. An efficient graph indexing method. In ICDE, pages 210--221, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. X. Wang, X. Ding, A. K. H. Tung, and Z. Zhang. Efficient and effective KNN sequence search with approximate n-grams. PVLDB, 7(1):1--12, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. X. Wang, L. Qin, X. Lin, Y. Zhang, and L. Chang. Leveraging set relations in exact set similarity join. PVLDB, 10(9):925--936, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Y. Wang, J. Qin, and W. Wang. Efficient approximate entity matching using jaro-winkler distance. In WISE, pages 231--239, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. H. Wei, J. X. Yu, and C. Lu. String similarity search: A hash-based approach. IEEE Trans. Knowl. Data Eng., 30(1):170--184, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  114. Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753--1760, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. C. Xiao, W. Wang, and X. Lin. Ed-Join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB, 1(1):933--944, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. C. Xiao, W. Wang, X. Lin, J. X. Yu, and G. Wang. Efficient similarity joins for near-duplicate detection. ACM Trans. Database Syst., 36(3):15:1--15:41, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. D. Xie, F. Li, and J. M. Phillips. Distributed trajectory similarity search. PVLDB, 10(11):1478--1489, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. J. Xu, Z. Zhang, A. K. H. Tung, and G. Yu. Efficient and effective similarity search over probabilistic data based on earth mover's distance. PVLDB, 3(1):758--769, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, pages 766--777, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. X. Yang, H. Liu, and B. Wang. ALAE: accelerating local alignment with affine gap exactly in biosequence databases. PVLDB, 5(11):1507--1518, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. X. Yang, B. Wang, and C. Li. Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. In SIGMOD, pages 353--364, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. X. Yang, B. Wang, C. Li, J. Wang, and X. Xie. Efficient direct search on compressed genomic data. In ICDE, pages 961--972, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. X. Yang, Y. Wang, B. Wang, and W. Wang. Local filtering: Improving the performance of approximate queries on string collections. In SIGMOD, pages 377--392, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. B. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In ICDE, pages 201--208, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. R. Ying, J. Pan, K. Fox, and P. K. Agarwal. A simple efficient approximation algorithm for dynamic time warping. In SIGSPATIAL GIS, pages 21:1--21:10, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. M. Yu, G. Li, D. Deng, and J. Feng. String similarity search and join: a survey. Frontiers Comput. Sci., 10(3):399--417, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. M. Yu, J. Wang, G. Li, Y. Zhang, D. Deng, and J. Feng. A unified framework for string similarity search with edit-distance constraint. VLDB J., 26(2):249--274, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou. Comparing stars: On approximating graph edit distance. PVLDB, 2(1):25--36, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. J. Zhai, Y. Lou, and J. Gehrke. Atlas: a probabilistic algorithm for high dimensional similarity search. In SIGMOD, pages 997--1008, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. R. Zhang, B. C. Ooi, and K.-L. Tan. Making the pyramid technique robust to query types and workloads. In ICDE, pages 313--324, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. W. Zhang, K. Gao, Y. Zhang, and J. Li. Efficient approximate nearest neighbor search with integrated binary codes. In ACM Multimedia, pages 1189--1192, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. X. Zhang, J. Qin, W. Wang, Y. Sun, and J. Lu. Hmsearch: an efficient hamming distance query processing algorithm. In SSDBM, page 19, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Y. Zhang, X. Li, J. Wang, Y. Zhang, C. Xing, and X. Yuan. An efficient framework for exact set similarity search using tree structure indexes. In ICDE, pages 759--770, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  134. Z. Zhang, M. Hadjieleftheriou, B. C. Ooi, and D. Srivastava. B<sup>ed</sup>-tree: an all-purpose index structure for string similarity search based on edit distance. In SIGMOD, pages 915--926, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Z. Zhang, B. C. Ooi, S. Parthasarathy, and A. K. H. Tung. Similarity search on bregman divergence: Towards non-metric indexing. PVLDB, 2(1):13--24, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. X. Zhao, C. Xiao, X. Lin, W. Wang, and Y. Ishikawa. Efficient processing of graph similarity queries with edit distance constraints. VLDB J., 22(6):727--752, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. X. Zhao, C. Xiao, X. Lin, W. Zhang, and Y. Wang. Efficient structure similarity searches: a partition-based approach. VLDB J., 27(1):53--78, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. W. Zheng, L. Zou, X. Lian, D. Wang, and D. Zhao. Efficient graph similarity search over large graph databases. IEEE Trans. Knowl. Data Eng., 27(4):964--978, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Y. Zhu and D. E. Shasha. Warping indexes with envelope transforms for query by humming. In SIGMOD, pages 181--192, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pigeonring: a principle for faster thresholded similarity search
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 12, Issue 1
      September 2018
      84 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 September 2018
      Published in pvldb Volume 12, Issue 1

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader