research-article

Pigeonring: a principle for faster thresholded similarity search

Authors:
Jianbin Qin

The University of Edinburgh United Kingdom

The University of Edinburgh United Kingdom
View Profile

,
Chuan Xiao

Nagoya University Japan

Nagoya University Japan
View Profile

Proceedings of the VLDB Endowment Volume 12 Issue 1pp 28–42https://doi.org/10.14778/3275536.3275539

Published:01 September 2018Publication History

Proceedings of the VLDB Endowment

Abstract

The pigeonhole principle states that if n items are contained in m boxes, then at least one box has no more than n/m items. It is utilized to solve many data management problems, especially for thresholded similarity searches. Despite many pigeonhole principle-based solutions proposed in the last few decades, the condition stated by the principle is weak. It only constrains the number of items in a single box. By organizing the boxes in a ring, we propose a new principle, called the pigeonring principle, which constrains the number of items in multiple boxes and yields stronger conditions.

To utilize the new principle, we focus on problems defined in the form of identifying data objects whose similarities or distances to the query is constrained by a threshold. Many solutions to these problems utilize the pigeonhole principle to find candidates that satisfy a filtering condition. By the new principle, stronger filtering conditions can be established. We show that the pigeonhole principle is a special case of the new principle. This suggests that all the pigeonhole principle-based solutions are possible to be accelerated by the new principle. A universal filtering framework is introduced to encompass the solutions to these problems based on the new principle. Besides, we discuss how to quickly find candidates specified by the new principle. The implementation requires only minor modifications on top of existing pigeonhole principle-based algorithms. Experimental results on real datasets demonstrate the applicability of the new principle as well as the superior performance of the algorithms based on the new principle.

References

N. Ailon and B. Chazelle. Faster dimension reduction. Commun. ACM, 53(2):97--104, 2010. Google ScholarDigital Library
M. Ajtai. The complexity of the pigeonhole principle. Combinatorica, 14(4):417--433, 1994.Google ScholarCross Ref
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of molecular biology, 215(3):403--410, 1990.Google Scholar
D. C. Anastasiu and G. Karypis. L2AP: fast cosine similarity search with prefix L-2 norm bounds. In ICDE, pages 784--795, 2014.Google ScholarCross Ref
A. Andoni, P. Indyk, T. Laarhoven, I. P. Razenshteyn, and L. Schmidt. Practical and optimal LSH for angular distance. In NIPS, pages 1225--1233, 2015. Google ScholarDigital Library
T. Apostol. Modular Functions and Dirichlet Series in Number Theory. Graduate Texts in Mathematics. Springer New York, 1997.Google Scholar
A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, 2006. Google ScholarDigital Library
R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In WWW, 2007. Google ScholarDigital Library
S. Berchtold, C. Böhm, and H. Kriegel. The pyramid-technique: Towards breaking the curse of dimensionality. In SIGMOD, pages 142--153, 1998. Google ScholarDigital Library
A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In ICML, pages 97--104, 2006. Google ScholarDigital Library
P. Bouros, S. Ge, and N. Mamoulis. Spatio-textual similarity joins. PVLDB, 6(1):1--12, 2012. Google ScholarDigital Library
A. Z. Broder. On the resemblance and containment of documents. In SEQS, 1997. Google ScholarDigital Library
R. Brualdi. Introductory Combinatorics. Math Classics. Pearson, 2017.Google Scholar
S. R. Buss. Polynomial size proofs of the propositional pigeonhole principle. The Journal of Symbolic Logic, 52(4):916--927, 1987.Google ScholarCross Ref
S. R. Buss, R. Impagliazzo, J. Krajícek, P. Pudlák, A. A. Razborov, and J. Sgall. Proof complexity in algebraic systems and bounded depth frege systems with modular counting. Computational Complexity, 6(3):256--298, 1997. Google ScholarDigital Library
X. Cao, S. C. Li, B. C. Ooi, and A. K. H. Tung. Piers: An efficient model for similarity search in DNA sequence databases. SIGMOD Record, 33(2):39--44, 2004. Google ScholarDigital Library
K. Chan, A. W. Fu, and C. T. Yu. Haar wavelets for efficient similarity search of time-series: With and without time warping. IEEE Trans. Knowl. Data Eng., 15(3):686--705, 2003. Google ScholarDigital Library
S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In SIGMOD, pages 313--324, 2003. Google ScholarDigital Library
S. Chaudhuri, V. Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In ICDE, 2006. Google ScholarDigital Library
L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, pages 792--803, 2004. Google ScholarDigital Library
L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491--502, 2005. Google ScholarDigital Library
T. Christiani and R. Pagh. Set similarity search beyond minhash. In STOC, pages 1094--1107, 2017. Google ScholarDigital Library
T. Christiani, R. Pagh, and J. Sivertsen. Scalable and robust set similarity join. CoRR, abs/1707.06814, 2017.Google Scholar
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435, 1997. Google ScholarDigital Library
U. Daepp and P. Gorkin. Reading, Writing, and Proving: A Closer Look at Mathematics. Undergraduate Texts in Mathematics. Springer New York, 2003.Google Scholar
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, pages 253--262, 2004. Google ScholarDigital Library
D. Deng, A. Kim, S. Madden, and M. Stonebraker. Silkmoth: An efficient method for finding related sets with maximum matching constraints. PVLDB, 10(10):1082--1093, 2017. Google ScholarDigital Library
D. Deng, G. Li, and J. Feng. A pivotal prefix based filtering algorithm for string similarity search. In SIGMOD, pages 673--684, 2014. Google ScholarDigital Library
D. Deng, G. Li, J. Feng, and W. Li. Top-k string similarity search with edit-distance constraints. In ICDE, pages 925--936, 2013. Google ScholarDigital Library
D. Deng, G. Li, H. Wen, and J. Feng. An efficient partition based method for exact set similarity joins. PVLDB, 9(4):360--371, 2015. Google ScholarDigital Library
D. Deng, Y. Tao, and G. Li. Overlap set similarity joins with theoretical guarantees. In SIGMOD, pages 905--920, 2018. Google ScholarDigital Library
H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. J. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB, 1(2):1542--1552, 2008. Google ScholarDigital Library
J. Feng, J. Wang, and G. Li. Trie-join: a trie-based method for efficient string similarity joins. VLDB J., 21(4):437--461, 2012. Google ScholarDigital Library
D. Fenz, D. Lange, A. Rheinländer, F. Naumann, and U. Leser. Efficient similarity search in very large string sets. In SSDBM, pages 262--279, 2012. Google ScholarDigital Library
E. Frentzos, K. Gratsias, and Y. Theodoridis. Index-based most similar trajectory search. In ICDE, pages 816--825, 2007.Google ScholarCross Ref
A. W. Fu, E. J. Keogh, L. Y. H. Lau, C. A. Ratanamahatana, and R. C. Wong. Scaling and time warping in time series querying. VLDB J., 17(4):899--921, 2008. Google ScholarDigital Library
J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, pages 541--552, 2012. Google ScholarDigital Library
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarDigital Library
L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, 2001. Google ScholarDigital Library
M. Hadjieleftheriou, A. Chandel, N. Koudas, and D. Srivastava. Fast indexes and algorithms for set similarity selection queries. In ICDE, pages 267--276, 2008. Google ScholarDigital Library
A. Haken. The intractability of resolution. Theor. Comput. Sci., 39:297--308, 1985.Google ScholarCross Ref
Y. Hwang, B. Han, and H. Ahn. A fast nearest neighbor search algorithm by nonlinear embedding. In CVPR, pages 3053--3060, 2012. Google ScholarDigital Library
H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2):364--397, 2005. Google ScholarDigital Library
H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. Searching in one billion vectors: re-rank with source coding. CoRR, abs/1102.3828, 2011.Google Scholar
Y. Jiang, G. Li, J. Feng, and W. Li. String similarity joins: An experimental evaluation. PVLDB, 7(8):625--636, 2014. Google ScholarDigital Library
C. Jin, S. S. Bhowmick, B. Choi, and S. Zhou. PRAGUE: towards blending practical visual subgraph query formulation and query processing. In ICDE, pages 222--233, 2012. Google ScholarDigital Library
E. J. Keogh and C. A. Ratanamahatana. Exact indexing of dynamic time warping. Knowl. Inf. Syst., 7(3):358--386, 2005. Google ScholarCross Ref
E. J. Keogh, L. Wei, X. Xi, M. Vlachos, S. Lee, and P. Protopapas. Supporting exact indexing of arbitrarily rotated shapes and periodic time series under euclidean and warping distance measures. VLDB J., 18(3):611--630, 2009. Google ScholarDigital Library
S. Kim, S. Park, and W. W. Chu. An index-based approach for similarity search supporting time warping in large sequence databases. In ICDE, pages 607--614, 2001. Google ScholarDigital Library
J. Krajícek, P. Pudlák, and A. R. Woods. An exponenetioal lower bound to the size of bounded depth frege proofs of the pigeonhole principle. Random Struct. Algorithms, 7(1):15--40, 1995. Google ScholarDigital Library
T. W. Lam, W. Sung, S. Tam, C. Wong, and S. Yiu. Compressed indexing and local alignment of DNA. Bioinformatics, 24(6):791--797, 2008. Google ScholarDigital Library
C. Li, J. Lu, and Y. Lu. Efficient merging and filtering algorithms for approximate string searches. In ICDE, 2008. Google ScholarDigital Library
C. Li, B. Wang, and X. Yang. VGRAM: Improving performance of approximate queries on string collections using variable-length grams. In VLDB, 2007. Google ScholarDigital Library
G. Li, D. Deng, and J. Feng. A partition-based method for string similarity joins with edit-distance constraints. ACM Trans. Database Syst., 38(2):9:1--9:33, 2013. Google ScholarDigital Library
G. Li, D. Deng, J. Wang, and J. Feng. Pass-Join: A partition-based method for similarity joins. PVLDB, 5(1):253--264, 2012. Google ScholarDigital Library
Y. Liang and P. Zhao. Similarity search in graph databases: A multi-layered indexing approach. In ICDE, pages 783--794, 2017.Google ScholarCross Ref
A. X. Liu, K. Shen, and E. Torng. Large scale hamming distance query processing. In ICDE, pages 553--564, 2011. Google ScholarDigital Library
W. Lu, X. Du, M. Hadjieleftheriou, and B. C. Ooi. Efficiently supporting edit distance based string similarity search using B+-trees. IEEE Trans. Knowl. Data Eng., 26(12):2983--2996, 2014.Google ScholarCross Ref
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB, pages 950--961, 2007. Google ScholarDigital Library
W. Mann and N. Augsten. PEL: position-enhanced length filter for set similarity joins. In Proc. GvD (Foundations of Databases), pages 89--94, 2014.Google Scholar
W. Mann, N. Augsten, and P. Bouros. An empirical evaluation of set similarity join techniques. PVLDB, 9(9):636--647, 2016. Google ScholarDigital Library
C. Meek, J. M. Patel, and S. Kasetty. OASIS: an online and accurate technique for local-alignment searches on biological sequences. In VLDB, pages 910--921, 2003. Google ScholarDigital Library
M. Mor and A. S. Fraenkel. A hash code method for detecting and correcting spelling errors. Commun. ACM, 25(12):935--938, 1982. Google ScholarDigital Library
R. Neamtu, R. Ahsan, E. A. Rundensteiner, and G. N. Sárközy. Interactive time series exploration powered by the marriage of similarity distances. PVLDB, 10(3):169--180, 2016. Google ScholarDigital Library
M. Norouzi, A. Punjani, and D. J. Fleet. Fast search in hamming space with multi-index hashing. In CVPR, pages 3108--3115, 2012. Google ScholarDigital Library
E. Ong and M. Bober. Improved hamming distance search using variable length hashing. In CVPR, pages 2000--2008, 2016.Google ScholarCross Ref
B. C. Ooi, K. Tan, C. Yu, and S. Bressan. Indexing the edges - A simple and yet efficient approach to high-dimensional indexing. In PODS, pages 166--174, 2000. Google ScholarDigital Library
P. Papapetrou, V. Athitsos, G. Kollios, and D. Gunopulos. Reference-based alignment in large sequence databases. PVLDB, 2(1):205--216, 2009. Google ScholarDigital Library
J. B. Paris, A. J. Wilkie, and A. R. Woods. Provability of the pigeonhole principle and the existence of infinitely many primes. The Journal of Symbolic Logic, 53(4):1235--1244, 1988.Google ScholarCross Ref
J. Peng, H. Wang, J. Li, and H. Gao. Set-based similarity search for time series. In SIGMOD, pages 2039--2052, 2016. Google ScholarDigital Library
T. Pitassi, P. Beame, and R. Impagliazzo. Exponential lower bounds for the pigeonhole principle. Computational Complexity, 3:97--140, 1993. Google ScholarDigital Library
J. Qin, W. Wang, C. Xiao, Y. Lu, X. Lin, and H. Wang. Asymmetric signature schemes for efficient exact edit similarity query processing. ACM Trans. Database Syst., 38(3):16:1--16:44, 2013. Google ScholarDigital Library
J. Qin, Y. Wang, C. Xiao, W. Wang, X. Lin, and Y. Ishikawa. GPH: Similarity search in hamming space. In ICDE, pages 29--40, 2018.Google ScholarCross Ref
J. Qin and C. Xiao. Pigeonring: A principle for faster thresholded similarity search. CoRR, abs/1804.01614, 2018.Google Scholar
T. Rakthanmanon, B. J. L. Campana, A. Mueen, G. E. A. P. A. Batista, M. B. Westover, Q. Zhu, J. Zakaria, and E. J. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD, pages 262--270, 2012. Google ScholarDigital Library
S. Ranu, D. P, A. D. Telang, P. Deshpande, and S. Raghavan. Indexing and matching trajectories under inconsistent sampling rates. In ICDE, pages 999--1010, 2015.Google ScholarCross Ref
R. Raz. Resolution lower bounds for the weak pigeonhole principle. J. ACM, 51(2):115--138, 2004. Google ScholarDigital Library
A. A. Razborov. Proof complexity of pigeonhole principles. In DLT, pages 100--116, 2001. Google ScholarDigital Library
L. A. Ribeiro and T. Härder. Generalizing prefix filtering to improve set similarity joins. Inf. Syst., 36(1):62--78, 2011. Google ScholarDigital Library
K. Riesen and H. Bunke. IAM graph database repository for graph based pattern recognition and machine learning. In SSPR & SPR, pages 287--297, 2008. Google ScholarDigital Library
Y. Sakurai, M. Yoshikawa, and C. Faloutsos. FTW: fast similarity search under the time warping distance. In PODS, pages 326--337, 2005. Google ScholarDigital Library
H. Samet. Foundations of multidimensional and metric data structures. Morgan Kaufmann, 2006. Google ScholarDigital Library
S. Sarawagi and A. Kirpal. Efficient set joins on similarity predicates. In SIGMOD, 2004. Google ScholarDigital Library
V. Satuluri and S. Parthasarathy. Bayesian locality sensitive hashing for fast similarity search. PVLDB, 5(5):430--441, 2012. Google ScholarDigital Library
A. Savasere, E. Omiecinski, and S. B. Navathe. An efficient algorithm for mining association rules in large databases. In VLDB, pages 432--444, 1995. Google ScholarDigital Library
H. Shang, X. Lin, Y. Zhang, J. X. Yu, and W. Wang. Connected substructure similarity search. In SIGMOD, pages 903--914, 2010. Google ScholarDigital Library
H. Shang, K. Zhu, X. Lin, Y. Zhang, and R. Ichise. Similarity search on supergraph containment. In ICDE, pages 637--648, 2010.Google ScholarCross Ref
S. Shang, L. Chen, Z. Wei, C. S. Jensen, K. Zheng, and P. Kalnis. Trajectory similarity join in spatial networks. PVLDB, 10(11):1178--1189, 2017. Google ScholarDigital Library
Z. Shang, G. Li, and Z. Bao. DITA: distributed in-memory trajectory analytics. In SIGMOD, pages 725--740, 2018. Google ScholarDigital Library
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. In Journal of Molecular Biology, volume 147(1), pages 195--197, 1981.Google ScholarCross Ref
Y. Sun, W. Wang, J. Qin, Y. Zhang, and X. Lin. SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB, 8(1):1--12, 2014. Google ScholarDigital Library
N. Ta, G. Li, Y. Xie, C. Li, S. Hao, and J. Feng. Signature-based trajectory similarity join. IEEE Trans. Knowl. Data Eng., 29(4):870--883, 2017. Google ScholarDigital Library
Y. Tabei, T. Uno, M. Sugiyama, and K. Tsuda. Single versus multiple sorting in all pairs similarity search. Journal of Machine Learning Research - Proceedings Track, 13:145--160, 2010.Google Scholar
Y. Tang, L. H. U, Y. Cai, N. Mamoulis, and R. Cheng. Earth mover's distance based similarity search at scale. PVLDB, 7(4):313--324, 2013. Google ScholarDigital Library
T. Tao. Small and large gaps in the primes. Latinos in the Mathematical Sciences Conference, April 2015.Google Scholar
Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst., 35(3), 2010. Google ScholarDigital Library
E. Tiakas, A. Papadopoulos, A. Nanopoulos, Y. Manolopoulos, D. Stojanovic, and S. Djordjevic-Kajan. Searching for similar trajectories in spatial networks. Journal of Systems and Software, 82(5):772--788, 2009. Google ScholarDigital Library
E. Tiakas and D. Rafailidis. Scalable trajectory similarity search based on locations in spatial networks. In MEDI, pages 213--224, 2015. Google ScholarDigital Library
A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1958--1970, 2008. Google ScholarDigital Library
M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In ICDE, pages 673--684, 2002. Google ScholarDigital Library
G. Wang, B. Wang, X. Yang, and G. Yu. Efficiently indexing large sparse graphs for similarity search. IEEE Trans. Knowl. Data Eng., 24(3):440--451, 2012. Google ScholarDigital Library
J. Wang, G. Li, and J. Feng. Can we beat the prefix filtering?: an adaptive framework for similarity join and search. In SIGMOD, pages 85--96, 2012. Google ScholarDigital Library
J. Wang, G. Li, and J. Feng. Extending string similarity join to tolerant fuzzy token matching. ACM Trans. Database Syst., 39(1):7:1--7:45, 2014. Google ScholarDigital Library
J. Wang, H. T. Shen, J. Song, and J. Ji. Hashing for similarity search: A survey. CoRR, abs/1408.2927, 2014.Google Scholar
P. Wang, C. Xiao, J. Qin, W. Wang, X. Zhang, and Y. Ishikawa. Local similarity search for unstructured text. In SIGMOD, pages 1991--2005, 2016. Google ScholarDigital Library
S. Wang, Z. Bao, J. S. Culpepper, Z. Xie, Q. Liu, and X. Qin. Torch: A search engine for trajectory data. In SIGIR, pages 535--544, 2018. Google ScholarDigital Library
W. Wang, J. Qin, C. Xiao, X. Lin, and H. T. Shen. Vchunkjoin: An efficient algorithm for edit similarity joins. IEEE Trans. Knowl. Data Eng., 25(8):1916--1929, 2013. Google ScholarDigital Library
W. Wang, C. Xiao, X. Lin, and C. Zhang. Efficient approximate entity extraction with edit constraints. In SIMGOD, 2009. Google ScholarDigital Library
X. Wang, X. Ding, A. K. H. Tung, S. Ying, and H. Jin. An efficient graph indexing method. In ICDE, pages 210--221, 2012. Google ScholarDigital Library
X. Wang, X. Ding, A. K. H. Tung, and Z. Zhang. Efficient and effective KNN sequence search with approximate n-grams. PVLDB, 7(1):1--12, 2013. Google ScholarDigital Library
X. Wang, L. Qin, X. Lin, Y. Zhang, and L. Chang. Leveraging set relations in exact set similarity join. PVLDB, 10(9):925--936, 2017. Google ScholarDigital Library
Y. Wang, J. Qin, and W. Wang. Efficient approximate entity matching using jaro-winkler distance. In WISE, pages 231--239, 2017.Google ScholarDigital Library
H. Wei, J. X. Yu, and C. Lu. String similarity search: A hash-based approach. IEEE Trans. Knowl. Data Eng., 30(1):170--184, 2018.Google ScholarCross Ref
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753--1760, 2008. Google ScholarDigital Library
C. Xiao, W. Wang, and X. Lin. Ed-Join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB, 1(1):933--944, 2008. Google ScholarDigital Library
C. Xiao, W. Wang, X. Lin, J. X. Yu, and G. Wang. Efficient similarity joins for near-duplicate detection. ACM Trans. Database Syst., 36(3):15:1--15:41, 2011. Google ScholarDigital Library
D. Xie, F. Li, and J. M. Phillips. Distributed trajectory similarity search. PVLDB, 10(11):1478--1489, 2017. Google ScholarDigital Library
J. Xu, Z. Zhang, A. K. H. Tung, and G. Yu. Efficient and effective similarity search over probabilistic data based on earth mover's distance. PVLDB, 3(1):758--769, 2010. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, pages 766--777, 2005. Google ScholarDigital Library
X. Yang, H. Liu, and B. Wang. ALAE: accelerating local alignment with affine gap exactly in biosequence databases. PVLDB, 5(11):1507--1518, 2012. Google ScholarDigital Library
X. Yang, B. Wang, and C. Li. Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. In SIGMOD, pages 353--364, 2008. Google ScholarDigital Library
X. Yang, B. Wang, C. Li, J. Wang, and X. Xie. Efficient direct search on compressed genomic data. In ICDE, pages 961--972, 2013. Google ScholarDigital Library
X. Yang, Y. Wang, B. Wang, and W. Wang. Local filtering: Improving the performance of approximate queries on string collections. In SIGMOD, pages 377--392, 2015. Google ScholarDigital Library
B. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In ICDE, pages 201--208, 1998. Google ScholarDigital Library
R. Ying, J. Pan, K. Fox, and P. K. Agarwal. A simple efficient approximation algorithm for dynamic time warping. In SIGSPATIAL GIS, pages 21:1--21:10, 2016. Google ScholarDigital Library
M. Yu, G. Li, D. Deng, and J. Feng. String similarity search and join: a survey. Frontiers Comput. Sci., 10(3):399--417, 2016. Google ScholarDigital Library
M. Yu, J. Wang, G. Li, Y. Zhang, D. Deng, and J. Feng. A unified framework for string similarity search with edit-distance constraint. VLDB J., 26(2):249--274, 2017. Google ScholarDigital Library
Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou. Comparing stars: On approximating graph edit distance. PVLDB, 2(1):25--36, 2009. Google ScholarDigital Library
J. Zhai, Y. Lou, and J. Gehrke. Atlas: a probabilistic algorithm for high dimensional similarity search. In SIGMOD, pages 997--1008, 2011. Google ScholarDigital Library
R. Zhang, B. C. Ooi, and K.-L. Tan. Making the pyramid technique robust to query types and workloads. In ICDE, pages 313--324, 2004. Google ScholarDigital Library
W. Zhang, K. Gao, Y. Zhang, and J. Li. Efficient approximate nearest neighbor search with integrated binary codes. In ACM Multimedia, pages 1189--1192, 2011. Google ScholarDigital Library
X. Zhang, J. Qin, W. Wang, Y. Sun, and J. Lu. Hmsearch: an efficient hamming distance query processing algorithm. In SSDBM, page 19, 2013. Google ScholarDigital Library
Y. Zhang, X. Li, J. Wang, Y. Zhang, C. Xing, and X. Yuan. An efficient framework for exact set similarity search using tree structure indexes. In ICDE, pages 759--770, 2017.Google ScholarCross Ref
Z. Zhang, M. Hadjieleftheriou, B. C. Ooi, and D. Srivastava. Bed-tree: an all-purpose index structure for string similarity search based on edit distance. In SIGMOD, pages 915--926, 2010. Google ScholarDigital Library
Z. Zhang, B. C. Ooi, S. Parthasarathy, and A. K. H. Tung. Similarity search on bregman divergence: Towards non-metric indexing. PVLDB, 2(1):13--24, 2009. Google ScholarDigital Library
X. Zhao, C. Xiao, X. Lin, W. Wang, and Y. Ishikawa. Efficient processing of graph similarity queries with edit distance constraints. VLDB J., 22(6):727--752, 2013. Google ScholarDigital Library
X. Zhao, C. Xiao, X. Lin, W. Zhang, and Y. Wang. Efficient structure similarity searches: a partition-based approach. VLDB J., 27(1):53--78, 2018. Google ScholarDigital Library
W. Zheng, L. Zou, X. Lian, D. Wang, and D. Zhao. Efficient graph similarity search over large graph databases. IEEE Trans. Knowl. Data Eng., 27(4):964--978, 2015.Google ScholarDigital Library
Y. Zhu and D. E. Shasha. Warping indexes with envelope transforms for query by humming. In SIGMOD, pages 181--192, 2003. Google ScholarDigital Library

Index Terms

Pigeonring: a principle for faster thresholded similarity search
1. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Propagation Engineering in Wireless Communications
Read More
A Basic Study on E-interview
ICIE '10: Proceedings of the 2010 WASE International Conference on Information Engineering - Volume 04

This paper defines the concept of E-interview, analyze the advantage of E-interview and elaborate the reason of the application scope of the E-interview in the current personnel selection. This paper not only points out the important position and the ...
Read More
Principles of Foundation Engineering
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 12, Issue 1
September 2018
84 pages
ISSN:2150-8097
Editors:
Lei Chen
HKUST
,
Fatma Özcan
IBM Research - Almaden
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2018
Published in pvldb Volume 12, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 153
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Pigeonring: a principle for faster thresholded similarity search

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Propagation Engineering in Wireless Communications

A Basic Study on E-interview

Principles of Foundation Engineering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Pigeonring: a principle for faster thresholded similarity search

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Propagation Engineering in Wireless Communications

A Basic Study on E-interview

Principles of Foundation Engineering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media