doi:10.1016/j.is.2005.12.001
Copyright © 2006 Elsevier B.V. All rights reserved.
Branch-and-bound processing of ranked queries
aDepartment of Computer Science, City University of Hong Kong, Tat Chee Avenue, Hong Kong
bSchool of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
cDepartment of Computer Science, Hong Kong University of Science and Technology, Clearwater Bay, Hong Kong
dDepartment of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
Received 22 April 2003;
revised 15 December 2005;
accepted 21 December 2005.
Recommended by Y. Ioannidis.
Available online 18 January 2006.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Despite the importance of ranked queries in numerous applications involving multi-criteria decision making, they are not efficiently supported by traditional database systems. In this paper, we propose a simple yet powerful technique for processing such queries based on multi-dimensional access methods and branch-and-bound search. The advantages of the proposed methodology are: (i) it is space efficient, requiring only a single index on the given relation (storing each tuple at most once), (ii) it achieves significant (i.e., orders of magnitude) performance gains with respect to the current state-of-the-art, (iii) it can efficiently handle data updates, and (iv) it is applicable to other important variations of ranked search (including the support for non-monotone preference functions), at no extra space overhead. We confirm the superiority of the proposed methods with a detailed experimental study.
Keywords: Databases; Ranked queries; R-tree; Branch-and-bound algorithms
Fig. 1. An example dataset.
Fig. 2. Nearest neighbor (NN) processing using R-trees. (a) The dataset, node MBRs, and heap content in NN search, (b) the R-tree.
Fig. 3. The multi-dimensional representation of data in Fig. 1.
Fig. 4. Algorithm for computing maxscore for monotone functions.
Fig. 5. The BRS algorithm.
Fig. 6. The identical score curves. (a) For top-1 query with f(t)=t.A1+A2, (b) for top-3 query with f(t)=t.A1+t.A2.
Fig. 7. Equi-width histograms and procedures in query estimation. (a) HISdata and the first tested search region, (b) the second tested search region, (c) the third tested search region, (d) HISleaf and the final estimated search region.
Fig. 8. A constrained top-1 query.
Fig. 9. The group-by top-k algorithm.
Fig. 10. Speedup of R-trees over Prefer (d=3, N=100k, k=250). (a) Zipf, (b) correlated.
Fig. 11. Comparison of alternative methods. (a) Query cost vs. k (d=3, N=100k), (b) query cost vs. d (k=250, N=100k), (c) query cost vs. N (k=250, d=3), (d) query cost vs. k (d=3, N=100k), (e) query cost vs. d (k=250, N=100k), (f) query cost vs. N (k=250, d=3).
Fig. 12. Performance of BRS vs. buffer size (k=250, d=3, N=100k).
Fig. 13. Accuracy of cost estimation for BRS. (a) Error vs. k (d=3, N=100k), (b) error vs. d (k=250, N=100k), (c) error vs. N (k=250, d=3).
Fig. 14. Estimating time (Zipf data). (a) Time vs. k (d=3, N=100k), (b) time vs. d (k=250, N=100k), (c) time vs. N (k=250, d=3).
Fig. 15. Percentage of reduced space over the dataset. (a) Saving vs. K (d=3, N=100k), (b) saving vs. d (K=250, N=100k), (c) saving vs. N (K=250, d=3).
Fig. 16. Points retained after the space reduction (a) Zipf, (b) correlated.
Fig. 17. Performance of BRS for non-linear functions (d=3, N=100k). (a) Skewed data, (b) correlated data.
Fig. 18. Performance of BRS for constrained ranked queries (k=250, d=3, N=100k). (a) Zipf, (b) correlated.
Fig. 19. Performance of GBRS (k=250, d=3, N=100k). (a) Zipf, (b) correlated.
Fig. 20. Performance of BRS for non-monotone functions (d=3, N=100k). (a) Zipf, (b) correlated.