Abstract
Top-k queries on large multi-attribute data sets are fundamental operations in information retrieval and ranking applications. In this article, we initiate research on the anytime behavior of top-k algorithms on exact and fuzzy data. In particular, given specific top-k algorithms (TA and TA-Sorted) we are interested in studying their progress toward identification of the correct result at any point during the algorithms’ execution. We adopt a probabilistic approach where we seek to report at any point of operation of the algorithm the confidence that the top-k result has been identified. Such a functionality can be a valuable asset when one is interested in reducing the runtime cost of top-k computations. We present a thorough experimental evaluation to validate our techniques using both synthetic and real data sets.
Similar content being viewed by others
References
Barbará D., Garcia-Molina H., Porter D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)
Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: mapping strategies and performance evaluation. TODS 27(2) (2002)
Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web accessible databases. In: Proceedings of ICDE, April 2002
Chang, K., Hwang, S.: Minimal probing: supporting expensive predicates for top-k queries. In: SIGMOD (2002)
Chaudhuri, S., Gravano, L.: Evaluating top-k selection queries. In: VLDB, pp. 397–410 (1999)
Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)
Cheng, R., Kalashnikov, D., Prabhakar, S.: Querying imprecise data in moving object environments. In: IEEE TKDE (2004)
Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB (2004)
chi Chang, Y., Bergman, L., Castelli, V., Li, C., Lo, M.L., Smith, J.: The onion technique: indexing for linear optimization queries. In: Proceedings of ACM SIGMOD, pp. 391–402 (2000)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search metric spaces. In: Proceedings of VLDB, pp. 426–435, August 1997
Dean, T., Boddy, M.: An analysis of time dependent planning. In: Proceedings of the National Conference on AI (1988)
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB (2004)
Donjerkovic, D., Ramakrishnan, R.: Probabilistic optimization of top-N queries. In: Proceedings of VLDB, August 1999
Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226, June 1996
Fagin, R.: Fuzzy queries in multimedia database systems. In: PODS, pp. 1–10, June 1998
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS, June 2001
Fagin R., Lotem A., Naor M.: Optimal aggregation algorithms for middleware. JCSS 66(4), 614–656 (2003)
Fagin, R., Wimmers, E.: Incorporating user preferences in multimedia queries. In: ICDT, pp. 247–261, Jan 1997
Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Approximating multi-dimensional aggregate range queries over real attributes. In: SIGMOD, pp. 463–474 (2000)
Guntzer, U., Balke, W.-T., Kiesling, W.: Optimizing multi-feature queries for image databases. VLDB J. 419–428 (2000)
Horvitz, E.: Reasoning about belifs and actions under computational resource constraints. In: Proceedings of the Third Workshop on Uncertainy in AI (1987)
Hristidis, V., Koudas, N., Papakonstantinou, Y.: Prefer: a system for the efficient execution of multi-parametric ranked queries. In: SIGMOD Conference, pp. 259–270 (2001)
Hua, M., Pei, J., Zhang, W., Lin, X.: Efficiently answering probabilistic threshold top-k queries on uncertain data. In: ICDE (2008)
Ilyas I.F., Aref W.G., Elmagarmid A.K.: Supporting Top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)
Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S.: ProbView: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)
Lian, X., Chen, L.: Probabilistic ranked queries in uncertain databases. In: EDBT (2008)
Marian, A., Bruno, N., Gravano, L.: Evaluating Top-k Queries Over Web Accesible Sources. TODS 29(2) (2004)
Mohamed Soliman, K.C.C.: Ihab Ilyas. Top-k query processing in uncertain databases. In: ICDE (2007)
Natsev, A., Chang, Y.-C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB ’01: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 281–290 (2001)
Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)
Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. VLDB J. 486–495 (1997)
Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)
Tao, Y., Cheng, R., Xiao, X., Ngai, W., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density. In: VLDB (2005)
Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: Proceedings of VLDB (2004)
Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: ICDE (2003)
Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: ICDE (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Arai, B., Das, G., Gunopulos, D. et al. Anytime measures for top-k algorithms on exact and fuzzy data sets. The VLDB Journal 18, 407–427 (2009). https://doi.org/10.1007/s00778-008-0127-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-008-0127-9