Skip to main content
Log in

Anytime measures for top-k algorithms on exact and fuzzy data sets

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Top-k queries on large multi-attribute data sets are fundamental operations in information retrieval and ranking applications. In this article, we initiate research on the anytime behavior of top-k algorithms on exact and fuzzy data. In particular, given specific top-k algorithms (TA and TA-Sorted) we are interested in studying their progress toward identification of the correct result at any point during the algorithms’ execution. We adopt a probabilistic approach where we seek to report at any point of operation of the algorithm the confidence that the top-k result has been identified. Such a functionality can be a valuable asset when one is interested in reducing the runtime cost of top-k computations. We present a thorough experimental evaluation to validate our techniques using both synthetic and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Barbará D., Garcia-Molina H., Porter D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)

    Article  Google Scholar 

  2. Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: mapping strategies and performance evaluation. TODS 27(2) (2002)

  3. Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web accessible databases. In: Proceedings of ICDE, April 2002

  4. Chang, K., Hwang, S.: Minimal probing: supporting expensive predicates for top-k queries. In: SIGMOD (2002)

  5. Chaudhuri, S., Gravano, L.: Evaluating top-k selection queries. In: VLDB, pp. 397–410 (1999)

  6. Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)

  7. Cheng, R., Kalashnikov, D., Prabhakar, S.: Querying imprecise data in moving object environments. In: IEEE TKDE (2004)

  8. Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB (2004)

  9. chi Chang, Y., Bergman, L., Castelli, V., Li, C., Lo, M.L., Smith, J.: The onion technique: indexing for linear optimization queries. In: Proceedings of ACM SIGMOD, pp. 391–402 (2000)

  10. Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search metric spaces. In: Proceedings of VLDB, pp. 426–435, August 1997

  11. Dean, T., Boddy, M.: An analysis of time dependent planning. In: Proceedings of the National Conference on AI (1988)

  12. Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB (2004)

  13. Donjerkovic, D., Ramakrishnan, R.: Probabilistic optimization of top-N queries. In: Proceedings of VLDB, August 1999

  14. Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226, June 1996

  15. Fagin, R.: Fuzzy queries in multimedia database systems. In: PODS, pp. 1–10, June 1998

  16. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS, June 2001

  17. Fagin R., Lotem A., Naor M.: Optimal aggregation algorithms for middleware. JCSS 66(4), 614–656 (2003)

    MATH  MathSciNet  Google Scholar 

  18. Fagin, R., Wimmers, E.: Incorporating user preferences in multimedia queries. In: ICDT, pp. 247–261, Jan 1997

  19. Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Approximating multi-dimensional aggregate range queries over real attributes. In: SIGMOD, pp. 463–474 (2000)

  20. Guntzer, U., Balke, W.-T., Kiesling, W.: Optimizing multi-feature queries for image databases. VLDB J. 419–428 (2000)

  21. Horvitz, E.: Reasoning about belifs and actions under computational resource constraints. In: Proceedings of the Third Workshop on Uncertainy in AI (1987)

  22. Hristidis, V., Koudas, N., Papakonstantinou, Y.: Prefer: a system for the efficient execution of multi-parametric ranked queries. In: SIGMOD Conference, pp. 259–270 (2001)

  23. Hua, M., Pei, J., Zhang, W., Lin, X.: Efficiently answering probabilistic threshold top-k queries on uncertain data. In: ICDE (2008)

  24. Ilyas I.F., Aref W.G., Elmagarmid A.K.: Supporting Top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)

    Article  Google Scholar 

  25. Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S.: ProbView: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)

    Article  Google Scholar 

  26. Lian, X., Chen, L.: Probabilistic ranked queries in uncertain databases. In: EDBT (2008)

  27. Marian, A., Bruno, N., Gravano, L.: Evaluating Top-k Queries Over Web Accesible Sources. TODS 29(2) (2004)

  28. Mohamed Soliman, K.C.C.: Ihab Ilyas. Top-k query processing in uncertain databases. In: ICDE (2007)

  29. Natsev, A., Chang, Y.-C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB ’01: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 281–290 (2001)

  30. Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)

  31. Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. VLDB J. 486–495 (1997)

  32. Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)

  33. Tao, Y., Cheng, R., Xiao, X., Ngai, W., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density. In: VLDB (2005)

  34. Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: Proceedings of VLDB (2004)

  35. Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: ICDE (2003)

  36. Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: ICDE (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Arai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arai, B., Das, G., Gunopulos, D. et al. Anytime measures for top-k algorithms on exact and fuzzy data sets. The VLDB Journal 18, 407–427 (2009). https://doi.org/10.1007/s00778-008-0127-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-008-0127-9

Keywords

Navigation