Skip to main content
Log in

Ranking distributed database in tuple-level uncertainty

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Ranking in uncertain database environments has gained a great importance recently. Many techniques were introduced to rank uncertain databases and others to rank distributed certain databases. Unfortunately, there are not that much techniques in ranking distributed uncertain databases. This paper proposes a framework that improves ranking processing in the case of uncertain and distributed database. In the proposed framework, new communication and computation-efficient algorithms are investigated for retrieving the top-k tuples from distributed sites. These algorithms are applied in tuple-level uncertainty. The main concern of the proposed algorithms is to reduce the communication rounds utilized and amount of data transmitted while achieving efficient ranking. Experimental results emphasize that both proposed algorithms have a great impact on reducing communication cost. Also, the results clarify that the first algorithm is efficient in the case of a low number of sites while the second achieves better performance in the context of a higher number of sites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • AbdulAzeem YM, Eldesouky A, Ali H (2012) Ranking in uncertain distributed database environments. In: The seventh international conference on computer engineering and systems (ICCES), pp 275–280

  • Agrawal P, Benjelloun O, Sarma AD, Hayworth C, Nabar S, Sugihara T, Widom J (2006) Trio: a system for data, uncertainty, and lineage. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, VLDB ’06, pp 1151–1154

  • Andreou P, Zeinalipour-Yazti D, Chrysanthis PK, Samaras G (2011) Power efficiency through tuple ranking in wireless sensor network monitoring. Distrib Parallel Databases 29(1–2):113–150

    Article  Google Scholar 

  • Antova L, Koch C, Olteanu D (2007) From complete to incomplete information and back. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, SIGMOD ’07. ACM, New York, NY, USA, pp 713–724

  • Antova L, Jansen T, Koch C, Olteanu D (2008) Fast and simple relational processing of uncertain data. In: Proceedings of the 2008 IEEE 24th international conference on data engineering, ICDE ’08. IEEE Computer Society, Washington, DC, USA, pp 983–992

  • Antova L, Koch C, Olteanu D (2009) \({{10}^{10}}^6\) worlds and beyond: efficient representation and processing of incomplete information. VLDB J 18(5):1021–1040

    Article  Google Scholar 

  • Babcock B, Olston C (2003) Distributed top-k monitoring. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, SIGMOD ’03. ACM, New York, NY, USA, pp 28–39

  • Benjelloun O, Sarma AD, Halevy A, Widom J (2006) Uldbs: databases with uncertainty and lineage. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, VLDB ’06, pp 953–964

  • Beskales G, Soliman MA, Ilyas IF (2008) Efficient search for the top-k probable nearest neighbors in uncertain databases. Proc VLDB Endow 1(1):326–339

    Article  Google Scholar 

  • Calders T, Garboni C, Goethals B (2010) Efficient pattern mining of uncertain data with sampling. In: Zaki M, Yu J, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining, Lecture Notes in Computer Science, vol 6118. Springer, Berlin, pp 480–487

  • Cao P, Wang Z (2004) Efficient top-k query calculation in distributed networks. In: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, PODC ’04. ACM, New York, NY, USA, pp 206–215

  • Cheng R, Kalashnikov DV, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, SIGMOD ’03. ACM, New York, NY, USA, pp 551–562

  • Cheng R, Kalashnikov DV, Prabhakar S (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng 16(9):1112–1127

    Article  Google Scholar 

  • Cho Y, Son J, Chung YD (2008) Pot: an efficient top-k monitoring method for spatially correlated sensor readings. In: Proceedings of the 5th workshop on Data management for sensor networks, DMSN ’08. ACM, New York, NY, USA, pp 8–13

  • Dalvi NN, Suciu D (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523–544

    Article  Google Scholar 

  • Deshpande A, Guestrin C, Madden SR, Hellerstein JM, Hong W (2004) Model-driven data acquisition in sensor networks. In: Proceedings of the thirtieth international conference on very large data Bases, VLDB Endowment, VLDB ’04, vol 30, pp 588–599

  • El-Desouky AI, Ali HA, AbdulAzeem YM (2010) Ranking distributed uncertain database systems: discussion and analysis. In: International conference on computer engineering and systems (ICCES), pp 295–300

  • Fagin R, Lotem A, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656

    Article  MATH  MathSciNet  Google Scholar 

  • Hua M, Pei J, Zhang W, Lin X (2008) Efficiently answering probabilistic threshold top-k queries on uncertain data. In: Proceedings of the 24th IEEE international conference on data, engineering, pp 1403–1405

  • Jestes J, Cormode G, Li F, Yi K (2011) Semantics of ranking queries for probabilistic data. IEEE Trans Knowl Data Eng 23(12):1903–1917

    Article  Google Scholar 

  • Kanagal B, Deshpande A (2008) Online filtering, smoothing and probabilistic modeling of streaming data. In: Proceedings of the 2008 IEEE 24th international conference on data engineering, ICDE ’08. IEEE Computer Society, Washington, DC, USA, pp 1160–1169

  • Li C, Huang L, Tian L (2011) Efficient building algorithms of decision tree for uniformly distributed uncertain data. In: Ding Y, Wang H, Xiong N, Hao K, Wang L (eds) Seventh international conference on natural computation, ICNC 2011. IEEE, New York, pp 105–108

  • Li F, Yi K, Jestes J (2009) Ranking distributed probabilistic data. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data, SIGMOD ’09. ACM, New York, NY, USA, pp 361–374

  • Lin CW, Hong TP (2012) A new mining approach for uncertain databases using cufp trees. Expert Syst Appl 39(4):4084–4093

    Article  MathSciNet  Google Scholar 

  • Lin CW, Hong TP, Chen YF, Lin TC, Pan ST (2013) An integrated mffp-tree algorithm for mining global fuzzy rules from distributed databases. J Univers Comput Sci 19(4):521–538

    Google Scholar 

  • Ljosa V, Singh AK (2008) Top-k spatial joins of probabilistic objects. In: Proceedings of the 2008 IEEE 24th international conference on data engineering, ICDE ’08. IEEE Computer Society, Washington, DC, USA, pp 566–575

  • Marian A, Bruno N, Gravano L (2004) Evaluating top-k queries over web-accessible databases. ACM Trans Database Syst 29(2):319–362

    Article  Google Scholar 

  • Neumann T, Bender M, Michel S, Schenkel R, Triantafillou P, Weikum G (2009) Distributed top-k aggregation queries at large. Distrib Parallel Databases 26(1):3–27

    Article  Google Scholar 

  • Qian A, Lu Y, Xiaofeng D, Zou L, Li Z (2009) Efficient top-k monitoring of abnormality in sensor networks. In: Proceedings of the 2009 ninth IEEE international conference on computer and information technology, CIT ’09, vol 02. IEEE Computer Society, Washington, DC, USA, pp 348–353

  • Re C, Dalvi N, Suciu D (2007) Efficient top-k query evaluation on probabilistic data. Proceedings of the 23th IEEE international conference on data engineering. IEEE Computer Society, Los Alamitos, CA, USA, pp 886–895

    Google Scholar 

  • SAMOS (2013) Samos. shipboard automated meteorological and oceanographic system. http://samos.coaps.fsu.edu

  • Sarma AD, Benjelloun O, Halevy A, Widom J (2006) Working models for uncertain data. In: Proceedings of the 22nd international conference on data engineering, ICDE ’06. IEEE Computer Society, Washington, DC, USA, pp 7–27

  • Sharfman I, Schuster A, Keren D (2007) A geometric approach to monitoring threshold functions over distributed data streams. ACM Trans Database Syst 32(4):1–32

    Article  Google Scholar 

  • Soliman MA, Ilyas IF, Chang KCC (2007) Top-k query processing in uncertain databases. In: Proceedings of the 23th IEEE international conference on data engineering, pp 896–905

  • Tao Y, Cheng R, Xiao X, Ngai WK, Kao B, Prabhakar S (2005) Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of the 31st international conference on very large data bases, VLDB Endowment, VLDB ’05, pp 922–933

  • Vlachou A, Doulkeridis C, Nørvåg K, Vazirgiannis M (2008) On efficient top-k query processing in highly distributed environments. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08. ACM, New York, NY, USA, pp 753–764

  • Wu M, Xu J, Tang X, Lee WC (2007) Top-k monitoring in wireless sensor networks. IEEE Trans Knowl Data Eng 19(7):962–976

    Article  Google Scholar 

  • Xiong L, Chitti S, Liu L (2005) Top-k queries across multiple private databases. In: Proceedings of the 25th IEEE international conference on distributed computing systems, ICDCS ’05. IEEE Computer Society, Washington, DC, USA, pp 145–154

  • Ye M, Liu X, Lee WC, Lee DL (2010) Probabilistic top-k query processing in distributed sensor networks. Proceedings of the 26th IEEE international conference on data engineering. IEEE Computer Society, Los Alamitos, CA, USA, pp 585–588

    Google Scholar 

  • Yi K, Li F, Kollios G, Srivastava D (2008) Efficient processing of top-k queries in uncertain databases with x-relations. IEEE Trans Knowl Data Eng 20(12):1669–1682

    Article  Google Scholar 

  • Yu H, Li HG, Wu P, Agrawal D, El Abbadi A (2005) Efficient processing of distributed top-k queries. In: Proceedings of the 16th international conference on database and expert systems applications, DEXA’05. Springer, Berlin, pp 65–74

  • Zhang Q, Li F, Yi K (2008a) Finding frequent items in probabilistic data. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08. ACM, New York, NY, USA, pp 819–832

  • Zhang W, Lin X, Pei J, Zhang Y (2008b) Managing uncertain data: probabilistic approaches. In: Proceedings of the 2008 the ninth international conference on web-age information management, WAIM ’08. IEEE Computer Society, Washington, DC, USA, pp 405–412

  • Zhang X, Chomicki J (2009) Semantics and evaluation of top-k queries in probabilistic databases. Distrib Parallel Databases 26(1):67–126

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yousry M. AbdulAzeem.

Additional information

Communicated by A. Lotfi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

AbdulAzeem, Y.M., Eldesouky, A.I., Ali, H.A. et al. Ranking distributed database in tuple-level uncertainty. Soft Comput 19, 965–980 (2015). https://doi.org/10.1007/s00500-014-1306-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-014-1306-9

Keywords

Navigation