Skip to main content

Parallelizing Skyline Queries for Scalable Distribution

  • Conference paper
Advances in Database Technology - EDBT 2006 (EDBT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3896))

Included in the following conference series:

Abstract

Skyline queries help users make intelligent decisions over complex data, where different and often conflicting criteria are considered. Current skyline computation methods are restricted to centralized query processors, limiting scalability and imposing a single point of failure. In this paper, we address the problem of parallelizing skyline query execution over a large number of machines by leveraging content-based data partitioning. We present a novel distributed skyline query processing algorithm (DSL) that discovers skyline points progressively. We propose two mechanisms, recursive region partitioning and dynamic region encoding, to enforce a partial order on query propagation in order to pipeline query execution. Our analysis shows that DSL is optimal in terms of the total number of local query invocations across all machines. In addition, simulations and measurements of a deployed system show that our system load balances communication and processing costs across cluster machines, providing incremental scalability and significant performance improvement over alternative distribution mechanisms.

This work was supported in part by NSF under grants IIS 02-23022, IIS 02-20152, and CNF 04-23336.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Froogle data feeds. feed_instructions.html, https://www.google.com/froogle/merchants/

  2. Yahoo! real estate, http://realestate.yahoo.com/

  3. Anderson, T.E., Culler, D.E., Patterson, D.A.: A case for NOW (network of workstations). IEEE Micro 15(1), 54–64 (1995)

    Article  Google Scholar 

  4. Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., Culler, D.E., Hellerstein, J.M., Patterson, D.A.: High-performance sorting on networks of workstations. In: Proc. of SIGMOD, Tucson, AZ (May 1997)

    Google Scholar 

  5. Balke, W.-T., Guntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 256–273. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Barroso, L.A., Dean, J., Holzle, U.: Web search for a planet: The google cluster architecture. IEEE Micro 23(2), 22–28 (2003)

    Article  Google Scholar 

  7. Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proc. of ICDE (2001)

    Google Scholar 

  8. Chan, C.-Y., Eng, P.-K., Tan, K.-L.: Stratified computation of skylines with partiallyordered domains. In: Proc. of SIGMOD (2005)

    Google Scholar 

  9. Dewitt, D., Gray, J.: Parallel database systems: The future of high performance database systems. CACM 35(6) (1992)

    Google Scholar 

  10. Dewitt, D., Naughton, J., Scheneider, D., Seshadri, S.: Parallel sorting on a shared-nothing architecture (1991)

    Google Scholar 

  11. Dewitt, D., Naughton, J., Schneider, D., Seshadri, S.: Practical skew handling in parallel joins. In: Proc. of VLDB (1992)

    Google Scholar 

  12. Ganesan, P., Bawa, M., Garcia-Molina, H.: Online balancing of range-partioned data with applications to peer-to-peer systems. In: Proc. of VLDB (2004)

    Google Scholar 

  13. Godfrey, P., Shipley, R., Gryz, J.: Maximal vector computation in large data sets. In: Proc. of VLDB (2005)

    Google Scholar 

  14. Gupta, A., Sahin, O.D., Agrawal, D.P., El Abbadi, A.: Meghdoot: Content-based publish/Subscribe over P2P networks. In: Jacobsen, H.-A. (ed.) Middleware 2004. LNCS, vol. 3231, pp. 254–273. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Huang, Z., Jensen, C.S., Lu, H., Ooi, B.C.: Skyline queries against mobile lightweight devices in manets. In: Proc. of ICDE (2006)

    Google Scholar 

  16. Huebsch, R., Hellerstein, J.M., Boon, N.L., Loo, T., Shenker, S., Stoica, I.: Querying the internet with pier. In: Proc. of VLDB (2003)

    Google Scholar 

  17. Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: Proc. of VLDB (2002)

    Google Scholar 

  18. Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: Efficient skyline computation over sliding windows. In: Proc. of ICDE (2005)

    Google Scholar 

  19. Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proc. of SIGMOD (2003)

    Google Scholar 

  20. Pei, J., Jin, W., Ester, M., Tao, Y.: Catching the best views of skyline: A semantic approach based on decisive subspaces. In: Proc. of VLDB (2005)

    Google Scholar 

  21. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable contentaddressable network. In: Proc. of SIGCOMM (August 2001)

    Google Scholar 

  22. Stonebraker, M., Aoki, P.M., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Staelin, C., Yu, A.: Mariposa: A wide-area distributed database system. VLDB Journal 5(1) (1996)

    Google Scholar 

  23. Tan, K.L., Eng, P.K., Ooi, B.C.: Efficient progressive skyline computation. In: Proc. Of VLDB (2001)

    Google Scholar 

  24. Wu, P., Wen, J.-R., Liu, H., Ma, W.-Y.: Query selection techniques for efficient crawling of structured web sources. In: Proc. of ICDE (2006)

    Google Scholar 

  25. Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J.X., Zhang, Q.: Efficient computation of the skyline cube. In: Proc. of VLDB (2005)

    Google Scholar 

  26. Zhou, Y., Ooi, B.C., Tan, K.-L.: Dynamic load management for distributed continous query systems. In: Proc. of ICDE (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, P., Zhang, C., Feng, Y., Zhao, B.Y., Agrawal, D., El Abbadi, A. (2006). Parallelizing Skyline Queries for Scalable Distribution. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_10

Download citation

  • DOI: https://doi.org/10.1007/11687238_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32960-2

  • Online ISBN: 978-3-540-32961-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics