Abstract
Skyline queries help users make intelligent decisions over complex data, where different and often conflicting criteria are considered. Current skyline computation methods are restricted to centralized query processors, limiting scalability and imposing a single point of failure. In this paper, we address the problem of parallelizing skyline query execution over a large number of machines by leveraging content-based data partitioning. We present a novel distributed skyline query processing algorithm (DSL) that discovers skyline points progressively. We propose two mechanisms, recursive region partitioning and dynamic region encoding, to enforce a partial order on query propagation in order to pipeline query execution. Our analysis shows that DSL is optimal in terms of the total number of local query invocations across all machines. In addition, simulations and measurements of a deployed system show that our system load balances communication and processing costs across cluster machines, providing incremental scalability and significant performance improvement over alternative distribution mechanisms.
This work was supported in part by NSF under grants IIS 02-23022, IIS 02-20152, and CNF 04-23336.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Froogle data feeds. feed_instructions.html, https://www.google.com/froogle/merchants/
Yahoo! real estate, http://realestate.yahoo.com/
Anderson, T.E., Culler, D.E., Patterson, D.A.: A case for NOW (network of workstations). IEEE Micro 15(1), 54–64 (1995)
Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., Culler, D.E., Hellerstein, J.M., Patterson, D.A.: High-performance sorting on networks of workstations. In: Proc. of SIGMOD, Tucson, AZ (May 1997)
Balke, W.-T., Guntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 256–273. Springer, Heidelberg (2004)
Barroso, L.A., Dean, J., Holzle, U.: Web search for a planet: The google cluster architecture. IEEE Micro 23(2), 22–28 (2003)
Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proc. of ICDE (2001)
Chan, C.-Y., Eng, P.-K., Tan, K.-L.: Stratified computation of skylines with partiallyordered domains. In: Proc. of SIGMOD (2005)
Dewitt, D., Gray, J.: Parallel database systems: The future of high performance database systems. CACM 35(6) (1992)
Dewitt, D., Naughton, J., Scheneider, D., Seshadri, S.: Parallel sorting on a shared-nothing architecture (1991)
Dewitt, D., Naughton, J., Schneider, D., Seshadri, S.: Practical skew handling in parallel joins. In: Proc. of VLDB (1992)
Ganesan, P., Bawa, M., Garcia-Molina, H.: Online balancing of range-partioned data with applications to peer-to-peer systems. In: Proc. of VLDB (2004)
Godfrey, P., Shipley, R., Gryz, J.: Maximal vector computation in large data sets. In: Proc. of VLDB (2005)
Gupta, A., Sahin, O.D., Agrawal, D.P., El Abbadi, A.: Meghdoot: Content-based publish/Subscribe over P2P networks. In: Jacobsen, H.-A. (ed.) Middleware 2004. LNCS, vol. 3231, pp. 254–273. Springer, Heidelberg (2004)
Huang, Z., Jensen, C.S., Lu, H., Ooi, B.C.: Skyline queries against mobile lightweight devices in manets. In: Proc. of ICDE (2006)
Huebsch, R., Hellerstein, J.M., Boon, N.L., Loo, T., Shenker, S., Stoica, I.: Querying the internet with pier. In: Proc. of VLDB (2003)
Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: Proc. of VLDB (2002)
Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: Efficient skyline computation over sliding windows. In: Proc. of ICDE (2005)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proc. of SIGMOD (2003)
Pei, J., Jin, W., Ester, M., Tao, Y.: Catching the best views of skyline: A semantic approach based on decisive subspaces. In: Proc. of VLDB (2005)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable contentaddressable network. In: Proc. of SIGCOMM (August 2001)
Stonebraker, M., Aoki, P.M., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Staelin, C., Yu, A.: Mariposa: A wide-area distributed database system. VLDB Journal 5(1) (1996)
Tan, K.L., Eng, P.K., Ooi, B.C.: Efficient progressive skyline computation. In: Proc. Of VLDB (2001)
Wu, P., Wen, J.-R., Liu, H., Ma, W.-Y.: Query selection techniques for efficient crawling of structured web sources. In: Proc. of ICDE (2006)
Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J.X., Zhang, Q.: Efficient computation of the skyline cube. In: Proc. of VLDB (2005)
Zhou, Y., Ooi, B.C., Tan, K.-L.: Dynamic load management for distributed continous query systems. In: Proc. of ICDE (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, P., Zhang, C., Feng, Y., Zhao, B.Y., Agrawal, D., El Abbadi, A. (2006). Parallelizing Skyline Queries for Scalable Distribution. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_10
Download citation
DOI: https://doi.org/10.1007/11687238_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32960-2
Online ISBN: 978-3-540-32961-9
eBook Packages: Computer ScienceComputer Science (R0)