Elsevier

Information Systems

Volume 40, March 2014, Pages 67-83
Information Systems

MSSQ: Manhattan Spatial Skyline Queries

https://doi.org/10.1016/j.is.2013.10.001Get rights and content

Highlights

  • We develop an efficient algorithm for spatial skyline queries in L1 metric.

  • We also present an algorithm for queries moving vertically or horizontally.

  • Our algorithms can easily be parallelized by computing each skyline independently.

  • Our algorithms straightforwardly extend for L distance.

  • Evaluations show that our algorithms are faster than the current approaches.

Abstract

Skyline queries have gained attention lately for supporting effective retrieval over massive spatial data. While efficient algorithms have been studied for spatial skyline queries using the Euclidean distance, these algorithms are (1) still quite computationally intensive and (2) unaware of the road constraints. Our goal is to develop a more efficient algorithm for L1 distance, also known as Manhattan distance, which closely reflects road network distance for metro areas. We present a simple and efficient algorithm which, given a set P of data points and a set Q of query points in the plane, returns the set of spatial skyline points in just O(|P|log|P|) time, assuming that |Q||P|. This is significantly lower in complexity than the best known method. In addition to efficiency and applicability, our algorithm has another desirable property of independent computation and extensibility to L norm distance, which naturally invites parallelism and widens applicability. Our extensive empirical results suggest that our algorithm outperforms the state-of-the-art approaches by orders of magnitude. We also present efficient algorithms that report the changes of the skyline points when single or multiple query points move along the x- or y-axis.

Introduction

Skyline queries have gained attention [1], [2], [3], [4], [5] because of their ability to retrieve “desirable” objects that are not worse than any other object in the database. Recently, these queries have been applied to spatial data, as we illustrate with the example below.

Consider a hotel search scenario for a conference trip to Minneapolis, where the user marks two locations of interest, e.g., the conference venue and an airport, as Fig. 1(a) illustrates. Given these two query locations, one option is to identify hotels that are close to both locations. When considering the Euclidean distance, we can say that hotel H5, located in the middle of the two query points, is more desirable than H4, i.e., H5 “dominates” H4. The goal is to narrow down the choice of hotels to a few desirable hotels that are not dominated by any other objects, i.e., no other object is closer to all the given query points simultaneously.

However, as Fig. 1(b) shows, considering these query and data points on the map, the Euclidean distance, quantifying the length of the line segment between H5 and the query points, does not consider the road constraints and thus severely underestimates the actual distance.

Going back to Fig. 1(a), we can now assume that the dotted lines represent the underlying road network and revisit the problem to identify desirable objects with respect to L1 distance. In this new problem, H4 and H5 are equally desirable, as both are three blocks away from the conference venue and two blocks from the airport.

In general, the Manhattan distance, or L1 distance, reflects actual road network distances well for well-connected metro areas such as Pasadena and Ontario (Fig. 2) in California. The experimental results for real road networks, summarized in Table 1, support this claim.1 In the experiment, we repeated the following 1000 times for each network. We chose a node randomly and constructed two sorted lists of the nodes of the network, one in the ascending order of network distance and the other in the ascending order of L1 distance from the chosen node. Then we counted the number of inversions between the two lists. Table 1 shows the average inversion ratio of each road network, which is less than 7%. For Pasadena and Ontario, the inversion ratios are even less than 5%.

Skyline queries have been actively studied for Euclidean distance [6], [7], [8], [9]. Given a set P of data points and a set Q of query points in the plane, the most efficient algorithm known so far has the time complexity of O(|P|(|S|log|CH(Q)|+log|P|)) [8], [9]. Here S denotes the set of spatial skyline points, and CH(Q) denotes the standard convex hull of Q in the underlying metric. These algorithms are based on a geometric interpretation of spatial dominance of a point p over another point p: p is not spatially dominated by p if and only if there is at least one query point in the side of the bisecting line of p and p that contains p. From this observation, they showed that every data point p lying in CH(Q) is a skyline point, because there is at least one query point in the side of the line bisecting p and any other data point that contains p. They also showed, using a similar argument, that a site of the Voronoi diagram of P is a skyline point if its Voronoi cell makes nonempty intersection with CH(Q).

The geometric interpretation of spatial dominance also holds for L1, because the bisecting line of two points p and p in L1 norm distance is the set of points at equidistance from p and p, and therefore there is at least one query point in the side of the line containing p if and only if p is not spatially dominated by p. This implies that (a) every data point p lying in the orthogonal convex hull of Q is a skyline point and (b) a site of the Voronoi diagram of P in L1 metric is a skyline point if its Voronoi cell makes nonempty intersection with the convex hull. Therefore, we can compute a “subset” of the spatial skyline points by constructing the convex hull and the Voronoi diagram, which can be done in O(|Q|log|Q|) time and O(|P|log|P|) time, respectively.

However, Fig. 3 shows that there are still some skyline points not belonging to the two cases above. For example, p2 is skyline, because none of the other points dominates it. But p2 is not contained in the orthogonal convex hull of queries and its Voronoi cell (gray region) does not intersect the orthogonal convex hull of Q. This example suggests that we need not only to maintain the subset of skyline points for cases (a) and (b), but also to check whether the remaining data points are skyline or not. This takes O(|P||S|log|CH(Q)|) time, which is exactly the same as the total time complexity required for Euclidean distance.

In a clear contrast, we develop a simple and efficient algorithm that computes skyline points in just O(|P|log|P|) time for L1 metric, assuming |Q||P|. Our extensive empirical results suggest that our algorithm outperforms the state-of-the-art algorithms in spatial and general skyline problems significantly. Our contributions can be summarized as follows:

  • We study the Manhattan Spatial Skyline Queries (MSSQ) problem, which arises in advanced query semantics, such as ranking and skyline queries of massive spatial datasets. We show that a straight-forward extension of the existing algorithm under L2 distance is inefficient for our problem, and present a simple and efficient algorithm that computes skyline points in just O(|P|log|P|) time.

  • We also propose an algorithm for MSSQ when query points move either vertically or horizontally. Our algorithm runs in O(|P|log|P|) time when only one query point moves and in O(|P|2|Q|) time when more than one query point moves.

  • We show that our algorithm can easily be parallelized by computing each skyline point independently. Our algorithm also straightforwardly extends for the Chebyshev distance, also known as L distance, which are used extensively for spatial logistics in warehouses [10].

  • We evaluate our framework using synthetic data and show that our algorithms are faster by orders of magnitude than the current state-of-the-art approaches.

Section snippets

Related work

This section provides a brief survey of work related to spatial query processing. Skyline queries were introduced in the context of finding the maximum vectors [1]. Since then they have been studied in database applications, both in a course of enhancing the efficiency of computation [2], [3], [11], [4], [5], [12] and in course of enhancing the quality of results [13], [14], [15], by narrowing down skyline results using properties such as frequency, k-dominance, and k-representativeness of

Problem definition

In the spatial skyline query problem, we are given two point sets: a set P of data points and a set Q of query points in the plane, assuming that |Q||P|. In general, the purpose of querying on a data set is to extract a subset of the data set with respect to the query set and the query set behaves as a set of constraints which each skyline point must satisfy. In many practical situations, the size of constraints is much smaller than the size of data under consideration, and therefore the

Observation

The basic idea of our algorithm is as follows. To determine whether pP is skyline or not, the approach under L2 distance performs dominance tests with the current skyline points (which we later discuss in detail, denoted as baseline algorithm PSQ, in Section 7).

Under L1 distance, we use a different approach in which we check the existence of a point that dominates p. To do this, we introduce another definition (below) on spatial dominance between two points which is equivalent to Definition 1.

Algorithm

In this section, we show how to handle each of the three cases efficiently so as to achieve an O(log|P|) time algorithm for determining whether a data point is skyline or not.

Tracing moving query points

In this section, we introduce a variation of the MSSQ problem, where data points are fixed and each query point qi moves either vertically or horizontally at unit speed. More precisely, for a nonnegative real number t, let qi(t) denote the translation of qi at time t, that is, qi(t)qi+(t,0) (or qi(t)=qi(t,0)) if qi moves along a horizontal line, and qi(t)qi+(0,t) (or qi(t)=qi(0,t)) if qi moves along a vertical line. Let Q(t) be the set of query points at time t, and let R(p,t) be R(p)

Implementation

In our implementation of MSSQ, an R-tree is used to efficiently prune out nonskyline points from P. More specifically, we first find a range bounding Q and read a constant number of points in this region from the R-tree. For each such point p, we identified the bounding box for i=1|Q|C(p,qi). Any point outside of this bounding box can be safely pruned as it would be dominated by p. We intersect such bounding boxes and retrieve the points falling into this region, which can be efficiently

Experimental evaluation

In this section, we outline our experimental settings, and present evaluation results to validate the efficiency and effectiveness of our framework. We compare our algorithm (MSSQ) with PSQ and BBS. As datasets, we use both synthetic datasets and a real dataset of points of interest (POI) in California. We carry out our experiments on Linux with Intel Q6600 CPU and 3 GB memory, and the algorithms are coded in C++.

Conclusion

We have studied Manhattan spatial skyline query processing and presented an efficient algorithm. We showed that our algorithm can identify the correct result in O(|P|log|P|) time with desirable properties of easy parallelizability and extensibility.

We also propose an algorithm for spatial skyline queries when query points move either vertically or horizontally. Our algorithm runs in O(|P|log|P|) time when only one query point moves and in O(|P|2|Q|) time when more than one query point moves.

References (25)

  • H.T. Kung et al.

    On finding the maxima of a set of vectors

    J. Assoc. Comput. Mach.

    (1975)
  • S. Börzsönyi, D. Kossmann, K. Stocker, The skyline operator, in: ICDE '01: Proceedings of the 17th International...
  • K. Tan, P. Eng, B.C. Ooi, Efficient progressive skyline computation, in: VLDB '01: Proceedings of the 27th...
  • D. Papadias, Y. Tao, G. Fu, B. Seeger, An optimal and progressive algorithm for skyline queries, in: SIGMOD '03:...
  • J. Chomicki, P. Godfery, J. Gryz, D. Liang, Skyline with presorting, in: ICDE '03: Proceedings of the 19th...
  • M. Sharifzadeh, C. Shahabi, The spatial skyline queries, in: VLDB '06: Proceedings of the 32nd International Conference...
  • M. Sharifzadeh et al.

    Processing spatial skyline queries in both vector spaces and spatial network databases

    ACM Trans. Database Syst.

    (2009)
  • W. Son, M.-W. Lee, H.-K. Ahn, S.-w. Hwang, Spatial skyline queries: an efficient geometric algorithm, in: SSTD '09:...
  • M.-W. Lee et al.

    Spatial skyline queriesexact and approximation algorithms

    GeoInformatica

    (2011)
  • G. Cormier

    Operational research methods for efficient warehousing

  • D. Kossmann, F. Ramsak, S. Rost, Shooting stars in the sky: an online algorithm for skyline queries, in: VLDB '02:...
  • P. Godfrey, R. Shipley, J. Gryz, Maximal vector computation in large data sets, in: VLDB '05: Proceedings of the 31st...
  • Cited by (25)

    • Computation of spatial skyline points

      2021, Computational Geometry: Theory and Applications
    • Nearest and farthest spatial skyline queries under multiplicative weighted Euclidean distances

      2020, Knowledge-Based Systems
      Citation Excerpt :

      Spatial skyline queries with non-Euclidean distance. There exist algorithms to obtain the skylines using road-network distances [27] and Manhattan distance [33]. They both face the same problem but analyzing proximity with a different distance function.

    • Top-k Manhattan spatial skyline queries

      2017, Information Processing Letters
      Citation Excerpt :

      The algorithm is coded in C++. We compare our algorithm Top-k-MSSQ with a straightforward implementation of reporting the k best ones (with respect to f) from the skylines returned by MSSQ [11,12]. In our experiments, we only consider query times.

    View all citing articles on Scopus

    Work by Son and Ahn was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No. 2011-0030044). Work by Hwang was supported by Microsoft Research Asia.

    View full text