MSSQ: Manhattan Spatial Skyline Queries☆
Introduction
Skyline queries have gained attention [1], [2], [3], [4], [5] because of their ability to retrieve “desirable” objects that are not worse than any other object in the database. Recently, these queries have been applied to spatial data, as we illustrate with the example below.
Consider a hotel search scenario for a conference trip to Minneapolis, where the user marks two locations of interest, e.g., the conference venue and an airport, as Fig. 1(a) illustrates. Given these two query locations, one option is to identify hotels that are close to both locations. When considering the Euclidean distance, we can say that hotel H5, located in the middle of the two query points, is more desirable than H4, i.e., H5 “dominates” H4. The goal is to narrow down the choice of hotels to a few desirable hotels that are not dominated by any other objects, i.e., no other object is closer to all the given query points simultaneously.
However, as Fig. 1(b) shows, considering these query and data points on the map, the Euclidean distance, quantifying the length of the line segment between H5 and the query points, does not consider the road constraints and thus severely underestimates the actual distance.
Going back to Fig. 1(a), we can now assume that the dotted lines represent the underlying road network and revisit the problem to identify desirable objects with respect to L1 distance. In this new problem, H4 and H5 are equally desirable, as both are three blocks away from the conference venue and two blocks from the airport.
In general, the Manhattan distance, or L1 distance, reflects actual road network distances well for well-connected metro areas such as Pasadena and Ontario (Fig. 2) in California. The experimental results for real road networks, summarized in Table 1, support this claim.1 In the experiment, we repeated the following 1000 times for each network. We chose a node randomly and constructed two sorted lists of the nodes of the network, one in the ascending order of network distance and the other in the ascending order of L1 distance from the chosen node. Then we counted the number of inversions between the two lists. Table 1 shows the average inversion ratio of each road network, which is less than 7%. For Pasadena and Ontario, the inversion ratios are even less than 5%.
Skyline queries have been actively studied for Euclidean distance [6], [7], [8], [9]. Given a set P of data points and a set Q of query points in the plane, the most efficient algorithm known so far has the time complexity of [8], [9]. Here S denotes the set of spatial skyline points, and denotes the standard convex hull of Q in the underlying metric. These algorithms are based on a geometric interpretation of spatial dominance of a point p over another point : p is not spatially dominated by if and only if there is at least one query point in the side of the bisecting line of p and that contains p. From this observation, they showed that every data point p lying in is a skyline point, because there is at least one query point in the side of the line bisecting p and any other data point that contains p. They also showed, using a similar argument, that a site of the Voronoi diagram of P is a skyline point if its Voronoi cell makes nonempty intersection with .
The geometric interpretation of spatial dominance also holds for L1, because the bisecting line of two points p and in L1 norm distance is the set of points at equidistance from p and , and therefore there is at least one query point in the side of the line containing p if and only if p is not spatially dominated by . This implies that (a) every data point p lying in the orthogonal convex hull of Q is a skyline point and (b) a site of the Voronoi diagram of P in L1 metric is a skyline point if its Voronoi cell makes nonempty intersection with the convex hull. Therefore, we can compute a “subset” of the spatial skyline points by constructing the convex hull and the Voronoi diagram, which can be done in time and time, respectively.
However, Fig. 3 shows that there are still some skyline points not belonging to the two cases above. For example, p2 is skyline, because none of the other points dominates it. But p2 is not contained in the orthogonal convex hull of queries and its Voronoi cell (gray region) does not intersect the orthogonal convex hull of Q. This example suggests that we need not only to maintain the subset of skyline points for cases (a) and (b), but also to check whether the remaining data points are skyline or not. This takes time, which is exactly the same as the total time complexity required for Euclidean distance.
In a clear contrast, we develop a simple and efficient algorithm that computes skyline points in just time for L1 metric, assuming . Our extensive empirical results suggest that our algorithm outperforms the state-of-the-art algorithms in spatial and general skyline problems significantly. Our contributions can be summarized as follows:
- •
We study the Manhattan Spatial Skyline Queries (MSSQ) problem, which arises in advanced query semantics, such as ranking and skyline queries of massive spatial datasets. We show that a straight-forward extension of the existing algorithm under L2 distance is inefficient for our problem, and present a simple and efficient algorithm that computes skyline points in just time.
- •
We also propose an algorithm for MSSQ when query points move either vertically or horizontally. Our algorithm runs in time when only one query point moves and in time when more than one query point moves.
- •
We show that our algorithm can easily be parallelized by computing each skyline point independently. Our algorithm also straightforwardly extends for the Chebyshev distance, also known as distance, which are used extensively for spatial logistics in warehouses [10].
- •
We evaluate our framework using synthetic data and show that our algorithms are faster by orders of magnitude than the current state-of-the-art approaches.
Section snippets
Related work
This section provides a brief survey of work related to spatial query processing. Skyline queries were introduced in the context of finding the maximum vectors [1]. Since then they have been studied in database applications, both in a course of enhancing the efficiency of computation [2], [3], [11], [4], [5], [12] and in course of enhancing the quality of results [13], [14], [15], by narrowing down skyline results using properties such as frequency, k-dominance, and k-representativeness of
Problem definition
In the spatial skyline query problem, we are given two point sets: a set P of data points and a set Q of query points in the plane, assuming that . In general, the purpose of querying on a data set is to extract a subset of the data set with respect to the query set and the query set behaves as a set of constraints which each skyline point must satisfy. In many practical situations, the size of constraints is much smaller than the size of data under consideration, and therefore the
Observation
The basic idea of our algorithm is as follows. To determine whether is skyline or not, the approach under L2 distance performs dominance tests with the current skyline points (which we later discuss in detail, denoted as baseline algorithm PSQ, in Section 7).
Under L1 distance, we use a different approach in which we check the existence of a point that dominates p. To do this, we introduce another definition (below) on spatial dominance between two points which is equivalent to Definition 1.
Algorithm
In this section, we show how to handle each of the three cases efficiently so as to achieve an time algorithm for determining whether a data point is skyline or not.
Tracing moving query points
In this section, we introduce a variation of the MSSQ problem, where data points are fixed and each query point qi moves either vertically or horizontally at unit speed. More precisely, for a nonnegative real number t, let qi(t) denote the translation of qi at time t, that is, (or ) if qi moves along a horizontal line, and (or ) if qi moves along a vertical line. Let Q(t) be the set of query points at time t, and let be R(p)
Implementation
In our implementation of MSSQ, an R-tree is used to efficiently prune out nonskyline points from P. More specifically, we first find a range bounding Q and read a constant number of points in this region from the R-tree. For each such point p, we identified the bounding box for . Any point outside of this bounding box can be safely pruned as it would be dominated by p. We intersect such bounding boxes and retrieve the points falling into this region, which can be efficiently
Experimental evaluation
In this section, we outline our experimental settings, and present evaluation results to validate the efficiency and effectiveness of our framework. We compare our algorithm (MSSQ) with PSQ and BBS. As datasets, we use both synthetic datasets and a real dataset of points of interest (POI) in California. We carry out our experiments on Linux with Intel Q6600 CPU and 3 GB memory, and the algorithms are coded in C++.
Conclusion
We have studied Manhattan spatial skyline query processing and presented an efficient algorithm. We showed that our algorithm can identify the correct result in time with desirable properties of easy parallelizability and extensibility.
We also propose an algorithm for spatial skyline queries when query points move either vertically or horizontally. Our algorithm runs in time when only one query point moves and in time when more than one query point moves.
References (25)
- et al.
On finding the maxima of a set of vectors
J. Assoc. Comput. Mach.
(1975) - S. Börzsönyi, D. Kossmann, K. Stocker, The skyline operator, in: ICDE '01: Proceedings of the 17th International...
- K. Tan, P. Eng, B.C. Ooi, Efficient progressive skyline computation, in: VLDB '01: Proceedings of the 27th...
- D. Papadias, Y. Tao, G. Fu, B. Seeger, An optimal and progressive algorithm for skyline queries, in: SIGMOD '03:...
- J. Chomicki, P. Godfery, J. Gryz, D. Liang, Skyline with presorting, in: ICDE '03: Proceedings of the 19th...
- M. Sharifzadeh, C. Shahabi, The spatial skyline queries, in: VLDB '06: Proceedings of the 32nd International Conference...
- et al.
Processing spatial skyline queries in both vector spaces and spatial network databases
ACM Trans. Database Syst.
(2009) - W. Son, M.-W. Lee, H.-K. Ahn, S.-w. Hwang, Spatial skyline queries: an efficient geometric algorithm, in: SSTD '09:...
- et al.
Spatial skyline queriesexact and approximation algorithms
GeoInformatica
(2011) Operational research methods for efficient warehousing
Cited by (25)
A fast and efficient algorithm for determining the connected orthogonal convex hulls
2022, Applied Mathematics and ComputationA modified Graham's convex hull algorithm for finding the connected orthogonal convex hull of a finite planar point set
2021, Applied Mathematics and ComputationComputation of spatial skyline points
2021, Computational Geometry: Theory and ApplicationsNearest and farthest spatial skyline queries under multiplicative weighted Euclidean distances
2020, Knowledge-Based SystemsCitation Excerpt :Spatial skyline queries with non-Euclidean distance. There exist algorithms to obtain the skylines using road-network distances [27] and Manhattan distance [33]. They both face the same problem but analyzing proximity with a different distance function.
Top-k Manhattan spatial skyline queries
2017, Information Processing LettersCitation Excerpt :The algorithm is coded in C++. We compare our algorithm Top-k-MSSQ with a straightforward implementation of reporting the k best ones (with respect to f) from the skylines returned by MSSQ [11,12]. In our experiments, we only consider query times.
Group nearest-neighbor queries in the L<inf>1</inf> plane
2015, Theoretical Computer Science
- ☆
Work by Son and Ahn was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No. 2011-0030044). Work by Hwang was supported by Microsoft Research Asia.