MSSQ: Manhattan Spatial Skyline Queries

doi:10.1016/j.is.2013.10.001

Information Systems

Volume 40, March 2014, Pages 67-83

https://doi.org/10.1016/j.is.2013.10.001 Get rights and content

Highlights

•
We develop an efficient algorithm for spatial skyline queries in L₁ metric.
•
We also present an algorithm for queries moving vertically or horizontally.
•
Our algorithms can easily be parallelized by computing each skyline independently.
•
Our algorithms straightforwardly extend for $L_{\infty}$ distance.
•
Evaluations show that our algorithms are faster than the current approaches.

Abstract

Skyline queries have gained attention lately for supporting effective retrieval over massive spatial data. While efficient algorithms have been studied for spatial skyline queries using the Euclidean distance, these algorithms are (1) still quite computationally intensive and (2) unaware of the road constraints. Our goal is to develop a more efficient algorithm for L₁ distance, also known as Manhattan distance, which closely reflects road network distance for metro areas. We present a simple and efficient algorithm which, given a set P of data points and a set Q of query points in the plane, returns the set of spatial skyline points in just $O (| P | \log | P |)$ time, assuming that $| Q | \leq | P |$ . This is significantly lower in complexity than the best known method. In addition to efficiency and applicability, our algorithm has another desirable property of independent computation and extensibility to $L_{\infty}$ norm distance, which naturally invites parallelism and widens applicability. Our extensive empirical results suggest that our algorithm outperforms the state-of-the-art approaches by orders of magnitude. We also present efficient algorithms that report the changes of the skyline points when single or multiple query points move along the x- or y-axis.

Introduction

Skyline queries have gained attention [1], [2], [3], [4], [5] because of their ability to retrieve “desirable” objects that are not worse than any other object in the database. Recently, these queries have been applied to spatial data, as we illustrate with the example below.

Consider a hotel search scenario for a conference trip to Minneapolis, where the user marks two locations of interest, e.g., the conference venue and an airport, as Fig. 1(a) illustrates. Given these two query locations, one option is to identify hotels that are close to both locations. When considering the Euclidean distance, we can say that hotel H5, located in the middle of the two query points, is more desirable than H4, i.e., H5 “dominates” H4. The goal is to narrow down the choice of hotels to a few desirable hotels that are not dominated by any other objects, i.e., no other object is closer to all the given query points simultaneously.

However, as Fig. 1(b) shows, considering these query and data points on the map, the Euclidean distance, quantifying the length of the line segment between H5 and the query points, does not consider the road constraints and thus severely underestimates the actual distance.

Going back to Fig. 1(a), we can now assume that the dotted lines represent the underlying road network and revisit the problem to identify desirable objects with respect to L₁ distance. In this new problem, H4 and H5 are equally desirable, as both are three blocks away from the conference venue and two blocks from the airport.

In general, the Manhattan distance, or L₁ distance, reflects actual road network distances well for well-connected metro areas such as Pasadena and Ontario (Fig. 2) in California. The experimental results for real road networks, summarized in Table 1, support this claim.¹ In the experiment, we repeated the following 1000 times for each network. We chose a node randomly and constructed two sorted lists of the nodes of the network, one in the ascending order of network distance and the other in the ascending order of L₁ distance from the chosen node. Then we counted the number of inversions between the two lists. Table 1 shows the average inversion ratio of each road network, which is less than 7%. For Pasadena and Ontario, the inversion ratios are even less than 5%.

Skyline queries have been actively studied for Euclidean distance [6], [7], [8], [9]. Given a set P of data points and a set Q of query points in the plane, the most efficient algorithm known so far has the time complexity of $O (| P | (| S | \log | CH (Q) | + \log | P |))$ [8], [9]. Here S denotes the set of spatial skyline points, and $CH (Q)$ denotes the standard convex hull of Q in the underlying metric. These algorithms are based on a geometric interpretation of spatial dominance of a point p over another point $p'$ : p is not spatially dominated by $p'$ if and only if there is at least one query point in the side of the bisecting line of p and $p'$ that contains p. From this observation, they showed that every data point p lying in $CH (Q)$ is a skyline point, because there is at least one query point in the side of the line bisecting p and any other data point that contains p. They also showed, using a similar argument, that a site of the Voronoi diagram of P is a skyline point if its Voronoi cell makes nonempty intersection with $CH (Q)$ .

The geometric interpretation of spatial dominance also holds for L₁, because the bisecting line of two points p and $p'$ in L₁ norm distance is the set of points at equidistance from p and $p'$ , and therefore there is at least one query point in the side of the line containing p if and only if p is not spatially dominated by $p'$ . This implies that (a) every data point p lying in the orthogonal convex hull of Q is a skyline point and (b) a site of the Voronoi diagram of P in L₁ metric is a skyline point if its Voronoi cell makes nonempty intersection with the convex hull. Therefore, we can compute a “subset” of the spatial skyline points by constructing the convex hull and the Voronoi diagram, which can be done in $O (| Q | \log | Q |)$ time and $O (| P | \log | P |)$ time, respectively.

However, Fig. 3 shows that there are still some skyline points not belonging to the two cases above. For example, p₂ is skyline, because none of the other points dominates it. But p₂ is not contained in the orthogonal convex hull of queries and its Voronoi cell (gray region) does not intersect the orthogonal convex hull of Q. This example suggests that we need not only to maintain the subset of skyline points for cases (a) and (b), but also to check whether the remaining data points are skyline or not. This takes $O (| P | | S | \log | CH (Q) |)$ time, which is exactly the same as the total time complexity required for Euclidean distance.

In a clear contrast, we develop a simple and efficient algorithm that computes skyline points in just $O (| P | \log | P |)$ time for L₁ metric, assuming $| Q | \leq | P |$ . Our extensive empirical results suggest that our algorithm outperforms the state-of-the-art algorithms in spatial and general skyline problems significantly. Our contributions can be summarized as follows:

•
We study the Manhattan Spatial Skyline Queries (MSSQ) problem, which arises in advanced query semantics, such as ranking and skyline queries of massive spatial datasets. We show that a straight-forward extension of the existing algorithm under L₂ distance is inefficient for our problem, and present a simple and efficient algorithm that computes skyline points in just $O (| P | \log | P |)$ time.
•
We also propose an algorithm for MSSQ when query points move either vertically or horizontally. Our algorithm runs in $O (| P | \log | P |)$ time when only one query point moves and in $O (| P |^{2} | Q |)$ time when more than one query point moves.
•
We show that our algorithm can easily be parallelized by computing each skyline point independently. Our algorithm also straightforwardly extends for the Chebyshev distance, also known as $L_{\infty}$ distance, which are used extensively for spatial logistics in warehouses [10].
•
We evaluate our framework using synthetic data and show that our algorithms are faster by orders of magnitude than the current state-of-the-art approaches.

Section snippets

Related work

This section provides a brief survey of work related to spatial query processing. Skyline queries were introduced in the context of finding the maximum vectors [1]. Since then they have been studied in database applications, both in a course of enhancing the efficiency of computation [2], [3], [11], [4], [5], [12] and in course of enhancing the quality of results [13], [14], [15], by narrowing down skyline results using properties such as frequency, k-dominance, and k-representativeness of

Problem definition

In the spatial skyline query problem, we are given two point sets: a set P of data points and a set Q of query points in the plane, assuming that $| Q | \leq | P |$ . In general, the purpose of querying on a data set is to extract a subset of the data set with respect to the query set and the query set behaves as a set of constraints which each skyline point must satisfy. In many practical situations, the size of constraints is much smaller than the size of data under consideration, and therefore the

Observation

The basic idea of our algorithm is as follows. To determine whether $p \in P$ is skyline or not, the approach under L₂ distance performs dominance tests with the current skyline points (which we later discuss in detail, denoted as baseline algorithm PSQ, in Section 7).

Under L₁ distance, we use a different approach in which we check the existence of a point that dominates p. To do this, we introduce another definition (below) on spatial dominance between two points which is equivalent to Definition 1.

Algorithm

In this section, we show how to handle each of the three cases efficiently so as to achieve an $O (\log | P |)$ time algorithm for determining whether a data point is skyline or not.

Tracing moving query points

In this section, we introduce a variation of the MSSQ problem, where data points are fixed and each query point q_i moves either vertically or horizontally at unit speed. More precisely, for a nonnegative real number t, let q_i(t) denote the translation of q_i at time t, that is, $q_{i} (t) ≔ q_{i} + (t, 0)$ (or $q_{i} (t) = q_{i} - (t, 0)$ ) if q_i moves along a horizontal line, and $q_{i} (t) ≔ q_{i} + (0, t)$ (or $q_{i} (t) = q_{i} - (0, t)$ ) if q_i moves along a vertical line. Let Q(t) be the set of query points at time t, and let $R (p, t)$ be R(p)

Implementation

In our implementation of MSSQ, an R-tree is used to efficiently prune out nonskyline points from P. More specifically, we first find a range bounding Q and read a constant number of points in this region from the R-tree. For each such point p, we identified the bounding box for $\cup_{i = 1}^{| Q |} C (p, q_{i})$ . Any point outside of this bounding box can be safely pruned as it would be dominated by p. We intersect such bounding boxes and retrieve the points falling into this region, which can be efficiently

Experimental evaluation

In this section, we outline our experimental settings, and present evaluation results to validate the efficiency and effectiveness of our framework. We compare our algorithm (MSSQ) with PSQ and BBS. As datasets, we use both synthetic datasets and a real dataset of points of interest (POI) in California. We carry out our experiments on Linux with Intel Q6600 CPU and 3 GB memory, and the algorithms are coded in C++.

Conclusion

We have studied Manhattan spatial skyline query processing and presented an efficient algorithm. We showed that our algorithm can identify the correct result in $O (| P | \log | P |)$ time with desirable properties of easy parallelizability and extensibility.

We also propose an algorithm for spatial skyline queries when query points move either vertically or horizontally. Our algorithm runs in $O (| P | \log | P |)$ time when only one query point moves and in $O (| P |^{2} | Q |)$ time when more than one query point moves.

References (25)

H.T. Kung et al.
On finding the maxima of a set of vectors
J. Assoc. Comput. Mach.
(1975)
S. Börzsönyi, D. Kossmann, K. Stocker, The skyline operator, in: ICDE '01: Proceedings of the 17th International...
K. Tan, P. Eng, B.C. Ooi, Efficient progressive skyline computation, in: VLDB '01: Proceedings of the 27th...
D. Papadias, Y. Tao, G. Fu, B. Seeger, An optimal and progressive algorithm for skyline queries, in: SIGMOD '03:...
J. Chomicki, P. Godfery, J. Gryz, D. Liang, Skyline with presorting, in: ICDE '03: Proceedings of the 19th...
M. Sharifzadeh, C. Shahabi, The spatial skyline queries, in: VLDB '06: Proceedings of the 32nd International Conference...
M. Sharifzadeh et al.
Processing spatial skyline queries in both vector spaces and spatial network databases
ACM Trans. Database Syst.
(2009)
W. Son, M.-W. Lee, H.-K. Ahn, S.-w. Hwang, Spatial skyline queries: an efficient geometric algorithm, in: SSTD '09:...
M.-W. Lee et al.
Spatial skyline queriesexact and approximation algorithms
GeoInformatica
(2011)
G. Cormier
Operational research methods for efficient warehousing

D. Kossmann, F. Ramsak, S. Rost, Shooting stars in the sky: an online algorithm for skyline queries, in: VLDB '02:...

P. Godfrey, R. Shipley, J. Gryz, Maximal vector computation in large data sets, in: VLDB '05: Proceedings of the 31st...

Cited by (25)

A fast and efficient algorithm for determining the connected orthogonal convex hulls
2022, Applied Mathematics and Computation
The Quickhull algorithm for determining the convex hull of a finite set of points was independently conducted by Eddy in 1977 and Bykat in 1978. Inspired by the idea of this algorithm, we present a new efficient algorithm, for determining the connected orthogonal convex hull of a finite set of points through extreme points of the hull, that still keeps advantages of the Quickhull algorithm. Consequently, our algorithm runs faster than the others (the algorithms introduced by Montuno and Fournier in 1982 and by An, Huyen and Le in 2020). We also show that the expected complexity of the algorithm is $O (n \log n)$ , where $n$ is the number of points.
A modified Graham's convex hull algorithm for finding the connected orthogonal convex hull of a finite planar point set
2021, Applied Mathematics and Computation
Graham’s convex hull algorithm outperforms the others on those distributions where most of the points are on or near the boundary of the hull (Allison and Noga, 1984). To use this algorithm for finding an orthogonal convex hull of a finite planar point set, we introduce the concept of extreme points of a connected orthogonal convex hull of the set, and show that these points belong to the set. Then we prove that the connected orthogonal convex hull of a finite set of points is an orthogonal ( $x, y$ )-polygon where its convex vertices are its connected orthogonal convex hull’s extreme points. As a result, an efficient algorithm, based on the idea of Graham’s convex hull algorithm, for finding the connected orthogonal convex hull of a finite planar point set is presented. We also show that the lower bound of computational complexity of such algorithms is $O (n \log n)$ . Some numerical results for finding the connected orthogonal convex hulls of random sets are given.
Computation of spatial skyline points
2021, Computational Geometry: Theory and Applications
The database skyline query (or non-domination query) has a spatial form: Given a set P with n point sites, and a point set S of m locations of interest, a site $p \in P$ is a skyline point if and only if for each $q \in P ∖ {p}$ , there exists at least one location $s \in S$ that is closer to p than to q. We reduce the problem of determining skyline points to the problem of finding sites that have non-empty cells in an additively weighted Voronoi diagram under a convex distance function. The weights of said Voronoi diagram are derived from the coordinates of the sites of P, while the convex distance function is derived from the set of locations S. In the two-dimensional plane, this reduction gives an $O ((n + m) \log (n + m))$ -time algorithm to find the skyline points.
Nearest and farthest spatial skyline queries under multiplicative weighted Euclidean distances
2020, Knowledge-Based Systems
Citation Excerpt :
Spatial skyline queries with non-Euclidean distance. There exist algorithms to obtain the skylines using road-network distances [27] and Manhattan distance [33]. They both face the same problem but analyzing proximity with a different distance function.
Consider two point sets in the plane, a set of points of interest and a set of query points that is used to establish distance restrictions with respect to the set of points of interest. A nearest/farthest spatial skyline query retrieves the subset of desirable or relevant points of interest, called skyline points, such that no other point of interest is simultaneously closer to/farther from all the query points. The nearest/farthest top- $k$ spatial skylines, are the best $k$ nearest/farthest spatial skylines among the existent ones. All these queries find applications in decision-making support systems, facility location, crisis management and in trips or events planning. To take into account that each point of interest has a different importance, a weight is assigned to each of them and multiplicative weighted Euclidean distances are used. In this paper, we study, for the first time, the nearest and farthest spatial skyline queries when multiplicative weighted Euclidean distances are considered. We prove that most of the properties of the traditional non weighted nearest and farthest spatial skyline queries are no longer true under the weighted Euclidean distance and, consequently, the strategies used for solving non weighted spatial skyline queries are not usable in the weighted case. We present a sequential and a parallel algorithm, to be run on the CPU and on a Graphics Processing Unit, respectively, for solving nearest/farthest weighted spatial skyline queries and to extract the nearest/farthest top- $k$ spatial skylines. We provide the time and space complexity analysis of both algorithms together with their theoretical comparison. We also have developed a simple interface to deal with weighted spatial skyline queries which allows to visualize and store in a file the obtained spatial skylines. Finally, we present and discuss experimental results obtained with the implementation of the proposed sequential and parallel algorithms.
Top-k Manhattan spatial skyline queries
2017, Information Processing Letters
Citation Excerpt :
The algorithm is coded in C++. We compare our algorithm Top-k-MSSQ with a straightforward implementation of reporting the k best ones (with respect to f) from the skylines returned by MSSQ [11,12]. In our experiments, we only consider query times.
Data retrieval from a huge spatial database has been the subject of research fields including database systems, geographic information systems, and computational geometry for many years. In this context, we study the retrieval of relevant points with respect to a query and a scoring function: For two point sets P and Q in the plane, the skyline of P with respect to Q consists of points of P for which no other point of P is closer to all points of Q. A skyline of a point set P with respect to a query set Q can be seen as the most “relevant” or “desirable” subset of P with respect to Q. As the skyline of a set P can be as large as P itself, it is reasonable to filter the skyline further using a scoring function f that reflects the relevance of each point in the skyline well, and to report only the k best skyline points with respect to f.
In this paper, we consider the top-k Manhattan spatial skyline query problem with respect to monotone scoring functions which quantifies, for each point in P, how well it fits the given query under the $L_{1}$ distance. We present an algorithm that computes the top-k skyline points in time near linear in the size of P, assuming that f and k are part of the input. The presented strategy improves over the direct approach of using the state-of-the-art algorithm to compute the Manhattan spatial skyline and then filtering it by the scoring function by a $\log (| P |)$ factor. Our empirical results suggest that our algorithm outperforms the direct approach by an order of magnitude.
Group nearest-neighbor queries in the L<inf>1</inf> plane
2015, Theoretical Computer Science
Let P be a set of n points in the plane. The k-nearest-neighbor (abbreviated as k-NN) query problem is to preprocess P into a data structure that quickly reports k closest points in P for a query point q. This paper addresses a generalization of the k-NN query problem to a query set Q of points, namely, the group k-nearest-neighbor query problem, in the $L_{1}$ plane. More precisely, a query is assigned with a set Q of at most m points and a positive integer k with $k \leq n$ , and the distance between a point p of P and a query set Q is defined as the sum of $L_{1}$ distances from p to all $q \in Q$ . The maximum number m of query points Q is assumed to be known in advance and to be at most n. In this paper, we propose two algorithms, one based on the range tree and the other based on a data structure for segment dragging queries, and obtain the following complexity bounds: (1) a group k-NN query can be handled in $O (T_{\min} \log n + (k + m^{2}) (\log \log n + \log m))$ time after preprocessing P using $O (m^{2} n \log^{2} n)$ space, where $T_{\min} = \min {k + m, m^{2}}$ , or (2) a group k-NN query can be handled in $O ((k + m) \log^{2} n + m^{2} (\log^{ϵ} n + \log m))$ time after preprocessing P using $O (m^{2} n)$ space, where $ϵ > 0$ is an arbitrarily small constant. We also show that our approach can be applied to the weighted group k-nearest-neighbor query problem and the group k-farthest-neighbor query problem.

View all citing articles on Scopus

^☆: Work by Son and Ahn was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No. 2011-0030044). Work by Hwang was supported by Microsoft Research Asia.

View full text

MSSQ: Manhattan Spatial Skyline Queries☆

Highlights

Abstract

Introduction

Section snippets

Related work

Problem definition

Observation

Algorithm

Tracing moving query points

Implementation

Experimental evaluation

Conclusion

On finding the maxima of a set of vectors

J. Assoc. Comput. Mach.

Processing spatial skyline queries in both vector spaces and spatial network databases

ACM Trans. Database Syst.

Spatial skyline queriesexact and approximation algorithms

GeoInformatica

Operational research methods for efficient warehousing