Abstract
Order preserving estimation is an estimation method that can retain the original order of the population parameters of interest. It is an important tool in many applications such as data visualization. In this paper, we focus on the population mean as our primary estimation function, and propose effective query processing strategy that can preserve the estimated order to be correct with probabilistic guarantees. We define the cost function as the number of samples taken for all the groups, and our goal is to make the sample size as small as possible. We compare our methods with state-of-the-art near-optimal algorithm in the literature, and achieve up to \(80\,\%\) reduction in the total sample size.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The order is induced from progressive sampling without replacement process. I.e., \(Rank(X_i) < Rank(X_j)\), if \(i < j\).
- 2.
Same as before we use \(g_i\) as the group with rank i in a candidate order in the rest of Sect. 4 in order to keep the notation clean.
References
Bardenet, R., Maillard, O.A.: Concentration inequalities for sampling without replacement. Bernoulli 21(3), 1361–1385 (2015)
Casella, G., Berger, R.: Statistical Inference. Thomson Learning (2002)
Chaudhuri, S., Das, G., Narasayya, V.R.: Optimized stratified sampling for approximate query processing. TODS 32(2), 9 (2007)
Chaudhuri, S., Motwani, R., Narasayya, V.R.: On random sampling over joins. In: SIGMOD, pp. 263–274 (1999)
Cormode, G., Garofalakis, M.N., Haas, P.J., Jermaine, C.: Synopses for massive data: Samples, histograms, wavelets, sketches. Found. Trends Databases 4(1–3), 1–294 (2012)
DataExpo,: Flight records (2009). http://stat-computing.org/dataexpo/2009/the-data.html
Garofalakis, M.N., Gibbons, P.B.: Approximate query processing: Taming the terabytes. In: VLDB (2001)
Haas, P.J., Swami, A.N.: Sequential sampling procedures for query size estimation. In: SIGMOD, pp. 341–350 (1992)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)
Kim, A., Blais, E., Parameswaran, A.G., Indyk, P., Madden, S., Rubinfeld, R.: Rapid sampling for visualizations with ordering guarantees. PVLDB 8(5), 521–532 (2015)
Neyman, J.: On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. J. Royal Stat. Soc. 97(4), 558–625 (1934)
Nirkhiwale, S., Dobra, A., Jermaine, C.M.: A sampling algebra for aggregate estimation. PVLDB 6(14), 1798–1809 (2013)
Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. In: SIGMOD, pp. 256–276 (1984)
Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. PVLDB 8(1), 1–12 (2014)
Vitter, J.S.: Random sampling with a reservoir. ACM TOMS 11(1), 37–57 (1985)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Chen, C., Wang, W., Wang, X., Yang, S. (2016). Effective Order Preserving Estimation Method. In: Cheema, M., Zhang, W., Chang, L. (eds) Databases Theory and Applications. ADC 2016. Lecture Notes in Computer Science(), vol 9877. Springer, Cham. https://doi.org/10.1007/978-3-319-46922-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-46922-5_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46921-8
Online ISBN: 978-3-319-46922-5
eBook Packages: Computer ScienceComputer Science (R0)