Abstract
In many different applications (ranging from OLAP databases to query optimization) having an approximate distribution of values in a data set is an important improvement that allows a relevant saving of time or resources during computations. Histograms are a good solution, offering a good balance between computation cost and accuracy. Multidimensional data require more complicated handling in order to keep these two requirements within significant usefulness. In this paper we propose an improvement of the MHIST-p algorithm for the generation of multidimensional histograms and compare it with other approaches from literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blake, C. Merz, C.: UCI repository of machine learning databases (1998)
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. Technical Report MSR-TR-2001-36, Microsoft Research (2001)
Deshpande, A., Garofalakis, M., Rastogi, R.: Independence is good: dependency-based histogram synopses for high-dimensional data. In: Proceeding of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 199–210, 21–24 May 2001, Santa Barbara, CA (2001)
Dumouchel, W., Faloutsos, C., Haas, P.J., Hellerstein, J.M., Ioannidis, Y., Jagadish, H.V., Johnson, T., Ng, R., Poosala, V., Ross, K.A., Sevcik, K.C.: The New Jersey data reduction report. IEEE Data Eng. Bull. 20, 3–45 (1997)
Frank, A., Asuncion, A.: UCI Machine Learning Repository. Irvine, CA, University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml (2010)
Ioannidis, Y.: The history of histograms (abridged). In: Proceeding of the 29th International Conference on Very Large Data Bases, 09–12 Sept 2003, pp. 19–30, Berlin, Germany (2003)
Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: Algorithms, complexity, and applications. In: Beeri, C., Buneman, P. (eds.) Database Theory. In: ICDT ’99, 7th International Conference, Jerusalem, Israel, 10–12 Jan 1999, Proceedings. LNCS, vol. 1540, pp. 236–256. Springer, Heidelberg (1999)
Pagel, B.-U., Six, H.-W., Toben, H., Widmayer, P.: Towards an analysis of range query performance in spatial data structures. In: Proceedings of 12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database System Washington, DC (1993)
Poosala, V. Ioannidis Y.: Selectivity estimation without the attribute value independence assumption. In: Proceedings of the 23rd International Conference on Very Large Databases, Athens, Greece (1997)
Wang, H., Sevcik, K.C.: A multi-dimensional histogram for selectivity estimation and fast approximate query answering. In: Proceedings of the 2003 Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, ON (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Iacono, M., Irpino, A. (2011). Improving the MHIST-p Algorithm for Multivariate Histograms of Continuous Data. In: Fichet, B., Piccolo, D., Verde, R., Vichi, M. (eds) Classification and Multivariate Analysis for Complex Data Structures. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13312-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-13312-1_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13311-4
Online ISBN: 978-3-642-13312-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)