Improving the MHIST-p Algorithm for Multivariate Histograms of Continuous Data

Iacono, Mauro; Irpino, Antonio

doi:10.1007/978-3-642-13312-1_15

Mauro Iacono⁵ &
Antonio Irpino⁵

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2904 Accesses

Abstract

In many different applications (ranging from OLAP databases to query optimization) having an approximate distribution of values in a data set is an important improvement that allows a relevant saving of time or resources during computations. Histograms are a good solution, offering a good balance between computation cost and accuracy. Multidimensional data require more complicated handling in order to keep these two requirements within significant usefulness. In this paper we propose an improvement of the MHIST-p algorithm for the generation of multidimensional histograms and compare it with other approaches from literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blake, C. Merz, C.: UCI repository of machine learning databases (1998)
Google Scholar
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. Technical Report MSR-TR-2001-36, Microsoft Research (2001)
Google Scholar
Deshpande, A., Garofalakis, M., Rastogi, R.: Independence is good: dependency-based histogram synopses for high-dimensional data. In: Proceeding of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 199–210, 21–24 May 2001, Santa Barbara, CA (2001)
Google Scholar
Dumouchel, W., Faloutsos, C., Haas, P.J., Hellerstein, J.M., Ioannidis, Y., Jagadish, H.V., Johnson, T., Ng, R., Poosala, V., Ross, K.A., Sevcik, K.C.: The New Jersey data reduction report. IEEE Data Eng. Bull. 20, 3–45 (1997)
Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository. Irvine, CA, University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml (2010)
Ioannidis, Y.: The history of histograms (abridged). In: Proceeding of the 29th International Conference on Very Large Data Bases, 09–12 Sept 2003, pp. 19–30, Berlin, Germany (2003)
Google Scholar
Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: Algorithms, complexity, and applications. In: Beeri, C., Buneman, P. (eds.) Database Theory. In: ICDT ’99, 7th International Conference, Jerusalem, Israel, 10–12 Jan 1999, Proceedings. LNCS, vol. 1540, pp. 236–256. Springer, Heidelberg (1999)
Chapter Google Scholar
Pagel, B.-U., Six, H.-W., Toben, H., Widmayer, P.: Towards an analysis of range query performance in spatial data structures. In: Proceedings of 12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database System Washington, DC (1993)
Google Scholar
Poosala, V. Ioannidis Y.: Selectivity estimation without the attribute value independence assumption. In: Proceedings of the 23rd International Conference on Very Large Databases, Athens, Greece (1997)
Google Scholar
Wang, H., Sevcik, K.C.: A multi-dimensional histogram for selectivity estimation and fast approximate query answering. In: Proceedings of the 2003 Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, ON (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

DEM, Seconda Universitá degli Studi di Napoli Belvedere Reale di San Leucio, Caserta, Italy
Mauro Iacono & Antonio Irpino

Authors

Mauro Iacono
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Irpino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mauro Iacono .

Editor information

Editors and Affiliations

, Laboratoire d' Informatique, Université d' Aix-Marseille II, Avenue de Luminy 163case 901, Marseille cedex 9, 13288, France
Bernard Fichet
, Dipartimento di Scienze Statistiche, Università di Napoli "Federico II", Via Leopoldo Rodinò 22, Naples, 80138, Italy
Domenico Piccolo
, Facoltà di Studi Politici "Jean Monnet", Seconda Università di Napoli, Via del Setificio 15, Caserta, 81100, Italy
Rosanna Verde
, Facoltà di Scienze Statistiche, Università di Roma "La Sapienza", P.le Aldo Moro 5, Rome, 00185, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iacono, M., Irpino, A. (2011). Improving the MHIST-p Algorithm for Multivariate Histograms of Continuous Data. In: Fichet, B., Piccolo, D., Verde, R., Vichi, M. (eds) Classification and Multivariate Analysis for Complex Data Structures. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13312-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-13312-1_15
Published: 08 November 2010
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13311-4
Online ISBN: 978-3-642-13312-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics