Skip to main content

Improving the MHIST-p Algorithm for Multivariate Histograms of Continuous Data

  • Conference paper
  • First Online:
Classification and Multivariate Analysis for Complex Data Structures
  • 2904 Accesses

Abstract

In many different applications (ranging from OLAP databases to query optimization) having an approximate distribution of values in a data set is an important improvement that allows a relevant saving of time or resources during computations. Histograms are a good solution, offering a good balance between computation cost and accuracy. Multidimensional data require more complicated handling in order to keep these two requirements within significant usefulness. In this paper we propose an improvement of the MHIST-p algorithm for the generation of multidimensional histograms and compare it with other approaches from literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blake, C. Merz, C.: UCI repository of machine learning databases (1998)

    Google Scholar 

  2. Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. Technical Report MSR-TR-2001-36, Microsoft Research (2001)

    Google Scholar 

  3. Deshpande, A., Garofalakis, M., Rastogi, R.: Independence is good: dependency-based histogram synopses for high-dimensional data. In: Proceeding of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 199–210, 21–24 May 2001, Santa Barbara, CA (2001)

    Google Scholar 

  4. Dumouchel, W., Faloutsos, C., Haas, P.J., Hellerstein, J.M., Ioannidis, Y., Jagadish, H.V., Johnson, T., Ng, R., Poosala, V., Ross, K.A., Sevcik, K.C.: The New Jersey data reduction report. IEEE Data Eng. Bull. 20, 3–45 (1997)

    Google Scholar 

  5. Frank, A., Asuncion, A.: UCI Machine Learning Repository. Irvine, CA, University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml (2010)

  6. Ioannidis, Y.: The history of histograms (abridged). In: Proceeding of the 29th International Conference on Very Large Data Bases, 09–12 Sept 2003, pp. 19–30, Berlin, Germany (2003)

    Google Scholar 

  7. Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: Algorithms, complexity, and applications. In: Beeri, C., Buneman, P. (eds.) Database Theory. In: ICDT ’99, 7th International Conference, Jerusalem, Israel, 10–12 Jan 1999, Proceedings. LNCS, vol. 1540, pp. 236–256. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  8. Pagel, B.-U., Six, H.-W., Toben, H., Widmayer, P.: Towards an analysis of range query performance in spatial data structures. In: Proceedings of 12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database System Washington, DC (1993)

    Google Scholar 

  9. Poosala, V. Ioannidis Y.: Selectivity estimation without the attribute value independence assumption. In: Proceedings of the 23rd International Conference on Very Large Databases, Athens, Greece (1997)

    Google Scholar 

  10. Wang, H., Sevcik, K.C.: A multi-dimensional histogram for selectivity estimation and fast approximate query answering. In: Proceedings of the 2003 Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, ON (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mauro Iacono .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Iacono, M., Irpino, A. (2011). Improving the MHIST-p Algorithm for Multivariate Histograms of Continuous Data. In: Fichet, B., Piccolo, D., Verde, R., Vichi, M. (eds) Classification and Multivariate Analysis for Complex Data Structures. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13312-1_15

Download citation

Publish with us

Policies and ethics