skip to main content
10.1145/1066677.1066793acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

A hybrid approach for multiresolution modeling of large-scale scientific data

Published:13 March 2005Publication History

ABSTRACT

Simulations of complex scientific phenomena involve the execution of massively parallel computer programs. These simulation programs generate large-scale multidimensional data sets over the spatio-temporal region. Analyzing such massive data sets is an essential step in helping scientists glean new information. To this end, efficient and effective data models are needed. In this paper, we present a hybrid approach for constructing data models from large-scale multidimensional scientific data sets. Our models not only provide descriptive information about the data but also allow users to subsequently examine the data by querying the data models. Our approach combines a multiresolution-topological model of the data with a multivariate-physical model of the data to generate one hierarchical data model that efficiently captures both the spatio-temporal and the physical aspects of the data. In particular, this hybrid approach consists of three phases. In the first phase, we build a multiresolution model that encapsulates the data set's spatial information (i.e., topology and spatial connectivity). In the second phase, we build a multivariate model from the physical dimensions of the data set. Physical dimensions refer to those dimensions that are neither spatial (x, y, z) nor temporal (time). The exclusion of the spatial-temporal dimensions from the clustering phase is important since "similar" characteristics could be located (spatially) far from each other. Finally, in the third phase, we connect the multivariate-physical model to the multiresolution-topological model by utilizing ideas from information retrieval. The third phase is essential since the multivariate-physical model does not contain any topological information (without which the model does not have accurate spatial context information). Experimental evaluations on two large-scale multidimensional scientific data sets illustrate the value of our hybrid approach.

References

  1. Abdulla, G., Critchlow, T., Arrighi, W. Simulation Data as Data Streams, In SIGMOD Record, 33, 1 (March 2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abdulla, G., Baldwin, C., Critchlow, T, Kamimura, R., Lozares, I., Musick, R., Tang, N. A., Lee, B., and Snapp, R. Approximate ad-hoc query engine for simulation data, In JCDL 2001, 255--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Acharya, S., Gibbsons, P. B., Poosala, V., and Ramaswamy, S. The Aqua approximate query answering system, In ACM SIGMOD 1999, 574--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baldwin, C., Eliassi-Rad, T., Abdulla, G., and Critchlow, T. The evolution of a hierarchical partitioning algorithm for large-scale scientific data: three steps of increasing complexity, In SSDBM 2003, 225--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baldwin, C., Abdulla, G., Critchlow, T. Multi-resolution modeling of large scale scientific simulation data, In CIKM 2003, 40--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dadgostar, H., Zarnegar, B., Hoffmann, A., Qin, X.-F., Truong, U., Rao, G., Baltimore, D., and Cheng, G., Cooperation of multiple signaling pathways in CD40-regulated gene expression in B lymphocytes. In Proc. of National Academy of Sciences of the U.S.A., 99, 3, 2002, 1497--1502.Google ScholarGoogle ScholarCross RefCross Ref
  7. DuMouchel, W., Volinsky, CH., Johnson, T., Cortes, C., and Pregibon, D., Squashing flat files flatter, In KDD 1999, 6--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. Cluster analysis and display of genome-wide expression patterns. In Proc. of the National Academy of Sciences of the U.S.A., 95, 25, 1998, 14863--14868.Google ScholarGoogle ScholarCross RefCross Ref
  9. Eliassi-Rad, T., Baldwin, C., Abdulla, G., and Critchlow, T. Statistical modeling of large-scale scientific simulation data. New Generation of Data Mining Applications, Eds: Zurada J. and Kantardzie M., IEEE Press/Wiley, January 2005.Google ScholarGoogle Scholar
  10. Eliassi-Rad, T., and Critchlow, T. Clustering with Uncentered Correlation Coefficients: Beware of Offsets, Lawrence Livermore Technical Report, 2004.Google ScholarGoogle Scholar
  11. Freitag, L. A., and Loy, R. M. Adaptive, multi-resolution visualization of large data sets using a distributed memory octree, Supercomputing 1999, Article 60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hand, D., Mannila, H., and Smyth, P. Principles of Data Mining, MIT Press, Cambridge, MA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jolliffe, I. T. Principal Component Analysis, Springer-Verlag; 2nd edition, 2002.Google ScholarGoogle Scholar
  14. Musick, R., and Critchlow, T. Practical lessons in supporting large-scale computational science, In SIGMOD Record, 28, 4 (December 1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ng, R. T., and Han, J., Efficient and effective clustering methods for spatial data mining, In VLDB 1994, 144--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Parsons, L., Haque, E., and Liu, H. Subspace Clustering for High Dimensional Data: A Review. In SIGKDD Explorations, 6, 1 (June 2004), 90--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Wang, W, Yang, J., and Muntz, R. STING: A statistical information grid approach to spatial data mining, In VLDB 1997, 186--195. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A hybrid approach for multiresolution modeling of large-scale scientific data

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  SAC '05: Proceedings of the 2005 ACM symposium on Applied computing
                  March 2005
                  1814 pages
                  ISBN:1581139640
                  DOI:10.1145/1066677

                  Copyright © 2005 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 13 March 2005

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • Article

                  Acceptance Rates

                  Overall Acceptance Rate1,650of6,669submissions,25%
                • Article Metrics

                  • Downloads (Last 12 months)0
                  • Downloads (Last 6 weeks)0

                  Other Metrics

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader