Article

A hybrid approach for multiresolution modeling of large-scale scientific data

Authors:
Tina Eliassi-Rad

Lawrence Livermore National Laboratory, Livermore, CA

Lawrence Livermore National Laboratory, Livermore, CA
View Profile

,
Terence Critchlow

Lawrence Livermore National Laboratory, Livermore, CA

Lawrence Livermore National Laboratory, Livermore, CA
View Profile

SAC '05: Proceedings of the 2005 ACM symposium on Applied computingMarch 2005Pages 511–518https://doi.org/10.1145/1066677.1066793

Published:13 March 2005Publication History

SAC '05: Proceedings of the 2005 ACM symposium on Applied computing

Pages 511–518

ABSTRACT

Simulations of complex scientific phenomena involve the execution of massively parallel computer programs. These simulation programs generate large-scale multidimensional data sets over the spatio-temporal region. Analyzing such massive data sets is an essential step in helping scientists glean new information. To this end, efficient and effective data models are needed. In this paper, we present a hybrid approach for constructing data models from large-scale multidimensional scientific data sets. Our models not only provide descriptive information about the data but also allow users to subsequently examine the data by querying the data models. Our approach combines a multiresolution-topological model of the data with a multivariate-physical model of the data to generate one hierarchical data model that efficiently captures both the spatio-temporal and the physical aspects of the data. In particular, this hybrid approach consists of three phases. In the first phase, we build a multiresolution model that encapsulates the data set's spatial information (i.e., topology and spatial connectivity). In the second phase, we build a multivariate model from the physical dimensions of the data set. Physical dimensions refer to those dimensions that are neither spatial (x, y, z) nor temporal (time). The exclusion of the spatial-temporal dimensions from the clustering phase is important since "similar" characteristics could be located (spatially) far from each other. Finally, in the third phase, we connect the multivariate-physical model to the multiresolution-topological model by utilizing ideas from information retrieval. The third phase is essential since the multivariate-physical model does not contain any topological information (without which the model does not have accurate spatial context information). Experimental evaluations on two large-scale multidimensional scientific data sets illustrate the value of our hybrid approach.

References

Abdulla, G., Critchlow, T., Arrighi, W. Simulation Data as Data Streams, In SIGMOD Record, 33, 1 (March 2004). Google ScholarDigital Library
Abdulla, G., Baldwin, C., Critchlow, T, Kamimura, R., Lozares, I., Musick, R., Tang, N. A., Lee, B., and Snapp, R. Approximate ad-hoc query engine for simulation data, In JCDL 2001, 255--256. Google ScholarDigital Library
Acharya, S., Gibbsons, P. B., Poosala, V., and Ramaswamy, S. The Aqua approximate query answering system, In ACM SIGMOD 1999, 574--576. Google ScholarDigital Library
Baldwin, C., Eliassi-Rad, T., Abdulla, G., and Critchlow, T. The evolution of a hierarchical partitioning algorithm for large-scale scientific data: three steps of increasing complexity, In SSDBM 2003, 225--228. Google ScholarDigital Library
Baldwin, C., Abdulla, G., Critchlow, T. Multi-resolution modeling of large scale scientific simulation data, In CIKM 2003, 40--48. Google ScholarDigital Library
Dadgostar, H., Zarnegar, B., Hoffmann, A., Qin, X.-F., Truong, U., Rao, G., Baltimore, D., and Cheng, G., Cooperation of multiple signaling pathways in CD40-regulated gene expression in B lymphocytes. In Proc. of National Academy of Sciences of the U.S.A., 99, 3, 2002, 1497--1502.Google ScholarCross Ref
DuMouchel, W., Volinsky, CH., Johnson, T., Cortes, C., and Pregibon, D., Squashing flat files flatter, In KDD 1999, 6--15. Google ScholarDigital Library
Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. Cluster analysis and display of genome-wide expression patterns. In Proc. of the National Academy of Sciences of the U.S.A., 95, 25, 1998, 14863--14868.Google ScholarCross Ref
Eliassi-Rad, T., Baldwin, C., Abdulla, G., and Critchlow, T. Statistical modeling of large-scale scientific simulation data. New Generation of Data Mining Applications, Eds: Zurada J. and Kantardzie M., IEEE Press/Wiley, January 2005.Google Scholar
Eliassi-Rad, T., and Critchlow, T. Clustering with Uncentered Correlation Coefficients: Beware of Offsets, Lawrence Livermore Technical Report, 2004.Google Scholar
Freitag, L. A., and Loy, R. M. Adaptive, multi-resolution visualization of large data sets using a distributed memory octree, Supercomputing 1999, Article 60. Google ScholarDigital Library
Hand, D., Mannila, H., and Smyth, P. Principles of Data Mining, MIT Press, Cambridge, MA, 2001. Google ScholarDigital Library
Jolliffe, I. T. Principal Component Analysis, Springer-Verlag; 2nd edition, 2002.Google Scholar
Musick, R., and Critchlow, T. Practical lessons in supporting large-scale computational science, In SIGMOD Record, 28, 4 (December 1999). Google ScholarDigital Library
Ng, R. T., and Han, J., Efficient and effective clustering methods for spatial data mining, In VLDB 1994, 144--155. Google ScholarDigital Library
Parsons, L., Haque, E., and Liu, H. Subspace Clustering for High Dimensional Data: A Review. In SIGKDD Explorations, 6, 1 (June 2004), 90--105. Google ScholarDigital Library
Wang, W, Yang, J., and Muntz, R. STING: A statistical information grid approach to spatial data mining, In VLDB 1997, 186--195. Google ScholarDigital Library

Index Terms

Recommendations

Statistical modeling of large-scale simulation data
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

With the advent of fast computer systems, scientists are now able to generate terabytes of simulation data. Unfortunately, the sheer size of these data sets has made efficient exploration of them impossible. To aid scientists in gleaning insight from ...
Read More
A document-based data warehousing approach for large scale data mining
ICPCA/SWS'12: Proceedings of the 2012 international conference on Pervasive Computing and the Networked World

Data mining techniques are widely applied and data warehousing is relatively important in this process. Both scalability and efficiency have always been the key issues in data warehousing. Due to the explosive growth of data, data warehousing today is ...
Read More
Mesh Data Management in Large-Scale Scientific Computing
CHINAGRID '08: Proceedings of the The Third ChinaGrid Annual Conference (chinagrid 2008)

In many research fields of numerical simulations, programs often produce a large amount of mesh data with complex structure. It is a fatal bottleneck for scientists to manage such large-scale simulation data. In allusion to typical data characteristics ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing
March 2005
1814 pages
ISBN:1581139640
DOI:10.1145/1066677
Conference Chair:
Hisham M. Haddad
Kennesaw State University
,
Editor:
Lorie M. Liebrock
New Mexico Institute of Mining and Technology, Socorro, NM
,
Program Chairs:
Andrea Omicini
Alma Mater Studiorum, Universita di Bologna, Italy
,
Roger L. Wainwright
Univerity of Tulsa, OK
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 March 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information retrieval
large-scale scientific data sets
multiresolution indices
multivariate clusters
topological models
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 341
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A hybrid approach for multiresolution modeling of large-scale scientific data

SAC '05: Proceedings of the 2005 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Statistical modeling of large-scale simulation data

A document-based data warehousing approach for large scale data mining

Mesh Data Management in Large-Scale Scientific Computing