research-article

TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition

Authors:
Grey Ballard

Wake Forest University, Winston-Salem, NC

Wake Forest University, Winston-Salem, NC

0000-0003-1557-8027
View Profile

,
Alicia Klinvex

Sandia National Laboratories, Livermore, CA

Sandia National Laboratories, Livermore, CA
View Profile

,
Tamara G. Kolda

Sandia National Laboratories, Livermore, CA

Sandia National Laboratories, Livermore, CA
View Profile

Authors Info & Claims

ACM Transactions on Mathematical Software Volume 46 Issue 2Article No.: 13pp 1–31https://doi.org/10.1145/3378445

Published:01 June 2020Publication History

ACM Transactions on Mathematical Software

Abstract

Our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software package for compressing distributed data. The approach is based on treating the data as a tensor, i.e., a multidimensional array, and computing its truncated Tucker decomposition, a higher-order analogue to the truncated singular value decomposition of a matrix. The result is a low-rank approximation of the original tensor-structured data. Compression efficiency is achieved by detecting latent global structure within the data, which we contrast to most compression methods that are focused on local structure. In this work, we describe TuckerMPI, our implementation of the truncated Tucker decomposition, including details of the data distribution and in-memory layouts, the parallel and serial implementations of the key kernels, and analysis of the storage, communication, and computational costs. We test the software on 4.5 and 6.7 terabyte data sets distributed across 100 s of nodes (1,000 s of MPI processes), achieving compression ratios between 100 and 200,000×, which equates to 99--99.999% compression (depending on the desired accuracy) in substantially less time than it would take to even read the same dataset from a parallel file system. Moreover, we show that our method also allows for reconstruction of partial or down-sampled data on a single node, without a parallel computer so long as the reconstructed portion is small enough to fit on a single machine, e.g., in the instance of reconstructing/visualizing a single down-sampled time step or computing summary statistics. The code is available at https://gitlab.com/tensors/TuckerMPI.

References

S. Afra, E. Gildin, and M. Tarrahi. 2014. Heterogeneous reservoir characterization using efficient parameterization through higher order SVD (HOSVD). In Proceedings of the American Control Conference. 147--152. DOI:https://doi.org/10.1109/ACC.2014.6859246Google Scholar
Woody Austin, Grey Ballard, and Tamara G. Kolda. 2016. Parallel tensor compression for large-scale scientific data. In Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS’16). 912--922. DOI:https://doi.org/10.1109/IPDPS.2016.67 arXiv:1510.06689Google Scholar
Grey Ballard, Koby Hayashi, and Ramakrishnan Kannan. 2018. Parallel Nonnegative CP Decomposition of Dense Tensors. Technical Report 1806.07985. Retrieved from https://arxiv.org/abs/1806.07985.Google Scholar
Rafael Ballester-Ripoll, Peter Lindstrom, and Renato Pajarola. 2019. TTHRESH: Tensor compression for multidimensional visual data. IEEE Trans. Visual. Comput. Graph. (2019). DOI:https://doi.org/10.1109/TVCG.2019.2904063Google Scholar
Rafael Ballester-Ripoll and Renato Pajarola. 2015. Lossy volume compression using Tucker truncation and thresholding. Vis. Comput. 32 (May 2015), 1433--1446. DOI:https://doi.org/10.1007/s00371-015-1130-yGoogle Scholar
V. T. Chakaravarthy, J. W. Choi, D. J. Joseph, X. Liu, P. Murali, Y. Sabharwal, and D. Sreedhar. 2017. On optimizing distributed Tucker decomposition for dense tensors. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’17). 1038--1047. DOI:https://doi.org/10.1109/IPDPS.2017.86Google Scholar
Venkatesan Chakravarthy. 2017. Personal communication.Google Scholar
E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. 2007. Collective communication: Theory, practice, and experience. Concurr. Comput.: Pract. Exper. 19, 13 (2007), 1749--1783. DOI:https://doi.org/10.1002/cpe.1206Google ScholarDigital Library
J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo. 2009. Terascale direct numerical simulations of turbulent combustion using S3D. Comput. Sci. Discov. 2, 1 (2009), 015001. DOI:https://doi.org/10.1088/1749-4699/2/1/015001Google ScholarCross Ref
Jee Choi, Xing Liu, and Venkatesan Chakaravarthy. 2018. High-performance dense Tucker decomposition on GPU clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’18). IEEE Press, Piscataway, NJ. Retrieved from http://dl.acm.org/citation.cfm?id=3291656.3291712.Google ScholarDigital Library
Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 4 (2000), 1253--1278. DOI:https://doi.org/10.1137/S0895479896305696Google ScholarDigital Library
S. Di and F. Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’16). 730--739. DOI:https://doi.org/10.1109/IPDPS.2016.11Google Scholar
Nathaniel Fout, Kwan-Liu Ma, and James Ahrens. 2005. Time-varying, multivariate volume data reduction. In Proceedings of the ACM Symposium on Applied Computing (SAC’05). ACM, New York, NY, 1224--1230. DOI:https://doi.org/10.1145/1066677.1066953Google ScholarDigital Library
A. García-Magariño, S. Sor, and A. Velazquez. 2016. Data reduction method for droplet deformation experiments based on high order singular value decomposition. Exper. Therm. Fluid Sci. 79 (Dec. 2016), 13--24. DOI:https://doi.org/10.1016/j.expthermflusci.2016.06.017Google Scholar
Wolfgang Hackbusch. 2014. Numerical tensor calculus. Acta Numerica 23 (2014), 651--742. DOI:https://doi.org/10.1017/S0962492914000087Google ScholarCross Ref
D. R. Hatch, D. del Castillo-Negrete, and P. W. Terry. 2012. Analysis and compression of six-dimensional gyrokinetic datasets using higher order singular value decomposition. J. Comput. Phys. 231, 11 (June 2012), 4234--4256. DOI:https://doi.org/10.1016/j.jcp.2012.02.007Google ScholarDigital Library
Koby Hayashi, Grey Ballard, Yujie Jiang, and Michael J. Tobia. 2018. Shared-memory parallelization of MTTKRP for dense tensors. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’18). ACM, New York, NY, 393--394. DOI:https://doi.org/10.1145/3178487.3178522Google Scholar
A. Karami, M. Yazdi, and G. Mercier. 2012. Compression of hyperspectral images using discrete wavelet transform and Tucker decomposition. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 5, 2 (April 2012), 444--450. DOI:https://doi.org/10.1109/JSTARS.2012.2189200Google Scholar
O. Kaya and B. Uçar. 2016. High performance parallel algorithms for the Tucker decomposition of sparse tensors. In Proceedings of the 45th International Conference on Parallel Processing (ICPP’16). 103--112. DOI:https://doi.org/10.1109/ICPP.2016.19Google Scholar
Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Rev. 51, 3 (Sept. 2009), 455--500. DOI:https://doi.org/10.1137/07070111XGoogle ScholarDigital Library
Hemanth Kolla, Xin-Yu Zhao, Jacqueline H. Chen, and N. Swaminathan. 2016. Velocity and reactive scalar dissipation spectra in turbulent premixed flames. Combust. Sci. Technol. 188, 9 (2016), 1424--1439. DOI:https://doi.org/10.1080/00102202.2016.1197211Google ScholarCross Ref
Jiajia Li, Casey Battaglino, Ioakeim Perros, Jimeng Sun, and Richard Vuduc. 2015. An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY. DOI:https://doi.org/10.1145/2807591.2807671Google ScholarDigital Library
S. Li, K. Gruchalla, K. Potter, J. Clyne, and H. Childs. 2015. Evaluating the efficacy of wavelet configurations on turbulent-flow data. In Proceedings of the IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV’15). 81--89. DOI:https://doi.org/10.1109/LDAV.2015.7348075Google Scholar
P. Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE Trans. Visual. Comput. Graph. 20, 12 (2014), 2674--2683. DOI:https://doi.org/10.1109/TVCG.2014.2346458Google ScholarCross Ref
Sgouria Lyra, Benjamin Wilde, Hemanth Kolla, Jerry M. Seitzman, Timothy C. Lieuwen, and Jacqueline H. Chen. 2015. Structure of hydrogen-rich transverse jets in a vitiated turbulent flow. Combust. Flame 162, 4 (2015), 1234--1248. DOI:https://doi.org/10.1016/j.combustflame.2014.10.014Google ScholarCross Ref
Linjian Ma and Edgar Solomonik. 2018. Accelerating Alternating Least Squares for Tensor Decomposition by Pairwise Perturbation. Technical Report 1811.10573. Retrieved from https://arxiv.org/abs/1811.10573.Google Scholar
S. Oh, N. Park, S. Lee, and U. Kang. 2018. Scalable Tucker factorization for sparse tensors—Algorithms and discoveries. In Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE’18). 1120--1131. DOI:https://doi.org/10.1109/ICDE.2018.00104Google Scholar
Anh-Huy Phan, Petr Tichavsky, and Andrzej Cichocki. 2013. Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations. IEEE Trans. Signal Process. 61, 19 (Oct. 2013), 4834--4846. DOI:https://doi.org/10.1109/TSP.2013.2269903Google ScholarDigital Library
Shaden Smith and George Karypis. 2016. A medium-grained algorithm for distributed sparse tensor factorization. In Proceedings of the IEEE 30th International Parallel and Distributed Processing Symposium. 902--911. DOI:https://doi.org/10.1109/IPDPS.2016.113Google Scholar
Shaden Smith and George Karypis. 2017. Accelerating the Tucker decomposition with compressed sparse tensors. In Proceedings of the International European Conference on Parallel and Distributed Computing (Euro-Par’17), Francisco F. Rivera, Tomás F. Pena, and José C. Cabaleiro (Eds.). Springer International Publishing, Cham, 653--668. DOI:https://doi.org/10.1007/978-3-319-64203-1_47Google ScholarCross Ref
Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 1 (2005), 49--66. DOI:https://doi.org/10.1177/1094342005051521Google ScholarDigital Library
Ledyard R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31 (1966), 279--311. DOI:https://doi.org/10.1007/BF02289464Google ScholarCross Ref
Nick Vannieuwenhoven, Raf Vandebril, and Karl Meerbergen. 2012. A new truncation strategy for the higher-order singular value decomposition. SIAM J. Sci. Comput. 34, 2 (Jan. 2012), A1027--A1052. DOI:https://doi.org/10.1137/110836067Google ScholarDigital Library
M. A. O. Vasilescu and D. Terzopoulos. 2002. Multilinear analysis of image ensembles: TensorFaces. In Proceedings of the 7th European Conference on Computer Vision (ECCV’02) (Lecture Notes in Computer Science), Vol. 2350. Springer, 447--460. DOI:https://doi.org/10.1007/3-540-47969-4_30Google ScholarDigital Library

Index Terms

TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices
  2. Mathematical software
    1. Mathematical software performance

Recommendations

Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions

For many multi-dimensional data applications, tensor operations as well as relational operations both need to be supported throughout the data lifecycle. Tensor based representations (including two widely used tensor decompositions, CP and Tucker ...
Read More
Scalable Tensor Decompositions for Multi-aspect Data Mining
ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining

Modern applications such as Internet traffic, telecommunication records, and large-scale social networks generate massive amounts of data with multiple aspects and high dimensionalities. Tensors (i.e., multi-way arrays) provide a natural representation ...
Read More
Tensor Decompositions and Applications

This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or $N$-way array. Decompositions of higher-order tensors (i.e., $N$-way arrays with $N \geq 3$) have ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Mathematical Software Volume 46, Issue 2
June 2020
274 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3401021
Editors:
Zhaojun Bai
University of California at Davis, USA
,
Wolfgang Bangerth
Colorado State University, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2020
- Online AM: 7 May 2020
- Revised: 1 January 2020
- Accepted: 1 January 2020
- Received: 1 January 2019
Published in toms Volume 46, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Tucker decomposition
higher-order singular value decomposition (HOSVD)
tensor decomposition
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 484
  Total Downloads
- Downloads (Last 12 months)98
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition

ACM Transactions on Mathematical Software

Abstract

References

Cited By

Index Terms

Recommendations

Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions

Scalable Tensor Decompositions for Multi-aspect Data Mining

Tensor Decompositions and Applications