Abstract
Our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software package for compressing distributed data. The approach is based on treating the data as a tensor, i.e., a multidimensional array, and computing its truncated Tucker decomposition, a higher-order analogue to the truncated singular value decomposition of a matrix. The result is a low-rank approximation of the original tensor-structured data. Compression efficiency is achieved by detecting latent global structure within the data, which we contrast to most compression methods that are focused on local structure. In this work, we describe TuckerMPI, our implementation of the truncated Tucker decomposition, including details of the data distribution and in-memory layouts, the parallel and serial implementations of the key kernels, and analysis of the storage, communication, and computational costs. We test the software on 4.5 and 6.7 terabyte data sets distributed across 100 s of nodes (1,000 s of MPI processes), achieving compression ratios between 100 and 200,000×, which equates to 99--99.999% compression (depending on the desired accuracy) in substantially less time than it would take to even read the same dataset from a parallel file system. Moreover, we show that our method also allows for reconstruction of partial or down-sampled data on a single node, without a parallel computer so long as the reconstructed portion is small enough to fit on a single machine, e.g., in the instance of reconstructing/visualizing a single down-sampled time step or computing summary statistics. The code is available at https://gitlab.com/tensors/TuckerMPI.
- S. Afra, E. Gildin, and M. Tarrahi. 2014. Heterogeneous reservoir characterization using efficient parameterization through higher order SVD (HOSVD). In Proceedings of the American Control Conference. 147--152. DOI:https://doi.org/10.1109/ACC.2014.6859246Google Scholar
- Woody Austin, Grey Ballard, and Tamara G. Kolda. 2016. Parallel tensor compression for large-scale scientific data. In Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS’16). 912--922. DOI:https://doi.org/10.1109/IPDPS.2016.67 arXiv:1510.06689Google Scholar
- Grey Ballard, Koby Hayashi, and Ramakrishnan Kannan. 2018. Parallel Nonnegative CP Decomposition of Dense Tensors. Technical Report 1806.07985. Retrieved from https://arxiv.org/abs/1806.07985.Google Scholar
- Rafael Ballester-Ripoll, Peter Lindstrom, and Renato Pajarola. 2019. TTHRESH: Tensor compression for multidimensional visual data. IEEE Trans. Visual. Comput. Graph. (2019). DOI:https://doi.org/10.1109/TVCG.2019.2904063Google Scholar
- Rafael Ballester-Ripoll and Renato Pajarola. 2015. Lossy volume compression using Tucker truncation and thresholding. Vis. Comput. 32 (May 2015), 1433--1446. DOI:https://doi.org/10.1007/s00371-015-1130-yGoogle Scholar
- V. T. Chakaravarthy, J. W. Choi, D. J. Joseph, X. Liu, P. Murali, Y. Sabharwal, and D. Sreedhar. 2017. On optimizing distributed Tucker decomposition for dense tensors. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’17). 1038--1047. DOI:https://doi.org/10.1109/IPDPS.2017.86Google Scholar
- Venkatesan Chakravarthy. 2017. Personal communication.Google Scholar
- E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. 2007. Collective communication: Theory, practice, and experience. Concurr. Comput.: Pract. Exper. 19, 13 (2007), 1749--1783. DOI:https://doi.org/10.1002/cpe.1206Google ScholarDigital Library
- J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo. 2009. Terascale direct numerical simulations of turbulent combustion using S3D. Comput. Sci. Discov. 2, 1 (2009), 015001. DOI:https://doi.org/10.1088/1749-4699/2/1/015001Google ScholarCross Ref
- Jee Choi, Xing Liu, and Venkatesan Chakaravarthy. 2018. High-performance dense Tucker decomposition on GPU clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’18). IEEE Press, Piscataway, NJ. Retrieved from http://dl.acm.org/citation.cfm?id=3291656.3291712.Google ScholarDigital Library
- Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 4 (2000), 1253--1278. DOI:https://doi.org/10.1137/S0895479896305696Google ScholarDigital Library
- S. Di and F. Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’16). 730--739. DOI:https://doi.org/10.1109/IPDPS.2016.11Google Scholar
- Nathaniel Fout, Kwan-Liu Ma, and James Ahrens. 2005. Time-varying, multivariate volume data reduction. In Proceedings of the ACM Symposium on Applied Computing (SAC’05). ACM, New York, NY, 1224--1230. DOI:https://doi.org/10.1145/1066677.1066953Google ScholarDigital Library
- A. García-Magariño, S. Sor, and A. Velazquez. 2016. Data reduction method for droplet deformation experiments based on high order singular value decomposition. Exper. Therm. Fluid Sci. 79 (Dec. 2016), 13--24. DOI:https://doi.org/10.1016/j.expthermflusci.2016.06.017Google Scholar
- Wolfgang Hackbusch. 2014. Numerical tensor calculus. Acta Numerica 23 (2014), 651--742. DOI:https://doi.org/10.1017/S0962492914000087Google ScholarCross Ref
- D. R. Hatch, D. del Castillo-Negrete, and P. W. Terry. 2012. Analysis and compression of six-dimensional gyrokinetic datasets using higher order singular value decomposition. J. Comput. Phys. 231, 11 (June 2012), 4234--4256. DOI:https://doi.org/10.1016/j.jcp.2012.02.007Google ScholarDigital Library
- Koby Hayashi, Grey Ballard, Yujie Jiang, and Michael J. Tobia. 2018. Shared-memory parallelization of MTTKRP for dense tensors. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’18). ACM, New York, NY, 393--394. DOI:https://doi.org/10.1145/3178487.3178522Google Scholar
- A. Karami, M. Yazdi, and G. Mercier. 2012. Compression of hyperspectral images using discrete wavelet transform and Tucker decomposition. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 5, 2 (April 2012), 444--450. DOI:https://doi.org/10.1109/JSTARS.2012.2189200Google Scholar
- O. Kaya and B. Uçar. 2016. High performance parallel algorithms for the Tucker decomposition of sparse tensors. In Proceedings of the 45th International Conference on Parallel Processing (ICPP’16). 103--112. DOI:https://doi.org/10.1109/ICPP.2016.19Google Scholar
- Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Rev. 51, 3 (Sept. 2009), 455--500. DOI:https://doi.org/10.1137/07070111XGoogle ScholarDigital Library
- Hemanth Kolla, Xin-Yu Zhao, Jacqueline H. Chen, and N. Swaminathan. 2016. Velocity and reactive scalar dissipation spectra in turbulent premixed flames. Combust. Sci. Technol. 188, 9 (2016), 1424--1439. DOI:https://doi.org/10.1080/00102202.2016.1197211Google ScholarCross Ref
- Jiajia Li, Casey Battaglino, Ioakeim Perros, Jimeng Sun, and Richard Vuduc. 2015. An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY. DOI:https://doi.org/10.1145/2807591.2807671Google ScholarDigital Library
- S. Li, K. Gruchalla, K. Potter, J. Clyne, and H. Childs. 2015. Evaluating the efficacy of wavelet configurations on turbulent-flow data. In Proceedings of the IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV’15). 81--89. DOI:https://doi.org/10.1109/LDAV.2015.7348075Google Scholar
- P. Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE Trans. Visual. Comput. Graph. 20, 12 (2014), 2674--2683. DOI:https://doi.org/10.1109/TVCG.2014.2346458Google ScholarCross Ref
- Sgouria Lyra, Benjamin Wilde, Hemanth Kolla, Jerry M. Seitzman, Timothy C. Lieuwen, and Jacqueline H. Chen. 2015. Structure of hydrogen-rich transverse jets in a vitiated turbulent flow. Combust. Flame 162, 4 (2015), 1234--1248. DOI:https://doi.org/10.1016/j.combustflame.2014.10.014Google ScholarCross Ref
- Linjian Ma and Edgar Solomonik. 2018. Accelerating Alternating Least Squares for Tensor Decomposition by Pairwise Perturbation. Technical Report 1811.10573. Retrieved from https://arxiv.org/abs/1811.10573.Google Scholar
- S. Oh, N. Park, S. Lee, and U. Kang. 2018. Scalable Tucker factorization for sparse tensors—Algorithms and discoveries. In Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE’18). 1120--1131. DOI:https://doi.org/10.1109/ICDE.2018.00104Google Scholar
- Anh-Huy Phan, Petr Tichavsky, and Andrzej Cichocki. 2013. Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations. IEEE Trans. Signal Process. 61, 19 (Oct. 2013), 4834--4846. DOI:https://doi.org/10.1109/TSP.2013.2269903Google ScholarDigital Library
- Shaden Smith and George Karypis. 2016. A medium-grained algorithm for distributed sparse tensor factorization. In Proceedings of the IEEE 30th International Parallel and Distributed Processing Symposium. 902--911. DOI:https://doi.org/10.1109/IPDPS.2016.113Google Scholar
- Shaden Smith and George Karypis. 2017. Accelerating the Tucker decomposition with compressed sparse tensors. In Proceedings of the International European Conference on Parallel and Distributed Computing (Euro-Par’17), Francisco F. Rivera, Tomás F. Pena, and José C. Cabaleiro (Eds.). Springer International Publishing, Cham, 653--668. DOI:https://doi.org/10.1007/978-3-319-64203-1_47Google ScholarCross Ref
- Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 1 (2005), 49--66. DOI:https://doi.org/10.1177/1094342005051521Google ScholarDigital Library
- Ledyard R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31 (1966), 279--311. DOI:https://doi.org/10.1007/BF02289464Google ScholarCross Ref
- Nick Vannieuwenhoven, Raf Vandebril, and Karl Meerbergen. 2012. A new truncation strategy for the higher-order singular value decomposition. SIAM J. Sci. Comput. 34, 2 (Jan. 2012), A1027--A1052. DOI:https://doi.org/10.1137/110836067Google ScholarDigital Library
- M. A. O. Vasilescu and D. Terzopoulos. 2002. Multilinear analysis of image ensembles: TensorFaces. In Proceedings of the 7th European Conference on Computer Vision (ECCV’02) (Lecture Notes in Computer Science), Vol. 2350. Springer, 447--460. DOI:https://doi.org/10.1007/3-540-47969-4_30Google ScholarDigital Library
Index Terms
- TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition
Recommendations
Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions
For many multi-dimensional data applications, tensor operations as well as relational operations both need to be supported throughout the data lifecycle. Tensor based representations (including two widely used tensor decompositions, CP and Tucker ...
Scalable Tensor Decompositions for Multi-aspect Data Mining
ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data MiningModern applications such as Internet traffic, telecommunication records, and large-scale social networks generate massive amounts of data with multiple aspects and high dimensionalities. Tensors (i.e., multi-way arrays) provide a natural representation ...
Tensor Decompositions and Applications
This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or $N$-way array. Decompositions of higher-order tensors (i.e., $N$-way arrays with $N \geq 3$) have ...
Comments