skip to main content
research-article

TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition

Published:01 June 2020Publication History
Skip Abstract Section

Abstract

Our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software package for compressing distributed data. The approach is based on treating the data as a tensor, i.e., a multidimensional array, and computing its truncated Tucker decomposition, a higher-order analogue to the truncated singular value decomposition of a matrix. The result is a low-rank approximation of the original tensor-structured data. Compression efficiency is achieved by detecting latent global structure within the data, which we contrast to most compression methods that are focused on local structure. In this work, we describe TuckerMPI, our implementation of the truncated Tucker decomposition, including details of the data distribution and in-memory layouts, the parallel and serial implementations of the key kernels, and analysis of the storage, communication, and computational costs. We test the software on 4.5 and 6.7 terabyte data sets distributed across 100 s of nodes (1,000 s of MPI processes), achieving compression ratios between 100 and 200,000×, which equates to 99--99.999% compression (depending on the desired accuracy) in substantially less time than it would take to even read the same dataset from a parallel file system. Moreover, we show that our method also allows for reconstruction of partial or down-sampled data on a single node, without a parallel computer so long as the reconstructed portion is small enough to fit on a single machine, e.g., in the instance of reconstructing/visualizing a single down-sampled time step or computing summary statistics. The code is available at https://gitlab.com/tensors/TuckerMPI.

References

  1. S. Afra, E. Gildin, and M. Tarrahi. 2014. Heterogeneous reservoir characterization using efficient parameterization through higher order SVD (HOSVD). In Proceedings of the American Control Conference. 147--152. DOI:https://doi.org/10.1109/ACC.2014.6859246Google ScholarGoogle Scholar
  2. Woody Austin, Grey Ballard, and Tamara G. Kolda. 2016. Parallel tensor compression for large-scale scientific data. In Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS’16). 912--922. DOI:https://doi.org/10.1109/IPDPS.2016.67 arXiv:1510.06689Google ScholarGoogle Scholar
  3. Grey Ballard, Koby Hayashi, and Ramakrishnan Kannan. 2018. Parallel Nonnegative CP Decomposition of Dense Tensors. Technical Report 1806.07985. Retrieved from https://arxiv.org/abs/1806.07985.Google ScholarGoogle Scholar
  4. Rafael Ballester-Ripoll, Peter Lindstrom, and Renato Pajarola. 2019. TTHRESH: Tensor compression for multidimensional visual data. IEEE Trans. Visual. Comput. Graph. (2019). DOI:https://doi.org/10.1109/TVCG.2019.2904063Google ScholarGoogle Scholar
  5. Rafael Ballester-Ripoll and Renato Pajarola. 2015. Lossy volume compression using Tucker truncation and thresholding. Vis. Comput. 32 (May 2015), 1433--1446. DOI:https://doi.org/10.1007/s00371-015-1130-yGoogle ScholarGoogle Scholar
  6. V. T. Chakaravarthy, J. W. Choi, D. J. Joseph, X. Liu, P. Murali, Y. Sabharwal, and D. Sreedhar. 2017. On optimizing distributed Tucker decomposition for dense tensors. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’17). 1038--1047. DOI:https://doi.org/10.1109/IPDPS.2017.86Google ScholarGoogle Scholar
  7. Venkatesan Chakravarthy. 2017. Personal communication.Google ScholarGoogle Scholar
  8. E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. 2007. Collective communication: Theory, practice, and experience. Concurr. Comput.: Pract. Exper. 19, 13 (2007), 1749--1783. DOI:https://doi.org/10.1002/cpe.1206Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo. 2009. Terascale direct numerical simulations of turbulent combustion using S3D. Comput. Sci. Discov. 2, 1 (2009), 015001. DOI:https://doi.org/10.1088/1749-4699/2/1/015001Google ScholarGoogle ScholarCross RefCross Ref
  10. Jee Choi, Xing Liu, and Venkatesan Chakaravarthy. 2018. High-performance dense Tucker decomposition on GPU clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’18). IEEE Press, Piscataway, NJ. Retrieved from http://dl.acm.org/citation.cfm?id=3291656.3291712.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 4 (2000), 1253--1278. DOI:https://doi.org/10.1137/S0895479896305696Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Di and F. Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’16). 730--739. DOI:https://doi.org/10.1109/IPDPS.2016.11Google ScholarGoogle Scholar
  13. Nathaniel Fout, Kwan-Liu Ma, and James Ahrens. 2005. Time-varying, multivariate volume data reduction. In Proceedings of the ACM Symposium on Applied Computing (SAC’05). ACM, New York, NY, 1224--1230. DOI:https://doi.org/10.1145/1066677.1066953Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. García-Magariño, S. Sor, and A. Velazquez. 2016. Data reduction method for droplet deformation experiments based on high order singular value decomposition. Exper. Therm. Fluid Sci. 79 (Dec. 2016), 13--24. DOI:https://doi.org/10.1016/j.expthermflusci.2016.06.017Google ScholarGoogle Scholar
  15. Wolfgang Hackbusch. 2014. Numerical tensor calculus. Acta Numerica 23 (2014), 651--742. DOI:https://doi.org/10.1017/S0962492914000087Google ScholarGoogle ScholarCross RefCross Ref
  16. D. R. Hatch, D. del Castillo-Negrete, and P. W. Terry. 2012. Analysis and compression of six-dimensional gyrokinetic datasets using higher order singular value decomposition. J. Comput. Phys. 231, 11 (June 2012), 4234--4256. DOI:https://doi.org/10.1016/j.jcp.2012.02.007Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Koby Hayashi, Grey Ballard, Yujie Jiang, and Michael J. Tobia. 2018. Shared-memory parallelization of MTTKRP for dense tensors. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’18). ACM, New York, NY, 393--394. DOI:https://doi.org/10.1145/3178487.3178522Google ScholarGoogle Scholar
  18. A. Karami, M. Yazdi, and G. Mercier. 2012. Compression of hyperspectral images using discrete wavelet transform and Tucker decomposition. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 5, 2 (April 2012), 444--450. DOI:https://doi.org/10.1109/JSTARS.2012.2189200Google ScholarGoogle Scholar
  19. O. Kaya and B. Uçar. 2016. High performance parallel algorithms for the Tucker decomposition of sparse tensors. In Proceedings of the 45th International Conference on Parallel Processing (ICPP’16). 103--112. DOI:https://doi.org/10.1109/ICPP.2016.19Google ScholarGoogle Scholar
  20. Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Rev. 51, 3 (Sept. 2009), 455--500. DOI:https://doi.org/10.1137/07070111XGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hemanth Kolla, Xin-Yu Zhao, Jacqueline H. Chen, and N. Swaminathan. 2016. Velocity and reactive scalar dissipation spectra in turbulent premixed flames. Combust. Sci. Technol. 188, 9 (2016), 1424--1439. DOI:https://doi.org/10.1080/00102202.2016.1197211Google ScholarGoogle ScholarCross RefCross Ref
  22. Jiajia Li, Casey Battaglino, Ioakeim Perros, Jimeng Sun, and Richard Vuduc. 2015. An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY. DOI:https://doi.org/10.1145/2807591.2807671Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Li, K. Gruchalla, K. Potter, J. Clyne, and H. Childs. 2015. Evaluating the efficacy of wavelet configurations on turbulent-flow data. In Proceedings of the IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV’15). 81--89. DOI:https://doi.org/10.1109/LDAV.2015.7348075Google ScholarGoogle Scholar
  24. P. Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE Trans. Visual. Comput. Graph. 20, 12 (2014), 2674--2683. DOI:https://doi.org/10.1109/TVCG.2014.2346458Google ScholarGoogle ScholarCross RefCross Ref
  25. Sgouria Lyra, Benjamin Wilde, Hemanth Kolla, Jerry M. Seitzman, Timothy C. Lieuwen, and Jacqueline H. Chen. 2015. Structure of hydrogen-rich transverse jets in a vitiated turbulent flow. Combust. Flame 162, 4 (2015), 1234--1248. DOI:https://doi.org/10.1016/j.combustflame.2014.10.014Google ScholarGoogle ScholarCross RefCross Ref
  26. Linjian Ma and Edgar Solomonik. 2018. Accelerating Alternating Least Squares for Tensor Decomposition by Pairwise Perturbation. Technical Report 1811.10573. Retrieved from https://arxiv.org/abs/1811.10573.Google ScholarGoogle Scholar
  27. S. Oh, N. Park, S. Lee, and U. Kang. 2018. Scalable Tucker factorization for sparse tensors—Algorithms and discoveries. In Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE’18). 1120--1131. DOI:https://doi.org/10.1109/ICDE.2018.00104Google ScholarGoogle Scholar
  28. Anh-Huy Phan, Petr Tichavsky, and Andrzej Cichocki. 2013. Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations. IEEE Trans. Signal Process. 61, 19 (Oct. 2013), 4834--4846. DOI:https://doi.org/10.1109/TSP.2013.2269903Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shaden Smith and George Karypis. 2016. A medium-grained algorithm for distributed sparse tensor factorization. In Proceedings of the IEEE 30th International Parallel and Distributed Processing Symposium. 902--911. DOI:https://doi.org/10.1109/IPDPS.2016.113Google ScholarGoogle Scholar
  30. Shaden Smith and George Karypis. 2017. Accelerating the Tucker decomposition with compressed sparse tensors. In Proceedings of the International European Conference on Parallel and Distributed Computing (Euro-Par’17), Francisco F. Rivera, Tomás F. Pena, and José C. Cabaleiro (Eds.). Springer International Publishing, Cham, 653--668. DOI:https://doi.org/10.1007/978-3-319-64203-1_47Google ScholarGoogle ScholarCross RefCross Ref
  31. Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 1 (2005), 49--66. DOI:https://doi.org/10.1177/1094342005051521Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ledyard R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31 (1966), 279--311. DOI:https://doi.org/10.1007/BF02289464Google ScholarGoogle ScholarCross RefCross Ref
  33. Nick Vannieuwenhoven, Raf Vandebril, and Karl Meerbergen. 2012. A new truncation strategy for the higher-order singular value decomposition. SIAM J. Sci. Comput. 34, 2 (Jan. 2012), A1027--A1052. DOI:https://doi.org/10.1137/110836067Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. A. O. Vasilescu and D. Terzopoulos. 2002. Multilinear analysis of image ensembles: TensorFaces. In Proceedings of the 7th European Conference on Computer Vision (ECCV’02) (Lecture Notes in Computer Science), Vol. 2350. Springer, 447--460. DOI:https://doi.org/10.1007/3-540-47969-4_30Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Mathematical Software
        ACM Transactions on Mathematical Software  Volume 46, Issue 2
        June 2020
        274 pages
        ISSN:0098-3500
        EISSN:1557-7295
        DOI:10.1145/3401021
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 2020
        • Online AM: 7 May 2020
        • Revised: 1 January 2020
        • Accepted: 1 January 2020
        • Received: 1 January 2019
        Published in toms Volume 46, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format