skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

The tensor algebra compiler

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save memory. Programmers are left to write kernels for every operation of interest, with different mixes of dense and sparse tensors in different formats. The combinations are infinite, which makes it impossible to manually implement and optimize them all. This paper introduces the first compiler technique to automatically generate kernels for any compound tensor algebra operation on dense and sparse tensors. The technique is implemented in a C++ library called taco. Its performance is competitive with best-in-class hand-optimized kernels in popular libraries, while supporting far more tensor operations.

Skip Supplemental Material Section

Supplemental Material

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Berkeley, CA, USA, 265–283. http://dl.acm.org/citation.cfm?id=3026877. 3026899Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. 2014. Tensor Decompositions for Learning Latent Variable Models. J. Mach. Learn. Res. 15, Article 1 (Jan. 2014), 60 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. 1999. LAPACK Users’ Guide (third ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA. Google ScholarGoogle ScholarCross RefCross Ref
  4. Gilad Arnold. 2011. Data-Parallel Language for Correct and Efficient Sparse Matrix Codes. Ph.D. Dissertation. University of California, Berkeley.Google ScholarGoogle Scholar
  5. Alexander A. Auer, Gerald Baumgartner, David E. Bernholdt, Alina Bibireata, Venkatesh Choppella, Daniel Cociorva, Xiaoyang Gao, Robert Harrison, Sriram Krishnamoorthy, Sandhya Krishnan, Chi-Chung Lam, Qingda Lu, Marcel Nooijen, Russell Pitzer, J. Ramanujam, P. Sadayappan, and Alexander Sibiryakov. 2006. Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics 104, 2 (2006), 211–228. Google ScholarGoogle Scholar
  6. Brett W. Bader, Michael W. Berry, and Murray Browne. 2008. Discussion Tracking in Enron Email Using PARAFAC. Springer London, 147–163.Google ScholarGoogle Scholar
  7. Brett W Bader and Tamara G Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30, 1 (2007), 205–231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Satish Balay, William D Gropp, Lois Curfman McInnes, and Barry F Smith. 1997. Efficient management of parallelism in object-oriented numerical software libraries. In Modern software tools for scientific computing. Springer, Birkhäuser Boston, 163–202.Google ScholarGoogle Scholar
  9. Muthu Baskaran, Benoît Meister, Nicolas Vasilache, and Richard Lethin. 2012. Efficient and scalable computations with sparse tensors. In High Performance Extreme Computing (HPEC), 2012 IEEE Conference on. IEEE, 1–6. Google ScholarGoogle ScholarCross RefCross Ref
  10. James Bennett, Stan Lanning, et al. 2007. The netflix prize. In Proceedings of KDD cup and workshop, Vol. 2007. ACM, New York, 35.Google ScholarGoogle Scholar
  11. Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman. 2012. Julia: A Fast Dynamic Language for Technical Computing. (2012).Google ScholarGoogle Scholar
  12. Aart JC Bik and Harry AG Wijshoff. 1993. Compilation techniques for sparse matrix computations. In Proceedings of the 7th international conference on Supercomputing. ACM, 416–424.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Aart JC Bik and Harry AG Wijshoff. 1994. On automatic data structure selection and code generation for sparse computations. In Languages and Compilers for Parallel Computing. Springer, 57–75.Google ScholarGoogle Scholar
  14. Aydin Buluc and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In IEEE International Symposium on Parallel and Distributed Processing, (IPDPS). 1–11. Google ScholarGoogle ScholarCross RefCross Ref
  15. Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. 2009. Parallel Sparse Matrixvector and Matrix-transpose-vector Multiplication Using Compressed Sparse Blocks. In Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09). ACM, New York, NY, USA, 233–244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jong-Ho Byun, Richard Lin, Katherine A Yelick, and James Demmel. 2012. Autotuning sparse matrix-vector multiplication for multicore. EECS, UC Berkeley, Tech. Rep (2012).Google ScholarGoogle Scholar
  17. Jonathon Cai, Muthu Baskaran, Benoît Meister, and Richard Lethin. 2015. Optimization of symmetric tensor computations. In High Performance Extreme Computing Conference (HPEC), 2015 IEEE. IEEE, 1–7. Google ScholarGoogle ScholarCross RefCross Ref
  18. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Albert. Einstein. 1916. The Foundation of the General Theory of Relativity. Annalen der Physik 354 (1916), 769–822.Google ScholarGoogle ScholarCross RefCross Ref
  20. Evgeny Epifanovsky, Michael Wormit, Tomasz Kuś, Arie Landau, Dmitry Zuev, Kirill Khistyaev, Prashant Manohar, Ilya Kaliman, Andreas Dreuw, and Anna I Krylov. 2013. New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations. Journal of computational chemistry 34, 26 (2013), 2293–2309. Google ScholarGoogle ScholarCross RefCross Ref
  21. Richard Feynman, Robert B. Leighton, and Matthew L. Sands. 1963. The Feynman Lectures on Physics. Vol. 3. Addison-Wesley.Google ScholarGoogle Scholar
  22. Google. 2017. TensorFlow Sparse Tensors. https://www.tensorflow.org/api_guides/python/sparse_ops . (2017).Google ScholarGoogle Scholar
  23. Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. http://eigen.tuxfamily.org . (2010).Google ScholarGoogle Scholar
  24. Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw. 4, 3 (1978). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Intel. 2012. Intel math kernel library reference manual. Technical Report. 630813-051US, 2012. http://software.intel.com/ sites/products/documentation/hpc/mkl/mklman/mklman.pdf .Google ScholarGoogle Scholar
  26. Kenneth E. Iverson. 1962. A Programming Language. Wiley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Oguz Kaya and Bora Uçar. 2015. Scalable sparse tensor decompositions in distributed memory systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-Kelley, David Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, Danny Kaufman, Gurtej Kanwar, Wojciech Matusik, and Saman Amarasinghe. 2016. Simit: A Language for Physical Simulation. ACM Trans. Graphics (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Donald Ervin Knuth. 1973. The art of computer programming: sorting and searching. Vol. 3. Pearson Education.Google ScholarGoogle Scholar
  30. Joseph C Kolecki. 2002. An Introduction to Tensors for Students of Physics and Engineering. Unixenguaedu 7, September (2002), 29.Google ScholarGoogle Scholar
  31. Vladimir Kotlyar. 1999. Relational Algebraic Techniques for the Synthesis of Sparse Matrix Programs. Ph.D. Dissertation. Cornell.Google ScholarGoogle Scholar
  32. Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In Euro-Par’97 Parallel Processing. Springer, 318–327. Google ScholarGoogle ScholarCross RefCross Ref
  33. Jiajia Li, Casey Battaglino, Ioakeim Perros, Jimeng Sun, and Richard Vuduc. 2015. An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jiajia Li, Yuchen Ma, Chenggang Yan, and Richard Vuduc. 2016. Optimizing sparse tensor times matrix on multi-core and many-core architectures. In Proceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms. IEEE Press, 26–33. Google ScholarGoogle ScholarCross RefCross Ref
  35. MATLAB. 2014. version 8.3.0 (R2014a). The MathWorks Inc., Natick, Massachusetts.Google ScholarGoogle Scholar
  36. Devin Matthews. 2017. High-Performance Tensor Contraction without Transposition. Technical Report.Google ScholarGoogle Scholar
  37. Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 165–172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kathryn S McKinley, Steve Carr, and Chau-Wen Tseng. 1996. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems (TOPLAS) 18, 4 (1996), 424–453.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. John Michael McNamee. 1971. Algorithm 408: a sparse matrix package (part I)[F4]. Commun. ACM 14, 4 (1971), 265–273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Lenore Mullin and James Raynolds. 2014. Scalable, Portable, Verifiable Kronecker Products on Multi-scale Computers. Springer International Publishing, Cham, 111–129. Google ScholarGoogle ScholarCross RefCross Ref
  41. L. M. R. Mullin. 1988. A Mathematics of Arrays. Ph.D. Dissertation. Syracuse University.Google ScholarGoogle Scholar
  42. Thomas Nelson, Geoffrey Belter, Jeremy G. Siek, Elizabeth Jessup, and Boyana Norris. 2015. Reliable Generation of High-Performance Matrix Algebra. ACM Trans. Math. Softw. 41, 3, Article 18 (June 2015), 27 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. William Pugh and Tatiana Shpeisman. 1999. SIPR: A new framework for generating efficient code for sparse matrix computations. In Languages and Compilers for Parallel Computing. Springer, 213–229. Google ScholarGoogle ScholarCross RefCross Ref
  44. Yves Renard. 2017. Gmm++. (2017). http://download.gna.org/getfem/html/homepage/gmm/first-step.htmlGoogle ScholarGoogle Scholar
  45. Gregorio Ricci-Curbastro and Tullio Levi-Civita. 1901. Méthodes de calcul différentiel absolu et leurs applications. Math. Ann. 54 (1901).Google ScholarGoogle Scholar
  46. Hongbo Rong, Jongsoo Park, Lingxiang Xiang, Todd A. Anderson, and Mikhail Smelyanskiy. 2016. Sparso: Context-driven Optimizations of Sparse Linear Algebra. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. ACM, 247–259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Conrad Sanderson. 2010. Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments. Technical Report. NICTA.Google ScholarGoogle Scholar
  48. Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. (2017). http://frostt.io/Google ScholarGoogle Scholar
  49. Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. ACM, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Shaden Smith, Niranjay Ravindran, Nicholas Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 61–70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Edgar Solomonik and Torsten Hoefler. 2015. Sparse Tensor Algebra as a Parallel Programming Model. arXiv preprint arXiv:1512.00066 (2015).Google ScholarGoogle Scholar
  52. Edgar Solomonik, Devin Matthews, Jeff R Hammond, John F Stanton, and James Demmel. 2014. A massively parallel tensor contraction framework for coupled-cluster computations. J. Parallel and Distrib. Comput. 74, 12 (2014), 3176–3190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Daniele G Spampinato and Markus Püschel. 2014. A basic linear algebra compiler. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Paul Springer and Paolo Bientinesi. 2016. Design of a high-performance GEMM-like Tensor-Tensor Multiplication. arXiv preprint arXiv:1607.00145 (2016).Google ScholarGoogle Scholar
  55. Paul Stodghill. 1997. A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes. Ph.D. Dissertation. Cornell.Google ScholarGoogle Scholar
  56. Scott Thibault, Lenore Mullin, and Matt Insall. 1994. Generating Indexing Functions of Regularly Sparse Arrays for Array Compilers. (1994).Google ScholarGoogle Scholar
  57. William F Tinney and John W Walker. 1967. Direct solutions of sparse network equations by optimally ordered triangular factorization. Proc. IEEE 55, 11 (1967), 1801–1809. Google ScholarGoogle ScholarCross RefCross Ref
  58. Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering 13, 2 (2011), 22–30.Google ScholarGoogle Scholar
  59. Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015). 521–532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Anand Venkat, Mahdi Soltan Mohammadi, Hongbo Rong, Rajkishore Barik, Jongsoo Park, Michelle Mills Strout, and Mary Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In In Supercomputing (SC).Google ScholarGoogle Scholar
  61. Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the Evolution of User Interaction in Facebook. In Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Richard Vuduc, James W. Demmel, and Katherine A. Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series 16, 1 (2005), 521+. Google ScholarGoogle ScholarCross RefCross Ref
  63. Joerg Walter and Mathias Koch. 2007. uBLAS. (2007). http://www.boost.org/libs/numeric/ublas/doc/index.htmGoogle ScholarGoogle Scholar
  64. R. Clint Whaley and Jack Dongarra. 1998. Automatically Tuned Linear Algebra Software. In SuperComputing 1998: High Performance Networking and Computing. Google ScholarGoogle ScholarCross RefCross Ref
  65. Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2007. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC ’07). New York, NY, USA, 38:1–38:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Michael E. Wolf and Monica S. Lam. 1991. A Data Locality Optimizing Algorithm. SIGPLAN Not. 26, 6 (May 1991), 30–44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Michael Joseph Wolfe. 1982. Optimizing Supercompilers for Supercomputers. Ph.D. Dissertation. University of Illinois at Urbana-Champaign, Champaign, IL, USA. AAI8303027.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Huasha Zhao. 2014. High Performance Machine Learning through Codesign and Rooflining. Ph.D. Dissertation. EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar

Index Terms

  1. The tensor algebra compiler

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader