research-article

Open Access

The tensor algebra compiler

Authors:
Fredrik Kjolstad

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Shoaib Kamil

Adobe, USA

Adobe, USA
View Profile

,
Stephen Chou

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
David Lugato

CEA, France

CEA, France
View Profile

,
Saman Amarasinghe

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

Proceedings of the ACM on Programming Languages Volume 1 Issue OOPSLAArticle No.: 77pp 1–29https://doi.org/10.1145/3133901

Published:12 October 2017Publication History

Proceedings of the ACM on Programming Languages

Abstract

Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save memory. Programmers are left to write kernels for every operation of interest, with different mixes of dense and sparse tensors in different formats. The combinations are infinite, which makes it impossible to manually implement and optimize them all. This paper introduces the first compiler technique to automatically generate kernels for any compound tensor algebra operation on dense and sparse tensors. The technique is implemented in a C++ library called taco. Its performance is competitive with best-in-class hand-optimized kernels in popular libraries, while supporting far more tensor operations.

Supplemental Material

Available for Download

zip

Auxiliary Archive (26.2 MB)

References

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Berkeley, CA, USA, 265–283. http://dl.acm.org/citation.cfm?id=3026877. 3026899Google ScholarDigital Library
Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. 2014. Tensor Decompositions for Learning Latent Variable Models. J. Mach. Learn. Res. 15, Article 1 (Jan. 2014), 60 pages.Google ScholarDigital Library
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. 1999. LAPACK Users’ Guide (third ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA. Google ScholarCross Ref
Gilad Arnold. 2011. Data-Parallel Language for Correct and Efficient Sparse Matrix Codes. Ph.D. Dissertation. University of California, Berkeley.Google Scholar
Alexander A. Auer, Gerald Baumgartner, David E. Bernholdt, Alina Bibireata, Venkatesh Choppella, Daniel Cociorva, Xiaoyang Gao, Robert Harrison, Sriram Krishnamoorthy, Sandhya Krishnan, Chi-Chung Lam, Qingda Lu, Marcel Nooijen, Russell Pitzer, J. Ramanujam, P. Sadayappan, and Alexander Sibiryakov. 2006. Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics 104, 2 (2006), 211–228. Google Scholar
Brett W. Bader, Michael W. Berry, and Murray Browne. 2008. Discussion Tracking in Enron Email Using PARAFAC. Springer London, 147–163.Google Scholar
Brett W Bader and Tamara G Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30, 1 (2007), 205–231.Google ScholarDigital Library
Satish Balay, William D Gropp, Lois Curfman McInnes, and Barry F Smith. 1997. Efficient management of parallelism in object-oriented numerical software libraries. In Modern software tools for scientific computing. Springer, Birkhäuser Boston, 163–202.Google Scholar
Muthu Baskaran, Benoît Meister, Nicolas Vasilache, and Richard Lethin. 2012. Efficient and scalable computations with sparse tensors. In High Performance Extreme Computing (HPEC), 2012 IEEE Conference on. IEEE, 1–6. Google ScholarCross Ref
James Bennett, Stan Lanning, et al. 2007. The netflix prize. In Proceedings of KDD cup and workshop, Vol. 2007. ACM, New York, 35.Google Scholar
Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman. 2012. Julia: A Fast Dynamic Language for Technical Computing. (2012).Google Scholar
Aart JC Bik and Harry AG Wijshoff. 1993. Compilation techniques for sparse matrix computations. In Proceedings of the 7th international conference on Supercomputing. ACM, 416–424.Google ScholarDigital Library
Aart JC Bik and Harry AG Wijshoff. 1994. On automatic data structure selection and code generation for sparse computations. In Languages and Compilers for Parallel Computing. Springer, 57–75.Google Scholar
Aydin Buluc and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In IEEE International Symposium on Parallel and Distributed Processing, (IPDPS). 1–11. Google ScholarCross Ref
Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. 2009. Parallel Sparse Matrixvector and Matrix-transpose-vector Multiplication Using Compressed Sparse Blocks. In Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09). ACM, New York, NY, USA, 233–244. Google ScholarDigital Library
Jong-Ho Byun, Richard Lin, Katherine A Yelick, and James Demmel. 2012. Autotuning sparse matrix-vector multiplication for multicore. EECS, UC Berkeley, Tech. Rep (2012).Google Scholar
Jonathon Cai, Muthu Baskaran, Benoît Meister, and Richard Lethin. 2015. Optimization of symmetric tensor computations. In High Performance Extreme Computing Conference (HPEC), 2015 IEEE. IEEE, 1–7. Google ScholarCross Ref
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011). Google ScholarDigital Library
Albert. Einstein. 1916. The Foundation of the General Theory of Relativity. Annalen der Physik 354 (1916), 769–822.Google ScholarCross Ref
Evgeny Epifanovsky, Michael Wormit, Tomasz Kuś, Arie Landau, Dmitry Zuev, Kirill Khistyaev, Prashant Manohar, Ilya Kaliman, Andreas Dreuw, and Anna I Krylov. 2013. New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations. Journal of computational chemistry 34, 26 (2013), 2293–2309. Google ScholarCross Ref
Richard Feynman, Robert B. Leighton, and Matthew L. Sands. 1963. The Feynman Lectures on Physics. Vol. 3. Addison-Wesley.Google Scholar
Google. 2017. TensorFlow Sparse Tensors. https://www.tensorflow.org/api_guides/python/sparse_ops . (2017).Google Scholar
Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. http://eigen.tuxfamily.org . (2010).Google Scholar
Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw. 4, 3 (1978). Google ScholarDigital Library
Intel. 2012. Intel math kernel library reference manual. Technical Report. 630813-051US, 2012. http://software.intel.com/ sites/products/documentation/hpc/mkl/mklman/mklman.pdf .Google Scholar
Kenneth E. Iverson. 1962. A Programming Language. Wiley. Google ScholarDigital Library
Oguz Kaya and Bora Uçar. 2015. Scalable sparse tensor decompositions in distributed memory systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 77. Google ScholarDigital Library
Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-Kelley, David Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, Danny Kaufman, Gurtej Kanwar, Wojciech Matusik, and Saman Amarasinghe. 2016. Simit: A Language for Physical Simulation. ACM Trans. Graphics (2016).Google ScholarDigital Library
Donald Ervin Knuth. 1973. The art of computer programming: sorting and searching. Vol. 3. Pearson Education.Google Scholar
Joseph C Kolecki. 2002. An Introduction to Tensors for Students of Physics and Engineering. Unixenguaedu 7, September (2002), 29.Google Scholar
Vladimir Kotlyar. 1999. Relational Algebraic Techniques for the Synthesis of Sparse Matrix Programs. Ph.D. Dissertation. Cornell.Google Scholar
Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In Euro-Par’97 Parallel Processing. Springer, 318–327. Google ScholarCross Ref
Jiajia Li, Casey Battaglino, Ioakeim Perros, Jimeng Sun, and Richard Vuduc. 2015. An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 76. Google ScholarDigital Library
Jiajia Li, Yuchen Ma, Chenggang Yan, and Richard Vuduc. 2016. Optimizing sparse tensor times matrix on multi-core and many-core architectures. In Proceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms. IEEE Press, 26–33. Google ScholarCross Ref
MATLAB. 2014. version 8.3.0 (R2014a). The MathWorks Inc., Natick, Massachusetts.Google Scholar
Devin Matthews. 2017. High-Performance Tensor Contraction without Transposition. Technical Report.Google Scholar
Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems. ACM, 165–172. Google ScholarDigital Library
Kathryn S McKinley, Steve Carr, and Chau-Wen Tseng. 1996. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems (TOPLAS) 18, 4 (1996), 424–453.Google ScholarDigital Library
John Michael McNamee. 1971. Algorithm 408: a sparse matrix package (part I)[F4]. Commun. ACM 14, 4 (1971), 265–273. Google ScholarDigital Library
Lenore Mullin and James Raynolds. 2014. Scalable, Portable, Verifiable Kronecker Products on Multi-scale Computers. Springer International Publishing, Cham, 111–129. Google ScholarCross Ref
L. M. R. Mullin. 1988. A Mathematics of Arrays. Ph.D. Dissertation. Syracuse University.Google Scholar
Thomas Nelson, Geoffrey Belter, Jeremy G. Siek, Elizabeth Jessup, and Boyana Norris. 2015. Reliable Generation of High-Performance Matrix Algebra. ACM Trans. Math. Softw. 41, 3, Article 18 (June 2015), 27 pages.Google ScholarDigital Library
William Pugh and Tatiana Shpeisman. 1999. SIPR: A new framework for generating efficient code for sparse matrix computations. In Languages and Compilers for Parallel Computing. Springer, 213–229. Google ScholarCross Ref
Yves Renard. 2017. Gmm++. (2017). http://download.gna.org/getfem/html/homepage/gmm/first-step.htmlGoogle Scholar
Gregorio Ricci-Curbastro and Tullio Levi-Civita. 1901. Méthodes de calcul différentiel absolu et leurs applications. Math. Ann. 54 (1901).Google Scholar
Hongbo Rong, Jongsoo Park, Lingxiang Xiang, Todd A. Anderson, and Mikhail Smelyanskiy. 2016. Sparso: Context-driven Optimizations of Sparse Linear Algebra. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. ACM, 247–259. Google ScholarDigital Library
Conrad Sanderson. 2010. Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments. Technical Report. NICTA.Google Scholar
Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. (2017). http://frostt.io/Google Scholar
Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. ACM, 5. Google ScholarDigital Library
Shaden Smith, Niranjay Ravindran, Nicholas Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 61–70. Google ScholarDigital Library
Edgar Solomonik and Torsten Hoefler. 2015. Sparse Tensor Algebra as a Parallel Programming Model. arXiv preprint arXiv:1512.00066 (2015).Google Scholar
Edgar Solomonik, Devin Matthews, Jeff R Hammond, John F Stanton, and James Demmel. 2014. A massively parallel tensor contraction framework for coupled-cluster computations. J. Parallel and Distrib. Comput. 74, 12 (2014), 3176–3190. Google ScholarDigital Library
Daniele G Spampinato and Markus Püschel. 2014. A basic linear algebra compiler. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 23.Google ScholarDigital Library
Paul Springer and Paolo Bientinesi. 2016. Design of a high-performance GEMM-like Tensor-Tensor Multiplication. arXiv preprint arXiv:1607.00145 (2016).Google Scholar
Paul Stodghill. 1997. A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes. Ph.D. Dissertation. Cornell.Google Scholar
Scott Thibault, Lenore Mullin, and Matt Insall. 1994. Generating Indexing Functions of Regularly Sparse Arrays for Array Compilers. (1994).Google Scholar
William F Tinney and John W Walker. 1967. Direct solutions of sparse network equations by optimally ordered triangular factorization. Proc. IEEE 55, 11 (1967), 1801–1809. Google ScholarCross Ref
Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering 13, 2 (2011), 22–30.Google Scholar
Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015). 521–532. Google ScholarDigital Library
Anand Venkat, Mahdi Soltan Mohammadi, Hongbo Rong, Rajkishore Barik, Jongsoo Park, Michelle Mills Strout, and Mary Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In In Supercomputing (SC).Google Scholar
Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the Evolution of User Interaction in Facebook. In Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN’09). Google ScholarDigital Library
Richard Vuduc, James W. Demmel, and Katherine A. Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series 16, 1 (2005), 521+. Google ScholarCross Ref
Joerg Walter and Mathias Koch. 2007. uBLAS. (2007). http://www.boost.org/libs/numeric/ublas/doc/index.htmGoogle Scholar
R. Clint Whaley and Jack Dongarra. 1998. Automatically Tuned Linear Algebra Software. In SuperComputing 1998: High Performance Networking and Computing. Google ScholarCross Ref
Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2007. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC ’07). New York, NY, USA, 38:1–38:12. Google ScholarDigital Library
Michael E. Wolf and Monica S. Lam. 1991. A Data Locality Optimizing Algorithm. SIGPLAN Not. 26, 6 (May 1991), 30–44. Google ScholarDigital Library
Michael Joseph Wolfe. 1982. Optimizing Supercompilers for Supercomputers. Ph.D. Dissertation. University of Illinois at Urbana-Champaign, Champaign, IL, USA. AAI8303027.Google ScholarDigital Library
Huasha Zhao. 2014. High Performance Machine Learning through Codesign and Rooflining. Ph.D. Dissertation. EECS Department, University of California, Berkeley.Google Scholar

Index Terms

The tensor algebra compiler
1. Mathematics of computing
  1. Mathematical software
    1. Mathematical software performance
2. Software and its engineering
  1. Software notations and tools

Recommendations

Compiler Support for Sparse Tensor Computations in MLIR
Sparse tensors arise in problems in science, engineering, machine learning, and data analytics. Programs that operate on such tensors can exploit sparsity to reduce storage requirements and computational time. Developing and maintaining sparse software by ...
Read More
taco: a tool to generate tensor algebra kernels
ASE '17: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering

Tensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences. However, the number of tensor expressions is unbounded, which makes it hard to develop and ...
Read More
Compiling Structured Tensor Algebra

Tensor algebra is essential for data-intensive workloads in various computational domains. Computational scientists face a trade-off between the specialization degree provided by dense tensor algebra and the algorithmic efficiency that leverages the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Programming Languages Volume 1, Issue OOPSLA
October 2017
1786 pages
EISSN:2475-1421
DOI:10.1145/3152284
Issue’s Table of Contents

Copyright © 2017 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2017
Published in pacmpl Volume 1, Issue OOPSLA

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available
- Artifacts Evaluated & Functional
Author Tags
code generation
iteration graphs
linear algebra
merge lattices
parallelism
performance
sparse data structures
tensor algebra
tensors
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 209
  Total Citations
  View Citations
- 5,126
  Total Downloads
- Downloads (Last 12 months)1,253
- Downloads (Last 6 weeks)183
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The tensor algebra compiler

Proceedings of the ACM on Programming Languages

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Compiler Support for Sparse Tensor Computations in MLIR

taco: a tool to generate tensor algebra kernels

Compiling Structured Tensor Algebra