Skip to main content

Parallel Sparse Matrix Vector Multiplication on Intel MIC: Performance Analysis

  • Conference paper
  • First Online:
Smart Societies, Infrastructure, Technologies and Applications (SCITA 2017)

Abstract

Numerous important scientific and engineering applications rely on and are hindered by, the intensive computational and storage requirements of sparse matrix-vector multiplication (SpMV) operation. SpMV also forms an important part of many (stationary and non-stationary) iterative methods for solving linear equation systems. Its performance is affected by factors including the storage format used to store the sparse matrix, the specific computational algorithm and its implementation. While SpMV performance has been studied extensively on conventional CPU architectures, research on its performance on emerging architectures, such as Intel Many Integrated Core (MIC) Architecture, is still in its infancy. In this paper, we provide a performance analysis of the parallel implementation of SpMV on the first-generation of Intel Xeon Phi Coprocessor, Intel MIC, named Knights Corner (KNC). We use the offload programming model to offload the SpMV computations to MIC using OpenMP. We measure the performance in terms of the execution time, offloading time and memory usage. We achieve speedups of up to 11.63x on execution times and 3.62x on offloading times using up to 240 threads compared to the sequential implementation. The memory usage varies depending on the size of the sparse matrix and the number of non-zero elements in the matrix.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mehmood, R., Lu, J.A.: Computational Markovian analysis of large systems. J. Manuf. Technol. Manag. 22, 804–817 (2011)

    Article  Google Scholar 

  2. Mehmood, R., Alturki, R., Zeadally, S.: Multimedia applications over metropolitan area networks (MANs). J. Netw. Comput. Appl. 34, 1518–1529 (2011)

    Article  Google Scholar 

  3. Mehmood, R., Meriton, R., Graham, G., Hennelly, P., Kumar, M.: Exploring the influence of big data on city transport operations: a Markovian approach. Int. J. Oper. Prod. Manag. 37, 75–104 (2016)

    Article  Google Scholar 

  4. Altowaijri, S., Mehmood, R., Williams, J.: A quantitative model of grid systems performance in healthcare organisations. In: ISMS 2010 - UKSim/AMSS 1st International Conference on Intelligent Systems, Modelling and Simulation, pp. 431–436 (2010)

    Google Scholar 

  5. Mehmood, R., Graham, G.: Big data logistics: a health-care transport capacity sharing model. Procedia Comput. Sci. 64, 1107–1114 (2015)

    Article  Google Scholar 

  6. Mehmood, R.: Disk-Based Techniques for Efficient Solution of Large Markov Chains (2004)

    Google Scholar 

  7. Banu, S.J.: Performance Analysis on Parallel Sparse Matrix Vector Multiplication Micro-Benchmark Using Dynamic Instrumentation Pintool, pp. 129–136 (2013)

    Google Scholar 

  8. Mehmood, R., Crowcroft, J.: Parallel iterative solution method for large sparse linear equation systems. Technical report Number UCAM-CL-TR-650, Computer Laboratory, University of Cambridge, Cambridge, UK (2005)

    Google Scholar 

  9. Mehmood, R.: Serial disk-based analysis of large stochastic models. In: Baier, C., Haverkort, Boudewijn R., Hermanns, H., Katoen, J.-P., Siegle, M. (eds.) Validation of Stochastic Systems. LNCS, vol. 2925, pp. 230–255. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24611-4_7

    Chapter  MATH  Google Scholar 

  10. Giles, M.B., Reguly, I.: Trends in high-performance computing for engineering calculations. Philos. Trans. R. Soc. A. 372, 20130319 (2014)

    Article  Google Scholar 

  11. Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36, 34–46 (2016)

    Article  Google Scholar 

  12. Cramer, T., Schmidl, D., Klemm, M., Mey, D.: OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison. Marc@Rwth, pp. 38–44 (2012)

    Google Scholar 

  13. Intel® Many Integrated Core Architecture - Advanced

    Google Scholar 

  14. Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., Wang, Y.: High-Performance Computing on the Intel® Xeon PhiTM. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06486-4

    Book  Google Scholar 

  15. Maeda, H., Takahashi, D.: Performance evaluation of sparse matrix-vector multiplication using GPU/MIC cluster. In: 2015 Third International Symposium on Computing and Networking, pp. 396–399 (2015)

    Google Scholar 

  16. Saule, E., Kaya, K., Atalyürek, U.V.Ç.: Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi (2013)

    Google Scholar 

  17. Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of 27th International ACM Conference on Supercomputing ICS 2013, p. 273 (2013)

    Google Scholar 

  18. Kreutzer, M., Hager, G., Wellein, G.: A unified sparse matrix data format for modern processors with wide SIMD units. SIAM J. Sci. Comput. 36, 1–25 (2013). https://arxiv.org/abs/1307.6209v1

    MATH  Google Scholar 

  19. Maeda, H., Takahashi, D.: Parallel sparse matrix-vector multiplication using accelerators. In: Gervasi, O., et al. (eds.) ICCSA 2016. LNCS, vol. 9787, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42108-7_1

    Chapter  Google Scholar 

  20. Ahamed, A.-K.C., Magoules, F.: Iterative methods for sparse linear systems on graphics processing unit. In: 2012 IEEE 14th International Conference on High Performance Computing and Communication 2012 and IEEE 9th International Conference on Embedded Software Systems, pp. 836–842 (2012)

    Google Scholar 

  21. Search the University Florida Matrix Collection. http://yifanhu.net/GALLERY/GRAPHS/search.html

  22. About Aziz. http://hpc.kau.edu.sa/Pages-About-Aziz-en2.aspx

Download references

Acknowledgments

The experiments reported in this paper were performed on the Aziz supercomputer at King AbdulAziz University, Jeddah, Saudi Arabia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hana Alyahya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alyahya, H., Mehmood, R., Katib, I. (2018). Parallel Sparse Matrix Vector Multiplication on Intel MIC: Performance Analysis. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 224. Springer, Cham. https://doi.org/10.1007/978-3-319-94180-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94180-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94179-0

  • Online ISBN: 978-3-319-94180-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics