skip to main content
10.1145/2807591.2807631acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Free Access

Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors

Published:15 November 2015Publication History

ABSTRACT

Full correlation matrix analysis (FCMA) is an unbiased approach for exhaustively studying interactions among brain regions in functional magnetic resonance imaging (fMRI) data from human participants. In order to answer neuroscientific questions efficiently, we are developing a closed-loop analysis system with FCMA on a cluster of nodes with Intel® Xeon Phi™ coprocessors. Here we propose several ideas for data-driven algorithmic modification to improve the performance on the coprocessor. Our experiments with real datasets show that the optimized single-node code runs 5x-16x faster than the baseline implementation using the well-known Intel® MKL and LibSVM libraries, and that the cluster implementation achieves near linear speedup on 5760 cores.

References

  1. H. M. Aktulga, A. Buluç, S. Williams, and C. Yang. Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS '14, pages 1213--1222, May 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Anderson, G. Ballard, J. Demmel, and K. Keutzer. Communication-avoiding qr decomposition for gpus. In Proceedings of the 2011 IEEE International Parallel and Distributed Processing Symposium, IPDPS '11, pages 48--58, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Aprà, M. Klemm, and K. Kowalski. Efficient implementation of many-body quantum chemical methods on the intel® xeon phi™ coprocessor. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '14, pages 674--684, Nov 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Auckenthaler, T. Huckle, and R. Wittmann. A blocked qr-decomposition for the parallel symmetric eigenvalue problem. Parallel Comput., 40(7):186--194, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Catanzaro, N. Sundaram, and K. Keutzer. Fast support vector machine training and classification on graphics processors. In Proceedings of the 25th international conference on Machine learning, ICML '08, pages 104--111, Jul 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C.-C. Chang and C.-J. Lin. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):1--27, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. T. deBettencourt, J. D. Cohen, R. F. Lee, K. A. Norman, and N. B. Turk-Browne. Closed-loop training of attention with real-time brain imaging. Nature Neuroscience, 18(3):470--475, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Demmel, D. Eliahu, A. Fox, S. Kamil, B. Lipshitz, O. Schwartz, and O. Spillinger. Poster: Beating mkl and scalapack at rectangular matrix multiplication using the bfs/dfs approach. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC '12, pages 1370--1370, Nov 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:1871--1874, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training support vector machines. J. Mach. Learn. Res., 6:1889--1918, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Fang, A. L. Varbanescu, H. J. Sips, L. Zhang, Y. Che, and C. Xu. An empirical study of intel xeon phi. arXiv preprint arXiv:1310.5842, 2013.Google ScholarGoogle Scholar
  12. P. Gepner, V. Gamayunov, D. L. Fraser, E. Houdard, L. Sauge, D. Declat, and M. Dubois. Evaluation of dgemm implementation on intel xeon phi coprocessor. Journal of Computers, 9(7):1566--1571, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  13. K. Goto and R. A. Geijn. Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software (TOMS), 34(3):1--25, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Heinecke, K. Vaidyanathan, M. Smelyanskiy, A. Kobotov, R. Dubtsov, G. Henry, A. G. Shet, G. Chrysos, and P. Dubey. Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi™ coprocessor. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS '13, pages 126--137, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Heybrock, B. Joó, D. D. Kalamkar, M. Smelyanskiy, K. Vaidyanathan, T. Wettig, and P. Dubey. Lattice qcd with domain decomposition on intel® xeon phi™ co-processors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '14, pages 69--80, Nov 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Hutchinson, Y. Wang, and N. Turk-Browne. Decoding the locus of attention from the full correlation matrix of the human brain. In Society for Neuroscience, SfN '14, Nov 2014.Google ScholarGoogle Scholar
  17. S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to platt's smo algorithm for svm classifier design. Neural Computation, 13(3):637--649, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. ACM SIGOPS Operating Systems Review, 25(Special Issue):63--74, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Li, J. Dongarra, and S. Tomov. A note on auto-tuning gemm for gpus. In Computational Science - ICCS 2009, pages 884--892. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Marek, V. Blum, R. Johanni, V. Havu, B. Lang, T. Auckenthaler, A. Heinecke, H.-J. Bungartz, and H. Lederer. The elpa library: scalable parallel eigenvalue solutions for electronic structure theory and computational science. Journal of Physics: Condensed Matter, 26(21):213201, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  21. K. A. Norman, S. M. Polyn, G. J. Detre, and J. V. Haxby. Beyond mind-reading: multi-voxel pattern analysis of fmri data. Trends in cognitive sciences, 10(9):424--430, 2006.Google ScholarGoogle Scholar
  22. H. Pabst. Libxsmm. https://github.com/hfp/libxsmm.Google ScholarGoogle Scholar
  23. J. Philbin, J. Edler, O. J. Anshus, C. C. Douglas, and K. Li. Thread scheduling for cache locality. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VII, pages 60--71, Oct 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research, Apr 1998.Google ScholarGoogle Scholar
  25. J. Sulzer, S. Haller, F. Scharnowski, N. Weiskopf, N. Birbaumer, M. L. Blefari, A. Bruehl, L. Cohen, R. Gassert, R. Goebel, et al. Real-time fmri neurofeedback: progress and challenges. NeuroImage, 76:386--399, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  26. G. Tan, L. Li, S. Triechle, E. Phillips, Y. Bao, and N. Sun. Fast implementation of dgemm on fermi gpu. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 35:1--35:11, Nov 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. B. Turk-Browne. Functional interactions as big data in the human brain. Science, 342(6158):580--584, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  28. V. Volkov and J. W. Demmel. Benchmarking gpus to tune dense linear algebra. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08, pages 31:1--31:11, Nov 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Wang, J. D. Cohen, K. Li, and N. B. Turk-Browne. Full correlation matrix analysis of fmri data. Technical report, Princeton Neuroscience Institute, 2014.Google ScholarGoogle Scholar
  30. Y. Wang, J. D. Cohen, K. Li, and N. B. Turk-Browne. Full correlation matrix analysis (fcma): An unbiased method for task-related functional connectivity. Journal of Neuroscience Methods, 251:108--119, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  31. M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. ACM Sigplan Notices, 26(6):30--44, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. J. Worsley, J.-I. Chen, J. Lerch, and A. C. Evans. Comparing functional connectivity via thresholding correlations and singular value decomposition. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1457):913--920, 2005.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
                  November 2015
                  985 pages
                  ISBN:9781450337236
                  DOI:10.1145/2807591
                  • General Chair:
                  • Jackie Kern,
                  • Program Chair:
                  • Jeffrey S. Vetter

                  Copyright © 2015 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 15 November 2015

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  SC '15 Paper Acceptance Rate79of358submissions,22%Overall Acceptance Rate1,516of6,373submissions,24%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader