A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers | IEEE Conference Publication | IEEE Xplore