ABSTRACT
Saturated arithmetic is a typical operation in multimedia applications, most multimedia extensions in the instruction set architecture (ISA) of modern processors provide saturation instructions for such operation. Therefore, extensive researches have focused on how to utilize saturation instructions to optimize programs. Previous algorithms mainly focus on purely saturated arithmetic, however saturated arithmetic is often mingled with first-order linear recurrence (FOLR) in real life applications. When FLOR pattern appears in the program, previous algorithms can not identify the saturated arithmetic as well.
In fact, the saturated arithmetic with FOLR (SAWF) is a new and significant pattern, especially, SAWF with one as coefficient is frequently used in multimedia applications. Hence, it is necessary to explore a method with which such pattern can be efficiently vectorized. This paper discusses how to vectorize SAWF, explores the efficient method to vectorize SAWF with one as coefficient and gives its evaluation and implement a library for the optimizing technique. Such an implementation manner can make compilers are able to exploit it more easily. The experimental results shows the optimizing technique can achieve a speedup of 1.19 to 1.46 on Pentium IV processor. At the same time, the optimizing techniques in this paper can also be used to develop a library for SAWF so a programmer can benefit even without changing the compiler.
- Gang Ren, Peng Wu, David Padua. An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions. Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, 2005. Google ScholarDigital Library
- Aart J. C. Bik, Milind Girkae, Paul M. Grey, Xinmin Tian. Automatic Detection of Saturation and Clipping Idioms. Proceedings of the 15th International Workshop on Languages and Compilers for parallel computers, July, 2002 Google ScholarDigital Library
- Slingerland N, Smith A J. Measuring the Performance of Multimedia Instruction Sets. IEEE Trans. Computers, 2002, 51(11): 1317--1332. Google ScholarDigital Library
- Nathan T. Slingerland, Alan Jay Smith. Design and characterization of the Berkeley multimedia workload, Multimedia Syst, 2002, 8(4): 315--327. Google ScholarDigital Library
- Ren G, Wu P, Padua D. A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions. Proc. Of the 16th Int'l WorkShop on Languages and Compilers for Parallel Computing. 2003Google Scholar
- Weihua Jiang, Chao Mei, BoHuang, Jianhui Li, Jiahua Zhu, Binyu Zang, Chuanqi Zhu. Boosting the Performance of Multimedia Applications Using SIMD Instructions. Compiler Constructions. 2005 Google ScholarDigital Library
- Jiahua Zhu, HongJiang Zhang, Hui Shi, Binyu Zang, Chuanqi Zhu "Overflow Controlled SIMD Arithmetic". The 17th International Workshop on Languages and Compilers for Parallel Computing (LCPC 04) Google ScholarDigital Library
- Hong-Soog Kim, Young-Ha Yoon, Dong-Soo Han. Parallel Processing of First Order Linear Recurrence on SMP Machines. The Journal of Supercomputing, 27, 295--310, 2004 Google ScholarDigital Library
- M. Nakamura, Y. Okabe, and T. Tsuda. New fast algorithms for first-order linear recurrences on vector computers. In 5th Workshop on Compilers for Parallel Computers, pp. 167C174, June 1995.Google Scholar
- H. Wang, A. Nicolau, S. Keung, and Kai-Yeung Siu. Computing programs containing band linear recurrences on vector supercomputers. IEEE Transactions on Parallel and Distributed Systems, 7(8):769C782, August 1996. Google ScholarDigital Library
- Y. Tanaka, K. Iwasawa, S. Gotoo, and Y. Umetani. Compiling techniques for first-order linear recurrences on a vector computer. In Supercomputing 88, pp. 174C181, IEEE, November 1988. Google ScholarDigital Library
- H. Wang, A. Nicolau, S. Keung, and K. S. Siu. Scalable techniques for computing band linear recurrences on massively parallel and vector supercomputers. In 8th International Parallel Processing Symposium, pp. 502C508. IEEE/ACM, April 1994. Google ScholarDigital Library
- Randy Allen, Ken Kennedy, Carrie Porterfield and Joe Warren. Conversion of Control Dependence to Data Dependence. ACM Symposium on Principles of Programming. Google ScholarDigital Library
- Zheng B, Tsai J Y, Zhang BY, Chen T, Huang B, Li J H, Ding Y H, Liang J, Zhen Y, Yew P C, Zhu C Q. Designing the Agassiz Compiler for Concurrent Multithreaded Architectures. Proc. Of the 12th Intel WorkShop on Languages and Compilers for Parallel Computing, 1999:380--398 Google ScholarDigital Library
Index Terms
- Optimizing techniques for saturated arithmetic with first-order linear recurrence
Recommendations
Multi-dimensional Vectorization in LLVM
WPMVP'19: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector ProcessingLoop vectorization is a classic technique to exploit SIMD instructions in a productive way. In multi-dimensional vectorization, multiple loops of a loop nest are vectorized at once. This exposes opportunities for data reuse, register tiling and more ...
Outer-loop vectorization: revisited for short SIMD architectures
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesVectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multi-media and ...
Enhancing LLVM Optimizations for Linear Recurrence Programs on RVV
ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing WorkshopsThe RISC-V Vector Extension (RVV) has emerged as a promising vector architecture for high-performance computing. It enables parallel computing capability for RISC-V CPUs by introducing additional vector instructions and vector registers. To fully ...
Comments