Abstract
Deblocking filter is one of the most time consuming modules in the H.264/AVC decoder as indicated in many studies. Therefore, accelerating deblocking filter is critical for improving the overall decoding performance. This paper proposes a novel parallel algorithm for H.264/AVC deblocking filter to speed the H.264/AVC decoder up. We exploit pixel-level data parallelism among filtering steps, and observe that results of each filtering step only affect a limited region of pixels. We call this “the limited propagation effect”. Based on this observation, the proposed algorithm could partition a frame into multiple independent rectangles with arbitrary granularity. The proposed parallel deblocking filter algorithm requires very little synchronization overhead, and provides good scalability. Experimental results show that applying the proposed parallelization method to a SIMD optimized sequential deblocking filter achieves up to 95.31% and 224.07% speedup on a two-core and four-core processor, respectively. We have also observed a significant speedup for H.264/AVC decoding, 21% and 34% on a two-core and four-core processor, respectively.
Similar content being viewed by others
Notes
For this set of analysis, the input frames to deblocking filter has gone through the decoding stages preceding deblocking filter.
References
List, P., Joch, A., Lainema, J., Bjntegaard, G., & Karczewicz, M. (2003). Adaptive deblocking filter. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 614–619.
Lappalainen, V., Hallapuro, A., Hamalainen, T.D., Center, N.R., & Tampere, F. (2003). Complexity of optimized H. 26L video decoder implementation. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 717–725.
Chen, T.C., Fang, H.C., Lian, C.J., Tsai, C.H., Huang, Y.W., Chen, T.W., et al. (2003). Algorithm analysis and architecture design for HDTV applications-a look at the H. 264/AVC video compressor system. IEEE Transactions on Circuits and Devices Magazine, 22(3), 22–31.
Huang, Y.W., Chen, T.W., Hsieh, B.Y., Wang, T.C., Chang, T.H., & Chen, L.G. (2003). Architecture design for deblocking filter in H.264/JVT/AVC. IEEE Proceedings of International Conference on Multimedia and Expo, 1, 693–699.
Sima, M., Zhou, Y., & Zhang, W. (2004) An efficient architecture for adaptive deblocking filter of H.264/AVC video coding. IEEE Transactions on Consumer Electronics, 50(1), 292–296.
Chang, S.C., Peng, W.H., Wang, S.H., & Chiang, T. (2005). A platform based bus-interleaved architecture for de-blocking filter in H.264/MPEG-4 AVC. IEEE Transactions on Consumer Electronics, 51(1), 249–255.
Cheng, C.C., Chang, T.S., & Lee, K.B. (2006). An in-place architecture for the deblocking filter in H.264/AVC. IEEE Transactions on Circuits and Systems: Analog and Digital Signal Processing, 99, 530–534.
Khurana, G.K., & Mi, A.A.T.P.C. (2006). A pipelined hardware implementation of in-loop deblocking filter in H.264/AVC. IEEE Transsactions on Consumer Electronics, 52(2), 536–540.
Intel Corporation. IA-32 Intel architecture optimization reference manual. ftp://download.intel.com/design/Pentium4/manuals.
Warrington, S., Shojania, H., & Sudharsanan, S. (2006). Performance improvement of the H.264/AVC deblocking filter using SIMD instructions. IEEE Proceedings of International Symposium on Circuits and Systems.
Zhou, X., Li, E.Q., & Chen, Y.K. (2003). Implementation of H. 264 decoder on general-purpose processors with media instructions. SPIE Conference on Image and Video Communications and Processing.
Wang, S.W., Yang, Y.T., Li, C.Y., Tung, Y.S., & Wu, J.L. (2004). The optimization of H.264/AVC baseline decoder on low-cost TriMedia DSP processor. Proceedings of SPIE, 5558.
Lin, W., Goh, K.H., Tye, B.J., Powell, G.A., Ohya, T., & Adachi, S. (1997). Real time H. 263 video codec using parallel DSP. Proceedings IEEE International Conference on Image Processing, 586–589.
Chiu, C.N., Tseng, C.T., & Tsai, C.J. (1997). Tightly-coupled MPEG-4 video encoder framework on asymmetric dual-core platforms. IEEE International Symposium on Circuits and Systems, 586–589.
Dutta, S., Singh, D., Mehra, V., Semicond, P., & Sunnyvale, C.A. (1999). Architecture and implementation of a single-chip programmabledigital television and media processor. IEEE Workshop on Signal Processing Systems, 321–330.
Wyland, D.C. (2000). Media processors using a new microsystem architecture designed for the Internet era. In Media Processors 2000, Proceedings of the SPIE (vol. 3970, pp. 2–15). San Jose, CA, USA.
Sudharsanan, S., Sriram, P., Frederickson, H., & Gulati, A. (2000). Image and video processing using MAJC 5200. In Proceedings of the 2000 IEEE International Conference on Image Processing. Vancouver, Canada.
de With, P.H.N., & Jaspers, E.G.T. (1999). A video display processing platform for future TV concepts. IEEE Transactions on Consumer Electronics, 45, 1230–1240.
van der Tol, E.B., & Jaspers, E.G.T. (2002). Mapping of MPEG-4 decoding on a flexible architecture platform. In Media Processors 2002, Proceedings of the SPIE (pp. 1–13). San Jose, CA, USA.
Wang, S.H., Peng, W.H., He, Y., Lin, G.Y., Lin, C.Y., Chang, S.C., et al. (2005). A software-hardware co-implementation of MPEG-4 advanced video coding (AVC) decoder with block level pipelining. The Journal of VLSI Signal Processing, 41,(1), 93–110.
Rodriguez, A., Gonzalez, A., & Malumbres, M.P. (2006). Hierarchical parallelization of an H.264/AVC video encoder. Proceedings of the International Symposium on Parallel Computing in Electrical Engineering (PARELEC’06), 00, 363–368.
Chen, Y.K., Tian, X., Ge, S., & Girkar, M. (2004). Towards efficient multi-level threading of H. 264 encoder on Intel hyper-threading architectures. Proceedings of 18th International Parallel and Distributed Processing Symposium.
Chen, Y.K., Li, E.Q., Zhou, X., & Ge, S. (2005). Implementation of H.264 encoder and decoder on personal computers. Journal of Visual Communications and Image Representations, 17, 509–532.
Aho, A.V., Sethi, R., & Ullman, J.D. (2007). Compilers: principles, techniques, and tools. Boston: Addison-Wesley Longman.
Schoffmann, K., Fauster, M., Lampl, O., & Boszormenyi, L. (2007). An evaluation of parallelization concepts for baseline-profile compliant H. 264/AVC decoders. Lecture Notes in Computer Science, 4641.
Zhao, Z., & Liang, P. (2006). Data partition for wavefront parallelization of H.264 video encoder. IEEE Proceedings of International Symposium on Circuits and Systems.
JVT (2003). Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264| ISO/IEC 14496-10 AVC). Joint video team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVTG050.
van der Tol, E.B., Jaspers, E.G., & Gelderblom, R.H. (2003). Mapping of H. 264 decoding on a multiprocessor architecture. Proceedings SPIE Conference on Image and Video Communications and Processing.
Nichols, B., & Buttlar, D. (1996). Pthreads programming. Sebastopol: O’Reilly.
Author information
Authors and Affiliations
Corresponding author
Additional information
Ja-Ling Wu is a Fellow IEEE.
Rights and permissions
About this article
Cite this article
Wang, SW., Yang, SS., Chen, HM. et al. A Multi-core Architecture Based Parallel Framework for H.264/AVC Deblocking Filters. J Sign Process Syst Sign Image Video Technol 57, 195–211 (2009). https://doi.org/10.1007/s11265-008-0321-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-008-0321-4