ABSTRACT
Entropy coding is essential to data compression, image and video coding, etc. The Range variant of Asymmetric Numeral Systems (rANS) is a modern entropy coder, featuring superior speed and compression rate. As rANS is not designed for parallel execution, the conventional approach to parallel rANS partitions the input symbol sequence and encodes partitions with independent codecs, and more partitions bring extra overhead. This approach is found in state-of-the-art implementations such as DietGPU. It is unsuitable for content-delivery applications, as the parallelism is wasted if the decoder cannot decode all the partitions in parallel, but all the overhead is still transferred.
To solve this, we propose Recoil, a parallel rANS decoding approach with decoder-adaptive scalability. We discover that a single rANS-encoded bitstream can be decoded from any arbitrary position if the intermediate states are known. After renormalization, these states also have a smaller upper bound, which can be stored efficiently. We then split the encoded bitstream using a heuristic to evenly distribute the workload, and store the intermediate states and corresponding symbol indices as metadata. The splits can then be combined simply by eliminating extra metadata entries.
The main contribution of Recoil is reducing unnecessary data transfer by adaptively scaling parallelism overhead to match the decoder capability. The experiments show that Recoil decoding throughput is comparable to the conventional approach, scaling massively on CPUs and GPUs and greatly outperforming various other ANS-based codecs.
Supplemental Material
Available for Download
- Eirikur Agustsson and Radu Timofte. 2017. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.Google Scholar
- Kasidis Arunruangsirilert, Pasapong Wongprasert, and Jiro Katto. 2023. Performance Evaluations of C-Band 5G NR FR1 (Sub-6 GHz) Uplink MIMO on Urban Train. In 2023 IEEE Wireless Communications and Networking Conference (WCNC). 1–6. https://doi.org/10.1109/WCNC55385.2023.10118777Google ScholarCross Ref
- Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. In International Conference on Learning Representations.Google Scholar
- Yann Collet. 2023. New Generation Entropy coders. Retrieved 2023-04-01 from https://github.com/Cyan4973/FiniteStateEntropyGoogle Scholar
- Yann Collet. 2023. Zstandard - Real-time data compression algorithm. Retrieved 2023-04-03 from http://facebook.github.io/zstd/Google Scholar
- Sebastian Deorowicz. 2020. Silesia compression corpus. Retrieved 2023-04-10 from https://sun.aei.polsl.pl/ sdeor/index.php?page=silesiaGoogle Scholar
- Jarek Duda. 2009. Asymmetric numeral systems. arxiv:0902.0271 [cs.IT]Google Scholar
- Shunji Funasaka, Koji Nakano, and Yasuaki Ito. 2016. Light Loss-Less Data Compression, with GPU Implementation, Vol. 10048. 281–294. https://doi.org/10.1007/978-3-319-49583-5_22Google ScholarCross Ref
- Fabian Giesen. 2014. Interleaved entropy coders. arxiv:1402.3392 [cs.IT]Google Scholar
- Fabian Giesen. 2018. Simple rANS encoder/decoder (arithmetic coding-ish entropy coder). Retrieved 2023-04-10 from https://github.com/rygorous/ryg_ransGoogle Scholar
- Jeff Johnson. 2022. DietGPU: GPU-based lossless compression for numerical data. https://github.com/facebookresearch/dietgpuGoogle Scholar
- Joint Photographic Experts Group. 2022. JPEG - JPEG XL. Retrieved 2023-04-03 from https://jpeg.org/jpegxl/Google Scholar
- Fabian Knorr, Peter Thoman, and Thomas Fahringer. 2021. Ndzip-Gpu: Efficient Lossless Compression of Scientific Floating-Point Data on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri) (SC ’21). Association for Computing Machinery, New York, NY, USA, Article 93, 14 pages. https://doi.org/10.1145/3458817.3476224Google ScholarDigital Library
- Pavel Krajcevski, Srihari Pratapa, and Dinesh Manocha. 2016. GST: GPU-Decodable Supercompressed Textures. ACM Trans. Graph. 35, 6, Article 230 (dec 2016), 10 pages. https://doi.org/10.1145/2980179.2982439Google ScholarDigital Library
- Fangzheng Lin, Heming Sun, Jinming Liu, and Jiro Katto. 2023. Multistage Spatial Context Models for Learned Image Compression. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095875Google ScholarCross Ref
- Matt Mahoney. 2023. Large Text Compression Benchmark. Retrieved 2023-04-03 from https://mattmahoney.net/dc/text.htmlGoogle Scholar
- David Minnen, Johannes Ballé, and George D Toderici. 2018. Joint Autoregressive and Hierarchical Priors for Learned Image Compression. In Advances in Neural Information Processing Systems.Google Scholar
- Seyyed Mahdi Najmabadi, Trung-Hieu Tran, Sherif Eissa, Harsimran Singh Tungal, and Sven Simon. 2019. An Architecture for Asymmetric Numeral Systems Entropy Decoder - A Comparison with a Canonical Huffman Decoder. J. Signal Process. Syst. 91, 7 (jul 2019), 805–817. https://doi.org/10.1007/s11265-018-1421-4Google ScholarDigital Library
- NVIDIA. 2023. NVCOMP. Retrieved 2023-04-10 from https://developer.nvidia.com/nvcompGoogle Scholar
- NVIDIA. 2023. nvJPEG. Retrieved 2023-04-10 from https://developer.nvidia.com/nvjpegGoogle Scholar
- Adnan Ozsoy and Martin Swany. 2011. CULZSS: LZSS Lossless Data Compression on CUDA. In 2011 IEEE International Conference on Cluster Computing. 403–411. https://doi.org/10.1109/CLUSTER.2011.52Google ScholarDigital Library
- Ritesh A. Patel, Yao Zhang, Jason Mak, Andrew Davidson, and John D. Owens. 2012. Parallel lossless data compression on the GPU. In 2012 Innovative Parallel Computing (InPar). 1–9. https://doi.org/10.1109/InPar.2012.6339599Google ScholarCross Ref
- Evangelia Sitaridi, Rene Mueller, Tim Kaldewey, Guy Lohman, and Kenneth A. Ross. 2016. Massively-Parallel Lossless Data Decompression. In 2016 45th International Conference on Parallel Processing (ICPP). 242–247. https://doi.org/10.1109/ICPP.2016.35Google ScholarCross Ref
- André Weißenberger and Bertil Schmidt. 2018. Massively Parallel Huffman Decoding on GPUs. In Proceedings of the 47th International Conference on Parallel Processing (Eugene, OR, USA) (ICPP ’18). Association for Computing Machinery, New York, NY, USA, Article 27, 10 pages. https://doi.org/10.1145/3225058.3225076Google ScholarDigital Library
- André Weißenberger and Bertil Schmidt. 2019. Massively Parallel ANS Decoding on GPUs. In Proceedings of the 48th International Conference on Parallel Processing (Kyoto, Japan) (ICPP ’19). Association for Computing Machinery, New York, NY, USA, Article 100, 10 pages. https://doi.org/10.1145/3337821.3337888Google ScholarDigital Library
Index Terms
- Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability
Recommendations
The Use of Asymmetric Numeral Systems Entropy Encoding in Video Compression
Distributed Computer and Communication NetworksAbstractModern image and video compression technologies includes many compression methods both lossless and lossy. Entropy encoding has a special place among lossless compression methods - it performs final data compression before its conversion to output ...
An Architecture for Asymmetric Numeral Systems Entropy Decoder - A Comparison with a Canonical Huffman Decoder
This paper proposes two decoder hardware architectures for the tabled asymmetric numeral systems (tANS) compression algorithm, a software implementation of which is used by Apple and Facebook due to its efficiency. To the best of our knowledge, hardware ...
Efficient CABAC Bit Estimation for H.265/HEVC Rate-Distortion Optimization
The entropy coding of context-adaptive binary arithmetic coding CABAC has been utilized in the H.265/HEVC for higher coding efficiency. But the related complexity also causes a bottleneck for its low-delay applications, owing to the employment of inter-...
Comments