Elsevier

Astronomy and Computing

Volume 12, September 2015, Pages 181-190
Astronomy and Computing

Full length article
A compression scheme for radio data in high performance computing

https://doi.org/10.1016/j.ascom.2015.07.002Get rights and content

Abstract

We present a procedure for efficiently compressing astronomical radio data for high performance applications. Integrated, post-correlation data are first passed through a nearly lossless rounding step which compares the precision of the data to a generalized and calibration-independent form of the radiometer equation. This allows the precision of the data to be reduced in a way that has an insignificant impact on the data. The newly developed Bitshuffle lossless compression algorithm is subsequently applied. When the algorithm is used in conjunction with the HDF5 library and data format, data produced by the CHIME Pathfinder telescope is compressed to 28% of its original size and decompression throughputs in excess of 1 GB/s are obtained on a single core.

Introduction

The simultaneous drives to wider fields and higher sensitivity have led radio astronomy to the cusp of a big-data revolution. There is a multitude of instruments, including 21  cm cosmology experiments (Pober et al., 2013, Battye et al., 2013, Canadian Hydrogen Intensity Mapping Experiment, CHIME, Pober et al., 2014, Greenhili et al., 2012, van Haarlem et al., 2013, Zheng et al., 2013, Parsons et al., 2010, Chen, 2012), Square Kilometer Array Precursors (Johnston et al., 2008, Lonsdale et al., 2009, Booth et al., 2009), and ultimately the Square Kilometer Array (SKA Organization, 2015), whose rate of data production will be orders of magnitude higher than any existing radio telescope. An early example is the CHIME Pathfinder (Bandura et al., 2014, Newburgh et al., 2014) which will soon be producing data at a steady rate of over 4 TB per day. The cost associated with storing and handling these data can be considerable and therefore it is desirable to reduce the size of the data as much as possible using compression. At the same time, these data volumes produce a significant data processing challenge. Any data compression/decompression scheme must be fast enough as to not hinder data processing, and would ideally lead to a net increase in performance due to the reduced time required to read the data from disk.

Here, after discussing some general considerations for designing data storage formats in Section  2, we present a scheme for compressing astronomical radio data. Our procedure has two steps: a controlled (relative to thermal noise) reduction of the precision of the data which reduces its information entropy (Section  3), and a lossless compression algorithm—Bitshuffle1—which exploits this reduction in entropy to achieve a very high compression ratio (Section  4). These two steps are independent in that, while they work very well together, either of them can be used without the other. When we evaluate our method in Section  5 we show that the precision reduction improves compression ratios for most lossless compressors. Likewise, Bitshuffle outperforms most other lossless compressors even in the absence of precision reduction.

Section snippets

Characteristics of radio-astronomy data and usage patterns

Integrated, post-correlation radio-astronomy data are typically at least three dimensional, containing axes representing spectral frequency, correlation product, and time.2 The correlation product refers to the correlation of all antenna

Lossy entropy reduction: reduction of precision

All experiments must perform some amount of lossy compression simply by virtue of having to choose a finite width data type which reduces precision by truncation. Here, we focus on performing a reduction of precision in a manner that is both controlled, in that it has a well-understood effect on the data; and efficient, in that only the required precision is kept allowing for better compression.

Reducing the precision of the data involves discarding some number of the least significant bits of

Lossless compression: Bitshuffle

Here we discuss lossless data compressors in the context of radio astronomical data. We seek a compressor that is fast enough for high performance applications but also obtains high compression ratios, especially in the context of the precision reduction discussed in the previous section. Satisfying both criteria is difficult and existing compressors are found to be inadequate. Therefore, a custom compression algorithm, Bitshuffle, was developed; it is both fast and obtains high compression

Evaluation of method

In this section we apply the compression algorithm described above to data from the CHIME Pathfinder to assess the algorithm’s performance and to compare it with other compression schemes. The Pathfinder comprises two parabolic cylinders, each 20 m wide by 35 m long, with their axes running in a north–south direction. 64 identical dual-polarization feeds are located at 0.3 m intervals along the central portion of each focal line.

The data used for the following comparisons was collected on

Summary and conclusions

We have presented a high-throughput data compression scheme for astronomical radio data that obtains a very high compression ratio. Our scheme includes two parts: reducing the precision of the data in a controlled manner to discard noisy bits, hence reducing the entropy of the data; and the lossless compression of the data using the Bitshuffle algorithm.

The entire compression algorithm consists of the following steps, starting with the precision reduction:

  • 1.

    Estimate the thermal noise on a

Acknowledgments

We are very grateful for the warm reception and skillful help we have received from the staff of the Dominion Radio Astrophysical Observatory, operated by the National Research Council Canada.

CHIME is Leading Edge Fund project 31170 funded by the Canada Foundation for Innovation, the B.C. Knowledge Development Fund, ‘le Cofinancement gouvernement du Québec-FCI, and the Ontario Research Fund. K. Masui is supported by the Canadian Institute for Advanced Research, Global Scholars Program. M. Deng

References (31)

  • K. Bandura et al.

    Canadian Hydrogen Intensity Mapping Experiment (CHIME) pathfinder

  • R.A. Battye et al.

    HI intensity mapping: a single dish approach

    Mon. Not. R. Astron. Soc.

    (2013)
  • Booth, R.S., de Blok, W.J.G., Jonas, J.L., Fanaroff, B., 2009. MeerKAT key project science, specifications, and...
  • X. Chen

    The Tianlai project: a 21CM cosmology experiment

    Int. J. Mod. Phys. Conf. Ser.

    (2012)
  • Denman, N., Amiri, M., Bandura, K., Cliche, J.-F., Connor, L., Dobbs, M., Fandino, M., Halpern, M., Hincks, A.,...
  • Deutsch, L.P., DEFLATE Compressed Data Format Specification version 1.3, RFC 1951, RFC Editor (May 1996). URL...
  • L.J. Greenhili et al.

    A broadband 512-element full correlation imaging array at VHF (LEDA)

  • N. Hübbe et al.

    Reducing the HPC-datastorage footprint with MAFISC—multidimensional adaptive filtering improved scientific data compression

    Comput. Sci. Res. Dev.

    (2012)
  • D. Huffman

    A method for the construction of minimum-redundancy codes

    Proc. IRE

    (1952)
  • The IEEE, 2008. Standard for floating-point arithmetic, IEEE Std. 754-2008....
  • S. Johnston et al.

    Science with ASKAP. The Australian square-kilometre-array pathfinder

    Exp. Astron.

    (2008)
  • Klages, P., Bandura, K., Denman, N., Recnik, A., Sievers, J., Vanderlinde, K., GPU kernels for high-speed 4-bit...
  • S.R. Kulkarni

    Self-noise in interferometers — radio and infrared

    Astron. J.

    (1989)
  • B. Lathi
  • C.J. Lonsdale et al.

    The Murchison Widefield Array: Design Overview

    IEEE Proc.

    (2009)
  • Cited by (47)

    • Precision requirements and data compression in CryoEM/CryoET

      2022, Journal of Structural Biology
      Citation Excerpt :

      For comparison we also tested the Blosc implementation of zstd, a newer alternative to zlib which leverages modern multithreaded CPUs to improve speed with some other optimizations (https://github.com/facebook/zstd). For the 5 bit case we also include results for bitshuffle followed by zstd as it has been suggested (Masui et al., 2015, McLeod et al., 2018) that shuffling should improve compression levels. Extensive testing with bitshuffle (not shown) yielded no consistent size reductions other than a slight reduction for floating point data.

    View all citing articles on Scopus
    View full text