Abstract
Multiple Symbol Detection (MSD) is an important technique in digital signal processing. It estimates the sequence of the received signal by maximum-likelihood principle. Due to its high computational complexity, currently, MSD algorithms were implemented in specialized signal processing devices, such as Field Programmable Gate Arrays (FPGAs). As the rapid development of CUDA, GPU has successfully accelerated applications in a variety of domains. In this paper, we explore to utilize CUDA-enabled GPUs to accelerate MSD algorithm. The computation core of MSD, sliding correlation problem, is formulated and an efficient CUDA parallelization scheme is proposed. CUDA-enabled MSD (CU-MSD) algorithm is implemented by adapting CUDA-enabled sliding correlation. To further improve the scalability of CU-MSD, the implementation on multiple GPUs is proposed as well. Various optimization techniques are used to maximize the performance. The performance of CU-MSD is evaluated by an MSD-based demodulation for PCM/FM telemetry system. Four data sets from a real aerospace PCM/FM integrated baseband system were used in our experiments. The experimental results demonstrate up to 133.3\(\times \) speedup using a single GPU and 514.64\(\times \) speedup using 4 GPUs in a single server.
Similar content being viewed by others
References
Geoghegan M (2000) Improving the detecting efficiency of conventional PCM/FM telemetry by using a multi-symbol demodulator. In: Proceedings of international telemetry conference, pp 675–682
Pelton JN, Madry S, Camacho Lara S (2013) Handbook of satellite applications. Springer, Berlin
Balevic A, Rockstroh L, Li W et al (2008) Acceleration of a finite-difference time-domain method with general purpose GPUs (GPGPUs). In: Proceedings of international conference on computer and information technology, vol 1–2, pp 291–294
Cohen JM, Molemaker MJ (2009) A fast double Precision CFD code using CUDA. In: Proceedings of international conference on parallel computational fluid dynamics
Jeong WK, Fletcher PT, Tao R et al (2007) Interactive visualization of volumetric white matter connectivity in DT-MRI using a parallel-hardware Hamilton–Jacobi solver. IEEE Trans Vis Comput Graph 3(6):1480–1487
Kavinguy B (2008) A neural Network on GPU. http://www.codeproject.com/KB/graphics/GPUNN.aspx
Yang L, Chiu S, Liao W, Thomas M (2013) High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J Supercomput 70(1):284–300
Jian L, Wang C, Liang S, Liu Y, Yi W, Shi Y (2013) Parallel data mining on graphics processing unit with compute unified device architecture (CUDA). J Supercomput 64(3):942–967
Vasiliadis G, Antonatos S, Polychronakis M et al (2008) Gnort: high performance network intrusion detection using graphics processors. Recent Adv Intrusion Detect 5230:116–134
Zhang Y, Misra S, Agrawal A, Patwary MMA, Liao WK, Qin Z, Choudhary A (2012) Accelerating pairwise statistical significance estimation for local alignment by harvesting GPU’s power. BMC Bioinform 13(Suppl 5):S3. doi:10.1186/1471-2105-13-S5-S3
Zhao R, Zheng H, Liu Y et al (2013) CUDA-enabled multiple symbol detection for PCM/FM demodulation. Proceedings of IEEE international conference on cloud and service computing. Beijing, China
Uhm OS, Cho JW (1999) Multi-symbol detecting for biorthogonal signals over Rayleigh fading channels. IEICE Trans Commun E82-B: 967–973
Cheng S, Valenti MC, Torrieri D (2009) Coherent and multi-symbol noncoherent CPFSK: capacity and code design. In: Proceedings of IEEE military communications conference. Orlando, USA pp 1–7
Ning X, Yeh C (2011) Multiple-GPU accelerated range-Doppler algorithm for synthetic aperture radar imaging. In: IEEE Radar Conference. Kansas City, USA, pp 698–701
Bernaschi M, Di Lallo A, Fulcoli R, Gallo E (2011) Combined use of graphics processing unit (GPU) and central processing unit (CPU) for passive radar signal & data elaboration. In: Proceedings of international radar symposium. Leipzig, Germany, pp 315–320
Wang G, Wu M, Sun Y (2011) A massively parallel implementation of QC-LDPC decoder on GPU. In: IEEE 9th symposium on application specific processors. San Diego, USA, pp 82–85
Agullo E, Augonnet C (2011) QR factorization on a multicore node enhanced with multiple GPU accelerators. In: Proceedings of IEEE parallel & distributed processing symposium (IPDPS). Anchorage, USA, pp 932–943
Borelli FF, de Camargo RY, Martins DC, Rozante LCS (2013) Gene regulatory networks inference using a multi-GPU exhaustive search algorithm. BMC Bioinform 14:S5
Wu Z, Zhao N, Shuying Li, Ren G (2009) A novel PCM/FM multi-symbol detection algorithm for FPGA implementation. J Inf Technol 8(4):583–588
Sanders J, Kandrot E (2010) CUDA by example. An introduction to general-purpose GPU programming. Addison-Wesley, Reading
Acknowledgments
This project was partially supported by Grants from Natural Science Foundation of China #61202321/70921061, and the open project of the Key Lab of Big Data Mining and Knowledge Management.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, Y., Zheng, H., Zhao, R. et al. Design and evaluation of multi-GPU enabled Multiple Symbol Detection algorithm. J Supercomput 72, 2111–2131 (2016). https://doi.org/10.1007/s11227-015-1475-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1475-z