Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms

https://doi.org/10.1016/j.image.2018.07.007Get rights and content

Highlights

  • Important considerations when selecting hardware accelerators are discussed.

  • Practical information about state-of-the-art DSPs, FPGAs, and GPUs are presented.

  • Relative advantages and disadvantages of DSPs, FPGAs, and GPUs are explained.

  • Several recent examples from the literature are reviewed and compared.

Abstract

Computer vision and image processing algorithms form essential components of many industrial, medical, commercial, and research-related applications. Modern imaging systems provide high resolution images at high frame rates, and are often required to perform complex computations to process image data. However, in many applications rapid processing is required, or it is important to minimise delays for analysis results. In these applications, central processing units (CPUs) are inadequate, as they cannot perform the calculations with sufficient speed. To reduce the computation time, algorithms can be implemented in hardware accelerators such as digital signal processors (DSPs), field-programmable gate arrays (FPGAs), and graphics processing units (GPUs). However, the selection of a suitable hardware accelerator for a specific application is challenging. Numerous families of DSPs, FPGAs, and GPUs are available, and the technical differences between various hardware accelerators make comparisons difficult. It is also important to know what speed can be achieved using a specific hardware accelerator for a particular algorithm, as the choice of hardware accelerator may depend on both the algorithm and the application. The technical details of hardware accelerators and their performance have been discussed in previous publications. However, there are limitations in many of these presentations, including: inadequate technical details to enable selection of a suitable hardware accelerator; comparisons of hardware accelerators at two different technological levels; and discussion of old technologies.

To address these issues, we introduce and discuss important considerations when selecting suitable hardware accelerators for computer vision and image processing tasks, and present a comprehensive review of hardware accelerators. We discuss the practical details of chip architectures, available tools and utilities, development time, and the relative advantages and disadvantages of using DSPs, FPGAs, and GPUs. We provide practical information about state-of-the-art DSPs, FPGAs, and GPUs as well as examples from the literature. Our goal is to enable developers to make a comprehensive comparison between various hardware accelerators, and to select a hardware accelerator that is most suitable for their specific application.

Introduction

Computer vision and image processing algorithms are used in a variety of applications in experimental mechanics [1], medical technologies [2], and human action recognition [3]. Many of the algorithms that have been used in these applications are computationally demanding, and in practical applications it is necessary to rapidly analyse the data. One of the main techniques for decreasing computation time is to use hardware with high computational power. Although the processing power of the central processing units (CPUs) in personal computers (PCs) is increasing, it remains insufficient for many applications. In addition, PCs cannot be used for computer vision tasks in mobile or portable devices. Hardware accelerators (e.g. digital signal processors (DSPs), field programmable gate arrays (FPGAs), and graphics processing units (GPUs)) are designed to address the increasing need for performing fast calculations in complicated algorithms. Furthermore, some hardware accelerators can be used in portable systems where it is not feasible to use PC-based systems.

Although DSPs, FPGAs, and GPUs have markedly different chip architectures, requiring different software development techniques, each can be used as a hardware accelerator to speed up computations. Microarchitecture and fabrication technologies are rapidly evolving, and commercial competition has motivated major hardware accelerator vendors to update and increase the capabilities of their products using the latest technological advances. However, different hardware accelerators are designed in ways that make them efficient for some algorithms but not others. Furthermore, the choice of a hardware accelerator is typically a trade-off between computational power, speed, development time, power consumption, and price. Identifying a suitable hardware accelerator for a specific algorithm or application can be thus very challenging.

Previously published reviews have investigated different aspects of using hardware accelerators in computer vision and image processing tasks. These review papers can be divided into four main groups, which are discussed here.

In the first group of review papers, a specific algorithm or application is chosen and various hardware accelerators for that task are compared. An example is stereo vision algorithms for real-time systems, as in [4]. These review papers may help with the choice of a suitable hardware accelerator for specific applications. However, the system requirements can vary considerably for other applications or algorithms. For example, in some applications real-time execution is important (see [4]), while for other applications it may be adequate to simply increase the processing speed. The choice of a suitable hardware accelerator depends significantly on the application and the algorithm.

In the second group of reviews, specific hardware accelerators are chosen to test the performance of algorithms and their implementation. For instance, algorithm implementations for a single FPGA and a single GPU for sliding-window applications are discussed in [5]. In these hardware-oriented reviews, the fact that new technologies have many advantages over their older versions, was not considered, which does not help developers to find suitable modern hardware accelerators for their own applications. Furthermore, a specific FPGA or a specific GPU does not necessarily represent the capability of that type of hardware accelerator in general. Therefore, these review papers may not help researchers to obtain an accurate comparison between hardware accelerators, unless they decide to choose a hardware accelerator specifically from those that have been reviewed.

In the third group of reviews, a broader application is chosen and different hardware accelerators are discussed for that purpose. Some examples are: parallel computing with multicore CPUs, FPGAs, and GPUs in experimental mechanics [6]; medical image processing on GPUs [[7], [8]]; and medical image registration on GPUs [9] or multicore CPUs and GPUs [10]. There are also some technical details about the chip architectures in these papers. Even though these papers can provide useful information, some of them (such as [[7], [8], [9], [10]]) only discuss GPUs and do not cover FPGAs or DSPs. In addition, the hardware details are usually limited to a specific hardware and are of limited use for comparing different hardware accelerators.

In the fourth group of reviews, the chip architecture and software tools of hardware accelerators are discussed in detail. An example is heterogeneous computing (i.e. the combination of CPUs with FPGAs or GPUs) for general applications [11]. Even though such reviews provide useful information, there is a need to update and simplify the technical details to provide practical advice for researchers on the choice of suitable hardware accelerators for computer vision and image processing applications.

This review combines the approach of the third and fourth groups of review papers described above. Our goal was to provide sufficient information and practical examples to enable researchers to choose the most suitable hardware accelerator for computer vision and image processing applications. To this end, DSPs, FPGAs, and GPUs are discussed in separate sections, followed by examples that demonstrate the performance of the various devices in different computer vision and image processing applications.

One of the main challenges in reviewing different hardware accelerators is to provide a fair comparison. Since the model names of DSPs, FPGAs, and GPUs are not indicative of their performance, a ‘speed normalisation’ factor was proposed [4] in an effort to improve the accuracy of comparison in the same chip architecture family. However, hardware accelerators are too complicated to limit the performance comparison only to the processing speed, which cannot indicate the advantage of one hardware accelerator over another, especially when they do not belong to the same family. Moreover, the processing speed of an algorithm is not only dependent on the hardware accelerator, but also on the programmer’s skill. In order to provide a practical comparison between hardware accelerators in this review, the most important features of DSPs, FPGAs, and GPUs for computer vision and image processing algorithms are introduced and discussed. Then, based on the technical specifications, hardware accelerators are divided into groups with similar levels of performance.

Another limitation of some review papers (such as [6]) is the discussion of outdated hardware technologies, which offer little help in assessing the performance and capabilities of modern hardware accelerators. This review addresses this issue by reporting on the latest improvements, and covers recent papers (published since 2009) with a focus on the latest hardware technologies.

This review is organised as follows. DSPs, FPGAs, and GPUs are discussed in Sections 2, 3, and 4, respectively. In each section, and for each hardware accelerator, different families, available development tools and utilities, development time, and the advantages and disadvantages of using the type of hardware accelerator are discussed. Each section concludes with a separate literature review and summary, and each literature review section presents separate tables with a summary of the application, algorithms being implemented, hardware type used, and performance (or data throughput) of the algorithm. In addition, the papers being reviewed are sorted chronologically and the year of introduction of FPGAs and GPUs (as an indicator of their hardware technology level) is reported. Since FPGAs and GPUs have both been widely used in computer vision and image processing tasks, Section 5 is devoted to the comparison of GPUs and FPGAs. Finally, Section 6 summarises this review.

Section snippets

Digital signal processors (DSPs)

DSPs are microprocessors with an architecture that is specifically designed for performing signal processing tasks. Texas Instruments (TI) and Analog Devices (AD) are the two major companies in the DSP production market. TI-DSPs are more common in the computer vision and image processing research community than AD-DSPs, so this review focuses on TI-DSPs.

TI has designed various DSPs with different processing power ranges and capabilities for different purposes. TI-DSPs can be divided into 4

Field-programmable gate arrays (FPGAs)

The FPGA chip incorporates arrays of reprogrammable logic gates. As opposed to CPUs, DSPs, and GPUs, FPGA fabrics do not have a pre-structured chip architecture or a central processing unit. Thus, prior to programming the reconfigurable FPGAs, the programmer should design a hardware architecture for their specific application using the logic gates inside the FPGA.

The FPGA hardware architecture is configured by interconnecting FPGA logic gates to perform a specific task, and requires

Graphics processing units (GPUs)

The first graphics accelerators were built for professional graphics workstations, such as the Infinite Reality for the Onyx series [105]. GPUs consist of many processing cores, and are accelerators that are optimised for performing fast matrix calculations in parallel (images are in the form of 2D matrices). These devices are typically very affordable, since their development is motivated by the gaming industry. GPUs are thus cost-effective hardware accelerators for massively parallel

Portability of software over different hardware

It is sometimes required to transfer codes from one hardware accelerator to another of the same type, such as when upgrading to a new generation hardware, or when testing the code in another device. The transfer process may be challenging if the available code is crafted to take advantage of the specific architecture of the original hardware. In this section, we discuss how it could be possible to transfer the code and potential challenges for DSPs, FPGAs, and GPUs.

Heterogeneous hardware accelerators

Heterogeneous hardware accelerators are designed to use the advantages of a hardware accelerator while offsetting its disadvantages by fusing its functionality with another hardware accelerator. For instance, as discussed in Section 4.5, one of the disadvantages of GPUs is the data transfer time between the host PC and the GPU. This time is decreased in heterogeneous CPU–GPU computing architectures, such as accelerated processing units (APUs) designed by AMD (formerly known as Fusion). An AMD

Comparison of FPGAs and GPU for implementing image processing, and computer vision algorithms

NVidia GPUs have been used more than FPGAs for high performance applications in recent years. For instance, the second fastest supercomputer in the world, named Titan, includes 18,688 NVidia Tesla GPUs, and has a processing power of more than 2 × 1016 calculations per second [175].

Among computer vision and image processing algorithms, stereo vision algorithms are the most common application implemented in hardware accelerators. Tippetts et al. [4] reviewed the implementation of various stereo

Hardware accelerators designed for machine learning

In recent years, the application of machine learning techniques has been growing very rapidly. In particular, deep neural networks (i.e. deep learning) and convolutional neural networks have been used extensively in various applications. Image processing and computer vision applications have also taken advantage of machine learning [181] and deep learning [[182], [184]] techniques. GPUs are naturally suited to the implementation of neural networks because of the similarity between the

Summary and conclusions

In this review, we have provided practical information for selecting suitable hardware accelerators for computer vision and image processing algorithms. We discussed the hardware architectures of the most recent DSPs, FPGAs, and GPUs, and the important features of these hardware accelerators for computer vision and image processing algorithms. For each hardware accelerator, available tools and utilities, development time, advantages, and disadvantages were discussed in an attempt to help

References (191)

  • GeorgoulasC. et al.

    FPGA based disparity map computation with vergence control

    Microprocess. Microsyst.

    (2010)
  • SuttonM.A.

    Computer vision-based, noncontacting deformation measurements in mechanics: A generational transformation

    Appl. Mech. Rev.

    (2013)
  • TippettsB. et al.

    Review of stereo vision algorithms and their suitability for resource-limited systems

    J. Real-Time Image Process.

    (2013)
  • J. Fowers, G. Brown, P. Cooke, G. Stitt, A performance and energy comparison of FPGAs, GPUs, and multicores for...
  • ShiL. et al.

    A survey of GPU-based medical image computing techniques.

    Quant. Imaging Med. Surg.

    (2012)
  • FluckO. et al.

    A survey of medical image registration on graphics hardware

    Comput. Methods Programs Biomed.

    (2011)
  • ShamsR. et al.

    A survey of medical image registration on multicore and the GPU

    IEEE Signal Process. Mag.

    (2010)
  • BrodtkorbA.R. et al.

    State-of-the-art in heterogeneous computing

    Sci. Program.

    (2010)
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • DSP boards:...
  • S.B. Goldberg, L. Matthies, Stereo and IMU assisted visual odometry on an OMAP3530 for small robots, in: Computer...
  • Z. Jun, G.-O. Liu, Design and implementation of networked real-time control system with image processing capability,...
  • Y. Chen, B. Wu, S. Member, H. Huang, C. Fan, A real-time vision system for nighttime vehicle detection and traffic...
  • Y.F. Cao, M. Ding, L.K. Zhuang, Y.K. Cao, Vision-based Guidance, Navigation and Control for Unmanned Aerial Vehicle...
  • V. Gonzalez-Huitron, E. Ramos-Diaz, V. Kravchenko, V. Ponomaryov, 2D to 3D conversion based on disparity map...
  • HuangS.-J. et al.

    Stereo vision system for moving object detecting and locating based on CMOS image sensor and DSP chip

    Pattern Anal. Appl.

    (2011)
  • IgualF.D. et al.

    Robust motion estimation on a low-power multi-core DSP

    EURASIP J. Adv. Signal Process.

    (2013)
  • BergR. et al.

    Highly efficient image registration for embedded systems using a distributed multicore DSP architecture

    J. Real-Time Image Process.

    (2014)
  • KaramJ. et al.

    Trends in multicore DSP platforms

    IEEE Signal Process. Mag.

    (2009)
  • ...
  • ...
  • ...
  • AgronJ.

    Domain-specific language for HW/SW Co-design for FPGAs

  • TA. et al.

    High-level synthesis revised: Generation of FPGA accelerators from a domain-specific language using the polyhedron model

    Parallel Comput. Accel. Comput. Sci. Eng.

    (2014)
  • N. George, H. Lee, D. Novo, T. Rompf, K.J. Brown, A.K. Sujeeth, M. Odersky, K. Olukotun, P. Ienne, Hardware system...
  • Cited by (89)

    • Future data center energy-conservation and emission-reduction technologies in the context of smart and low-carbon city construction

      2023, Sustainable Cities and Society
      Citation Excerpt :

      The digital industry has emphasized the need for computing power in DCs (Stanley, 2015), which is derived from chips (Hamza, Deogun, & Alexander, 2016), as shown in Fig. 6(c), and can be used to evaluate the DC performance using various computing power indicators (Helali & Omri, 2021). Among these, general computing controls the data flow (Jiang, Qiu, & Gao, 2019), high-performance computing can quickly solve complex problems (Buyya et al., 2010; Delimitrou & Kozyrakis, 2012; Dong, 2011; Fainman & Porter, 2013; Garimella et al., 2013; Hammadi & Mhamdi, 2014; Hamza et al., 2016; Harris, 2005; Helali & Omri, 2021; Hrouga et al., 2022; Hu & Deng, 2019; Jiang et al., 2019; Nath et al., 2006; Stanley, 2015; Stokel-Walker, 2022; Tang et al., 2017; Wei et al., 2019; Xu et al., 2018; Zeng and Veeravalli, 2014; T. Zhang et al., 2022), storage performance is highly related to security (HajiRassouliha, Taberner, & Nash, 2018), and network capability is measured by bandwidth and network latency (Elgendy, Zhang, & Tian, 2019). The computing power environment is supported by the Internet and 5 G mobile base stations, enabling services such as edge computing and data transmission (Brewer, Katz, & Chawathe, 1998).

    View all citing articles on Scopus
    View full text