ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Implementation of distributed arithmetic-based symmetrical 2-D block finite impulse response filter architectures

[version 1; peer review: 2 approved]
PUBLISHED 21 Sep 2023
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Computational Modelling and Numerical Aspects in Engineering collection.

Abstract

Background: This paper presents an efficient two-dimensional (2-D) finite impulse response (FIR) filter using block processing for two different symmetries. Architectures for a general filter (without symmetry) and two symmetrical filters (diagonal and quadrantal symmetry) are implemented. The proposed architectures need fewer multipliers because of the symmetry of the filter coefficients.

Methods: A distributed arithmetic (DA)- based multiplication method is used in the proposed architecture. A dual-port memory-based lookup table (DP-MLUT) is used in the multiplication instead of lookup-table (LUT) to reduce the area and power of the FIR filter. The filter's throughput is increased by using block processing. Memory reuse and memory sharing methods are introduced, which reduces the need for many registers and hence the circuit complexity. The architectures are written in Verilog Hardware Description Language and synthesized using Genus Synthesis tool-19.1 in 45nm technology with a generic library of Cadence vendor constraints. The synthesis tool generates the area, delay, and power reports. Power consumption of architectures is calculated with an image size of 64 X 64 and at 20 MHz frequency.

Results: Compared to existing architectures, the synthesis results show improvements in power, area, area delay product (ADP), and power delay product (PDP). The proposed MLUT-based 2-D block Quadrantal Symmetry Filter (QSF) for length 8 with block size 4 consumes 58.94% less power, occupies 59.5% less area, 48.44% less ADP and 47.78% less PDP compared to best existing methods.

Conclusions: A novel DA-based 2-D block FIR filter architecture with various symmetries is realized. Symmetry is incorporated into the filter coefficients to minimize the number of multipliers. The LUT size is optimized by odd multiples or even multiples storage techniques. Also, the overall area of the architecture is decreased by DP-LUT-based multipliers. The proposed filter architecture is area-power-efficient. It is best suited for applications that have fixed coefficients.

Keywords

Systolic architecture, block processing, distributed arithmetic, 2-D finite impulse response, symmetries in FIR filter

Introduction

Many image and video processing applications, including image enhancement, template matching, image restoration, and video communication, use 2-D digital filters.1,2 Finite impulse response (FIR) filters are preferred over infinite impulse response (IIR) filter when the numerical stability, ease of design and linear phase are the primary concerns.1 Because 2-D FIR filters need numerous computations, the efficient structure design is challenging for researchers. In,1 Parhi proposed a systolic structure for a 2-D FIR filter and suggested many techniques to optimize the implementation of 1-D and 2-D FIR and IIR filter architectures with more computational blocks. The block-based 2-D FIR filter banks consisting of separable and non-separable architectures with a significant reduction in memory are described.3,4 In,3 conventional multipliers, which consume power, are used for the convolution of input samples and filter coefficients, and there is no consideration of the internal architectures of symmetry filters.

The power-efficient and memory-efficient 2-D FIR filter architectures (FIRAs) are constructed with high-speed multipliers and parallel prefix modified carry look ahead adder (MCLAA).5 The low area-memory-based non-symmetry type 2-D FIRA is proposed with a new multiplication technique.6 In the above works, no symmetry concept is considered. The arithmetic computations are decreased by coefficient symmetry in the systolic filter architecture.7,8 The low-power multimode architectures for 2-D IIR filters are designed and implemented with four symmetries. The critical path analysis is addressed for symmetry filters, but the architectures are implemented only for single input processing. Another single input processing-based quadrantal symmetry is implemented using the 2-D L1- technique to minimize the filter coefficients and hardware blocks.9 Recently, Chowdari et al.2426 have proposed efficient implementation of DA based adaptive filter.

Mohanty et al.10 proposed a 1-D block filter for narrowband applications using a Distributed Arithmetic (DA)-based reconfigurable filter for the software define radio SDR channelizer. Introduced the memory sharing concept to implement a 1-D finite impulse response (FIR) filter with a low area-power-delay. Several authors have implemented only DA-based 1-D filters. In recent years, DA techniques have attained great importance in FIR filter implementation to reduce the complexity of the architecture with high throughput and regularity. Kumar et al.11,12 recently proposed block-based 2-D FIR and IIR filter architectures using DA with a memory-sharing approach but did not discuss the symmetry of coefficients. DA-based FIRAs are described in,13,14 and the review of DA methods for cost-effective and efficient FIRAs is summarized. Park et al.15 have suggested reconfigurable FIR architecture using DA.

In all the DA-based filter implementation schemes, the authors focused only on the decreasing adders' quantity and multiplier complexity. Memory complexity is one of the key factors while designing the filter, affecting power consumption and area. Many researchers have addressed the 1-D and 2-D filters using symmetry or block processing in filters.16,17 Few researchers have realized the filter structures with Lookup Table (LUT)-based or DA multipliers without block processing or symmetry.

A new approach to memory-based DA multiplication is proposed by Meher et al.18,19 This memory-based LUT (MLUT) multiplication approach is used to realize the 1-D FIR filters. The comparison analysis is presented with conventional multiplier-based filter architectures. Vinitha et al.6,20 also developed the LUT-based multiplication and incorporated it into the filer architectures with fewer hardware blocks. Chiper et al.21 suggested the dual-port concept in the MLUT-based DA multiplication rather than Single-Port LUT (SPLUT) multipliers. The modified memory-based multipliers are realized to implement an efficient filter architecture by Sharma et al.22 Alawad et al.23 presented a stochastic-based 2-D FIRA with low hardware complexity and high throughput. The probabilistic convolution theorem is used for the proposed non-separable systolic 2-D FIRA. The proposed work solves this problem within a predetermined accuracy range. The probability density function represents the 2-D input signal kernels by exploiting the convolution theorem. This well-known probabilistic convolution theorem replaces the expensive multipliers with simple adders. The memory storage complexity is also reduced by memory sharing and memory reuse. This work is more suitable for applications like perception-based image processing, which can inherently tolerate some computing inaccuracy.

The addressed points motivate developing and implementing the block-based 2-D FIRAs using various symmetries and multiplier-less DA-based approaches. In this research, two types of symmetries, diagonal symmetry and quadrantal symmetry, are considered to reduce the multipliers. The hardware in adders is increased by block processing in symmetry filters, although multipliers are more complex than adders. A novel MLUT multiplication approach is introduced in the 2-D block FIRAs. Two types of symmetries for 2-D FIRA and one non-symmetry filter are explored to decrease the number of multipliers. Conventional multipliers are replaced with MLUT multipliers to decrease each symmetry filter's power consumption, delay, and area.

The paper is organized as follows: The novel approach to designing the two types of symmetries and an optimized memory-based multiplication approach for 2-D FIRAs are discussed in background section. The next section describes the proposed 2-D FIRAs, and the individual symmetry filter architectures are explored according to the block processing using enhanced Dual-Port Memory-based LUT (DP-MLUT)-based multipliers.

Background: Block-based design and symmetry of 2-D FIR filters

This section explains the various coefficient symmetry concepts and the MLUT multiplication approach to replacing normal multipliers.

Block processing and memory reuse

In the digital filters, the block processing concept increases the throughput of the architecture. If the input block size is ‘N’, the filter produces ‘N’ outputs per one iteration, which means N-times throughput increases. The input matrix Xn1n2 is needed at different systolic stages to generate a 2-D filter output Yn1n2is of the length of the filter (L).3

(Eq. 1)
Xn1n2=xn1n2xn1n21xn11n2xn11n21........xn1n2L+1xn11n2L+1......xn1L+1n2xn1L+1n21..xn1L+1n2L+1

Let us consider xn1n2m is the m+1th input of the 2-D FIR filter, wpq represents filter coefficients. The m+1thoutput of the filter is expressed as3:

(Eq. 2)
Ymn1n2=xn1pn2mq.wpqp=0L1q=0L1

wpq is expressed as equation (3).3

(Eq. 3)
wpq=w00w01w10w11........w0L1w1L1......wL10wL11..wL1L1

Thus, the 2-D FIR filter block output at each systolic stage is expressed as11:

(Eq. 4)
Y=Gn1n2.wpTp=0L1

The filter coefficient vector required at each stage is expressed as11:

(Eq. 5)
wp=wp0wp1wp2wpL11XL

Each iteration of the 2-D block FIRA needs the parallel calculation of a block of input samples and produces a block of output. At each systolic stage, a set of L1 delayed inputs is required to generate a block of input. The input pixels at p+1th the stage is represented by Gn1n2, which is given in matrix form as11:

(Eq. 6)
Gn1n2=xn1pn2xn1pn21xn1pn21xn1pn22........xn1pn2L+1xn1pn2L......xn1pn2N+1xn1pn2N..xn1pn2NL+2

To facilitate parallelism, we further decompose the input pixel matrix Gn1n2 and coefficient vector by a factor of s. The input pixel matrix Gn1n2 is decomposed into LSsub matrices represented as Xpq of dimension GNXS, and also the coefficient vector wpqof dimension 1XS; 0qLS1. Equation (4) is modified as11

(Eq. 7)
YNX1=XpqwpqTq=0Ls1p=0L1

Where11

(Eq. 8)
Xpq=xuvxuv1xuv1xuv2..xuv2xuv3..xuv3xuv4......xuvN+1)xuvNxuvN1xuvN2

wpq=wpsqwpsq+1wpsq+q1, u=n1p and v=(n2sq). Equation (7) is re-writtenas

(Eq. 9)
YNX1=ypqq=0LS1

Where

(Eq. 10)
ypq=XpqwpqTp=0L1

Symmetry concepts of 2-D FIR filter structure

The symmetry concept is considered for the reduction of complex multipliers. In this paper, two types of symmetry, Diagonal Symmetry Filter (DSF) and Quadrantal Symmetry Filter (QSF)7,8 for 2-D FIRAs, are studied and explored. The following transfer functions are used to design the two types of symmetries in the 2-D FIRA.7,8

  • A) DSF 2-D FIR Filter: The transfer functions of DSF in magnitude response as Hz1z2=Hz2z1, where z1=ejθ1and z2=ejθ2, θ1,θ2. The filter coefficients are related ashij=hji for all i,j. Equation (11) expresses the transfer function of diagonal symmetry.17

(Eq. 11)
YX=i=0Lhiiz1iz2i+i=0L1j=i+1Lhijz1iz2j+z1jz2i
  • B) QSF 2-D FIR Filter: The QSF’s magnitude response isHz1z2=Hz11z2, where z1=ejθ1and z2=ejθ2, θ1,θ2. The filter coefficient symmetry is given by hij=hLij for all i,j. Equation (12) expresses the transfer function of the filter.17

(Eq. 12)
Y/X=j=0Lhujz1uz2j+i=0L1j=0Lhijz1iz2j+z1Liz2j

The general filter coefficients and two types of symmetry coefficient matrices are shown in Figure 1.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure1.gif

Figure 1. Filter coefficient matrices of (a) General filter (b) Diagonal Symmetry Filter (c) Quadrantal Symmetry Filter.

The proposed work implements two efficient symmetrical 2-D FIRAs and one generic filter architecture. Because of the symmetry of the filter coefficient, fewer multipliers are needed to design the filter.

The LUT-DA multiplication process

A LUT is treated as memory in memory-based multiplication, and the precomputed outputs of filter coefficients are saved in the LUT. DA multiplication is the process of shifting and accumulating LUT output values. The input sample and coefficient are multiplied in the process of memory-based multiplication. The LUT memory can save 2w possible values for the binary input of word length of w bits and a coefficient of bit length of c bits. In the process of standard LUT-based multiplication, it requires 2w words to save the precomputed partial products in LUT.

Even multiples can be obtained from memory using left shift operations on odd multiples. This work uses (2w/2) words to save the odd multiples of coefficient C. This approach is shown for w = 4-bits of input sample in Table 1. In this table, the 8-address locations are stored by odd multiples of coefficient C, such as C, 3C, 5C, 7C, 9C, 11C, 13C, and 15C. Even multiples are evaluated using left shift operations of C, such as 2C, 4C, and 8C by 1-, 2-, and 3-times left shift operation to C, respectively. Next, 6C and 12C products are produced by a left shift of 3C; the remaining 10C is derived from 5C, and 14C is derived from 7C, respectively. The product output for the input sample consists of all zeros x=0000 produced by resetting the LUT.

Table 1. Memory-based Lookup Table – Distributed Arithmetic (MLUT-DA) multiplication approach.17

Address A2 A1 A0Name of the WordValue to be stored in LUTInput bits (w) x3 x2 x1 x0ResultNumber of shifts requiredControl lines S1 S0
0 0 0W0C0 0 0 1C00 0
0 0 1 021× C10 1
0 1 0 022× C21 0
1 0 0 023× C31 1
0 0 1W13C0 0 1 13C00 0
0 1 1 021× 3C10 1
1 1 0 022 × 3C21 0
0 1 0W25C0 1 0 15C00 0
1 0 1 021× 5C10 1
0 1 1W37C0 1 1 17C00 0
1 1 1 021× 7C10 1
1 0 0W49C1 0 0 19C00 0
1 0 1W511C1 0 1 111C00 0
1 1 0W613C1 1 0 113C00 0
1 1 1W715C1 1 1 115C00 0

The single-port MLUT-DA multiplier is realized with reference to Table 1, as shown in Figure 2A. The structure has one 4-to-3 encoder block, one 3-to-8 decoder block, one control logic to produce Reset (RST), and control lines {S0, S1} to accommodate the shifts required for the computation of even multiples of coefficients such as 2C, 4C, 8C, 10C, 12C, and 14C. A maximum of three shifts are required, so two bits of control signals are contemplated in the structure.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure2.gif

Figure 2. (A) Structure of conventional Memory-based Lookup Table (MLUT) multiplier for odd multiples (B) Modified MLUT multiplier for even multiples.

Where MUX is multiplexer and RST is reset.

Using a control logic block, the RST is formed from the applied input sample. It results in eight odd multiples of coefficients with c+4 bits. An extra 4 bits are essential to computing the highest odd multiple value 15C is precomputed and stored in the LUT. The decoder output corresponding location is read and fed to the NOR cell, which is made up of c+4 NOR gates with one common input of RST. The NOR cell outputs are shifted by a barrel shifter based upon the control signals {S0, S1} coming from the control logic. The barrel shifter has 2×(c + 4) AOI (AND_OR_INVERT) gates or 2×1 multiplexers (MUXs). Finally, the barrel shifter output is the multiplication result of the input sample and coefficient.

The combinational logic expression of the 4-to-3 encoder, employed in the LUT multiplier, is indicated in equations (13), (14), and (15).

(Eq. 13)
A0=x0x1¯.x1x2¯¯.(x0+x2x3)¯¯
(Eq. 14)
A1=x0x2¯.¯(x0+x1x3)¯¯
(Eq. 15)
A2=x0.x3
where A2 A1 A0 are address bits derived from the actual input bits x3 x2 x1 x0. The control logic signals (RST and S0, S1) are given by equations (16), (17), and (18).
(Eq. 16)
S0=(x0+x1+x2¯¯)¯
(Eq. 17)
S1=(x0+x1)¯
(Eq. 18)
RST=(x0+x1¯).x2+x3¯

In very large scale integration (VLSI) design, the conventional multipliers consume more power and occupy more area, whereas the LUT-based multipliers save area and power consumption. Hence, a further reduction in the hardware is achieved by LUT-DA multipliers.

In the LUT, only the even multiples of the coefficients are saved. Hence, only 2w/2 words are required instead of all 2w words. Even multiples can be translated into odd multiples by adding one filter coefficient magnitude. The barrel shifter and encoder blocks are not required for this modified multiplier, and one 2 × 1 MUX is required to choose the odd or even-multiple coefficients. Table 2 depicts the even multiples storing technique for w = 4. The even values of constant-coefficient 02C4C12C14C are precomputed corresponding to x3x2x1 using the 3-to-8 decoder and saved in the 8-LUT locations. The other input for the 2 × 1 MUX is the LUT even output, and the other input is the odd output from the adder. The selection lines of the MUX are the least significant bit LSB-bits of the input sample x0. Whether the coefficients are even or odd multiples depends on the input sample’s LSB bit. Figure 2B represents the modified LUT-based multiplier.

Table 2. The Memory-based Lookup Table (MLUT) multiplier using even multiples.

Name of the WordAddress x3 x2 x1Value to be stored in LUTinput bits (w) x3 x2 x1 x0Result
W00 0 000 0 0 00
0 0 0 11C
W10 0 12C0 0 1 02C
0 0 1 13C
W20 1 04C0 1 0 04C
0 1 0 15C
W30 1 16C0 1 1 06C
0 1 1 17C
W41 0 08C1 0 0 08C
1 0 0 19C
W51 0 110C1 0 1 010C
1 0 1 111C
W61 1 012C1 1 0 012C
1 1 0 113C
W71 1 114C1 1 1 014C
1 1 1 115C

Likewise, odd multiples of the coefficient can be saved in an improved LUT-DA multiplier by using a subtractor to generate the required even multiples. In the proposed work, the SPLUT multiplier is converted into a DPLUT multiplier using the DA approach. When the input sample bits are more, the dual-port memory helps decrease the LUT size. The common filter coefficient is multiplied simultaneously with two separate input samples using a DPMLUT-based multiplier. The following section explains how the proposed filters use an improved MLUT-DA multiplier with even multiples storage.

Methods

Proposed architectures of block-based 2-D FIR filters

The block-based 2-D FIRA is shown in Figure 3 for L = 4, with N = 2 without any symmetry, and is considered a general filter. The input samples {xk0, xk1} are from the same row of the image input matrix given to the shift register unit (SRU) array, and input samples are given in serial order, block by block and row by row. The SRU array contains L1 SRUs, each with L shift registers with M words. Here, SRU1 is termed as {SR1, SR2}. Likewise, SRU2 and SRU3 are placed in an array form considered the SRU array for order L = 4. Each L - Delay Unit Block (DUB) produces NL samples by applying each set of past N and present samples to the total L sets.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure3.gif

Figure 3. Conventional 2-D finite impulse response filter architecture (FIRA) for L = 4 with N = 2.

Where PU is Processing Unit, Xkm is input and Ykm is output.

The input block of L input samples from the image matrix M×M are applied as present inputs. The (L1) SRU array receives these parallel inputs. Figure 4A represents the structure of SRU using the L number of registers. The present input sample and the past input sample blocks are applied to the N-DUBs of the DUB array. Each DUB consists of (L1) flipflops. It produces the present and past samples required for block processing. As shown in Figure 4B, each DUB generates LN samples. The L-DUBs give the L×LN of input samples to the filter’s arithmetic module.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure4.gif

Figure 4. (A) Shift register unit (SRU) Array 2 (B) Delay Unit Block for L = 4 with N= 2.

Structures of block-based symmetric 2-D FIR filter arithmetic modules

This section explores two symmetry-type 2-D FIR filters and one general filter of L = 4 with N = 2.

General filter structure with MLUT multipliers

The arithmetic module of the general filter architecture is realized by the L number of Processing Units (PU) and an Adder Tree (AT) block, which receives LN samples from DUB.

Each PU block is constructed by N number of Product Cells (PC), which are used to multiply the input sample by the corresponding filter coefficients. Generally, the product is done by conventional multipliers. MLUT multipliers are used in place of these power-hungry conventional multipliers. At last, the AT adds the outputs of the PU block and generates the Nfilter outputs corresponding to the N block of inputs.

This general filter architecture is modified by DPMLUT multipliers, as presented in Figure 5. In this architecture, the inputs multiplied with the common filter coefficients are given to a DPLUT-multiplier. Hence, a total of L×L DPLUT multipliers are needed to process the complete multiplication of input samples of L= 4 and filter coefficients. 2L×L multipliers are needed if SPLUT-based multipliers are used. The DPLUT-based multipliers save 50% of the area compared to SPLUT multipliers. Each DPLUT-based multiplier produces the L number of filter outputs. Total L memory multipliers generate L×LN number of outputs, and these are parallelly added by N -AT blocks and give N outputs with a size of (c+4)-bits.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure5.gif

Figure 5. General 2-D filter architectures (FIRA) with dual-port look-up table (DPLUT) multipliers.

SRU, shaft register unit; DUB, Delay Unit Block; LUT, look up table.

In this work, the multiplier quantity is decreased by symmetry in the filter coefficients. Two different symmetries are described in this section, and these symmetry filters can be used to design circular symmetry, fan-type and diamond filters.

Structure of 2-D FIR Diagonal Symmetry Filter (DSF)

In the DSF coefficient matrix, the sixteen coefficients are reduced to ten, such as {h00,h01,h02,h03,h11,h12,h13,h22,h23,h33} for L = 4 as shown in Figure 1B. Figure 6 represents the arithmetic module of the DSF-based 2-D FIRA, and it is designed by diagonal symmetry. Before the multiplication process, the input samples to be multiplied with common filter coefficients are added.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure6.gif

Figure 6. Structure of a diagonal symmetry 2-D Finite Impulse Response (FIR) filter with dual-port look-up table (DPLUT) multipliers.

SRU, shaft register unit; DUB, Delay Unit Block; LUT, look up table.

For the one input of N = 2, seven adders are required to accumulate symmetry input samples. The seven highlighted colored adders indicate the adders for the other input sample. The adder is a simple block than the multiplier. The diagonal symmetry filter requires 2L+2Nmultipliers instead of L×LN,but extra L1Nadders are required. Next, these2L+2Nmultipliers are only designed for 2L+2N/2 DPLUT-based multipliers. Hence, half of the area is optimized. Finally, all the multiplier output samples are accumulated by N- AT blocks to produce N outputs.

Because multipliers are responsible for most of the power consumption, DPLUT-based multipliers are used to optimize them. Hence, ten DPMLUT multipliers are needed to produce the N = 2 outputs from the diagonal symmetry filter. The DSF architecture for L = 4 with N = 2 needed 20 individual SPLUT multipliers.

DPLUT decreases the LUT size for input samples with greater bit lengths by adding an additional shifter. Because of parallel block processing, two inputs are multiplied with the common filter coefficient in a 2-D FIR filter. This concept can be used to replace two SPLUT multipliers with a single DPLUT multiplier. The internal structure of the conventional DPLUT-based multiplier and the modified DPLUT-based multiplier are shown in Figure 7A and B.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure7.gif

Figure 7. (A) Conventional dual-port look-up table (DPLUT) multiplier (B) Modified DPLUT multiplier.

RST, reset.

The common filter odd coefficient multiples are precomputed and placed in the LUT memory. According to the input bits, the address of the location in the LUT is determined by the address encoder and address decoder. DPLUT fetches the corresponding locations based on the given addresses of two ports and provides two parallel outputs. Furthermore, each output is shifted by barrel shifters after passing through the corresponding NOR gate. The control lines for shifting are generated from the input sample bits handled by some control circuit logic, as explained earlier.

This conventional DPLUT-DA multiplier has been revised, shown in Figure 7B for w = 4 bits of input sample using even multiples storage in LUT. It can be observed that the modified even multiples storage LUT-DA multipliers need less memory and area. Control logic for RST, barrel shifter, NOR cell, 4-to-3 encoder, and control signals of barrel shifter sos1 are not needed to enhance the DPLUT multiplier, and this feature reduces area further.

The conversion of SPLUT into DPLUT is a critical task. The common filter coefficients stored in the LUT must be shared by two inputs simultaneously. For this, the control logic is introduced related to the clock signal to choose the address locations with a slight delay. Figure 8 represents the control logic using multiplexers for a DPLUT-DA multiplier.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure8.gif

Figure 8. Dual-port look-up table (DPLUT) control logic.

MUX. Multiplexer; LUT, look up table.

Structure of a 2-D FIR Quadrantal Symmetry Filter (QSF)

The QSF consists of eight unique filter coefficients are given as {.h00,h01,h02,h03,h10,h11,h12,h13.}. Figure 9 represents the architecture of QSF for L = 4 with N = 2. A total of 16 SPLUT multipliers are needed for this structure, and it is modified with eight DPLUT multipliers to produce N -block outputs.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure9.gif

Figure 9. Structure of 2-D Finite Impulse Response (FIR) Quadrantal Symmetry Filter (QSF) with dual-port look-up table (DPLUT) multipliers.

DUB, Delay Unit Block; SRU, shaft register unit; LUT, look up table.

The summary of the number of single-port and dual-port multipliers needed for each symmetry is presented in Table 3.

Table 3. The multipliers count for constructing various symmetry filters for L = 4 with N =2.

Name of the filterSingle-port look-up table (SPLUT) multipliersDual-port look-up table (DPLUT) multipliers
General filter3216
Diagonal Symmetry Filter DSF2010
Quadrantal Symmetry Filter QSF168

Experiment/validation

This section analyzes the implementation and results for the proposed 2-D FIRAs. Multipliers, registers, and adders construct the architecture of the proposed filters. The hardware block's complexity depends on the filter input sample bits, length L, input block size N, and filter coefficients. Hence, DSF and QSF symmetry-based 2-D FIR filters are designed and explored to reduce the quantity of the multipliers. Next, the multiplier architectures are optimized by dual-port even multiples storage LUT- based multipliers. The architectures are synthesized using the Genus Synthesis tool-19.1 in 45nm technology with a generic library of Cadence vendor constraints. There is a free synthesis tools available like Xilinx Integrated Synthesis Environment, which can be used instead of Genus Synthesis tool in Cadence to replicate our methods. Power consumption of architectures is calculated with an image size of 64 X 64 and at 20 MHz frequency. The synthesized results (reports in Underlying data27) have been analyzed and compared with the existing architecture’s results. All Verilog code associated with the work is available in Software availability.28

Results and discussion

The data associated with the results is available in Underlying data.27 Table 4 presents synthesis results of two individual types of symmetry 2-D FIR filters and general filters for L = 4 with N = 2.

Table 4. Power, delay, and area parameters of different multipliers for various symmetry filters for L = 4, N = 2 and w = 4.

SPLUT, Single-port look-up table; DPLUT, Dual-port look-up table; DSF, Diagonal Symmetry Filter; QSF, Quadrantal Symmetry Filter.

Name of the FilterNormal MultipliersSPLUT MultipliersDPLUT Multipliers
Power (mW)Delay (ns)Area (μm2)Power (mW)Delay (ns)Area (μm2)Power (mW)Delay (ns)Area (μm2)
General Filter1.281514.601297110.997211.875261220.77510.2223722
DSF1.062414.209226520.721211.432191160.698110.23817999
QSF1.010314.112217640.699611.228176940.559110.62314389

The power consumption, delay, and area results are represented in graphs, as shown in Figures 10, 11, and 12, respectively. The proposed DPLUT-based 2-D FIR DSF-filter architecture needs 20.54% and 5.84% less area than normal and SPLUT multiplier-based filter architectures. 34.2% and 3.2% of power savings are obtained by the proposed filter architecture compared to the normal and single-port multiplier-based filter architectures, respectively. The proposed DSF architecture is 27.9%, and 10.4% has less delay than normal multiplier and SPLUT-based architectures. Similarly, the proposed QSF 2-D FIRA power is decreased by 44%, 20%, than normal and SPLUT multipliers, the area is decreased by 33%, 18.6%, and delay is decreased by 24.7%, 5.3% than normal and SPLUT multipliers, respectively.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure10.gif

Figure 10. The power consumption comparison of different proposed 2-D Finite Impulse Response filters with different multiplier techniques.

Where DSF is diagonal symmetry filter, QSF is quadrantal symmetry filter and LUT is look up table.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure11.gif

Figure 11. The delay comparison of different proposed 2-D Finite Impulse Response filters with different multiplier techniques.

Where DSF is diagonal symmetry filter, QSF is quadrantal symmetry filter and LUT is look up table.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure12.gif

Figure 12. The area comparison of different proposed 2-D Finite Impulse Response filters with different multiplier techniques.

Where DSF is diagonal symmetry filter, QSF is quadrantal symmetry filter and LUT is look up table.

The filter architectures of 2-D FIR with two symmetries and one general filter are implemented by block processing and dual-port memory-based multipliers. Here, the memory reuse concept is used to get the filter outputs, and memory saving is obtained. The VLSI performance metrics, such as area, delay, and power values of the proposed filters, are compared for input bits w = 4 and 8 in Table 5.

Table 5. Comparison of proposed architectures for L = 4, N = 2, w = 4 and 8.

Name of the FilterPower (mW)Delay (ns)Area (μm2)
Input bitsw = 4w = 8w = 4w = 8w = 4w = 8
General Filter with dual port lookup table multipliers0.7750.85410.2210.222372228963
Diagonal symmetry filter with dual port lookup table multipliers0.6980.745810.23810.221799920186
Quadrantal symmetry filter with dual port lookup table multipliers0.5590.625310.62310.81438916453

The area of the proposed DSF and QSF symmetry filter architectures is reduced by 24.1% and 39.3% to the general 2-D FIRA for w = 4. The power-saving obtained by DSF and QSF is 9.9% and 27.8% less than the general filter architecture. The delay values of DSF, QSF and general filter are almost the same. Figure 13 represents the comparison of area, delay, and power consumption of the proposed DPLUT-based 2-D FIRAs for w = 4 and 8. It can observe that the VLSI performance metrics increase correspondingly when the filter's input sample bits increase.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure13.gif

Figure 13. The area, delay, and power consumption comparison of proposed filters.

DSF, diagonal symmetry filter; QSF, quadrantal symmetry filter; LUT, look up table.

for w = 4 and 8. Where DSF is diagonal symmetry filter, QSF is quadrantal symmetry filter and LUT is look up table. The proposed symmetry 2-D FIR filters with DPLUT-based multipliers are compared to previous works. The performance metrics obtained from the synthesis tool are tabulated in Table 6.

Table 6. Comparison of the proposed filters with existing filter for L = 8 with N= 4.

DSF, Diagonal Symmetry Filter; QSF, Quadrantal Symmetry Filter; ADP, area delay product; PDP, power delay product.

ArchitectureArea (μm2)Delay (ns)Power (mW)ADP (μm2. ms)PDP (mW. ns)
Alawad et al. [44]48927114.167.966.928112.71
Mohanty et al. [45]79136116.255.093412.8582.76
Kumar et al. [48]4058258.724.133.5336.01
Proposed Filter (DSF)18345711.2581.87932.05721.157
Proposed Filter (QSF)16405211.0911.69551.8218.804

The proposed filter architecture implementation is extended for L =8 with N = 4. The 2-D FIRA for L =8 with N = 4 is also compared with the state-of-the-art works in Table 6. It can be observed that the proposed architecture is improved in terms of power, area, delay, ADP, and PDP than existing architectures.

A graphical comparison of results of rea, power consumption, delay, ADP, and PDP of the proposed structure with existing filter architecture for L = 8 with N = 4 is shown in Figure 14.

9d2c3c82-ee89-4898-a013-b5648a69aa72_figure14.gif

Figure 14. Area, delay-power consumption, ADP, and PDP comparison of proposed DSF and QSF architectures with existing architectures for L = 8 with N = 4.

The proposed MLUT-based 2-D block DSF filter for L= 8 with N= 4 requires 62.50%, 76.81%, and 54.79% less area compared to [23], [3], and [11], respectively. It has 20.49%, and 30.72% less delay compared to [23], and [3], respectively. It consumes 76.39%, 63.10%, and 54.49% less power than [23], [3], and [11], respectively. It has 70.3%, 83.99%, and 41.72% less ADP than [23], [3], and [11], respectively. It has 81.22%, 74.43%, and 41.24% less PDP compared to [23], [3], and [11] respectively.

The proposed MLUT-based 2-D block QSF filter for L= 8 with N= 4 requires 66.47%, 79.26%, and 59.57% less area compared to [23],[3], and [11], respectively. It has 21.67%, and 31.74% less delay compared to [23], and [3], respectively. It consumes 78.69%, 66.71% and 58.94% less power than [23], [3], and [11], respectively. It has 73.72%, 85.83%, and 48.44% less ADP than [23], [3], and [11] respectively. It has 83.31%, 77.27%, and 47.78% less PDP compared to [23], [3], and [11] respectively.

Conclusions

This paper implements two novel symmetry 2-D block FIRAs using QSF and DSF and one general filter (without symmetry) with DPMLUT-based multipliers. The conventional multipliers are replaced with the MLUT multipliers; DPMLUT-based multipliers save power and area compared to SPMLUT-based multipliers. Individual symmetry filters are implemented using fewer multipliers. Block processing is used to achieve memory reuse.

The proposed MLUT-based 2-D block DSF filter for L= 8 with N= 4 requires 54.79% less area, consumes 54.49% less power, has 41.72% less ADP, and 41.24% less PDP, but has 29% more delay compared to existing HLUT-based 2-D block FIR filter.11

On the other hand, the proposed MLUT-based 2-D block QSF filter for L= 8 with N= 4 requires 59.5% less area, consumes 58.94% less power, 48.44% less ADP and 47.78% less PDP, but has 27% more delay compared to existing HLUT-based 2-D block FIR filter.11 The 2-D block FIRA using QSF has fewer unique coefficients than the 2-D block FIRA using DSF. Hence, QSF performs well in terms of performance metrics.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 21 Sep 2023
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Chowdari Ch P and Seventline JB. Implementation of distributed arithmetic-based symmetrical 2-D block finite impulse response filter architectures [version 1; peer review: 2 approved] F1000Research 2023, 12:1182 (https://doi.org/10.12688/f1000research.126067.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 21 Sep 2023
Views
3
Cite
Reviewer Report 29 Feb 2024
M Balaji, Department of ECE, Sree Vidyanikethan Engineering College, Tirupati, Andhra Pradesh, India 
Approved
VIEWS 3
This study focuses on symmetry-based optimizations and offers a well-organized and very effective method for designing 2-D finite impulse response (FIR) filters. It is a good method to decrease multipliers by incorporating different symmetries. One notable innovation is the multiplication ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Balaji M. Reviewer Report For: Implementation of distributed arithmetic-based symmetrical 2-D block finite impulse response filter architectures [version 1; peer review: 2 approved]. F1000Research 2023, 12:1182 (https://doi.org/10.5256/f1000research.138442.r242160)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
10
Cite
Reviewer Report 01 Nov 2023
Sirish Kumar Pagoti, Department of ECE, Aditya Institute of Technology and Management, Srikakulam, Andhra Pradesh, India 
Approved
VIEWS 10
1. This paper presents a well-structured and highly efficient approach to 2-D finite impulse response (FIR) filter design, focusing on symmetry-based optimizations. The incorporation of various symmetries to minimize multipliers is a commendable strategy.

2. The utilization ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Pagoti SK. Reviewer Report For: Implementation of distributed arithmetic-based symmetrical 2-D block finite impulse response filter architectures [version 1; peer review: 2 approved]. F1000Research 2023, 12:1182 (https://doi.org/10.5256/f1000research.138442.r208433)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 21 Sep 2023
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.