Keywords

1 Introduction

In the age of big data and IoT, error resilient data storage, analysis and transmission are very crucial in different fields like social media, health care, deep space exploration and underwater surveillance etc. Even though the chances of data corruption in the silicon based semiconductor memory has grown with the shrinking of technology node, semiconductor based storage devices like random access memory (RAM), read only memory (ROM) and flash memory popularly used in the memory industry still have large footprint [1]. In order to prevent data corruption in the semiconductor memories, various traditional error mitigation techniques like triple modular redundancy (TMR), concurrent error detection (CED) [2] and readback with scrubbing [3] are generally used. The above mentioned methods consume large area, power and are not suitable for real time applications. Sometimes interleaving is used for error mitigation in memory but it increases the complexity of the memory and is not useful for small memory devices.

In order to alleviate the drawbacks of TMR, CED or scrubbing, various error detecting and correcting (EDAC) codes are used for error mitigation in the data memory as well as in the communication channels. In general, single bit errors in the memory are corrected by using single bit error correcting code such as Hamming or Hisao code. In order to correct multiple erroneous bits, multi-bit error correcting block codes like Bose, Chaudhuri, Hocquenghem (BCH) code [4], Reed-Solomon code [5] are used. They have greater decoding complexity and large overhead due to the presence of more number of redundant bits compared to single bit error correcting code. Data in the memory is arranged as a matrix. Hence, different product codes are used for error mitigation in the memory where two low complexity block codes are used as component codes. Product codes formed using only Hamming codes as component codes [6] or Hamming code and parity code as component codes [6], are used to correct multi-bit upset in the SRAM based semiconductor memory. Error detection capability of different complex EDAC codes can be concatenated with Hamming code to generate low complexity multi-bit error correcting code, such as RS code concatenated with Hamming code [7] and BCH code concatenated with Hamming code [8]. In addition to block code, memory based convolutional codes [9] are also used for error mitigation in the storage devices.

Error detection and correction methods discussed so far are implemented separately that read data from memory, perform encoding and decoding operation and finally write back data into the memory. With the rise of emerging technologies, computing can be performed in the memory itself, alongside storage of data unlike traditional von Neumann computing models [10]. Redox based Random Access Memory (ReRAM) is one of the non-volatile storage technology which supports such in memory computing [11]. Due to high circuit density, high retention capability and low power consumption, ReRAM technology is capable of being used as an alternative of NAND or NOR flash in the industry. Unlike CMOS or TTL based semiconductor memory technology, ReRAM uses different dielectric materials to develop its crossbar structure. ReRAM demonstrates good switching characteristics between high and low resistance state compared to other emerging memories like magnetic random access memory (MRAM), ferroelectric random access memory (FRAM) [12], etc. ReRAM based memory technology is compatible with conventional CMOS based design flow and provides inherent parallelism due to its crossbar structure. The working principle of ReRAM technology involves formation of low resistance conducting path through dielectric material by applying a high voltage across it. The conducting path arises due to multiple mechanisms like metal defect, vacancy, etc. [13]. The conducting tunnel through insulator can be controlled by an external voltage source for performing SET or RESET operations on the device.

Several in-memory computation platforms have already been proposed using ReRAMs, such as, general purpose in memory arithmetic circuit implementations [14], neuromorphic computing platforms [15] and general purpose Programmable Logic-in-Memory (PLiM) [16]. Apart from these general purpose applications, ReRAM based computation platforms are also used to implement different domain specific algorithms like machine learning [17, 18], encryption [19] or compression algorithm [20].

Authors in [21] proposed efficient hardware implementation of BCH code. Further, hardware implementation of non-binary BCH code or RS code is also proposed by authors in [22]. The basic building blocks of error correcting code is the finite field arithmetic. The hardware implementation of high throughput finite field multiplier circuit on field programmable gate array (FPGA) and application specific integrated circuit (ASIC) are discussed by authors in [23]. Recently, ReRAM based in memory computation of Galois field (GF) arithmetic is described by authors in [24]. In this work, we propose the first in-memory BCH encoding and decoding operation library. Specifically, our contributions are as follows:-

  • This work presents the first in-memory implementation of encoding and decoding operation of BCH code using ReRAM crossbar array.

  • The proposed mapping harnesses the bit-level parallelism offered by ReRAM crossbar arrays and supports a wide variety of crossbar dimensions.

  • In order to perform matrix multiplication during encoding and decoding operations, we have proposed a new method of implementation of binary matrix multiplication using ReRAM crossbar array. We refer the method as BiBLAS-3, since it is a level-3 binary basic linear algebra subprogram.

  • The proposed implementation has a very low footprint in terms of devices required as well as energy, which makes it suitable for use as building blocks for different applications.

The rest of the paper is organized as follows. Section 2 presents the fundamentals of GF arithmetic, basics of encoding and decoding operations using BCH code along with a succinct introduction to ReVAMP, a state-of-the-art ReRAM based in-memory computing platform. Section 3 presents detailed implementation of element generation of GF, encoding and decoding operations for the ReVAMP platform using BiBLAS-3. Experimental results are described in Sect. 4, followed by conclusion in Sect. 5.

2 Preliminaries

In this section, we present the fundamentals of encoding and decoding operation using BCH code. We introduce the preliminaries of logic operation using ReVAMP architecture. The encoding and decoding operations of the BCH code will be performed on binary GF, that we describe briefly.

2.1 Galois Field Arithmetic

A field is a set of elements on which basic mathematical operations like addition and multiplication can be performed without leaving the set. Hence, these basic operations must satisfy distributive, associative and commutative laws [25]. The order of a field is the number of elements in the field. A field with finite number of elements is known as GF. The order of the GF is always a prime number or the power of a prime number. If p be a prime number and m be a positive integer, then GF will contain \(p^m\) elements and can be represented as \(GF(p^m)\).

For m = 1, p = 2, the elements in GF will be {0,1} and this is known as binary field. Here, we will consider GF of \(2^m\) elements from the binary field GF(2) where \(m > 1\). If U be the set of the elements of the field and \(\alpha \) be an element of \(GF(2^m)\), then U can be represented by Eq. (1).

$$\begin{aligned} U = [0, \alpha ^0, \alpha ^1, \alpha ^2, \alpha ^3, .......,\alpha ^{2^m-1}] \end{aligned}$$
(1)

Let f(x) be a polynomial over \(GF(2^m)\) and it is said to be irreducible if f(x) is not divisible by any other polynomial in \(GF(2^m)\) with degree less than m, but greater than zero [26]. The irreducible polynomial is a primitive polynomial, if the smallest positive integer q for which f(x) divides \(x^q+1\), where \(q = 2^m-1\). For each value of m, there can be multiple primitive polynomials, but we will use the primitive polynomial with least number of terms for computation over GF.

Fig. 1.
figure 1

(a) Primitive polynomial for various order GF. (b) Representation of elements in \(GF(2^4)\).

The list of primitive polynomials for different values of m is shown in Fig. 1a. These primitive polynomials are the basis of computation using the elements of GF. For the generation of elements of GF, we will start from two basic elements 0, 1 and another new element \(\alpha \).

In this paper, we have discussed encoding and decoding operation of single bit error correcting BCH code on \(GF(2^m)\) where m varies from 3 to 7. As \(\alpha \) is an element of \(GF(2^m)\), it must satisfy the primitive polynomial corresponding to \(GF(2^m)\). With the variation of m, not only primitive polynomial changes but also dimension of BCH code changes as shown in Table 1. If \(\alpha \) be an element in \(GF(2^m)\), \(\alpha ^k\) (where k is an positive integer and \(k>2\)) is also be an element of \(GF(2^m)\) and the recursive expression that will be used to calculate \(\alpha ^k\) for different values m in \(GF(2^m)\) are shown in Table 1. Here in Fig. 1b we have illustrated the power, polynomial and 4-Tuple representation of all the elements of \(GF(2^4)\) are shown in Fig. 1b. Based on the elements of GF, the encoding and decoding operations of BCH code will be performed.

Table 1. Variation of dimension of single bit error correcting BCH code with the order of GF.

2.2 Basics of BCH Encoding and Decoding Operation

BCH is a powerful random error correcting cyclic code which is basically general purpose multi-bit error correcting Hamming code. Given two integers m and t such that \(m>3\) and \(t<2^m - 1\), then there exists a binary BCH code whose block length will be \(n=2^m-1\) with the number of parity check bits equal to \((n-k) \le mt\) and the minimum distance will be \(d_{min}\ge (2t +1)\). This will represent t error correcting BCH code. If \(\alpha \) be a primitive element in \(GF (2^m)\), then generator polynomial g(x) of t error correcting BCH code of length \(2^m-1\) will be lowest degree polynomial over GF(2) and \(\alpha \), \(\alpha ^2\),\(\ldots \),\(\alpha ^{2t}\) will be its root. Hence, the Eq. (2) must satisfy.

$$\begin{aligned} g(\alpha ^i) = 0\quad \forall i \in \{1,2, \ldots , t\} \end{aligned}$$
(2)

If \(\phi _i(x)\) be the minimal polynomial of \(\alpha ^i\), then g(x) will be formed using the Eq. (3).

$$\begin{aligned} g(x) = LCM \{\phi _1(x), \phi _2(x),\ldots ,\phi _{2t}(x)\} \end{aligned}$$
(3)

As \(\alpha ^i\) and \(\alpha ^{i'}\) (where \(i= i'2^l\), \(i'\) is odd and \(l>1\)) are conjugate to each other \(\phi _i(x) = \phi _{i'}(x)\). Hence, g(x) will be formed using the Eq. (4).

$$\begin{aligned} g(x) = LCM \{\phi _1(x), \phi _3(x),\ldots ,\phi _{2t-1}(x)\} \end{aligned}$$
(4)

Since we will use single bit error correcting BCH code, the generator polynomial g(x) for \(GF(2^4)\) is given by

$$\begin{aligned} g(x) = \phi _1(x)= x^4 + x + 1 \end{aligned}$$

The degree of g(x) will be at most mt and the number of parity bits will be \((n-k)\). After the generation g(x), the encoding operation will involve multiplication of input data D(x) with g(x), i.e, \(C(x) = D(x) \times g(x)\).

The decoding operation of BCH code will involve the following steps:

  1. 1.

    Syndrome computation.

  2. 2.

    Determine the error locater polynomial \(\lambda \) from the syndrome components \(S_1\), \(S_2\),\(\ldots \),\(S_{2t}\).

  3. 3.

    Find the error location by solving the error locater polynomial \(\lambda (x)\).

Let \(r(x) = r_0 + r_1x + r_2x^2+\ldots +r_{n-1}x^{n-1}\) be the received data and e(x) be the error pattern, then \(r(x) = D(x) + e(x)\). For t error correcting BCH code, the parity check matrix will be

$$ H = \begin{bmatrix} 1&\alpha&\alpha ^2&\alpha ^3&\ldots&\alpha ^{(n-1)} \\ 1&\alpha ^3&(\alpha ^3)^2&(\alpha ^3)^3&\ldots&(\alpha ^3)^{(n-1)}\\ 1&\alpha ^5&(\alpha ^5)^2&(\alpha ^5)^3&\ldots&(\alpha ^5)^{(n-1)} \\ \vdots&\vdots&\vdots&\vdots&\ddots&\vdots \\ 1&\alpha ^{(2t-1)}&(\alpha ^{(2t-1)})^2&(\alpha ^{(2t-1)})^3&\ldots&(\alpha ^{(2t-1)})^{(n-1)} \end{bmatrix} $$

The syndrome is a 2t-tuple \(S = (S_1, S_2,\dots ,S_{2t})\) = \( r\times H^T\) where H is the parity check matrix. Since we are considering single bit error correcting BCH code, t will be equal to 1 and \(S= S_1= r\times H^T\).

In the next step, from the syndrome values 2t nonlinear equations are formed which will be solved using either Berlekamp-Massey or Euclid’s algorithm [27] and an error locater polynomial is formed using the roots obtained by solving the 2t nonlinear equations. Finally, the roots of the error locater polynomial is solved using Chien search algorithm [27]. Single bit error correcting BCH code generate only one syndrome whose value can directly locate the position of the erroneous bit and hence, we have not discussed the detailed implementation of step 2 and step 3 of the decoding of BCH code.

Fig. 2.
figure 2

ReVAMP architecture.

Fig. 3.
figure 3

ReVAMP instruction format.

2.3 In-Memory Computing Using ReRAM

In this subsection, we describe the ReRAM-based in-memory computing platform—ReVAMP, introduced in [28]. The architecture, presented in Fig. 2 utilizes ReRAM crossbar with lightweight peripheral circuitry for in-memory computing. The ReRAM crossbar memory is used as data storage and computation memory (DCM). This is where in-memory computation using ReRAM devices takes place. A ReRAM crossbar memory consists of multiple 1-Select 1-Resistance (1S1R) ReRAM devices [29], arranged in the form of a crossbar [30]. A V/2 scheme is used for programming the ReRAM array. Unselected lines are kept to ground. In a readout phase, the presence of a high current (\({\approx }5~\mu A\)) is considered as logic ‘1’ while presence of a low current (\({<}2~\mu A\)) is interpreted as logic ‘0’. Like conventional RAM arrays, ReRAM memories are accessed as \(w_D\)-bit wide words. Each ReRAM device has two input terminals, namely the wordline wl and bitline bl. The internal resistive state Z of the ReRAM acts as a third input and stored bit. The next state of the device \(Z^n\) can be expressed as Boolean majority function with three inputs, where the bitline input is inverted.

$$\begin{aligned} Z^n = M_3(Z, wl, \overline{bl}) \end{aligned}$$
(5)

This forms the fundamental logic operation that can be realized using ReRAM devices. Using the intrinsic function \(Z^n\), inversion operation can be realized. Since majority and inversion operation form a functionally complete set, any Boolean function can be realized using the \(Z^n\).

The ReVAMP architecture has a three-stage pipeline with Instruction Fetch, Instruction Decode and Execution stages, as shown in Fig. 2. The instruction memory (IM) can be a regular memory or a ReRAM memory, with the program counter being used to address and fetch the stored instructions in the Instruction Fetch stage.

The architecture supports two instructions—Read and Apply, as presented in Fig. 3. Read instruction reads a specified word, wl from the DCM and stores it in the Data Memory Register (DMR). The read out word, available in the DMR, can be used as input by the following instructions. The Apply instruction is used for computation in the DCM. The address wl specifies the word in the DCM that will be computed upon. A bit flag s chooses whether the inputs will be from primary input register (PIR) or DMR. Two-bit flag ws is used to select the wordline input – 11 selects ‘1’, 10 selects ‘0’, 00 selects wb bit within the chosen data source for use as wordline input while 01 is an invalid value for ws. The pairs (vval) are used to specify individual bitline inputs. The bit flag v indicates if the input is NOP or a valid input. Similar to wb, the bits val specify the bit within the chosen data source for use as bitline input.

Fig. 4.
figure 4

A \(2\times 3\) ReRAM crossbar, i.e., a crossbar with two rows and three bitlines. \(Z_{ij}\) represents the state of device at wordline i and bitline j. (a) Computation on \(0^{th}\) row with ‘0’ and {\(bl_0\), \(bl_1\), \(bl_2\)} as the wordline and bitline inputs respectively. Valid inputs can be either {‘1’ (\(+2.4\) V) or ‘0’ (\(-2.4\) V) }. (b) \(0^{th}\) row is being read out, by setting wordline to ‘1’ (\(+2.4\) V) and the bitlines to 0 (0 V).

Figure 4 shows a \(2\times 3\) ReRAM crossbar array, which can act as the DCM. The operation in Fig. 4(a) can be expressed as an Apply instruction,

$$\text {Apply}~0~00~00~00~1~00~1~01~1~10$$

and the PIR contents are set to \(bl_0~bl_1~bl_2\). The operation in Fig. 4(b) can be expressed as Read 0. From here on, we express the in-memory compute operations in the crossbar representation.

3 Methodology

The ReVAMP architecture performs in-memory computing operation using ReRAM devices capable of computing three-input Boolean majority with a single input inverted. Boolean majority with inverter is a functionally complete set. Therefore, it can be used for computation of arbitrary Boolean functions. The ReVAMP allows simultaneous computation on all devices that share a common wordline. A signal is required to read out the contents of a word. A recent work demonstrated multiple mathematical operation on the elements of GF using ReVAMP architecture [24]. In this section, we present the mapping of encoding and decoding operation of BCH code using these mathematical operation on the ReVAMP architecture. It involves three steps: generation of GF elements, encoding using BCH code and decoding using BCH code. The encoding and decoding operation using BCH code basically involves matrix multiplication which we will implement using ReRAM crossbar with the help of the BiBLAS operations proposed in [31]. We use the terms wordline and rows interchangeably. Similarly, the terms bitline and columns are used interchangeably.

Table 2. Generation operation for elements in \(GF(2^3)\) using \(8\times 3\) DCM.

3.1 Generation of GF Elements

Here, we will illustrate the generation of elements of \(GF(2^3)\) as an example. As each element in \(GF(2^3)\) is a three tuple, the number of the columns in the DCM should either three or a multiple of three. For this purpose, we need DCM having 8 wordlines and 3 bitlines. Table 2 presents the intermediate state of the DCM and the inputs used for the generation of elements of GF. In Step 1, ‘1’ is applied on \(7^{th}\) wordline and \(\overline{a_{00}}\), \(\overline{a_{01}}\), \(\overline{a_{02}}\) are applied to the bitlines. Here, (\(\overline{a_{00}}\), \(\overline{a_{01}}\), \(\overline{a_{02}}\)) represent 1, 1 and 0 respectively (from the 3-tuple representation shown in Fig. 1b). Similarly, 0, 0 and 1 are loaded into the \(7^{th}\) row which basically represents \(\alpha ^0\). In the next two steps, \(\alpha \) and \(\alpha ^2\) are loaded into the sixth and fifth row applying (\(\overline{a_{10}}\), \(\overline{a_{11}}\), \(\overline{a_{12}}\)) and (\(\overline{a_{20}}\), \(\overline{a_{21}}\), \(\overline{a_{22}}\)) to the bitlines respectively that represent (1, 0, 1) and (0, 1, 1).

Now \(\alpha ^3\) will be calculated by modulo-2 addition of elements in \(7^{th}\) and \(6^{th}\) row. Modulo-2 addition between \(a_{i,j}\) and \(a_{(i+1),j}\) can be broken down into two operations.

$$\begin{aligned} a_{i,j}\oplus a_{(i+1),j}&= a_{i,j}.\overline{a_{(i+1),j}} + (\overline{a_{i,j}+\overline{a_{(i+1),j}}}) \end{aligned}$$
(6)
$$\begin{aligned}&=f_{i,j} + \overline{g_{i+1,j}} \end{aligned}$$
(7)

To compute \(f_{i,j}\) and \(g_{i+1,j}\), we require two copies of \(a_{i,j}\). Here i represents power of \(\alpha \) and j represents position of a bit when \(\alpha ^i\) is expressed in 3-tuple format. Hence, in Step 5 and Step 6, we have loaded \(a_{00}\), \(a_{01}\) and \(a_{02}\) in \(4^{th}\) and \(3^{rd}\) row. \(f_{0,j}\) and \(g_{1,j}\) are calculated by applying ‘0’ in \(4^{th}\) row, ‘1’ in \(3^{rd}\) row and 0 along all the bitlines in Step 7 and Step 8 respectively. In order to do OR operation between \(f_{0,j}\) and \(\overline{g_{1,j}}\), \(g_{1,j}\) is read from the \(3^{rd}\) row in Step 9 and then in Step 10, apply \(g_{1,j}\) along all the bitlines and ‘1’ in wordline of \(4^{th}\) row. Finally, \(4^{th}\) row will store the value \(\alpha ^3\). Obeying the same procedure, \(\alpha ^4\), \(\alpha ^5\) and \(\alpha ^6\) will be calculated and stored in the \(3^{rd}\), \(2^{nd}\) and \(1^{st}\) row of the crossbar respectively. The flowchart shown in Fig. 5 describes the generation of elements for \(GF(2^m)\) using DCM with u wordlines and m bitlines.

Fig. 5.
figure 5

Flowchart for the generation of elements of \(GF(2^m)\).

3.2 Encoding and Decoding Operations

The encoding and decoding operations primarily involve binary matrix multiplications. During the encoding operation, the input data will be multiplied with the generator matrix which will give the encoded data. Similarly, during the decoding operation, the received data will be multiplied with the transpose of parity check matrix to generate the syndromes which helps to locate the error in the received data. Even though the decoding of BCH codes involve multiple steps as presented in Sect. 2, we only require syndrome computation for achieving single-bit error correction.

Suppose A and B are two matrices with dimensions \(2\times 2\) and C be the matrix obtained after multiplication of A and B.

$$ A = \begin{bmatrix} a_{11}&a_{12}\\ a_{21}&a_{22} \end{bmatrix} B = \begin{bmatrix} b_{11}&b_{12}\\ b_{21}&b_{22} \end{bmatrix} C = \begin{bmatrix} c_{11}&c_{12}\\ c_{21}&c_{22} \end{bmatrix} = \begin{bmatrix} a_{11}b_{11}\oplus a_{12}b_{21}&a_{11}b_{12}\oplus a_{12}b_{22}\\ a_{21}b_{11}\oplus a_{22}b_{21}&a_{21}b_{12}\oplus a_{22}b_{22} \end{bmatrix} $$

We will explain mapping of BiBLAS-3 on ReRAM crossbar using \(2\times 2\) matrix. The parameters involved in the BiBLAS-3 implementation are described in Table 3. We consider the following configuration for the BiBLAS-3 implementation.

  • Matrices A, B are available in within the crossbar and the product matrix C is stored crossbar after computation.

  • First five rows (\(r_1\),\(r_2\),\(\ldots \),\(r_5\)) are reserved for storage and last four rows (reserve1,\(\ldots \),reserve4) are used for computation.

Table 3. BiBLAS-3 parameters.

The minimum dimensions of the matrix in mapping for BiBLAS-3 should be \(1\times 2\). Algorithm 1 shows step by step mapping of BiBLAS-3. In step 1, the elements of product matrix C is divided into a groups of size c. In this example, C is of \(2\times 2\) size with a total of 4 product terms. As the column size of choosen crossbar is 3, product terms are divided into two groups with elements 1, 3, 2 in group 1 and element 4 in group2. In step 2, the terms from matrices A and B, responsible for the formation of each product term in a group, are obtained. Each product constitutes a series of XORed dot products. The first and second dot products of each element in group 1 is computed in reserve1 and reserve2 rows respectively. Before computing third dot product if there are reserve1, reserve2 they will be XORed first and followed by a clear operation on reserve2. Likewise, the following dot products are computed in reserve2 and accumulated to reserve1 by XOR. This is repeated till all the dots products in the choosen group are completed. By the end of step 2, reserve1 contains the products are of group 1. In step 3, the products from reserve1 are copied to free memory and then reserve1 is cleared. steps 2 and 3 are iterated till all the groups are computed on the crossbar. A detailed example is shown in Table 4.

figure a
Table 4. BiBLAS-3 computation with the input matrices A and B to compute the product matrix C.

4 Experiment

In this section, we analyze the performance of in-memory computation of encoding and decoding operations of BCH code on GF for various order m and crossbar dimensions. The experiments were performed using Cadence Virtuoso, using device-accurate model of ReRAM devices [29]. The dimension of single bit error correcting BCH code varies with change in the order m of GF. Figures 6 and 7 show the delay in terms of number of clock cycles and area in terms of the number of ReRAM devices required to perform encoding and decoding operations, along with element generation of GF elements respectively using ReVAMP. The encoding operation involves generation of the generator matrix from parity check matrix and multiplication of the input data with the generator matrix whereas the decoding operation involves multiplication of the received data with the transpose of parity check matrix. Hence, the number of cycles required for encoding operation are more than the number of cycles required for decoding operation, that can be verified from Fig. 6. Generation of generator matrix from the parity check matrix and multiplication of the input data with the generator matrix are sequential operations, so the same ReRAM devices will be used for both the operations. Thus, the area required for both encoding and decoding operations are basically ReRAM devices used for matrix multiplication. Hence, the area required for encoding and decoding operations are same as shown in Fig. 7.

Fig. 6.
figure 6

Delay of mapping for various operations on different dimension of single bit error correcting BCH code.

Fig. 7.
figure 7

Area or the number of ReRAM devices required for generation of the elements of GF, encoding and decoding operation of BCH code on that GF.

The increase in dimension of BCH code increases the order of GF, i.e., the delay as well as area requirements for computation of BCH code grow exponentially. Specifically, the number of instructions \(I_{m}\) required for generating all the elements of \(GF(2^m)\) can be expressed by Eq. (8).

$$\begin{aligned} I_{m}= m+ 11(2^m-m-1) \end{aligned}$$
(8)

Similarly, the number of instructions required for encoding (\(I_{en}\)) and decoding operation (\(I_{de}\)) of BCH code can be calculated by Eqs. (9) and (10) respectively, where n and k are the length of encoded data and input data respectively.

$$\begin{aligned} I_{en}= 2n + (n-1)12 + (k*(n-k)) \end{aligned}$$
(9)
$$\begin{aligned} I_{de}= 2n + (n-1)12 \end{aligned}$$
(10)

Addition operation involves XORing the individual bits of the input operands, which is done in parallel using the rows of ReRAM crossbar. Hence, the delay of performing XOR between any number of elements in two rows of crossbar operation remain constant, even with increase in the number of bitlines. The change in the length of encoded data n and input data k leads to change in the delay as well the number of ReRAM devices required for mapping.

The number of bit-level parallel operations on ReRAM crossbar arrays is dependent on the number of bitlines present in the crossbar. Figure 8 demonstrates the impact of number of bitlines on the mapping delay for operations in (15,11) BCH code. With the increase in the number of bitlines, the delay of operations reduce, which demonstrate the effectiveness of the proposed mapping in harnessing the bit-level parallelism offered by the ReRAM crossbar array.

Fig. 8.
figure 8

Impact of the number of bitlines on delay for computation of (15,11) BCH code.

Fig. 9.
figure 9

Impact of BCH code dimensions on energy required for computation.

Table 5. Comparison of area and delay for BCH computation on ReVAMP, ASIC and FPGA.

Figure 9 presents the mean energy required for element generation and encoding with decoding operation. Due to the large runtime of device-accurate simulations for all possible input combinations, we only report energy number for operations till BCH code with dimension (63,57). With the increase in dimension of BCH code, the dimensions of parity check and generator matrix also increases and hence, the total number of multiplications also increase. This drastically increases delay, area and energy consumption with the increase in dimension of BCH code.

Even though the contemporary technologies such as ASIC and FPGA cannot be directly compared with ReRAM based implementation, we report a coarse comparison for the sake of completeness in Table 5. For ReRAM, we assume mature ReRAM technology with 1 ns read/write times [32]. The DCM column presents the size of the DCM used for in-memory computation, in terms of number of bits. Hence the size of DCM is calculated by multiplying number of wordline with the number of bitlines. The Instruction Memory (IM) column in the Table 5 presents the number of bits required for instruction storage in the IM of the ReVAMP architecture. The FPGA implementation (synthesized at 100 MHz) is on Kintex-7 evaluation board. The ASIC implementation (synthesized at \({\approx }\)1 GHz) was performed using Synopsys Design Compiler with TSMC 65 nm technology library. As computing using ReRAM is inherently sequential, increase in the dimension of BCH code leads to direct increase in delay, along with corresponding increase in area (DCM size). For ASIC and FPGA, the increase in delay is relatively lesser. The main advantage of ReRAM is low area requirement in terms of number of devices required for implementation. For example, encoding and decoding operation of (31,26) BCH code requires \(\approx \)8k GE for ASIC, 510 LUT for FPGA but only 590 devices for ReVAMP.

5 Conclusion

In this work, efficient mapping of encoding and decoding operation of single bit error correcting BCH code was proposed on state-of-the-art ReRAM based in-memory computing platform. Further, we have devised a technique for efficient in-memory realization of BiBLAS-3 operations using ReRAM crossbar array. We have explored multiple configurations for the crossbar dimensions and demonstrated performance trade-offs while varying the dimensions of BCH code. The proposed implementation has a low energy footprint and shows good improvements in terms of area requirements compared to traditional ASIC and FPGA based design. In future, the work can be extended for in-memory implementation of multi-bit error correcting BCH code.