ReRAM Based In-Memory Computation of Single Bit Error Correcting BCH Code

Mandal, Swagata; Tavva, Yaswanth; Bhattacharjee, Debjyoti; Chattopadhyay, Anupam

doi:10.1007/978-3-030-23425-6_7

Swagata Mandal²⁰,
Yaswanth Tavva²¹,
Debjyoti Bhattacharjee²¹ &
…
Anupam Chattopadhyay²¹

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 561))

Included in the following conference series:

IFIP/IEEE International Conference on Very Large Scale Integration - System on a Chip

623 Accesses

Abstract

Error resilient high speed robust data communication is the primary need in the age of big data and Internet-of-things (IoT), where multiple connected devices exchange huge amount of information. Different multi-bit error detecting and correcting codes are used for error mitigation in the high speed data communication though it introduces delay and their decoding structures are quite complex. Here we have discussed the implementation of single bit error correcting Bose, Chaudhuri, Hocquenghem (BCH) code with simple decoding structure on a state-of-the art ReRAM based in-memory computing platform. ReRAM devices offer low leakage power, high endurance and non-volatile storage capabilities, coupled with stateful logic operations. The proposed lightweight library presents the mapping for generation of elements on Galois field (GF) for computation of BCH code, along with encoding and decoding operations on input data stream using BCH code. We have verified the results for BCH code with different dimensions using SPICE simulation. For (15,11) BCH code, the number of clock cycles required for element generation, decoding and encoding of BCH code are 103, 230 and 251 respectively, which demonstrates the efficacy of the mapping.

You have full access to this open access chapter, Download conference paper PDF

Design and Evaluation of Neale-Based Multi-bit Adjacent Error-Correcting Codec for SRAM

FPGA Based Low Area Multi-bit Adjacent Error Correcting Codec for SRAM Application

Article 01 October 2020

Implementation of Classical Error Control Codes for Memory Storage Systems Using VERILOG

Keywords

1 Introduction

In the age of big data and IoT, error resilient data storage, analysis and transmission are very crucial in different fields like social media, health care, deep space exploration and underwater surveillance etc. Even though the chances of data corruption in the silicon based semiconductor memory has grown with the shrinking of technology node, semiconductor based storage devices like random access memory (RAM), read only memory (ROM) and flash memory popularly used in the memory industry still have large footprint [1]. In order to prevent data corruption in the semiconductor memories, various traditional error mitigation techniques like triple modular redundancy (TMR), concurrent error detection (CED) [2] and readback with scrubbing [3] are generally used. The above mentioned methods consume large area, power and are not suitable for real time applications. Sometimes interleaving is used for error mitigation in memory but it increases the complexity of the memory and is not useful for small memory devices.

In order to alleviate the drawbacks of TMR, CED or scrubbing, various error detecting and correcting (EDAC) codes are used for error mitigation in the data memory as well as in the communication channels. In general, single bit errors in the memory are corrected by using single bit error correcting code such as Hamming or Hisao code. In order to correct multiple erroneous bits, multi-bit error correcting block codes like Bose, Chaudhuri, Hocquenghem (BCH) code [4], Reed-Solomon code [5] are used. They have greater decoding complexity and large overhead due to the presence of more number of redundant bits compared to single bit error correcting code. Data in the memory is arranged as a matrix. Hence, different product codes are used for error mitigation in the memory where two low complexity block codes are used as component codes. Product codes formed using only Hamming codes as component codes [6] or Hamming code and parity code as component codes [6], are used to correct multi-bit upset in the SRAM based semiconductor memory. Error detection capability of different complex EDAC codes can be concatenated with Hamming code to generate low complexity multi-bit error correcting code, such as RS code concatenated with Hamming code [7] and BCH code concatenated with Hamming code [8]. In addition to block code, memory based convolutional codes [9] are also used for error mitigation in the storage devices.

Error detection and correction methods discussed so far are implemented separately that read data from memory, perform encoding and decoding operation and finally write back data into the memory. With the rise of emerging technologies, computing can be performed in the memory itself, alongside storage of data unlike traditional von Neumann computing models [10]. Redox based Random Access Memory (ReRAM) is one of the non-volatile storage technology which supports such in memory computing [11]. Due to high circuit density, high retention capability and low power consumption, ReRAM technology is capable of being used as an alternative of NAND or NOR flash in the industry. Unlike CMOS or TTL based semiconductor memory technology, ReRAM uses different dielectric materials to develop its crossbar structure. ReRAM demonstrates good switching characteristics between high and low resistance state compared to other emerging memories like magnetic random access memory (MRAM), ferroelectric random access memory (FRAM) [12], etc. ReRAM based memory technology is compatible with conventional CMOS based design flow and provides inherent parallelism due to its crossbar structure. The working principle of ReRAM technology involves formation of low resistance conducting path through dielectric material by applying a high voltage across it. The conducting path arises due to multiple mechanisms like metal defect, vacancy, etc. [13]. The conducting tunnel through insulator can be controlled by an external voltage source for performing SET or RESET operations on the device.

Several in-memory computation platforms have already been proposed using ReRAMs, such as, general purpose in memory arithmetic circuit implementations [14], neuromorphic computing platforms [15] and general purpose Programmable Logic-in-Memory (PLiM) [16]. Apart from these general purpose applications, ReRAM based computation platforms are also used to implement different domain specific algorithms like machine learning [17, 18], encryption [19] or compression algorithm [20].

Authors in [21] proposed efficient hardware implementation of BCH code. Further, hardware implementation of non-binary BCH code or RS code is also proposed by authors in [22]. The basic building blocks of error correcting code is the finite field arithmetic. The hardware implementation of high throughput finite field multiplier circuit on field programmable gate array (FPGA) and application specific integrated circuit (ASIC) are discussed by authors in [23]. Recently, ReRAM based in memory computation of Galois field (GF) arithmetic is described by authors in [24]. In this work, we propose the first in-memory BCH encoding and decoding operation library. Specifically, our contributions are as follows:-

This work presents the first in-memory implementation of encoding and decoding operation of BCH code using ReRAM crossbar array.
The proposed mapping harnesses the bit-level parallelism offered by ReRAM crossbar arrays and supports a wide variety of crossbar dimensions.
In order to perform matrix multiplication during encoding and decoding operations, we have proposed a new method of implementation of binary matrix multiplication using ReRAM crossbar array. We refer the method as BiBLAS-3, since it is a level-3 binary basic linear algebra subprogram.
The proposed implementation has a very low footprint in terms of devices required as well as energy, which makes it suitable for use as building blocks for different applications.

The rest of the paper is organized as follows. Section 2 presents the fundamentals of GF arithmetic, basics of encoding and decoding operations using BCH code along with a succinct introduction to ReVAMP, a state-of-the-art ReRAM based in-memory computing platform. Section 3 presents detailed implementation of element generation of GF, encoding and decoding operations for the ReVAMP platform using BiBLAS-3. Experimental results are described in Sect. 4, followed by conclusion in Sect. 5.

2 Preliminaries

In this section, we present the fundamentals of encoding and decoding operation using BCH code. We introduce the preliminaries of logic operation using ReVAMP architecture. The encoding and decoding operations of the BCH code will be performed on binary GF, that we describe briefly.

2.1 Galois Field Arithmetic

A field is a set of elements on which basic mathematical operations like addition and multiplication can be performed without leaving the set. Hence, these basic operations must satisfy distributive, associative and commutative laws [25]. The order of a field is the number of elements in the field. A field with finite number of elements is known as GF. The order of the GF is always a prime number or the power of a prime number. If p be a prime number and m be a positive integer, then GF will contain $p^m$ elements and can be represented as $GF(p^m)$.

For m = 1, p = 2, the elements in GF will be {0,1} and this is known as binary field. Here, we will consider GF of $2^m$ elements from the binary field GF(2) where $m > 1$. If U be the set of the elements of the field and $\alpha $ be an element of $GF(2^m)$, then U can be represented by Eq. (1).

$$\begin{aligned} U = [0, \alpha ^0, \alpha ^1, \alpha ^2, \alpha ^3, .......,\alpha ^{2^m-1}] \end{aligned}$$

(1)

Let f(x) be a polynomial over $GF(2^m)$ and it is said to be irreducible if f(x) is not divisible by any other polynomial in $GF(2^m)$ with degree less than m, but greater than zero [26]. The irreducible polynomial is a primitive polynomial, if the smallest positive integer q for which f(x) divides $x^q+1$, where $q = 2^m-1$. For each value of m, there can be multiple primitive polynomials, but we will use the primitive polynomial with least number of terms for computation over GF.

The list of primitive polynomials for different values of m is shown in Fig. 1a. These primitive polynomials are the basis of computation using the elements of GF. For the generation of elements of GF, we will start from two basic elements 0, 1 and another new element $\alpha $.

In this paper, we have discussed encoding and decoding operation of single bit error correcting BCH code on $GF(2^m)$ where m varies from 3 to 7. As $\alpha $ is an element of $GF(2^m)$, it must satisfy the primitive polynomial corresponding to $GF(2^m)$. With the variation of m, not only primitive polynomial changes but also dimension of BCH code changes as shown in Table 1. If $\alpha $ be an element in $GF(2^m)$, $\alpha ^k$ (where k is an positive integer and $k>2$) is also be an element of $GF(2^m)$ and the recursive expression that will be used to calculate $\alpha ^k$ for different values m in $GF(2^m)$ are shown in Table 1. Here in Fig. 1b we have illustrated the power, polynomial and 4-Tuple representation of all the elements of $GF(2^4)$ are shown in Fig. 1b. Based on the elements of GF, the encoding and decoding operations of BCH code will be performed.

Table 1. Variation of dimension of single bit error correcting BCH code with the order of GF.

Full size table

2.2 Basics of BCH Encoding and Decoding Operation

BCH is a powerful random error correcting cyclic code which is basically general purpose multi-bit error correcting Hamming code. Given two integers m and t such that $m>3$ and $t<2^m - 1$, then there exists a binary BCH code whose block length will be $n=2^m-1$ with the number of parity check bits equal to $(n-k) \le mt$ and the minimum distance will be $d_{min}\ge (2t +1)$. This will represent t error correcting BCH code. If $\alpha $ be a primitive element in $GF (2^m)$, then generator polynomial g(x) of t error correcting BCH code of length $2^m-1$ will be lowest degree polynomial over GF(2) and $\alpha $, $\alpha ^2$,$\ldots $,$\alpha ^{2t}$ will be its root. Hence, the Eq. (2) must satisfy.

$$\begin{aligned} g(\alpha ^i) = 0\quad \forall i \in \{1,2, \ldots , t\} \end{aligned}$$

(2)

If $\phi _i(x)$ be the minimal polynomial of $\alpha ^i$, then g(x) will be formed using the Eq. (3).

$$\begin{aligned} g(x) = LCM \{\phi _1(x), \phi _2(x),\ldots ,\phi _{2t}(x)\} \end{aligned}$$

(3)

As $\alpha ^i$ and $\alpha ^{i'}$ (where $i= i'2^l$, $i'$ is odd and $l>1$) are conjugate to each other $\phi _i(x) = \phi _{i'}(x)$. Hence, g(x) will be formed using the Eq. (4).

$$\begin{aligned} g(x) = LCM \{\phi _1(x), \phi _3(x),\ldots ,\phi _{2t-1}(x)\} \end{aligned}$$

(4)

Since we will use single bit error correcting BCH code, the generator polynomial g(x) for $GF(2^4)$ is given by

$$\begin{aligned} g(x) = \phi _1(x)= x^4 + x + 1 \end{aligned}$$

The degree of g(x) will be at most mt and the number of parity bits will be $(n-k)$. After the generation g(x), the encoding operation will involve multiplication of input data D(x) with g(x), i.e, $C(x) = D(x) \times g(x)$.

The decoding operation of BCH code will involve the following steps:

1.
Syndrome computation.
2.
Determine the error locater polynomial $\lambda $ from the syndrome components $S_1$, $S_2$,$\ldots $,$S_{2t}$.
3.
Find the error location by solving the error locater polynomial $\lambda (x)$.

Let $r(x) = r_0 + r_1x + r_2x^2+\ldots +r_{n-1}x^{n-1}$ be the received data and e(x) be the error pattern, then $r(x) = D(x) + e(x)$. For t error correcting BCH code, the parity check matrix will be

$$ H = \begin{bmatrix} 1&\alpha&\alpha ^2&\alpha ^3&\ldots&\alpha ^{(n-1)} \\ 1&\alpha ^3&(\alpha ^3)^2&(\alpha ^3)^3&\ldots&(\alpha ^3)^{(n-1)}\\ 1&\alpha ^5&(\alpha ^5)^2&(\alpha ^5)^3&\ldots&(\alpha ^5)^{(n-1)} \\ \vdots&\vdots&\vdots&\vdots&\ddots&\vdots \\ 1&\alpha ^{(2t-1)}&(\alpha ^{(2t-1)})^2&(\alpha ^{(2t-1)})^3&\ldots&(\alpha ^{(2t-1)})^{(n-1)} \end{bmatrix} $$

The syndrome is a 2t-tuple $S = (S_1, S_2,\dots ,S_{2t})$ = $ r\times H^T$ where H is the parity check matrix. Since we are considering single bit error correcting BCH code, t will be equal to 1 and $S= S_1= r\times H^T$.

In the next step, from the syndrome values 2t nonlinear equations are formed which will be solved using either Berlekamp-Massey or Euclid’s algorithm [27] and an error locater polynomial is formed using the roots obtained by solving the 2t nonlinear equations. Finally, the roots of the error locater polynomial is solved using Chien search algorithm [27]. Single bit error correcting BCH code generate only one syndrome whose value can directly locate the position of the erroneous bit and hence, we have not discussed the detailed implementation of step 2 and step 3 of the decoding of BCH code.

2.3 In-Memory Computing Using ReRAM

In this subsection, we describe the ReRAM-based in-memory computing platform—ReVAMP, introduced in [28]. The architecture, presented in Fig. 2 utilizes ReRAM crossbar with lightweight peripheral circuitry for in-memory computing. The ReRAM crossbar memory is used as data storage and computation memory (DCM). This is where in-memory computation using ReRAM devices takes place. A ReRAM crossbar memory consists of multiple 1-Select 1-Resistance (1S1R) ReRAM devices [29], arranged in the form of a crossbar [30]. A V/2 scheme is used for programming the ReRAM array. Unselected lines are kept to ground. In a readout phase, the presence of a high current (${\approx }5~\mu A$) is considered as logic ‘1’ while presence of a low current (${<}2~\mu A$) is interpreted as logic ‘0’. Like conventional RAM arrays, ReRAM memories are accessed as $w_D$-bit wide words. Each ReRAM device has two input terminals, namely the wordline wl and bitline bl. The internal resistive state Z of the ReRAM acts as a third input and stored bit. The next state of the device $Z^n$ can be expressed as Boolean majority function with three inputs, where the bitline input is inverted.

$$\begin{aligned} Z^n = M_3(Z, wl, \overline{bl}) \end{aligned}$$

(5)

This forms the fundamental logic operation that can be realized using ReRAM devices. Using the intrinsic function $Z^n$, inversion operation can be realized. Since majority and inversion operation form a functionally complete set, any Boolean function can be realized using the $Z^n$.

The ReVAMP architecture has a three-stage pipeline with Instruction Fetch, Instruction Decode and Execution stages, as shown in Fig. 2. The instruction memory (IM) can be a regular memory or a ReRAM memory, with the program counter being used to address and fetch the stored instructions in the Instruction Fetch stage.

The architecture supports two instructions—Read and Apply, as presented in Fig. 3. Read instruction reads a specified word, wl from the DCM and stores it in the Data Memory Register (DMR). The read out word, available in the DMR, can be used as input by the following instructions. The Apply instruction is used for computation in the DCM. The address wl specifies the word in the DCM that will be computed upon. A bit flag s chooses whether the inputs will be from primary input register (PIR) or DMR. Two-bit flag ws is used to select the wordline input – 11 selects ‘1’, 10 selects ‘0’, 00 selects wb bit within the chosen data source for use as wordline input while 01 is an invalid value for ws. The pairs (v, val) are used to specify individual bitline inputs. The bit flag v indicates if the input is NOP or a valid input. Similar to wb, the bits val specify the bit within the chosen data source for use as bitline input.

Figure 4 shows a $2\times 3$ ReRAM crossbar array, which can act as the DCM. The operation in Fig. 4(a) can be expressed as an Apply instruction,

$$\text {Apply}~0~00~00~00~1~00~1~01~1~10$$

and the PIR contents are set to $bl_0~bl_1~bl_2$. The operation in Fig. 4(b) can be expressed as Read 0. From here on, we express the in-memory compute operations in the crossbar representation.

3 Methodology

The ReVAMP architecture performs in-memory computing operation using ReRAM devices capable of computing three-input Boolean majority with a single input inverted. Boolean majority with inverter is a functionally complete set. Therefore, it can be used for computation of arbitrary Boolean functions. The ReVAMP allows simultaneous computation on all devices that share a common wordline. A signal is required to read out the contents of a word. A recent work demonstrated multiple mathematical operation on the elements of GF using ReVAMP architecture [24]. In this section, we present the mapping of encoding and decoding operation of BCH code using these mathematical operation on the ReVAMP architecture. It involves three steps: generation of GF elements, encoding using BCH code and decoding using BCH code. The encoding and decoding operation using BCH code basically involves matrix multiplication which we will implement using ReRAM crossbar with the help of the BiBLAS operations proposed in [31]. We use the terms wordline and rows interchangeably. Similarly, the terms bitline and columns are used interchangeably.

Table 2. Generation operation for elements in $GF(2^3)$ using $8\times 3$ DCM.

Full size table

3.1 Generation of GF Elements

Here, we will illustrate the generation of elements of $GF(2^3)$ as an example. As each element in $GF(2^3)$ is a three tuple, the number of the columns in the DCM should either three or a multiple of three. For this purpose, we need DCM having 8 wordlines and 3 bitlines. Table 2 presents the intermediate state of the DCM and the inputs used for the generation of elements of GF. In Step 1, ‘1’ is applied on $7^{th}$ wordline and $\overline{a_{00}}$, $\overline{a_{01}}$, $\overline{a_{02}}$ are applied to the bitlines. Here, ($\overline{a_{00}}$, $\overline{a_{01}}$, $\overline{a_{02}}$) represent 1, 1 and 0 respectively (from the 3-tuple representation shown in Fig. 1b). Similarly, 0, 0 and 1 are loaded into the $7^{th}$ row which basically represents $\alpha ^0$. In the next two steps, $\alpha $ and $\alpha ^2$ are loaded into the sixth and fifth row applying ($\overline{a_{10}}$, $\overline{a_{11}}$, $\overline{a_{12}}$) and ($\overline{a_{20}}$, $\overline{a_{21}}$, $\overline{a_{22}}$) to the bitlines respectively that represent (1, 0, 1) and (0, 1, 1).

Now $\alpha ^3$ will be calculated by modulo-2 addition of elements in $7^{th}$ and $6^{th}$ row. Modulo-2 addition between $a_{i,j}$ and $a_{(i+1),j}$ can be broken down into two operations.

$$\begin{aligned} a_{i,j}\oplus a_{(i+1),j}&= a_{i,j}.\overline{a_{(i+1),j}} + (\overline{a_{i,j}+\overline{a_{(i+1),j}}}) \end{aligned}$$

(6)

$$\begin{aligned}&=f_{i,j} + \overline{g_{i+1,j}} \end{aligned}$$

(7)

To compute $f_{i,j}$ and $g_{i+1,j}$, we require two copies of $a_{i,j}$. Here i represents power of $\alpha $ and j represents position of a bit when $\alpha ^i$ is expressed in 3-tuple format. Hence, in Step 5 and Step 6, we have loaded $a_{00}$, $a_{01}$ and $a_{02}$ in $4^{th}$ and $3^{rd}$ row. $f_{0,j}$ and $g_{1,j}$ are calculated by applying ‘0’ in $4^{th}$ row, ‘1’ in $3^{rd}$ row and 0 along all the bitlines in Step 7 and Step 8 respectively. In order to do OR operation between $f_{0,j}$ and $\overline{g_{1,j}}$, $g_{1,j}$ is read from the $3^{rd}$ row in Step 9 and then in Step 10, apply $g_{1,j}$ along all the bitlines and ‘1’ in wordline of $4^{th}$ row. Finally, $4^{th}$ row will store the value $\alpha ^3$. Obeying the same procedure, $\alpha ^4$, $\alpha ^5$ and $\alpha ^6$ will be calculated and stored in the $3^{rd}$, $2^{nd}$ and $1^{st}$ row of the crossbar respectively. The flowchart shown in Fig. 5 describes the generation of elements for $GF(2^m)$ using DCM with u wordlines and m bitlines.

3.2 Encoding and Decoding Operations

The encoding and decoding operations primarily involve binary matrix multiplications. During the encoding operation, the input data will be multiplied with the generator matrix which will give the encoded data. Similarly, during the decoding operation, the received data will be multiplied with the transpose of parity check matrix to generate the syndromes which helps to locate the error in the received data. Even though the decoding of BCH codes involve multiple steps as presented in Sect. 2, we only require syndrome computation for achieving single-bit error correction.

Suppose A and B are two matrices with dimensions $2\times 2$ and C be the matrix obtained after multiplication of A and B.

$$ A = \begin{bmatrix} a_{11}&a_{12}\\ a_{21}&a_{22} \end{bmatrix} B = \begin{bmatrix} b_{11}&b_{12}\\ b_{21}&b_{22} \end{bmatrix} C = \begin{bmatrix} c_{11}&c_{12}\\ c_{21}&c_{22} \end{bmatrix} = \begin{bmatrix} a_{11}b_{11}\oplus a_{12}b_{21}&a_{11}b_{12}\oplus a_{12}b_{22}\\ a_{21}b_{11}\oplus a_{22}b_{21}&a_{21}b_{12}\oplus a_{22}b_{22} \end{bmatrix} $$

We will explain mapping of BiBLAS-3 on ReRAM crossbar using $2\times 2$ matrix. The parameters involved in the BiBLAS-3 implementation are described in Table 3. We consider the following configuration for the BiBLAS-3 implementation.

Matrices A, B are available in within the crossbar and the product matrix C is stored crossbar after computation.
First five rows ($r_1$,$r_2$,$\ldots $,$r_5$) are reserved for storage and last four rows (reserve1,$\ldots $,reserve4) are used for computation.

Table 3. BiBLAS-3 parameters.

Full size table

The minimum dimensions of the matrix in mapping for BiBLAS-3 should be $1\times 2$. Algorithm 1 shows step by step mapping of BiBLAS-3. In step 1, the elements of product matrix C is divided into a groups of size c. In this example, C is of $2\times 2$ size with a total of 4 product terms. As the column size of choosen crossbar is 3, product terms are divided into two groups with elements 1, 3, 2 in group 1 and element 4 in group2. In step 2, the terms from matrices A and B, responsible for the formation of each product term in a group, are obtained. Each product constitutes a series of XORed dot products. The first and second dot products of each element in group 1 is computed in reserve1 and reserve2 rows respectively. Before computing third dot product if there are reserve1, reserve2 they will be XORed first and followed by a clear operation on reserve2. Likewise, the following dot products are computed in reserve2 and accumulated to reserve1 by XOR. This is repeated till all the dots products in the choosen group are completed. By the end of step 2, reserve1 contains the products are of group 1. In step 3, the products from reserve1 are copied to free memory and then reserve1 is cleared. steps 2 and 3 are iterated till all the groups are computed on the crossbar. A detailed example is shown in Table 4.

Table 4. BiBLAS-3 computation with the input matrices A and B to compute the product matrix C.

Full size table

4 Experiment

In this section, we analyze the performance of in-memory computation of encoding and decoding operations of BCH code on GF for various order m and crossbar dimensions. The experiments were performed using Cadence Virtuoso, using device-accurate model of ReRAM devices [29]. The dimension of single bit error correcting BCH code varies with change in the order m of GF. Figures 6 and 7 show the delay in terms of number of clock cycles and area in terms of the number of ReRAM devices required to perform encoding and decoding operations, along with element generation of GF elements respectively using ReVAMP. The encoding operation involves generation of the generator matrix from parity check matrix and multiplication of the input data with the generator matrix whereas the decoding operation involves multiplication of the received data with the transpose of parity check matrix. Hence, the number of cycles required for encoding operation are more than the number of cycles required for decoding operation, that can be verified from Fig. 6. Generation of generator matrix from the parity check matrix and multiplication of the input data with the generator matrix are sequential operations, so the same ReRAM devices will be used for both the operations. Thus, the area required for both encoding and decoding operations are basically ReRAM devices used for matrix multiplication. Hence, the area required for encoding and decoding operations are same as shown in Fig. 7.

The increase in dimension of BCH code increases the order of GF, i.e., the delay as well as area requirements for computation of BCH code grow exponentially. Specifically, the number of instructions $I_{m}$ required for generating all the elements of $GF(2^m)$ can be expressed by Eq. (8).

$$\begin{aligned} I_{m}= m+ 11(2^m-m-1) \end{aligned}$$

(8)

Similarly, the number of instructions required for encoding ($I_{en}$) and decoding operation ($I_{de}$) of BCH code can be calculated by Eqs. (9) and (10) respectively, where n and k are the length of encoded data and input data respectively.

$$\begin{aligned} I_{en}= 2n + (n-1)12 + (k*(n-k)) \end{aligned}$$

(9)

$$\begin{aligned} I_{de}= 2n + (n-1)12 \end{aligned}$$

(10)

Addition operation involves XORing the individual bits of the input operands, which is done in parallel using the rows of ReRAM crossbar. Hence, the delay of performing XOR between any number of elements in two rows of crossbar operation remain constant, even with increase in the number of bitlines. The change in the length of encoded data n and input data k leads to change in the delay as well the number of ReRAM devices required for mapping.

The number of bit-level parallel operations on ReRAM crossbar arrays is dependent on the number of bitlines present in the crossbar. Figure 8 demonstrates the impact of number of bitlines on the mapping delay for operations in (15,11) BCH code. With the increase in the number of bitlines, the delay of operations reduce, which demonstrate the effectiveness of the proposed mapping in harnessing the bit-level parallelism offered by the ReRAM crossbar array.

Table 5. Comparison of area and delay for BCH computation on ReVAMP, ASIC and FPGA.

Full size table

Figure 9 presents the mean energy required for element generation and encoding with decoding operation. Due to the large runtime of device-accurate simulations for all possible input combinations, we only report energy number for operations till BCH code with dimension (63,57). With the increase in dimension of BCH code, the dimensions of parity check and generator matrix also increases and hence, the total number of multiplications also increase. This drastically increases delay, area and energy consumption with the increase in dimension of BCH code.

Even though the contemporary technologies such as ASIC and FPGA cannot be directly compared with ReRAM based implementation, we report a coarse comparison for the sake of completeness in Table 5. For ReRAM, we assume mature ReRAM technology with 1 ns read/write times [32]. The DCM column presents the size of the DCM used for in-memory computation, in terms of number of bits. Hence the size of DCM is calculated by multiplying number of wordline with the number of bitlines. The Instruction Memory (IM) column in the Table 5 presents the number of bits required for instruction storage in the IM of the ReVAMP architecture. The FPGA implementation (synthesized at 100 MHz) is on Kintex-7 evaluation board. The ASIC implementation (synthesized at ${\approx }$1 GHz) was performed using Synopsys Design Compiler with TSMC 65 nm technology library. As computing using ReRAM is inherently sequential, increase in the dimension of BCH code leads to direct increase in delay, along with corresponding increase in area (DCM size). For ASIC and FPGA, the increase in delay is relatively lesser. The main advantage of ReRAM is low area requirement in terms of number of devices required for implementation. For example, encoding and decoding operation of (31,26) BCH code requires $\approx $8k GE for ASIC, 510 LUT for FPGA but only 590 devices for ReVAMP.

5 Conclusion

In this work, efficient mapping of encoding and decoding operation of single bit error correcting BCH code was proposed on state-of-the-art ReRAM based in-memory computing platform. Further, we have devised a technique for efficient in-memory realization of BiBLAS-3 operations using ReRAM crossbar array. We have explored multiple configurations for the crossbar dimensions and demonstrated performance trade-offs while varying the dimensions of BCH code. The proposed implementation has a low energy footprint and shows good improvements in terms of area requirements compared to traditional ASIC and FPGA based design. In future, the work can be extended for in-memory implementation of multi-bit error correcting BCH code.

References

Ibe, E., Taniguchi, H., Yahagi, Y., Shimbo, K., Toba, T.: Impact of scaling on neutron-induced soft error in srams from a 250 nm to a 22 nm design rule. IEEE Trans. Electron Devices 57(7), 1527–1538 (2010)
Article Google Scholar
Krasniewski, A.: Concurrent error detection in sequential circuits implemented using FPGAs with embedded memory blocks. In: Proceedings, 10th IEEE International On-Line Testing Symposium, pp. 67–72, July 2004
Google Scholar
Asadi, G., Tahoori, M.B.: Soft error mitigation for SRAM-based FPGAs. In: 23rd IEEE VLSI Test Symposium (VTS 2005), pp. 207–212, May 2005
Google Scholar
Reviriego, P., Argyrides, C., Maestro, J.A.: Efficient error detection in double error correction bch codes for memory applications. Microelectron. Reliab. 52(7), 1528–1530 (2012). Special Section “Thermal, mechanical and multi-physics simulation and experiments in micro-electronics and micro-systems (EuroSimE 2011)”
Article Google Scholar
Chen, B., Zhang, X., Wang, Z.: Error correction for multi-level NAND flash memory using reed-solomon codes. In: 2008 IEEE Workshop on Signal Processing Systems, pp. 94–99, October 2008
Google Scholar
Park, S.P., Lee, D., Roy, K.: Soft-error-resilient FPGAs using built-in 2-D hamming product code. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 20(2), 248–256 (2012)
Article Google Scholar
Neuberger, G., de Lima, F., Carro, L., Reis, R.: A multiple bit upset tolerant SRAM memory. ACM Trans. Des. Autom. Electron. Syst. 8(4), 577–590 (2003)
Article Google Scholar
Poolakkaparambil, M., Mathew, J., Jabir, A.M., Mohanty, S.P.: Low complexity cross parity codes for multiple and random bit error correction. In: Thirteenth International Symposium on Quality Electronic Design (ISQED), pp. 57–62, March 2012
Google Scholar
Jacobvitz, A.N., Calderbank, R., Sorin, D.J.: Writing cosets of a convolutional code to increase the lifetime of flash memory. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 308–318, October 2012
Google Scholar
Chen, B., Cai, F., Zhou, J., Ma, W., Sheridan, P., Lu, W.D.: Efficient in-memory computing architecture based on crossbar arrays. In: 2015 IEEE International Electron Devices Meeting (IEDM), pp. 17.5.1–17.5.4, December 2015
Google Scholar
Chen, Y., Petti, C.: Reram technology evolution for storage class memory application. In: 2016 46th European Solid-State Device Research Conference (ESSDERC), pp. 432–435, September 2016
Google Scholar
Yu, S., Chen, P.: Emerging memory technologies: recent trends and prospects. IEEE Solid-State Circuits Mag. 8(2), 43–56 (2016)
Article MathSciNet Google Scholar
Zhu, L., Zhou, J., Guo, Z., Sun, Z.: An overview of materials issues in resistive random access memory. J. Materiomics 1(4), 285–295 (2015)
Article Google Scholar
Siemon, A., Menzel, S., Waser, R., Linn, E.: A complementary resistive switch-based crossbar array adder. IEEE J. Emerg. Sel. Top. Circuits Syst. 5(1), 64–74 (2015)
Article Google Scholar
Sah, M.P., Kim, H., Chua, L.O.: Brains are made of memristors. IEEE Circuits Syst. Mag. 14(1), 12–36 (2014)
Article Google Scholar
Gaillardon, P., et al.: The programmable logic-in-memory (PLiM) computer. In: 2016 Design, Automation Test in Europe Conference Exhibition (DATE), pp. 427–432, March 2016
Google Scholar
Song, L., Qian, X., Li, H., Chen, Y.: Pipelayer: a pipelined ReRAM-based accelerator for deep learning. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 541–552, February 2017
Google Scholar
Wang, Z., Karpovsky, M.G., Kulikowski, K.J.: Replacing linear hamming codes by robust nonlinear codes results in a reliability improvement of memories. In: 2009 IEEE/IFIP International Conference on Dependable Systems Networks, pp. 514–523, June 2009
Google Scholar
Bhattacharjee, D., Pudi, V., Chattopadhyay, A.: SHA-3 implementation using ReRAM based in-memory computing architecture. In: 2017 18th International Symposium on Quality Electronic Design (ISQED), pp. 325–330, March 2017
Google Scholar
Bhattacharjee, D., Chattopadhyay, A.: In-memory data compression using ReRAMs. In: Chattopadhyay, A., Chang, C.H., Yu, H. (eds.) Emerging Technology and Architecture for Big-data Analytics, pp. 275–291. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54840-1_13
Chapter Google Scholar
Haroussi, M.E., Chana, I., Belkasmi, M.: VHDL design and FPGA implementation of a fully parallel BCH SISO decoder. In: 2010 5th International Symposium On I/V Communications and Mobile Network, pp. 1–4, September 2010
Google Scholar
Khan, M.A., Afzal, S., Manzoor, R.: Hardware implementation of shortened (48,38) Reed Solomon forward error correcting code. In: 7th International Multi Topic Conference, 2003, INMIC 2003, pp. 90–95, December 2003
Google Scholar
Xie, J., Meher, P.K., Mao, Z.: High-throughput finite field multipliers using redundant basis for FPGA and asic implementations. IEEE Trans. Circuits Syst. I Regul. Pap. 62(1), 110–119 (2015)
Article Google Scholar
Mandal, S., Tavva, Y., Chattopadhyay, D.B.A.: ReRAM-based in-memory computation of galois field arithmetic. In: 2019 IFIP/IEEE International Conference on Very Large Scale Integration, VLSI-SoC 2019, pp. 1–6 (2019)
Google Scholar
Couveignes, J.M., Edixhoven, B.: Computational Aspects of Modular Forms and Galois Representations. Princeton University Press, Princeton (2011)
MATH Google Scholar
Kyuregyan, M.K.: Recurrent methods for constructing irreducible polynomials over Fq of odd characteristics. Finite Fields Their Appl. 9(1), 39–58 (2003)
Article MathSciNet Google Scholar
Joiner, L.L., Komo, J.J.: Decoding binary BCH codes. In: Proceedings IEEE Southeastcon 1995, Visualize the Future, pp. 67–73, March 1995
Google Scholar
Bhattacharjee, D., Devadoss, R., Chattopadhyay, A.: ReVAMP: ReRAM based VLIW architecture for in-memory computing. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2017, pp. 782–787, March 2017
Google Scholar
Siemon, A., Menzel, S., Marchewka, A., Nishi, Y., Waser, R., Linn, E.: Simulation of TaOx-based complementary resistive switches by a physics-based memristive model. In: 2014 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1420–1423, June 2014
Google Scholar
Linn, E., Rosezin, R., Tappertzhofen, S., Böttger, U., Waser, R.: Beyond von neumann–logic operations in passive crossbar arrays alongside memory operations. Nanotechnology 23(30), 305205 (2012)
Article Google Scholar
Bhattacharjee, D., Merchant, F., Chattopadhyay, A.: Enabling in-memory computation of binary BLAS using ReRAM crossbar arrays. In: 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), pp. 1–6, September 2016
Google Scholar
Emerging research devices (ERD) report. In: International Technology Roadmap for Semiconductors (ITRS) (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Jalpaiguri Government Engineering College (Autonomous), Jalpaiguri, India
Swagata Mandal
School of Computer Science Engineering, Nanyang Technological University, Singapore, Singapore
Yaswanth Tavva, Debjyoti Bhattacharjee & Anupam Chattopadhyay

Authors

Swagata Mandal
View author publications
You can also search for this author in PubMed Google Scholar
Yaswanth Tavva
View author publications
You can also search for this author in PubMed Google Scholar
Debjyoti Bhattacharjee
View author publications
You can also search for this author in PubMed Google Scholar
Anupam Chattopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Swagata Mandal .

Editor information

Editors and Affiliations

University of Verona, Verona, Italy
Nicola Bombieri
University of Verona, Verona, Italy
Graziano Pravadelli
University of Tokyo, Tokyo, Japan
Masahiro Fujita
University of Michigan, Ann Arbor, MI, USA
Todd Austin
Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Ricardo Reis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mandal, S., Tavva, Y., Bhattacharjee, D., Chattopadhyay, A. (2019). ReRAM Based In-Memory Computation of Single Bit Error Correcting BCH Code. In: Bombieri, N., Pravadelli, G., Fujita, M., Austin, T., Reis, R. (eds) VLSI-SoC: Design and Engineering of Electronics Systems Based on New Computing Paradigms. VLSI-SoC 2018. IFIP Advances in Information and Communication Technology, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-030-23425-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-23425-6_7
Published: 26 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23424-9
Online ISBN: 978-3-030-23425-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

ReRAM Based In-Memory Computation of Single Bit Error Correcting BCH Code

Abstract

Similar content being viewed by others

Design and Evaluation of Neale-Based Multi-bit Adjacent Error-Correcting Codec for SRAM

FPGA Based Low Area Multi-bit Adjacent Error Correcting Codec for SRAM Application

Implementation of Classical Error Control Codes for Memory Storage Systems Using VERILOG

Keywords

1 Introduction