1. Introduction
Internet of Things (IoT) devices are connected electronic devices, vehicles, buildings, and various social infrastructures that communicate with each other and process data in real time. As IoT becomes more popular, the number of connected devices and sensors (we call these “edges”) are sharply increasing, and data are also overflowing from these devices [
1,
2]. The types of devices are growing more varied and computations to process the data are becoming complicated, according to diverse applications in industry fields [
3,
4,
5]. Thus, the complexity of IoT systems has increased.
Traditionally, these various edge devices have performed such operations as sensing data, controlling target systems and environments, and communicating with servers of IoT systems. Recently, edge devices have come to require the ability to process sensed data [
1,
6]. However, the edge environment cannot provide sufficient computing power, due to constraints of the edge environment, such as insufficient power supply, unstable operating environments, and insufficient memory size: the edge environment is also demanding in terms of maintenance, involving frequent device changes, firmware updates, and physical access difficulties [
7,
8,
9].
Cloud computing is proposed to resolve the slow performance of the edge, which derives from various constraints [
10]. Cloud computing can process various data successfully and flexibly using the powerful hardware resources of a server. However, servers used in cloud computing have two constraints: network bandwidth and server workload. As shown in
Figure 1, the more devices are connected and the more data is transmitted to the cloud server, the slower the service processing speed due to these constraints, so real-time processing cannot be guaranteed. Recently, with the development of wireless communication, represented by 5G and Bluetooth, and wired communication, represented by optical fiber, the speed of the network has been increased, and the network bottleneck has been significantly reduced. On the other hand, the workload problem due to the limitation of the resources for a server is getting serious. In addition, the entire IoT system is dependent on the network, making it vulnerable to network errors that can cause the entire system to go offlnie [
9,
11,
12].
To reduce network dependency, edge computing, which processes data at a local computer, was introduced. Edge computing processes data on the computer around the edge device, resulting in low network dependency and the ability to use additional hardware accelerators to expedite data processing. However, IoT systems operating in poor environments and small-scale IoT systems cannot use local computers, so certain edge platforms require systems that can process data on their own, such as MCU (Micro Controller Unit)-based platforms. These MCU-based devices use hardware accelerators, which quickly perform complex tasks but have limited work to do, to compensate for the slow processing speed. IoT edge devices require a stable edge system with a flexible hardware accelerator that can perform various services required in various environments in a small hardware space.
Recently released FPGAs (Field Programmable Gate Arrays) are better suited for edge computing, which requires faster processing speeds and flexible accelerators, with small, low-power, low-cost features. The ASIC-FPGA (Application-Specific Integrated Circuit-Field Programmable Gate Arrays) co-design architecture was proposed to take advantage of the small size and low-power characteristics of the ASIC (Application-Specific Integrated Circuit) and the flexibility of the FPGA [
13,
14,
15]. The functional hardware blocks are partitioned into the ASIC and the FPGA; the entire software application is operated and controlled at the ASIC, and the device-specific hardware is configured at the FPGA. Although FPGAs offer some flexibility, they are difficult to apply to IoT devices because they require physical access for reprogramming. Storing the synthesized hardware module in the device’s memory allows reprogramming without physical access, but it increases the memory size in a way that also increases the power consumption. Due to the limited memory size, it is not possible to prepare a large number of hardware modules, which makes it impossible to guarantee high flexibility.
Hardware-as-a-service (HaaS), which shares hardware in the cloud, has been studied to increase flexibility for hardware at the IoT device [
16]. HaaS is a system that allows remote hardware devices distributed in various regions to be easily accessed through cloud middleware. Using HaaS, the edge device does not require additional hardware for accelerator, providing unlimited kinds of hardware services. However, to use remote hardware, data to be processed must be transmitted through the cloud middleware as in the operation of cloud computing. It makes the response speed of the system dependent on network latency. Also, because the hardware is attached to other devices, only general hardware, not edge-specific hardware, can be used.
This paper extended from our previous work. Our first research represented our initial concept as a reconfigurable fault-safe processor platform [
17]. Our second approach related to on-demand software replacement represents the possibility for real-time on-demand hardware execution [
18]. In this paper, by integrating two approaches, we propose the metamorphic IoT (mIoT) platform, based on the flexible operation of cloud computing, with the powerful operation using hardware accelerators and real-time processing through the network independence of edge computing, as shown in
Figure 2. Unlike previous research that sent data to be processed at the server, mIoT platform receives a hardware configuration to accelerate data processing from the server. By generating the hardware bitstream using the configuration parameter of the edge, the edge device can use edge-specific hardware. The method of transmitting the necessary hardware accelerators allows the edge device to enhance processing speed without network dependencies and gives the hardware the flexibility to cope with many situations. In addition, since the functions of various hardware can be executed in a small reconfigurable region, the overall chip size can be reduced, resulting in a significant reduction in power consumption. When reconfiguring the hardware, only a part of the FPGA is reconfigured, so the hardware reconfiguration and the program execution are processed simultaneously. The callability-based hardware prediction system minimizes time overhead by pre-reconfiguring the next required hardware, according to the program execution flow. Therefore, mIoT can rapidly process various data by reconstructing, in real time, diverse hardware functions, required in different environments on edge devices with limited hardware size.
3. Proposed Architecture
3.1. mIoT Edge Device
Figure 6 shows the structure of the edge device in the proposed mIoT platform. The edge device is a co-design architecture that consists of an ASIC that acts as a processor, an FPGA that acts as an accelerator, and an external flash memory that stores embedded applications. The ASIC is a RISC-V architecture-based processor. The FPGA is divided into a dynamic region and a static region by applying the DPR, so it reconfigures the dynamic region by requesting in real time at the server, according to the operation of the application.
As mentioned above, ASIC is good, in terms of operating speed, power consumption and area, but it cannot be modified once it is manufactured. To accelerate the program while maintaining the advantages of the ASIC, it is important to separate the SM part from the RM part. In this case, study, the metamorphic fault monitor implemented on the FPGA observes the RISC-V processor implemented on the ASIC. To observe various points of the ASIC processor, the metamorphic fault monitor was reconfigured in real time. In this paper, we adopted the Freedom E300 platform, which is an open-source hardware based on RISC-V architecture and managed by SiFive, as an ASIC processor. The Freedom chip platform is designed using Chisel. Chisel parameterizes each hardware component and compiles each module into Verilog description language. The criteria for adopting the processor of the mIoT platform are as follows.
3.1.1. Easy to Re-Design
As the IoT trend changes, hardware must be changed for devices optimized for various environments and operations. If the processor is designed only for a specific operation, power consumption increases, due to unnecessary hardware modules required when operating in different environments. Chisel makes it easy to modify the entire design to generate efficient hardware for various environments by objectifying the hardware with high-level descriptions and parameterizing the specifications of the hardware modules.
3.1.2. Ownership of Design
Even if the processor is modified as required by the designer, it cannot claim to own the design. Such companies as SiFive (Freedom), Cadence (Tensilica), and Synopsys (ARC) have their architecture and they commercialize platforms that can create hardware designs based on their respective architecture. However, this method can only use the hardware provided by the company, but cannot be customized by the user. For copyright reasons, we used the RISC-V architecture of the open-source ISA.
3.1.3. SW/HW Integrated Platform
To create a program that can be executed on a custom processor, we required processor-specific software build tools, such as a compiler, a linker, and a locater. The Freedom platform makes it easy to create hardware-optimized software, because it builds the processor-optimized build tools when building the processor.
3.2. mIoT Server
The mIoT platform has the advantage of being able to reconfigure the edge device in real time upon request. However, to guarantee real-time program execution at the edge, it is necessary to manage the time required for hardware reconfiguration. Therefore, the mIoT server must be managed to minimize the overhead of transmitting and reconfiguring hardware required for multiple edges.
Figure 7 shows the overall operation of mIoT platform. The mIoT server consists of edge servers that connect a group of IoT devices and the main server that connects edge servers. The task allocator receives the reconfiguration requests, sent by the IoT device, and assigns it to the queue of each reconfiguration processing engine (RPE). The RPE consists of a Vivado programming engine that can implement hardware into the edge’s FPGA, a BCA unit that manages the hardware to be reconstructed using callability-based prediction, and a decoder that interprets the request. While the embedded program is running, the edge’s hardware is pre-programmed based on callability. When the pre-programmed hardware is hit, that is expressed by edge-hit, and the MCU of the edge can use the accelerator directly, so the time overhead required for hardware is not required.
If the pre-programmed hardware is not hit, that is expressed by edge-miss, and the edge device asks the edge server to reconfigure the desired hardware. The transmitted reconfiguration request is allocated to the idle queue by the task allocator. The decoder fetches the queued request and decodes the necessary hardware information and the identification of the edge node. The BCA uses the decoded information to determine which hardware to reconstruct. If the bitstream of hardware being reconfigured is in the edge server’s bit storage, as denoted by server-hit, the IoT device is immediately reconfigured using the Vivado programming engine.
If the required bitstream is not in its storage, the edge server requests the required hardware from the main server, and the main server synthesizes the required hardware and transmits the bitstream. While the bitstream is implemented and executed at the edge device, the main server predicts the hardware to be used next, based on callability, and generates bitstreams and stores them in the edge server’s storage. In the mIoT platform, edge devices have very little space to store hardware bitstreams. The edge server has more storage than the edge device, and the main server has more storage than the edge server. To allow access to multiple bitstreams at the edge device with the least amount of time overhead, the mIoT platform uses a cache replacement algorithm, least recently used.
The Algorithm 1 represents the operation of the edge server according to the operation of the device. When the execution of is finished, the edge server programs , which is a predicted bitstream, to RM. If the pre-programmed is not the desired hardware, the edge server requests the desired hardware with signal. If the desired hardware, as denoted by , is in the edge server’s storage, program the in the RM. In the absence of , the edge server requests hardware synthesis from the main server and programs in RM. While the programmed is running at the edge, the edge server updates and .
Bitstream generation flow is shown in the Algorithm 2. On the main server, programs that are independent of the properties of the edge device are pre-synthesized. If the requested hardware is in the set of pre-synthesized code, the program is transmitted to the edge server and immediately programmed at the edge. To generate the bitstream optimized for the edge, the optimized Verilog code is synthesized and implemented. The generated code is transmitted to the edge server, programmed at the edge, and the main server predicts the next bitstream to be executed according to the caching algorithm, and synthesizes it to update the
of the edge server.
Algorithm 1: Edge bitstream caching algorithm. |
|
Algorithm 2: Generation flow of partial bitstream |
|
Figure 8 shows the overall execution scenario of each case that can occur in the proposed mIoT platform.
Figure 8a shows an example of callability from the perspective of tasks requiring hardware reconfiguration. When task B is executed, the callability of tasks D, E, and F being called are 75%, 5%, and 20%, respectively. After task D is called, the callability that task I and J will be selected next to task H are 95% and 5%, respectively. Each path has a different callability.
Figure 8b shows the operation scenarios of the server, according to given example.
Case A shows the example of edge-hit with pre-programmed hardware based on callability. When the application starts to execute, the RM A stored on the edge server is reconfigured on the FPGA. The edge server requests and generates the RM D with the highest callability from the main server. When task A ends, the edge server pre-programs the prepared RM D into the FPGA. If the processor calls pre-programmed task D after task B, an edge-hit occurs and the edge device executes the application without time overhead due to reconfiguration.
In case B, edge-miss occurs because the pre-programmed RM D does not match, but server-hit occurs because the requested RM F exists in the storage of the edge server. If edge-miss occurs after task B is finished, the edge server searches the requested hardware at the storage and the FPGA is immediately reconfigured. In case of server-hit, the processor pauses due to the time overhead of partially reconfiguring the FPGA during application execution.
In case C, the bitstream that was previously generated by the edge server was also not hit, as expressed by server-miss. When server-miss occurs, the edge server requests hardware generation from the main server and receives it to reconfigure the FPGA. The main server generates bitstreams by synthesizing and implementing Verilog code. Then the processor takes a long overhead, because the requested hardware must be generated from Verilog code and wait to be reconfigured in the FPGA. After the FPGA is reconfigured, the edge server and main server update the bitstream set in the storage, according to the callability.
The mIoT server can perform synthesis and reconfiguration in parallel, in response to multiple requests. The bandwidth of tasks that can be executed in parallel depends on the server’s specification. If the number of requests exceeds the bandwidth, each request is managed by the task queue. Also, as shown in Equation (
1), the server-miss reconfiguration overhead occurs for each device at the beginning of the operation, and consequently, the reconfiguration overhead converges to the edge-miss overhead. The edge-miss frequency can be reduced if the edge-hit frequency is increased which requires increasing the observation depth for determining callability. By combining the above operations, it is possible to ensure real-time performance by reducing the overhead required to reconfigure edge devices and effectively manage multiple IoT edge devices.
where:
n = Number of executions
= Average overhead for reconfiguration
= Reconfiguration overhead according to server-miss
= Reconfiguration overhead according to edge-miss
6. Experiment
To verify the mIoT platform proposed in this paper, we combined the RISC-V processor implemented by a Hynix/Magnachip 350nm process and the Xilinx Arty-7 35T FPGA (AVNET, Tokyo, Japan), which can be partially reconfigured in real time to construct the mFSP edge device presented in the case study. As shown in
Figure 13, The main server and the edge server for managing the reconfiguration of the edge device were constructed by the Windows Vivado environment. In the proposed platform architecture, the edge device and the server are connected wirelessly to manage the reconfiguration operation. Several studies have been published on wireless configuration of FPGA [
28,
29]; therefore, in the actual commercialization stage, we adopted the wireless configuration proposed in other studies and used JTAG configuration in this experiment.
The control flow of the software application to determine the operation of the processor is shown in
Figure 14. The white circles in the figure are function blocks that operate software without the help of hardware, and the gray circles are function blocks that require additional hardware operation. When a gray block is executed, the processor requests the hardware to reconfigure the FPGA to the server, and the processor waits until the requested hardware is implemented. Each block that requires hardware has a certain level of callability, and caching operations for efficient hardware reconfiguration operate based on this callability. We designed the monitoring application to observe various parts of the FSP and presented difficult cases, in which hardware is called excessively, and enough cases that require one hardware call per execution.
The callability of the application used in the experiment was determined by repeating an execution 1000 times, and the result is shown in
Figure 15. The blocks that need hardware reconfiguration are B, C, E, F, G, L, M, N, P, and Q, and block 5 and 6 affect the callability of blocks that require hardware reconfiguration.
Figure 15a shows the callability with an observation depth of 1, in which past execution does not affect current execution.
Figure 15b shows the callability of a case with an observation depth of 2, in which only the immediately preceding execution affects the current execution. Finally,
Figure 15c shows the callability with an observation depth of 3, observing up to two previous stages. Based on the callability shown in the above table, the BCA predicts the next hardware.
As shown in
Figure 16a, the average time required for hardware reconfiguration after edge-miss is 6.5 s. However, the actual time used to reconstruct the hardware is C, which is, on average, about 0.8 s. A and B represent the time required to start the platform, so these can be ignored in the context of long program execution. When server-miss occurs, as shown in
Figure 16b, the server takes, on average, 57.4 s to generate, synthesize, place, and route (PNR) the Verilog code to generate bitstream. F is a step that consists of combining SM and RM, and if the module has previously been made, the D and E steps that create the module are not executed.
Figure 17a shows the edge-hit ratio, according to the observation depth that determines the callability of the platform. The edge-hit is the most important factor, because it eliminates the time overhead involved in hardware configuration. We measured the edge-hit ratio at depths of 0, 1, 2, and 3, according to the number of executions of the program. The meaning of callability not being applied is that the caching algorithm arbitrarily determines the next hardware, considering only the operation sequence. For example, after B or C is executed, the callabilities of L, M, and N are 33%.
Figure 17a indicates that the higher the observation depth, the higher the edge-hit ratio that can be obtained.
Figure 17b shows the ratio of server-hit, according to the depth of observation. The server-hit occurs when the requested hardware is in the server’s bitstream storage. When the number of program executions is small, the server-hit ratio increases as the depth increases. However, if the program execution continues, the server-hit ratio becomes saturated, resulting in a similar value, regardless of depth.
Figure 18 shows the edge-hit ratio, according to the applied caching algorithm and the observation depth, as well as the time overhead when the hardware is reconstructed 18 times. This experiment was based on the assumption that the application has been sufficiently executed, so that server-miss does not occur. If edge caching was not applied, server-hit overhead was required for all reconfiguration operations, thereby requiring the largest reconfiguration overhead (13.34 s). When the caching algorithm was applied without callability, the edge-hit ratio was 16.67% and the reconfiguration overhead was 10.64 s. With an observation depth of 1, the edge-hit ratio was 27.78% and the overhead was 9.93 s. When the depth was 2 and 3, the edge-hit ratios were, respectively, 55.56% and 66.67%, and the overheads were, respectively, 6.04 s and 5.12 s. Therefore, the average overhead to reconfigure the hardware 18 times with an edge-hit ratio of 66.7% was 0.28 s.
The frequency of occurrence of edge-miss and server-miss, according to the number of executions and the reconfiguration overhead, are shown in
Figure 19: the red graph represents the instantaneous reconfiguration overhead; the black dotted line represents the accumulated server-miss overhead; and the solid black line represents the accumulated total overhead. The black dotted line shows that all instances of server-miss appear at the beginning of the run and no longer occur after the generation of all partial bitstreams. Therefore, we only must observe how frequently edge-miss occurs. We confirmed that more edge-miss overhead is required when edge caching and callability are not applied, as per
Figure 19a, than when callability and edge caching are applied, as per
Figure 19b. Also, when the observation depth is high, we confirmed a reduction in edge-miss overhead, according to
Figure 19b.
We conclude that a reconfiguration overhead of approximately 0.28 s per reconfiguration is required to achieve a 70% hit ratio, according to the results in
Figure 18. However, this result can be seen as a reasonable reconfiguration overhead at a single-device level, but it is not reasonable for large-scale IoT systems. For example, edge-misses occur every cycle at 30% of the total nodes. For a system with 100 nodes, reconfiguration requests of 30 nodes are sent at one time to the edge server and the latency rapidly increases. Therefore, it is necessary to increase the hit ratio to apply the platform to large-scale IoT systems. Some ways to increase the hit ratio are as follows: observing more historical hardware call flows to accurately extract callability, and fetching multiple candidate RM modules to the edge device. In the previous experiment, we confirmed that the hit ratio is proportional to the depth of observation. In this experiment, we implemented two dynamic regions in the FPGA with a control block to increase the edge-hit ratio, as shown in
Figure 20b. Both dynamic regions are reconstructed when edge-miss occurs. When one of the two reconstructed RMs is selected to start, the other region is reconstructed according to the new callability. In the structure with double RMs, we can apply
Figure 14b, which is an enough-case application, in which reconfiguration occurs once per a single program execution cycle.
Figure 21 shows the result of applying the double RM’s structure to a single device, as in previous experiments. In this experiment, we observed a long-term view, which consists of 1000 execution cycles after the program has been executed for a long time. As a result, the hit ratio with the single-RM case is 76%, and the hit ratio with the double-RM case is 99%. The results of a large-scale experiment, using 100 edge devices with 99% hit ratio and three RPEs, are shown in
Figure 22. One device was implemented as an actual mIoT edge device, which consists of a RISC-V processor and FPGA, and the other devices were edge devices emulated by allocating applications in the edge server. Each device had zero to four reconfigurations during 200 executions, and the average reconfiguration overhead time was between 0.1 and 0.35 s. The reconfiguration overhead times for a given number of RPEs and a given number of edge devices are shown in
Figure 23. The gray boxes indicate overhead results of less than 0.3 s. The number of RPEs and the number of edge devices connected to the edge network is determined differently, depending on the application.
Based on the above experimental results, it was possible to confirm that the proposed platform and the callability concept can manage the operation of each device with a reasonable reconfiguration overhead. Also, as the callability becomes more accurate, the correct RM candidate module is pre-programmed in the edge device, reducing the overall reconfiguration overheads of the IoT system.
8. Conclusions
In this paper, we proposed a metamorphic IoT platform that combines the following components and concepts to enhance stability in the diverse IoT operation environments and the processing capability at the edge: an ASIC-FPGA co-design-based edge device, able to provide various hardware with limited resources, a server system to manage a large number of edge devices connected to a network, and a callability to reconfigure edge devices with minimal time overhead. The IoT edge device is optimized for the operating environment, based on the Chisel language and RISC-V architecture, and the FPGA is partially reconstructed in real time, according to the operation of the processor. The server predicts the next hardware to be used in the edge, based on the callability, greatly reducing the time required to reconfigure the edge device. The proposed platform broadens the range of hardware that one edge device can use, and facilitates the management and update of reconfigurable hardware, thereby extending the operating life of a given edge device. In addition, it processes data using hardware accelerators at the edge, so the data transfer time is less than that of the existing platform, which makes the operation speed fast and allows for the resolution of the bottleneck of the server. With the experiment, which used actual devices and an interworking server, we confirmed that the overhead required for reconfiguration is reasonable and that the proposed callability-based reconfiguration system is efficient. Using the proposed platform, we can construct an IoT system that can guarantee flexible operation in real time in a complicated IoT environment.
This paper achieved real-time reconstruction of various accelerator in limited hardware size to increase the processing speed and reduce power consumption of the edge device. The mIoT platform has several security issues, such as man-in-the-middle attack and hardware Trojan, while transmitting hardware information for reconfiguration over the network. In the future, this study will be expanded to apply encryption for hiding bitstream from attackers.