3.1. State-of-the-Art of AIoT
In fact, nowadays, not all IIoT/IoT devices have the capability to make decisions, low-end IIoT/IoT devices (due to resource constraints) only sample environmental data, send sensory data to the IoT cloud through the edge router and wait for the IoT cloud to decide to perform an action. A High-end IIoT/IoT devices can be integrated with a relatively complicated deep learning algorithm able to perform (intelligent) decision making by interacting directly and locally with defined actuators or neighboring devices. This capability is essential to enable large-scale deployment of IIoT/IoT devices (millions) by avoiding internet network congestion and improving security, safety and privacy. In general, the terms IIoT/IoT do not indicate whether or not the IIoT/IoT devices have cognitive capability.
Hou et al. discussed and highlighted the concept and the difference between a simple/low-end IoT device and a smart object (high-end IoT or AIoT device) [
9].
Figure 2 presents a deployment of a basic AIoT platform for a smart irrigation application including local or/and remote servers.
In fact, due to resource constraints, how smart can a smart object be?
The use of artificial intelligence (AI) helps to understand, learn, reason, and interact, and thus, it increases efficiency. AI technology such as machine learning allows many types of correlations of large amounts of structured and unstructured data from multiple sources to extract knowledge and to take action. AI was born officially in 1956, but the first neuron model was defined in 1943 by McCulloch and Pitts [
10]. Initially, AI investigated propositional logic and the representation of knowledge, i.e., an expert system [
11]. The developed methods and tools were related to knowledge-based reasoning, represented by facts and rules. Over the decades, the field of AI was extended to become a multidisciplinary science such as machine learning, the adaptive control theory, the information theory, and the theory of computation and game theory [
12,
13]. Deep learning is a subset of machine learning and is a multilayer neural network. Today, deep learning outperforms other algorithms for gesture analysis, video image processing (object detection and recognition) and speech processing (speech recognition and translation). It should be noted that AlphaGo DeepMind of Google is running on a supercomputer built around 1202 CPUs and 176 GPUs [
14]. Admittedly, the deep learning algorithms based on von Neumann’s CMOS architecture are efficient, but they consume a lot of energy. For example, the next-generation computer technology is expected to solve problems at the exascale with 10
18 calculations each second and it will consume between 20 and 30 megawatts of power [
15]. Thereby, about 5–15% of the world’s energy is spent in some form of data manipulation, transmission or processing [
16]. It should be noted that the human brain only consumes about 20 W. The brain is based on large collections of neurons (approximately 86 billion neurons), each of which has a cell body (soma), an axon and dendrites. The information or action potential is transmitted from one neuron to another neuron through synapses, as illustrated in
Figure 3. The neuronal signal action potential or spike consists of short electrical pulses with amplitudes of about 100 mV and typically a duration of 1–2 ms and is generated by the soma as soon as its membrane potential reaches a critical value ϑ [
17]. In the 1990s, Carver Mead investigated an unconventional computer architecture for mimicking brain function called neuromorphic computing [
18]. Due to the inherent asynchrony in memory computing (synapses) and the sparseness of spike trains, neuromorphic computing is energy efficient for performing cognitive tasks (deep learning algorithms). Note that deep learning is based on multi-layered neural networks and classified into two types: artificial neural networks (ANNs) and spiking neural networks (SNNs). SNNs are considered to be the third generation of neural networks.
The neuron model of an ANN is based on McCulloch and Pitts model. An artificial neuron takes in some number of inputs (x
1, x
2, …, x
n), each of which is multiplied by a specific weight (w
1, w
2, …, w
n). The logit of the neuron is
where b is the bias of the neuron. The output of the neuron is expressed as:
where
f is the activation function.
There are four major activation functions: sigmoid, tanh, softmax and restricted linear unit (ReLU). The ReLU activation function is defined as . In general, to train an ANN, the ReLU activation function, the back propagation algorithm and the stochastic gradient descent (SGD) are used to minimize the ANN output error. The effectiveness of training ANNs with large databases in different applications has been proven; however, this is not the case with SNNs.
Unlike ANNs, it is difficult to train SNNs with direct backpropagation due to the non-differentiable nature of their activation functions. One of the most popular ANN-SNN conversion models used is the linear leaky integrate-and-fire (LIF) model [
19]. It is expressed as follows [
20]:
where
is the membrane time constant, I
inj(t) is the input current,
is the membrane capacitor, Rm is the membrane resistor and
V(
t0) is the initial membrane potential [
20].
where
is the delta function
is the number of spikes in time windows T
w.
In [
20], the authors demonstrated the equivalence of a linear LIF/SNN and ReLU-ANN model. This proof allowed ANN-SNN conversion and BP with surrogate gradients and direct supervised learning. Moreover, the leaky integrate-and-fire (LIF) neuron has been implemented with a few arithmetic components, such as an adder and comparator [
21]. Kashu Yamazaki et al. presented an overview of different SNN neuron models for different applications [
22]. While a spike timing-dependent plasticity (STDP) model, which is biologically plausible allows unsupervised learning, the lack of global information hinders the convergence of large SNNs to be trained with large and complex datasets.
The architecture of neuromorphic computing exploits the property of low-power new nonvolatile memory based on resistive switching materials, such as phase-change memory (PCM), ferroelectric device, valence change memory (VCM), electrochemical metallization cells and spintronics [
15], to locally implement the integrate-and-fire function (neuron synapse) in the memory cell, i.e., in-memory computing (IMC). IMC avoids intensive backward and forward data transfers between memory and processing units (CPU, GPU, TPU, etc.) in conventional von Neumann architecture. Consequently, IMC reduces power consumption and throughput latency. The different technologies of new memory materials (memristor) are well described in [
15].
Figure 3.
A typical structure of a biological neuron and synapse [
22].
Figure 3.
A typical structure of a biological neuron and synapse [
22].
Today, there are four categories of neuromorphic computing. The first category (DeepSouth, IBM TrueNorth and Intel Loihi) uses digital CMOS to emulate brain functions. The second category (SpiNNaker, Tianjic, etc.) is based on a software approach (neural network communication protocol and specific hardware architecture) to accelerate ANN and SNN execution. The third category (BrainScaleS, Neurogrid and MNIFAT) uses analogue and mixed-signal CMOS to reproduce a real neuron model [
23]. The fourth category uses FPGA-based neuromorphic platforms which outperform previous platforms in terms of power consumption, response time and number of neurons implemented. Yang et al. proposed a CerebelluMorphic system using six Intel Stratix III EP3SL340 FPGAs to realize a large-scale neuromorphic cerebellar network with approximately 3.5 million neurons and 218.3 million synapses [
24]. While Wang et al. presented a new abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogous to the fundamental structural units observed in neurobiology. As a result, implementation on one Altera Stratix V FPGA was able to simulate from 20 million to 2.6 billion leaky-integrate-and-fire (LIF) neurons in real time [
25]. The Intel Programmable Solutions Group (neuromorphic computing) and the International Center for Neuromorphic Systems at Western Sidney University (WSU) are building a neuromorphic platform using 168 Intel Stratix 10 PGAs with high-bandwidth memory (HBM) and an accelerator configurable network protocol (COPA) to simulate the human cortex (LIF model). It is estimated that the human cerebral cortex has from 10 to 20 billion neurons and from 60 to 240 trillion synapses [
26]. Proof-of-concepts (PoCs) developed on FPGA-based neuromorphic and memristor IMC platforms will lead to the next significant advances in SoC design of low-cost, low-power AIoT devices.
So far, although there has been significant progress in neuromorphic computing over the last decade, its market is insignificant (USD 200 million in 2025) compared to that of conventional computing [
15].
Nowadays, there are two emerging trends in AI systems that aim at implementing low-power embedded AI devices: neuromorphic computing and TinyML. In this article, we only focus on the TinyML technology ecosystem for implementing low-cost and low-power AIoT/IIoT/IoT devices and establish its assessment.
Henceforth, it is important to investigate the cognitive techniques that can be embedded to build an AIoT device by considering its resource constraints. The computing power required for an AIoT device is related to the size of the input sensory data, the sampling frequency, i.e., the duty cycle, and the application algorithm. In fact, we can define two categories of AIoT devices: scalar and multimedia. In general, a scalar AIoT device requires less computing power than a multimedia AIoT device and requires less than 0.5 TMAC to run a simple embedded deep learning algorithm (e.g., linear regression). In addition, a multimedia AIoT device requires more computation power to embed deep learning inference according to the applications such as face recognition (0.5–2 TMACs), AR/VR (1–4 TMACs), smart surveillance (2–10 TMACs) and autonomous vehicles (10S–100S TMACs) [
27]. Deep learning has the ability to learn without being explicitly programmed, and it outperforms other algorithms for gesture analysis, video image processing (e.g., object detection and recognition) and speech processing (e.g., speech recognition and translation). These functions play key roles in time-sensitive human-machine interface (HMI) requirements for metaverse, digital twin, autonomous vehicle and Industry 4.0 applications. Here, we introduce the available frameworks for developing deep learning applications to be embedded on AIoT/IIoT/IoT devices in
Section 4.
3.2. AIoT/IIoT/IoT Device Hardware
In general, AIoT/IIoT/IoT devices have four main low-power components:
- -
A processing unit and sensor interfaces based on a single-core or multicore MCU to implement AIoT/IIoT/IoT device safety and security;
- -
Low-cost and low-power consumption sensors usually based on MEMS/NEMS sensors;
- -
A power management unit which is essential to increase the lifetime of the AIoT device by minimizing its power consumption;
- -
Wireless access medium based on single wireless access medium or multiple wireless access medium.
Figure 4 illustrates the basic hardware architecture of AIoT/IIoT/IoT devices.
Due to resource and cost constraints, the design of AIoT/IIoT/IoT devices is usually specific to meet application requirements such as safety, form factor, time sensitive (real-time constraint) and power consumption (lifetime). In fact, resource constraints condition the implementation decision to adopt a specific hardware (MCU and wireless access medium) and firmware architecture (e.g., smartwatch).
Today, for the conventional von Neumann hardware implementation of AIoT/IIoT/IoT devices, two technologies are available: system-on-chip (SoC) and commercial-off-the-shelf (COTS). SoC is used to implement small form factor (e.g., smartwatch), low-power consumption and low-cost (mass production) AIoT/IIoT/IoT devices such as FitBit sense, Apple watch, Huawei watch and medical patches (e.g., Medtronic’s SEEQ cardiac monitoring system), whereas, COTS is applied for small series production, i.e., for testing and validation or more complex AIoT/IIoT/IoT devices.
Note that the difference between the two approaches is mainly in the use of manufacturing technology because in terms of hardware and firmware architectures, the concepts applied are similar.
3.2.1. Processing Unit of an AIoT/IIoT/IoT Device
Over the past 5 years, IoT and AI technologies have made tremendous progress in the implementation of IoT devices driven by the requirements of the development of smartphones, autonomous vehicles (real-time object detection and recognition), smartwatches and metaverse (VR/AR). The current market trend focuses on implementing high-performance AIoT devices, which require symmetric/asymmetric multicore architecture for fault tolerance and computing power (deep learning). MCU manufactures such as ST and NXP provide MCU target portfolios based on ARM IP (e.g., ARM-M and ARM-A) for each application domain: smartwatch, smart care, smart home, smart car and factory automation. Note that, nowadays, ARM MCU IP dominates the embedded market, but this dominance will be reduced in the coming year due to the open-source RISC-V IP. By 2025, 40% of application-specific integrated circuits (ASICs) will be designed by OEMs, up from around 30% today [
28].
New emerging microcontroller manufacturers such as Espressif [
29], STARFive Technology [
30] and ANDES Technology [
31] are offering new microcontroller portfolios dedicated to implementing low-cost AIoT devices based on open RISC-V (open source IP). Moreover, new MCUs based on subthreshold technology (e.g., Ambiq subthreshold power optimized technology (SPOT)) and MRAM significantly increase AIoT device resources (memory and computing power) while minimizing power consumption (e.g., Apollo4 from Ambiq) [
32]. In addition, today, co-design SoC CAD tools (
Figure 5) such as Cadence provide a wide choice of processing units from single core to symmetric, specific symmetric (DSP) and asymmetric multicore to ease the SoC implementation of AIoT/IIoT/IoT devices [
33].
3.2.2. AIoT/IIoT/IoT Wireless Access Medium
A wireless access medium is the critical component of AIoT/IIoT/IoT devices in terms of power consumption, data reliability and latency. Many advances in the field have been achieved but it is still not sufficient to meet their widespread use in terms of safety (reliability) and security, communication range and bandwidth, power consumption and interoperability, despite the many IEEE standards available. Therefore, new low-energy and high-bandwidth wireless access media, i.e., IEEE 802.11ah (Wi-Fi 6 and Wi-Fi HaLow) and BLE 5.x. are being developed to meet different domain applications such as VR and AR. Wi-Fi and Bluetooth (classic and low energy) are set to dominate throughout the forecast period with almost three-quarters of connected devices using these technologies due to smart phones and smartwatches, and, as a result, local wireless communication represents 53% of the wireless access medium, according to the review in [
34].
In
Table 1, the Tx max current is indicated, knowing that wireless power consumption depends on the duty cycle and message size of the AIoT/IIoT/IoT application.
Note that Bluetooth and Wi-Fi are the most used wireless access media to implement smart object devices driven by smartphones and smartwatches. Hence, the IoT ecosystem, in particular, the wireless access medium continues to evolve quickly and is steadily driven by 5G/6G and Wi-Fi 6/7.
3.2.3. AIoT/IIoT/IoT Sensor
As wireless access media, embedded sensors constitute the key components of AIoT/IIoT/IoT devices and significantly impact their form factor and power consumption (lifetime). According to a recent Allied Market Research report, the global smart sensor market will grow at a compound annual growth rate (CAGR) of 18.6 percent or USD 143.65 billion by 2027 (from 2020). A sensor quantifies the measurand of the surrounding environment and provides an electrical output signal corresponding to the amount of measurands present. There are different classifications of sensor devices [
45].
Table 2 presents a sensor classification by considering their signal stimulus type and the measurand (attribute).
Nowadays, different technologies are used to implement different types of sensors, but from our point of view, MEMS/NEMS technologies are the most promising for implementing many types of sensors with low-cost, low-power consumption and small form factor for various applications from smart wearable sensors to Industry 4.0. In fact, MEMS/NEMS technologies can be used to implement low-cost sensors to measure stimulation signals, as shown in
Table 2. These sensors can detect a wide spectrum of signals from the physical world. Due to economic and technological challenges, MEMS/NEMS technologies continue to make very rapid and significant progress [
46]. CMOS and MEMS/NEMS technologies will be more and more tightly coupled, which will enable implemention of SoC smart, low-cost and low-power consumption smart object devices [
47]. In the near future, CMOS VLSI circuit, MEMS/NEMS sensors (
Figure 6a) [
47] and CMOS-compatible MEMS/NEMS switch (
Figure 6b) [
48] may be implemented on the same substrate. These significant achievements will reduce the power consumption, cost and form factor of AIoT/IIoT/IoT devices. In fact, on the one hand, it will allow removal of the interface between the sensors and the MCU, and on the other hand, the MCU will only be activated (powered on or woken up) via a CMOS compatible MEMS/NEMS switch when a relevant sensory input has occurred.