1 Introduction

The Internet of Things (IoT) is experiencing rapid growth and assuming an increasingly significant role in our everyday existence. IoT nodes can establish a connection with the Internet using an Internet Protocol (IP) address [1]. The past decade has witnessed a significant surge in the level of interconnectivity among individuals, machines, and services, ultimately leading to the emergence of a novel communication paradigm referred to as the IoT [2]. The proliferation of self-configured smart nodes is fueling the development of a wide range of innovative applications, including but not limited to home automation, process automation, smart automobiles, health-care systems, decision analytics, smart grids, industrial development, and autonomous cars [3]. It is predicted by analysts that in the future, the number of interconnected devices will surpass that of the human population on Earth. As per the International Data Corporation's projections, by the year 2025, a total of 41.6 billion interconnected IoT devices are expected to generate a staggering amount of 79.4 zettabytes of data, in contrast with the anticipated global population of 8.1 billion individuals [4].

The IoT is vulnerable to a range of security threats and presents significant security challenges for end-users, particularly as it continues to expand into various aspects of communal life, as shown in Fig. 1. The IoT is a complex system of various networks that include security measures for sensor data, Internet and mobile network connectivity, privacy protection, network authentication, access control, and information management, as noted in the source [5]. In recent years, the occurrence of anomalies and security breaches on IoT devices has become increasingly prevalent. The Internet of Things infrastructure framework is becoming increasingly complex, which is resulting in the introduction of undesired vulnerabilities into its systems. The IoT has the potential to facilitate the seamless integration of physical objects into networks, thereby providing advanced information services to individuals. A multitude of IoT services and applications that utilize ML have emerged across various domains such as security, surveillance, health care, transportation, control, and object monitoring. Preventative security measures are often limited by inadequate planning and implementation, and given the inevitability of attacks, machine learning systems can offer essential services and resilient security strategies for safeguarding IoT devices [6].

Fig. 1
figure 1

The Internet of Things scenario

The attack detection system is classified as either a signature-based or an anomaly-based system. Signature-based system attacks compare certain patterns, such as bytes or harmful instruction sequences, in malware-infected network traffic to known attack types stored in a database [7]. Systems based on anomalies detect unknown threats or deviations from the typical flow. Unlike signature-based detection systems, machine learning-based solutions have the potential to detect unknown attacks. However, the ML models must be sufficiently precise to maximize high accuracy, increase the detection rate (DR), have a high ROC, and minimize false alarms [6]. They must be trained and assessed on genuine datasets to demonstrate their efficacy in real-world deployments. The basic strategy is to utilize ML to create a model of legitimate action and then analyze new behavioral attacks against the ML model.

As a result, numerous approaches and strategies such as data encryption, firewalls, and user verification via the fog computing model have been created and implemented to defend the IoT platform. These attack channels and risks continue to evolve, rendering traditional security solutions inefficient and ineffective at addressing the IoT safety challenge, paving the way for a new wave of IDS based on ML. A substantial amount of work and study has been undertaken to determine the optimum intelligent IDS for various types of applications in IoT-based environments [8]. As IDS is one of the key remedies used to ensure IoT security, there is a propensity to employ multiple techniques concurrently [9]. Alharbi et al. [10] proposed an IoT security proof-of-concept system built into the fog computing layer. Each unit defends against a specific type of attack. The IDS of traffic analyzer components was employed to spot DDoS and DoS attacks with a classification engine based on the decision tree ML technique. To authenticate the IDS's answer, the challenge-response component sends a challenge communication in the event of intrusion detection. As a result of the system's failure to respond to this message, the firewall unit disables the connection. Pajouh et al. [11] introduced a unique layered IDS for IoT mainstay networks that use a two-tier (2-tier) dimensionality reduction and classification phase. The dimension reduction engine is built of component analysis and LDA units, while the classification engine is composed of NB and a cascaded version of the CF-KNN units. The NB was utilized to classify attack records, which were further improved using the CF-KNN algorithm as a secondary filter layer. Using the NSLKDD [12] dataset, the suggested model demonstrated modest uncovering performance for difficult-to-catch attacks, specifically those belonging to the U2R and R2L classes. Zhang et colleagues [13] used the UNSW-NB [14] standard dataset to illustrate the efficacy of ML-based intrusion detection using a full depiction of modern IoT attack scenarios. They employed a new feature selection engine that applied DAE founded on a biased loss function, despite using a simple MLP as an algorithm. This unique feature selection technique resulted in an increased emphasis on attack-representative features. Koroniotis et al. [15] proposed an IoT network forensic framework consisting of C4.5, ARM, NB, and ANN ML approaches to recognize and spot novel and complex forms of present botnet attacks as another application of the UNSW-NB dataset.

Traditional ML techniques and approaches have been widely used due to their high accuracy for attack detection and low false alarms, but they have been disapproved for their inability to detect innovative threats. Traditional ML techniques are incapable of identifying composite and new attacks. The mainstream of mutation attacks is minor alterations known as cyberattacks in modern times. The prior logic and conceptions serve as the basis for the novel attacks. This means that typical ML models will fail to recognize minute mutations because they are incapable of abstracting information to discern novel threats [16]. Hence, a more robust, intelligent method for IoT attack detection is needed. Therefore, this paper proposes an ensemble learning method. Ensemble learning for resilient IoT security is a strategy for solving a specific artificial intelligence-based challenge by combining different models or expertise. Ensemble learning enhances generalization, simplification, and voting among the various ensemble strategies in the intrusion detection problem, resulting in a higher detection performance than individual models [17]. The paper's primary contributions are as follows:

  • Propose a new voting ensemble learning approach for IoT intrusion detection (To the best of our knowledge, this is the first voting GWO-optimized ensemble model for intrusion detection in the IoT).

  • Analyze the model using feature extraction (principal component analysis) and feature selection (information gain) for dimensionality reduction. We created a hybrid IG + PCA technique for feature selection, feature extraction, and GWO-optimized ensemble models for classification tasks.

  • Based on network traffic characteristics, low-cost and mountable cyber intrusion detection for IoT are proposed.

  • Suggest several realistic datasets for IDS in the IoT environment.

  • Develop a voting ensemble model based on the average of probability to increase the detection accuracy and decrease the false alarm rate to detect cyberattacks in the IoT.

  • Leverage the realistic BoT-IoT and UNSW-NB15 datasets that reflect modern-day attacks and are representative of real-world attack scenarios in IoT which also satisfy IoT protocol requirements as against outdated and non-representative datasets used in some previous studies.

The paper is divided into seven sections. The literature and existing works are presented in Sect. 2. The proposed methodology is detailed in Sect. 3. Section 4 presents the GWO-optimized ensemble models, and Sect. 5 presents the experimental setup. The findings and discussions are given in Sect. 6. Section 7 contains the conclusion and recommendations for future work.

2 Related work

The study [18] employed bloom filtering for signature matching and offered a dynamic coding mechanism for constructing a decentralized signature-based IDS in IP-USN. The study [19] created a virtual test platform to mimic an actual network environment, installing a Snort IDS for traffic control and attack discovery by reflecting traffic to the server and constructing a stream-based IDS intelligent system using ML developed a specification-based IDS capable of identifying a novel sort of danger—the topology attack. They suggested an IDS architecture built on top of a network monitor and explained its monitoring techniques using an RPL FSM. Roy et al. [20] presented the use of a Bi-LSTM RNN for intrusion detection to spot a binary categorization of normal and malicious attacks. The model was trained on the UNSW-NB15 dataset and had a detection accuracy above 95% in IoT attacks. The work [21] devised an approach for detecting resource-constrained deep packet anomalies that distinguish between regular and anomalous payloads. Xu et al. [22] presented a unique IDS that examined the realization of several basic hybrid RNN models and MLP to protect against IoT threats. Both the NSL-KDD and KDD Cup 99 datasets are utilized for training and assessing the described models. The study [23] developed a several-layered RNN model for IoT gadgets that might be deployed. The identification rates of attacks were determined to be DoS at 98.27 percent, the probe at 97.35 percent, U2R at 64.93 percent, and R2L at 77.25 percent, respectively, using the NSL-KDD dataset. Li et al. [24] used the NSLKDD dataset to build GRU, LSTM, BLS, and Bi-LSTM algorithms for several known intrusion classification tasks. According to the performance study, the BLS significantly reduces training time while maintaining an accuracy of 72.64% and 84.15 percent for the KDDTest-21 and KDDTest + data, respectively. The author [25] demonstrated an accuracy of 85.5 percent–95.25 percent for RNN-IDS using a heuristic technique for intrusion detection. The IDS is initially trained using the gradient descent approach and then retrained and tested using the KDD20 + and KDDTest + datasets. RNN-IDS outperforms various applied algorithms, including SVM, J48, NB, MLP NB tree, RF, ANN, and RF tree. In ref. [26], a DoS detecting design for 6LoWPAN was presented. This design incorporated an IDS into the ebbits framework created under the EUFP7 program. The paper [27] conducted an experimental investigation on intrusion detection utilizing DJ, DF, DNN, LSTM-RNN, DBN, GRU-RNN, and RNN of ML and deep learning models. Four datasets, namely, KDD Cup 99, NSLKDD, CICIDS2017, and CICIDS, were used to evaluate the algorithms' effectiveness in detecting and classifying anomalies using 22 distinct evaluation measures. However, the experiment results indicate that when DL models are combined with machine learning models, notably DBN, the detection accuracy rate increases from 5 to 10%. The study [26] set out to spot DoS attack protocols against CoAP and 6LoWPAN communication and to offer an IDS architecture for detecting and blocking attacks in an internet-connected environment. Jiang et al. [28] experimented with a mixed sampling-based intrusion detection method using the UNSW-NB15 and NSL-KDD datasets separately. The OSS and SMOTE are combined to create balanced data for training models built with CNN, AlexNet, BiLSTM, LeNet-5, and RF algorithms. According to the statistical result, CNN-BiLSTM surpassed other classifiers with an accuracy of 83.58%. Hasan et al. [29] addressed many paradigmatic machine learning strategies for spotting intrusions into IoT nets that result in system failure. On the DS2OS data, fivefold cross-validation was performed using LR, SVM, DT, RF, and ANN. Cheng et al. [30] developed an HS-TCN for detecting anomalous communication in the Internet of Things. The experiment was controlled using two variants of the unique dataset DS2OS: data collected over eleven (11) days and the DS2OS-UA. For both adjusted datasets, the HS-TCN model outperforms the LSTM and SVM models. The author [31] suggested an intrusion detection approach founded on node usage analysis in 6LowPAN. Sahu et al. [32] developed another machine learning-based method for detecting anomalies by combining LR and ANN classification methods. Both the ANN and LR achieve approximately 99.4 percent accuracy when the entire dataset is used and 99.99 percent accuracy when approximately 105,952 data points are omitted from the unique data. In both situations, the data are divided into 75 percent and 25% subsets. In reference [33], an event-processing IDS architecture based on CEP technology was described. Kalis [34], an adaptive expert IDS that can supervise several protocols without modifying existing IoT software, is a thorough approach for detecting IoT intrusions. Reddy et al. [35] described a DNN architecture for securing the apps of future smart cities. The findings demonstrate that this DNN technique achieves an accuracy of approximately 98.26 percent when compared to standard machine learning classifiers with a variable layer and neurons. The authors [36] developed a novel method for detecting network intrusions in IoT networks that are built on a conditional variational autoencoder with a specialized design that incorporates intrusion tags. To detect malicious activity, ref. [37] employed a single-class SVM equipped with characteristics such as memory utilization and CPU utilization. The study [38] examined the efficacy of many community detection methods for detecting P2P bots, particularly when only incomplete information is available. They demonstrated that the approach may be used with approximately half of the nodes, presenting their connection graphs with only a slight upsurge in detection mistakes. Table 1 summarizes the assessed studies on IoT security as per their datasets, models, best accuracy results, and gaps.

Table 1 Summary of existing IoT attack detection using machine learning and deep learning

As seen from the review of the existing studies, the focus of some of the research is solely on detecting DDoS attacks. Other sizable attacks are not taken into account. Also, a simple ANN with only one hidden layer was deployed in one case with no optimization techniques applied. The majority of the work also lacks comparative analysis with other ML and DL models. In another study, it was difficult to replicate the research work. The implementation details of the machine learning model are absent, with obsolete datasets that do not reflect contemporary IoT attacks. Finally, the suggested approach is policy-based and relies on known attack signatures; hence, it will not be up-to-date with the most recent attack trends until signatures are upgraded.

Unlike the past efforts, we investigate intrusion detection for IoT resource-constrained devices in the network in this research. The difference is that our technique is divided into three stages. The first is hybrid dimensionality reduction, which involves using PCA and IG to choose the relevant attributes. The proposed GWO ensemble intrusion detection model includes two important engines in the second phase: a traffic analyzer and a classification phase engine. In the third phase, voting was utilized to merge the base learners' probability averages.

2.1 Motivation for the intelligent threat model on the Internet of Things

As IoT grows, so does the number of cybersecurity threats that investigators must address and examine to develop a reliable IDS. Numerous forms of malevolent action attempt to compromise the privacy and security of IoT gadgets, and all smart appliances connected to the Internet are potentially vulnerable. For a variety of reasons, the IoT is vulnerable to cyberattacks. For starters, IoT appliances are frequently unattended (for example, sensors located in remote places), making it relatively uncomplicated for an assailant to get admittance to them physically. Second, the vast majority of data transfers are wireless, making eavesdropping easier. Finally, most IoT devices have limited storage and computing capabilities [45]. Additional anti-virus protection, for example, cannot be deployed on IoT gadgets. Using numerous hacking tactics, hackers can disrupt or manipulate the functionality of smart gadgets [46]. In light of the physically insecure nature of a large number of IoT gadgets, some hacking approaches require active access to smart gadgets, making an attack more difficult but not impossible. Other attacks could be carried out remotely over the Internet. Table 2 shows the main kinds of attacks targeting smart devices.

Table 2 Common types of attacks against smart IoT devices

The intrusion attacks can affect an IoT bot network comprised of unsecured IoT gadgets such as electrical gadgets, security systems, automobiles, thermostats, lights in-home or marketable locations, speaker systems, and wall timers. These attacks give a cybercriminal the ability to take control of the sensors. Unlike traditional botnets, compromised IoT devices actively seek to propagate their hateful behavior to a cumulative range of gadgets. While a traditional bot network may consist of hundreds of bots, IoT bot malware is far larger in scope, involving a large number of connected gadgets [51]. For instance, on October 21, 2016, cybercriminals targeted a prominent DNS firm named Dyn. This attack was initiated by a massive flood of DNS lookup queries from millions of IP addresses [52]. The bot network demands it infect a significant number of devices linked to the Internet, including printers, camcorders, and other gadgets. This IoT bot network attack was initiated by malevolent software known as Mirai. As a result of the Mirai contagion, computers continually search the Internet for susceptible gadgets and log in using the default username and password, attacking them with malicious programs. Researchers in the security field described how they targeted the Chrysler Jeep Cherokee at Black Hat 2015. While hacking the Jeep's IoT device and sensor network, one could remotely access the vehicle as it drove down the motorway [53]. The specific security challenges addressed in this research, which involves developing an IDS for the IoT using a hybrid approach of feature extraction via PCA, feature selection via IG, and parameter optimization using GWO for ensemble models, are related to the cybersecurity aspects of IoT environments. Firstly, about vulnerabilities in IoT devices, it is important to note that these devices frequently have limited resources and may lack comprehensive security measures. The primary objective of the IDS suggested in this study is to identify and address vulnerabilities present in these devices, hence thwarting unauthorized access and control. Furthermore, it is imperative to periodically upgrade the firmware and software of IoT devices to ensure their security. The suggested approach has the potential to facilitate monitoring and ensure the timely implementation of changes. Authentication and access control play a vital role in safeguarding IoT systems, as they are responsible for ensuring that solely authorized individuals or devices are granted access. The proposed IDS has the potential to effectively detect and identify unauthorized access attempts.

3 Methodology

This section discusses our proposed method's framework, philosophy, and design ideologies. In this research, a hybrid IG-PCA-based feature selection and extraction method employing optimized voting gray wolf optimizer-based ensemble learning models was proposed for intrusion detection in IoT. The general design of our suggested model is portrayed in Fig. 2, which is made up of three phases. The first phase is dimensionality reduction utilizing PCA and IG to control the relevant attributes. In the second phase, two key engines comprise the proposed ensemble intrusion detection model: a traffic analyzer and a classification (RF, DT, MLP, KNN, and voting ensemble) phase engine. The GWO evolutionary-based optimization was used for optimizing the parameters of the ensemble models. Preprocessing of traffic connection records in the circulation processing unit results in traffic data in a format appropriate for processing by the ensemble models of the classification phase, with these connections classed as normal or attacked by the GWO ensemble intrusion detection. In the third phase, voting was utilized to combine the average of the probability of the base classifiers. The new voting methodology employs GWO ensemble models to improve the legitimate/intrusion classification's prediction capacity. A probability average offers rapid reply and effective immediate safety management for the IoT system. Voting is a critical phase of the proposed classification-based traffic analysis; it analyzes network traffic that seeks to reach the IoT scheme and generates a security alert if an intrusion is identified. In the provided framework illustrated in Fig. 2, the data are trained using the IG approach, where the IG entropy is estimated. Following this, we proceed to calculate the eigenvalue of the PCA covariance matrix. During the testing phase, the voting process is conducted by calculating the average of probabilities obtained from the GWO-optimized ensembles, namely, RF, DT, MLP, and KNN. The voting mechanism is further enhanced by the utilization of vectors alpha, beta, and gamma, which are responsible for updating the voting process. In the context of an IoT setting, the process of data collecting encompasses not only the reception of data from IoT devices, but also the transmission of commands, updates, or responses back to these devices. The bidirectional flow of information is of utmost importance in facilitating real-time interactions and control inside IoT devices.

Fig. 2
figure 2

The framework of the proposed GWO ensemble models for IoT

3.1 Data preprocessing

Normalization is a technique for scaling attributes in which the goal is to have all attribute values on the same scale normalization techniques include the standardized approach, min–max normalization, and z-score normalization [54, 55]. We selected the min–max normalizing technique since the majority of the features had a normal distribution to prevent information from leaking in the test data.

3.2 Normalization technique

The min–max approach [56] modifies a feature so that all of its values lie inside the interval [0,1]. Equation 1 depicts the fundamental formula for min–max normalization.

$$ Y_{{{\text{new}}}} = \frac{y - \min \;(y)}{{\max \;(y) - \min \;(y)}} $$
(1)

where yi represents the value of a certain feature, y min represents its minimum value, and ymax represents its highest value.

3.3 Feature selection

The IoT ecosystem comprises intelligent devices with limited computing power, energy, communication range, and memory. Among the issues with IDSs are handling numerous irrelevant features, which might result in system overhead. Thus, the objective of feature evaluation is to discover key attributes that may be employed in the IDS to detect a variety of attacks efficiently. The characteristics are examined for both normal and pathological behaviors using the retrieved labels to select the most important features. We used an information gain (IG) strategy and principal component analysis (PCA) for feature extraction for feature selection.

3.4 Feature selection with IG

IG is a frequently used entropy-based feature evaluation approach in ML [57]. The information gain techniques were rapid to execute, and this strategy extracted the model's optimal feature set. IG was frequently used in the literature to determine how successfully each different attribute distinguished the assumed data. The first phase in this research is to use IG plus ranked as a filtering strategy to lower the datasets' dimensionality. The primary idea behind this method is to evaluate subgroups of features by estimating their IG entropy in decreasing order. From most relevant to least relevant, each feature receives a score. The attributes with the best scores are used as the input set of attributes for the next dimensionality reduction stage. The author [58] describes the overall entropy “K” of a given dataset “D” as follows:

$$ K \, = \, \left( D \right) \, = - \sum\limits_{i = 1} {piLog2Pi} $$
(2)

where “e” signifies the total class size, and “pi” denotes the percentage of cases belonging to class u. The reduction in entropy in information is estimated for each feature using the following formula:

$$ {\text{IG }}\left( {D, \, M} \right) \, = \, K \, \left( D \right) - \sum\limits_{w\varepsilon A} {\frac{|DA,w|}{{|D|}}} K(Dw) $$
(3)

3.5 Feature extraction with PCA

The IG method's specified attributes can be utilized directly for categorization. However, one of the most typical IG issues is a preference for traits with various possible numbers [59]. These features have a close-zero eigenvalue in this scenario, which improves their gain more than another attribute. As a result, the full importance of these attributes to the training examples may not be represented in their ranking. To overcome this constraint, features from the attribute selection phase will be presented for additional reduction using the PCA method to identify the best subgroup of features. This allows the PCA to narrow the search area from the whole subspace to the features that have been pre-selected [60]. The purpose of using PCA is to minimize dimensionality by retaining important attribute information in the data. It decreases the number of variables by employing orthogonal combinations with significant variance. Table 3 shows the proposed hybrid dimensionality reduction for our suggested models.

Table 3 Hybrid feature dimensionality reduction

Two techniques are employed to reduce the dimensionality of features from m dimensions to j dimensions: preprocessing and dimensionality reduction. During the preprocessing phase, the mean and variance of the data are standardized using Eqs. (3) and (4) (steps 1 via 4 below). During the second phase (steps 5–8), the covariance matrix Covn, eigenvectors, and eigenvalues are constructed using Eqs. (5) and (6).

  1. 1

    Standardize the initial input feature values by their mean and standard deviation using Eq. (4), where n is the number of cases, and Y(i) is the data points.

    $$ \mu = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} Y_{(i)} $$
    (4)
  2. 2.

    Substitute Y(i) with Y(i)\(\mu\).

  3. 3.

    Using Eq. (5), transform each vector Yk(i) to have unit variance.

    $$ \sigma_{i}^{2} = \frac{1}{n}\sum\limits_{i} {\left( {Y_{k(i)} } \right)^{2} } $$
    (5)
  4. 4.

    Substitute each Yk(i) with \(\frac{Yk(i)}{\sigma }\).

  5. 5.

    Computation of the covariance matrix Covn:

    $$ {\text{Cov}}_{n} = \frac{1}{n}\sum {\left( {Y_{(i)} } \right)} \;Y_{(i)} )^{T} $$
    (6)
  6. 6.

    Covn eigenvectors and eigenvalues are calculated.

  7. 7.

    Set eigenvectors by diminishing eigenvalues and select j eigenvectors with the greatest eigenvalues to produce S.

  8. 8.

    Using S and Eq. 7, convert the data to the novel subspace.

    $$ Y \, = \, S \times X $$
    (7)

where Y is a 1 \(\times\) e vector on behalf of one sample, and y is the converted j \(\times\) 1 sample in the new subspace.

The computational difficulty of performing the specified PCA is proportional to the number of attributes F representing each point of data.

$$ O \, \left( {F^{3} } \right) $$
(8)

In this study, PCA is utilized to reduce the dimensionality of the BoT-IoT and UNSW-NB15 datasets by compressing the attribute space with ten (10) selected features and nine (9) high-rank features, respectively. The ten (10) and nine (9) top-ranked features were considered for the BoT-IoT and UNSW-NB15 datasets. To identify the most effective features, we employed information gain, used in our feature selection process, which quantifies the importance of each feature based on its ability to discriminate between different classes (e.g., normal and intrusions). Features with higher information gain were considered more effective in distinguishing between classes. The design principle of PCA is given in Table 4.

Table 4 Design principles of PCA

Parameter ranking typically refers to the process of assessing and ranking the importance or influence of different parameters or hyperparameters on a machine learning model's performance. These parameters are settings or configurations that can be adjusted to influence how a model learns from data and makes predictions. In our research, the parameter ranking in the settings is set to true. The num to select parameter in PCA is set to the value 6. The threshold value is set to 0.5, and the variance is set to 1.832. The design principle revolves around finding a new set of orthogonal axes, called principal components, that capture the maximum variance in the data while reducing its dimensionality.

Ten (10) new features were selected from the BoT-IoT dataset, and nine (9) features were chosen from the UNSW-NB15 which are subsequently fed and passed to the GWO-optimized ensemble models (RF, DT, MLP, and KNN). The information gain efficiently identifies the most relevant features based on their contribution to the target variable, while PCA optimally captures the variance within the dataset to create a reduced set of orthogonal features. By combining these two methods, we achieve a balanced feature reduction approach that maximizes the preservation of informative features while minimizing computational overhead.

PCA aims to transform the original high-dimensional feature space into a lower-dimensional space while retaining as much of the variance in the IoT network traffic data as possible. This dimensionality reduction can lead to several benefits:

  1. i.

    Curse of Dimensionality High-dimensional IoT network traffic data can suffer from the "curse of dimensionality," where the number of features greatly exceeds the number of samples. This can lead to increased computational complexity, overfitting, and difficulty in visualization. PCA helps mitigate these issues by reducing the dimensionality.

  2. ii.

    Noise Reduction High-dimensional IoT network data often contain noise and irrelevant features. PCA helps remove and down-weight such noisy dimensions by identifying and emphasizing the dimensions with the most significant information.

  3. iii.

    Improved Model Performance Reducing dimensionality leads to faster training and inference times for machine learning models, as well as potentially reducing overfitting.

3.6 Handling the class imbalance problem

Addressing class imbalance is a prevalent issue encountered in the field of machine learning, particularly in the context of intrusion detection systems. This challenge arises due to the substantial disparity between the abundance of normal instances and the scarcity of attack instances. In this research, we employed the synthetic minority oversampling technique (SMOTE) as a method to tackle the aforementioned concern. The SMOTE is a method that produces artificial cases for the underrepresented class by interpolating between the available data points. We ensure that the data are preprocessed properly, including removing irrelevant features, handling missing values, and encoding categorical variables. Subsequently, we divide the datasets into features (x) and corresponding labels (y) for both training and testing datasets. Thus, we create an instance of the SMOTE and apply it to the training data. The mathematical representation is given in Eq. (9).

$$ x{\text{\_synthetic }} = \, x{\text{\_minority }} + {\text{ random\_number }}* \, \left( {n \, - \, x{\text{\_minority}}} \right) $$
(9)

Assume there exists a dataset with features x and labels y. For each minority instance x_minority, there is a need to find its K-nearest neighbors from the minority class. The distance metric used for finding neighbors (such as Euclidean distance) can vary. Assume we denote the set of k-nearest neighbors as N(x_minority). For each neighbor n in N (x_minority), a synthetic instance x_synthetic is generated as Eq. (9).

At this juncture, random_number is a random value between 0 and 1, controlling the interpolation between x_minority and n. The formula in Eq. (9) is applied to each feature of x_minority and n to generate the corresponding feature of x_synthetic.

3.7 Optimization of the ensemble learning models (ELM) with gray wolf optimizer

The GWO methodology is a metaheuristic algorithm that replicates the initiative chain of importance and pursues the method of dark posers [61]. In the numerical method for the GWO, the optimal configuration is denoted by the symbol alpha α. The beta (β) and delta (δ) are optimized according to the second- and the third-best configurations, respectively. It is believed that the remaining application setups are known as omega (\(\omega\)). These three applicants are being pursued by \(\beta\),\(\delta\), and \(\omega\) using GWO tactics and \(\alpha\) as a hunting guide.

For the pack to pursue prey, they immediately encircle it. The following Eqs. (10)–(13) are applied to mathematically model surrounding behavior.

$$ \overrightarrow {Z} \left( {r + 1} \right) \, = \overrightarrow {Z}_{p} \left( r \right) \, + \overrightarrow {B} .\overrightarrow {E} $$
(10)

\(\overrightarrow {Z}\)p is the position of the prey, \(\overrightarrow {Z}\) is the gray wolf position, \(\overrightarrow {B}\) and \(\overrightarrow {D}\) are coefficient vectors, and r is the number of iteration number E as shown in Eq. (11)

$$ \overrightarrow {E} = \left| {\overrightarrow {D} .\overrightarrow {Z}_{p} \left( r \right) \, {-}\overrightarrow {Z} \left( r \right)} \right| $$
(11)
$$ \overrightarrow {D} = \, 2b.\;\overrightarrow {t}_{1} {-} \, b $$
(12)
$$ \overrightarrow {D} = { 2}\overrightarrow {t}_{2} $$
(13)

b is lowered linearly from 2 to 0 throughout the emphasis span, while t1 and t2 are random vectors in the interval [0, 1]. Typically, the alpha leads the pursuit. Moreover, the beta and the delta may occasionally be interested in chasing. To scientifically emulate the chasing behavior of gray wolves, the alpha (the best candidate solution), beta (the second-best rival solution), and delta (the third-best optimistic solution) are accepted to obtain more information regarding the likely prey position. The initial three best application configurations have reached this stage, necessitating that the other hunt operators change their situations to match those of the best pursue experts. Therefore, the replenishment of the positions of the wolves is provided by Eq. (14):

$$ \overrightarrow {Z} = \, \left( {{\text{r }} + {1}} \right) \, = \frac{{\overrightarrow {Z} 1 + \overrightarrow {Z} 2 + \overrightarrow {Z} 3}}{3} $$
(14)
$$ \overrightarrow {Z}_{1} = \left| {\overrightarrow {Z} \;\alpha - \overrightarrow {B}_{1} .\;\overrightarrow {E}_{a} } \right| $$
(15)
$$ \overrightarrow {Z}_{2} = \left| {\overrightarrow {Z}_{\beta } - \overrightarrow {B}_{2} .\;\overrightarrow {E}_{\beta } } \right| $$
(16)
$$ \overrightarrow {Z}_{3} = \left| {\overrightarrow {Z} \;\delta - \overrightarrow {B}_{3} .\;\overrightarrow {E}_{\delta } } \right| $$
(17)

where \(\overrightarrow {B}\)1, \(\overrightarrow {B}\)2, and \(\overrightarrow {B}\)3 are defined as Eq. (14) and \(\overrightarrow {Z}\)\(\alpha\),\(\overrightarrow {Z}\)β, and \(\overrightarrow {Z}\)\(\delta\) are the leading three best solutions in the assumed iteration r, \(\overrightarrow {B}\)1, \(\overrightarrow {B}\)2, and \(\overrightarrow {B}\)3 are expressed in Eqs. (1517), and \(\overrightarrow {E}\)\(\alpha\) and \(\overrightarrow {E}\)\(\delta\)are expressed as Eqs. 1820, respectively.

$$ \overrightarrow {E}_{\alpha } = \left| {\overrightarrow {D}_{1} .\;\overrightarrow {Z}_{1} - \overrightarrow {Z} } \right| $$
(18)
$$ \overrightarrow {E}_{\beta } = \left| {\overrightarrow {D}_{2} - \overrightarrow {Z}_{\beta } - \overrightarrow {Z}_{1} } \right| $$
(19)
$$ \overrightarrow {E} \delta = \left| {\overrightarrow {D}_{3} .\overrightarrow {Z} \delta - \overrightarrow {Z}_{1} } \right| $$
(20)

\(\overrightarrow {D}\)1, \(\overrightarrow {D}\)2, and \(\overrightarrow {D}\)3 are given as in Eq. (13)

A final observation regarding the GWO mediator is the updating of the parameter that regulates the investigation-abuse tradeoff. The stricture is continuously updated each cycle to range from 2 to 0 following Eq. (21).

$$ b \, = \, 2 \, = \, r\frac{2}{{{\text{Maxlter}}}} $$
(21)

where MaxIter is the full number of allowable optimization iterations, and r is the number of optimization iterations. The hunting and pursuit positions of gray wolves are required to be updated by binary {1, 0}. The gray wolf optimization pseudocode is described in Table 5.

Table 5 Pseudocode of gray wolf optimization

We chose GWO to optimize the parameters of the ensemble algorithms because of three significant merits; exploration and exploitation, convergence speed, and handling constraints, which it has over other algorithms. GWO has gained a significant amount of prominence among other swarm intelligence methodologies due to its various characteristics such as fine-tuning parameters, simplicity and ease of use, scalability, and most notably its ability to just provide convergence speed by maintaining the right balance between exploitation and exploration during the search. GWO exhibits a better balance between exploration (searching the solution space) and exploitation (exploiting promising solutions). It uses the concept of alpha, beta, gamma, and delta wolves to strike a balance between exploration and exploitation which can lead to more efficient optimization compared to other algorithms. GWO tends to converge faster to a global optimum compared to several other algorithms in some cases. The nature-inspired hunting behavior of gray wolves, such as encircling prey, mimicked in GWO can lead to more efficient exploration and faster convergence. GWO promotes diverse solution exploration due to its hierarchical structure and the hunting behavior of gray wolves. This can help avoid getting stuck in local optima and facilitate a more comprehensive search of the solution space.

In our research, the GWO is utilized to optimize the parameters of RF, DT, MLP, and n for KNN. Gray wolf optimizer (GWO) is a nature-inspired optimization algorithm that simulates the hunting behavior of gray wolves to find optimal solutions. We utilized the pseudocode of GWO to optimize the hyperparameters of ensemble learning models; random forest, decision tree, multilayer perceptron (MLP), and K-nearest neighbor (KNN) [62]. Here's a high-level overview of how we integrated GWO with ensemble models:

  1. 1.

    Initialize a population of gray wolves with random hyperparameter settings for the ensemble models.

  2. 2.

    Define a fitness function that evaluates the performance of the ensemble model with the given hyperparameters. The fitness function used appropriate evaluation metrics.

  3. 3.

    In each iteration of the GWO loop, evaluate the fitness of each wolf (hyperparameter set) using the ensemble model. Update the positions of the alpha, beta, and delta wolves based on their fitness values. These wolves represent the best solutions found so far.

  4. 4.

    Update the positions of the other wolves using predefined formulas that simulate the hunting behavior of gray wolves. This step helps explore the search space efficiently.

  5. 5.

    Apply boundary constraints to ensure that hyperparameters remain within valid ranges for the ensemble models.

  6. 6.

    After a certain number of iterations or when a stopping criterion is met, select the best solution found so far based on fitness values.

  7. 7.

    Perform cross-validation to assess the performance of the ensemble model with the selected hyperparameters on a validation set.

  8. 8.

    If the new solution (hyperparameters) is better than the previous best solution, update the best solution.

  9. 9.

    Continue the optimization process until the stopping criterion is met.

  10. 10.

    Finally, return the best solution, which represents the optimal hyperparameters for the ensemble learning models.

By integrating GWO with ensemble models in this way, we effectively search for the best hyperparameters to maximize the ensemble's performance, improving its accuracy and effectiveness in real-world applications.

3.8 Mathematical formulation of the ensemble method for classification

Let {y(u)} for u = 1,…, m be a randomized data containing its associated examples and characteristics with a mean of zero. Equation (22) shows the covariance matrix of y(u). Algorithm 1 summarizes the hybrid IG-PCA approach's selection procedure.

$$ Z = \frac{1}{m - 1}\;\;\sum\limits_{u = 1}^{m} {\left[ {y\left( t \right) \times \left( u \right)^{U} } \right]} $$
(22)

In PCA, the transformation function from y(u) to x(v) is calculated as follows;

$$ x\left( u \right) = \, N^{u} \times \left( u \right) $$
(23)

The jth column of the covariance sample matrix Z is equal to the jth eigenvector, and N denotes an m × m orthogonal matrix. The eigenvalue problem stated in Eq. (24) is initially fixed through PCA.

$$ \beta_{j} k_{j} = \, Z \, k_{j} $$
(24)

where \(\beta\)j signifies an eigenvalue of Z (say \(\beta\)1 > \(\beta\)2 > ... > \(\beta\)m), and kj is the corresponding eigenvector. The PCA is obtained using Eq. (25) as follows:

$$ x_{j} \left( u \right) \, = \, k_{j} \times \left( u \right), j \, = \, 1,2, \ldots ,m. $$
(25)

The jth principle component is denoted by xj(v). The computation to project a fresh sample y(u) onto the main space is given in Eq. (26). Let

$$ y\left( u \right) \, = \sum\nolimits_{j = 1}^{q} b_{j} U \times \left( u \right)_{j}^{a} , $$
(26)

where A = {ej: ej = kj, j = 1,…, g}. Equation (27) calculates the distance f from y(u) and (t) to determine the projection inaccuracy of y(u) and Ý(u):

$$ b \, = \, f\left( {y\left( u \right),\mathop Y\limits^{\prime } \left( u \right)} \right) $$
(27)

3.9 Ensemble model

Ensemble methods are effective ways of improving the prediction outcome of the overall model by developing numerous self-reliant models and integrating them to provide results with improved, enhanced accuracy [63]. Ensemble learning approaches include boosting, bagging, Bayesian parameter averaging, and stacking [64]. This work proposes a unique ensemble classifier to improve intrusion detection accuracy in IoT that employs RF, DT, MLP, and KNN learners. These algorithms were utilized in a voting algorithm and were combined using the average of probabilities method. To accelerate the performance of each of the models, the GWO was used to optimize the parameters of each of the ensemble (RF, DT, MLP, and KNN) models.

Assume we have \(\phi\) 'classifiers A = {A1, A2,A \(\phi\)} and l labels = {h1, h2,…, hl}. According to the classifiers given above, \(\phi\) = 4, and l = 2 (that is, non-attack and attack) for the datasets analyzed in this work. Aj: Zm → [1,0]l is a classifier. l takes an object y ZM and returns a vector [JAj (h1|y),…, JAj (h|y)], where JA (h|y) represents the probability given by Ai to the assumption that entity y corresponds to class i. Where ni becomes the average of the probabilities provided by the different classifiers for every class hi,

$$ n_{i \, = } \frac{1}{\phi }\;\sum\nolimits_{j = 1}^{\phi } {Jaj(h/y)} $$
(28)

Let N denotes the collection of mean probability for each category (n1, n2,…, nc). Object y is classified correctly in N with the highest mean, i.e., y is allocated to class g if and only if

$$ n_{g} = \, \max \, N $$
(29)

The proposed ensemble approach's performance is evaluated using two famous intrusion detection assessment data that are ideally suited for IoT, namely, BoT-IoT and UNSW-NB15.

3.10 Ensemble learning strategy

Ensemble learning is a powerful technique that combines multiple individual learning algorithms to create a stronger, more accurate predictive model. Voting-based ensembles are a popular approach within ensemble learning. In this research, we performed the average of probabilities from multiple models for intrusion detection in the IoT using the BoT-IoT and UNSW-NB15 datasets. Here's a step-by-step explanation of how we achieved this:

Step 1: Data Preparation We preprocess and split the datasets (BoT-IoT and UNSW-NB15) into training and testing subsets with the target labels (intrusion or non-intrusion) and the corresponding features for each dataset.

Step 2: Individual Learning Algorithms Choose a set of individual learning algorithms RF, DT, MLP, and KNN that we want to ensemble.

Step 3: Train Individual Models For each selected individual learning algorithm RF, DT, MLP, and KNN. We trained all these algorithms on training data from both datasets (BoT-IoT and UNSW-NB15). This gave us a set of trained models, each capable of making intrusion detection predictions.

Step 4: Probability Prediction For each trained model, we use it to make predictions on our testing data. Instead of just obtaining the final prediction label, we are interested in the predicted probabilities of intrusion (class attack) for each instance.

Step 5: Ensemble Voting For each instance in our testing data, we calculated the average of the predicted probabilities from all the individual models. This average can be computed for class 1 (intrusion).

Step 6: Evaluation We evaluated the performance of our voting ensemble models using standard metrics such as accuracy, DR, precision, ROC, and FAR on our testing data. We also compare these results with the performance of individual models to assess the effectiveness of the ensemble.

3.11 Benefits of the proposed voting-based ensembles model

  • Reduced Bias Combining multiple models can help reduce bias present in any individual model.

  • Improved Generalization Ensembles often perform better on unseen data compared to individual models.

  • Robustness Ensemble methods are more robust against overfitting, especially if the individual models are diverse.

  • Model Diversity Using different learning algorithms ensures that the ensemble captures different aspects of the data.

4 Experimental setup with the software and hardware requirements

The simulations are executed on a laptop with an Intel Core (TM) i5-8250U processor clocked at 1.60 GHz and 8 GB of RAM. To demonstrate the efficacy of the proposed approach, four GWO ensemble models (RF, DT, MLP, and KNN) with an average probability are chosen. The algorithms are used to classify and identify threats and anomalies across all the BoT-IoT and UNSW-NB15 datasets. Scikit learning was utilized in the implementation of the models.

4.1 Metrics used for performance evaluation

This study evaluated the performance of the proposed system using multiple performance measures, including precision, recall, dtection rate (DR), and accuracy (Acc), as well as the time required to create the model. These metrics' definitions are provided below. True positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) determine the metrics (FN).

Detection rate (DR): The DR is the proportion of identified attacks relative to the total number of attack events in the dataset. Equation (30) can be utilized to estimate DR.

$$ {\text{DR }} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} $$
(30)

Accuracy is the measure of the classifier's ability to correctly classify an object as normal or as an attack. The accuracy is defined by Eq. (31).

$$ {\text{Accuracy }} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{FN}} + {\text{FP}} + {\text{TN}}}} $$
(31)

Precision is the ratio of positive predictions to the total number of positive anticipated class values. It considered a measure of the classifier's precision. A low value represents a high number of FP. The precision is computed using Eq. (32).

$$ {\text{Precision }} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}} $$
(32)

The recall is calculated by dividing the number of TP by the number of TP and FN. The recall is regarded as a measure of a classifier's completeness, with a low recall value resulting in a large number of FN [65]. Using equation, recall is estimated (33).

$$ {\text{Recall }} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} $$
(33)

4.2 Description of the dataset

One of the primary challenges encountered in the domain of anomaly detection research revolves around obtaining or generating a suitable dataset for experimental endeavors. In this study, we analyzed pre-existing datasets to identify the dataset that is most appropriate for further exploration. The authors delineated the dataset prerequisites identifying anomalies in the IoT by the following four criteria:

C1 The acquisition of the dataset ought to be conducted from the IoT;

C2 It is recommended that the dataset includes anomalies;

C3 The dataset must be appropriately labeled to distinguish between normal and abnormal data;

C4 It is recommended that the dataset utilized in the study closely approximates real-world data, specifically data derived from authentic or partially authentic systems.

C5 It is recommended that the datasets encompass a diverse range of attack scenarios and network conditions. A key criterion was the inclusion of a wide variety of attack types and patterns to ensure a comprehensive evaluation of our intrusion detection system.

C6 Took into account the accessibility and availability of the datasets to the research community. It was important to select datasets that are publicly accessible, well-documented, and readily available for replication and validation by other researchers.

The datasets that meet the specified criteria, namely, those that comprise labeled sensors, actuators, and network data, include the recently developed BoT-IoT and the UNSW-NB15 dataset. These datasets were subjected to a comprehensive analysis by the authors. The particulars of each dataset are delineated as follows;

4.2.1 BoT-IoT dataset

The BoT-IoT contains both typical IoT net traffic and a range of attacks. These data were utilized to test our system. It was chosen because it accurately depicts an IoT ecosystem context. DoS, DDoS, data exfiltration, keylogging, service scan, and OS attacks are included in the dataset. The BoT-IoT is available at https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/bot_iot.php. All of these data were preprocessed to establish network-level patterns for the varied types of traffic generated by devices and to use these similarities to spot attack behavior in the IoT architecture [51]. Table 6 summarizes the amount of benign and attack samples in the collection.

Table 6 Attack and normal behavior statistics from the BoT-IoT dataset

4.2.2 UNSW-NB15 dataset

The researchers [14] created the UNSW-NB15 dataset at UNSW Canberra. The researchers used the IXIA perfect storm to create a mix of benign and malicious traffic, yielding a 100 GB dataset in the form of PCAP files, including many novel attributes. The generated data were intended to be utilized for intrusion detection generation and validation. Nevertheless, the data were created using a simulated environment to generate attack activity. The UNSW-NB15 dataset record distribution is specified in Table 7.

Table 7 UNSW-NB15 data records

5 Results and discussion

We present the detailed findings of experiments conducted utilizing the proposed framework in this section. The suggested approach was tested on the datasets mentioned above. The oversampling without replacement method was used to divide each dataset's selected samples into two distinct subgroups for training and testing. As a result, the training subset can accurately predict model performance on previously unrecognized data, and the testing sample is reserved for assessing the model's performance. In this instance, generating subgroups for cross-validation evaluation is not essential, which could be time-consuming with large datasets. Two tests were conducted to evaluate the efficiency of the presented technique. The following evaluation metrics were used according to the confusion matrix shown in Table 8: precision, accuracy, detection rate, ROC, and FAR. The authors [66] explain the mathematical computations for the measurement methods used.

Table 8 Confusion matrix

Where TP is the number of current attacks recognized as attacks, TN is the number of frequent patterns identified as regular, FN is the series of attacks identified as frequent patterns, and FP is the number of frequent patterns identified as threats.

5.1 Experimental analysis based on BoT-IoT dataset

The BoT-IoT dataset was used in the first experiment. To begin, vital attributes were determined by computing the IG entropy for every feature in declining order. From the original thirty-one (31) potential features, ten (10) were chosen for the following step. The strategy was seen to create several FARs by deploying IG alone. To overcome this constraint, a second additional reduction phase founded on the selected attributes was done using the PCA as feature extraction. To evade bias, the PCA was created using only the training set, ensuring that no information from the test data was leaked into the training dataset. When genuine new unseen data are introduced into the model, the model will not function as well if the complete dataset is used to construct the PCAs. Similarly, calculating PCAs on the two sets independently will result in two mismatched sets of data. We cannot build a classifier in one domain and then apply it to another. The same characteristics from the training set were utilized to translate the testing dataset into the same feature space using the batch-filtering method. The new datasets were utilized to assess the efficiency of the presented method, so five separate classifiers were built utilizing the training data and classified using the testing dataset. On the BoT-IoT dataset, Table 9 compares the performance of standard ML models IG + PCA-RF, IG + PCA-DT, IG + PCA-MLP, IG + PCA-KNN, and the proposed voting GWO ensemble model. The results indicate that the voting GWO ensemble model performs the best, with an accuracy of 99.98% and DR of 99.97%, precision of 99.94%, ROC of 99.99%, and FAR of 1.30.

Table 9 The performance of standard ML approaches and the proposed voting ensemble model on BoT-IoT

5.2 Experimental analysis based on UNSW-NB15 dataset

Additional tests on the UNSW-NB15 dataset were carried out to demonstrate the efficiency of the suggested feature dimensionality reduction (IG + PCA) GWO ensemble model. As in the first experiment, IG and PCAs were computed during the preprocessing step of these datasets. In this second experiment, nine (9) candidate features were chosen from UNSW-NB15 by computing the entropy of the IG and, subsequently, the PCA feature extraction. Table 10 shows the best results obtained using the reduction of dimension approaches on the dataset. Our proposed model produces promising classification results, as seen in the result. Table 10 compares the performance of the IG + PCA-RF, IG + PCA-DT, IG + PCA-MLP, IG + PCA-KNN, and the proposed GWO ensemble model on the UNSW-NB15 dataset. The voting GWO ensemble technique outperforms all other approaches, with an accuracy attaining 100%, DR of 99.99%, precision of 99.59%, ROC of 99.40%, and FAR of 1.15.

Table 10 The performance of standard ML techniques and voting ensemble model on the UNSW-NB15

5.3 Multiclass experimental analysis on the BoT-IoT dataset

The initial step was the computation of the IG entropy for each characteristic, with the resulting values being arranged in descending order to identify the most significant qualities. Out of the initial set of thirty-one (31) possible features, a subset of ten (10) features was selected for the subsequent stage. The implementation of IG in isolation was observed to generate several FARs as part of the strategy. To address this limitation, a secondary reduction phase was implemented, utilizing the specified features and employing PCA as a feature extraction technique. To mitigate bias, the PCA was conducted exclusively on the training dataset, to preventing any potential leakage of information from the test data into the training set.

Table 11 shows the performance of the proposed voting GWO ensemble model on BoT-IoT in a multiclass scenario. The results indicate that the voting GWO ensemble model performed on DDoS HTTP achieved an accuracy of 99.87% and DR of 99.89%, precision of 99.60%, ROC of 99.56%, and FAR of 1.20.

Table 11 Performance of the voting GWO ensemble model relative to the different attack types and benign in terms of DR, accuracy, and training time on the BoT-IoT dataset

5.4 Multiclass experimental analysis on the UNSW-NB15

Further experiments were conducted on the UNSW-NB15 dataset to showcase the effectiveness of the proposed ensemble model, which combines feature dimensionality reduction techniques (IG + PCA) with the GWO. Similar to the initial experiment, the datasets underwent preprocessing in which IG and PCAs were generated. In the subsequent experiment, a total of nine (9) candidate features were selected from the UNSW-NB15 dataset by evaluating the entropy of the information gain (IG) and subsequently applying PCA for feature extraction. Table 12 shows the performance of the proposed voting GWO ensemble model on BoT-IoT in a multiclass scenario. The results indicate that the voting GWO ensemble model performed on reconnaissance achieved an accuracy of 99.91% and DR of 99.75%, precision of 97.08%, ROC of 98.80%, and FAR of 1.80.

Table 12 Performance of the voting GWO ensemble model relative to the different attack types and benign in terms of accuracy, DR, precision, ROC, and FAR on the UNSW-NB15 dataset

5.5 Evaluation and comparison of current datasets suitability for IoT network

To determine the essential qualities of a valuable and realistic dataset for an IoT network, some of the current IDS datasets were evaluated in this part.

5.5.1 DARPA

For the goal of analyzing network security, this dataset was created. Due to problems with the fake injection of attacks as well as benign traffic, researchers chastised DARPA. DARPA covers tasks such as sending and receiving mail, surfing the web, sending and receiving files via FTP, using telnet to log into distant systems and carry out work, sending and receiving IRC messages, and remotely monitoring the router using SNMP. The aforementioned list comprises various types of attacks, including but not limited to denial of service (DOS), password guessing, buffer overflow, remote file transfer protocol (FTP), syn flood, network mapper (Nmap), and rootkit. Regrettably, the dataset under consideration does not accurately reflect network traffic in real-world scenarios in IoT and exhibits anomalies such as the lack of erroneous detections. Furthermore, it is no longer current enough to provide a comprehensive assessment of IDSs concerning contemporary network infrastructures and attack modalities. Furthermore, the absence of factual attack data records is evident [67].

5.5.2 KDD Cup 99

The dataset known as KDD Cup 1999 was derived by analyzing the tcpdump component of the 1998 DARPA dataset. However, it is important to note that the KDD Cup 1999 dataset is not immune to the same issues as its predecessor. The KDD99 dataset encompasses over twenty distinct types of attacks, including but not limited to neptune-dos, pod-dos, smurf-dos, buffer-overflow, rootkit, satan, and teardrop. The amalgamation of network traffic records of both normal and attack traffic within a simulated environment yields a dataset that contains a substantial amount of superfluous records, which are also tainted with data corruption. This, in turn, results in testing outcomes that are biased, as reported in reference [68]. NSL-KDD was developed as a means of addressing certain limitations of the KDD dataset [68], which had been identified in the previous research [67].

5.5.3 CDX

The utilization of network warfare competitions for the creation of contemporary labeled datasets is demonstrated by the CDX dataset. The dataset reveals that attackers have utilized widely recognized attack tools such as Nikto, Nessus, and WebScarab to conduct automated reconnaissance and attacks. Benign network traffic encompasses essential services such as web browsing, email communication, DNS queries, and other necessary functions. According to source [69], CDX has limitations in terms of traffic diversity and volume, although it can still serve as a tool for testing IDS alert rules.

5.5.4 Kyoto

The dataset in question has been generated through the utilization of honeypots, thereby precluding the possibility of manual labeling and anonymization. However, it is important to note that the dataset's scope is restricted to solely those attacks that were directed toward the honeypots. The current dataset offers ten additional features, including IDS detection, malware identification, and Ashula detection, compared to the previous datasets. These features are beneficial for conducting NIDS evaluation and analysis. As the attacks repeatedly simulate normal traffic, the resulting DNS and mail traffic information does not accurately reflect real-world normal traffic. Therefore, false positives are not present. The significance of false positives lies in their ability to reduce the frequency of alerts, as indicated by sources [70].

5.5.5 Twente

To generate the dataset, three distinct services, namely, OpenSSH, Apache web server, and Proftp utilizing auth/ident on port 113, were deployed to gather information from a honeypot network via netflow. Certain types of traffic, including auth/ident, ICMP, and irc traffic, may produce side effects that are neither entirely benign nor malicious. In addition, the dataset includes alert traffic that is both unidentified and lacking correlations. The labeled dataset under consideration is deemed more realistic; however, its deficiency in terms of the volume and variety of attacks is a conspicuous limitation as noted in reference [71].

5.5.6 ISCX2012

The authors have presented a valuable recommendation for producing realistic and useful IDS evaluation datasets through a dynamic approach. The dataset in question was generated using this approach. The methodology employed by the individuals involves a bifurcation into two distinct components, specifically denoted as the alpha and beta profiles. The alpha profile executes multiple stages of attack scenarios to filter the anomalous segment of the dataset. The beta profile, a benign traffic generator, produces authentic network traffic accompanied by ambient noise. Empirical data are utilized to construct profiles that simulate authentic traffic for various protocols such as HTTP, SMTP, SSH, IMAP, POP3, and FTP. The dataset produced by this methodology comprises network traces that include complete packet payloads and pertinent profiles. Nevertheless, it should be noted that the dataset in question does not pertain to novel network protocols, given that a significant proportion of contemporary network traffic, approximately 70%, is comprised of HTTPS, and no traces of HTTPS are present within the said dataset. Furthermore, the allocation of the simulated assaults is not grounded on empirical data [72]. Table 13 shows some popular realistic datasets for IoT networks.

Table 13 A comparative analysis of the datasets currently accessible for detecting attacks in IoT

As can be seen, only the proposed datasets used in this study meet all criteria. Tables 13 and 14 list and explain the dataset’s flaws and strengths based on relevant documents and research, as well as their suitability for IoT networks. Some feature values are not presented as a result of inadequate documentation and a lack of metadata. Here, we evaluated the proposed model using two well-known datasets: UNSW-NB15 and BoT-IoT. In contrast with the datasets used in several existing models, which do not accurately reflect contemporary attacks on IoT networks and do not adhere to IoT protocol requirements, these chosen datasets are appropriate and realistic for IoT network traffic.

Table 14 Summary of representative (realistic) and non-representative (non-realistic) datasets for IoT

6 Discussion of findings

6.1 Comparison with the existing studies

In this section, we compared the performance of the proposed GWO ensemble model with the existing state-of-the-art models in Table 15. The majority of the state-of-the-art model concentrated on the NSLKDD and KDD Cup 99 datasets. These data are unrealistic intrusion detection datasets for the evaluation of IoT systems. They are unsuccessful in practical uses due to the dataset used to train and evaluate the underlying models being non-representative. On the other hand, several existing techniques address these issues but provide low accuracy, DR, precision, ROC, and FAR preventing them from being implemented in commercial systems. Also worthy of mentioning was that the existing state-of-the-art models paid no attention to feature dimensionality; this stage of dimensionality reduction is regarded as the most crucial stage. This phase is particularly time- and labor-intensive. This paper addressed the feature dimensionality phase by proposing a hybridized IG + PCA for dimensionality reduction and provides a novel GWO ensemble model for classification. Additionally, this proposed ensemble model was evaluated on realistic BoT-IoT and UNSW-NB15 datasets, which made it suitable for commercial and industrial applications. As shown in Fig. 3, the best state-of-the-art model provides 100% accuracy on the BoT-IoT data, while the ROC and F-measure were disregarded. On the comparable BoT-IoT data, the proposed innovative voting GWO ensemble model achieved an accuracy of 99.98%, DR of 99.97%, precision of 99.94%, ROC of 99.99%, and FAR of 1.30.

Table 15 Comparison with the state-of-the-art models
Fig. 3
figure 3

Comparison of the proposed models with the existing models

6.2 Computational compatibility across IoT devices

When designing a machine learning model for intrusion detection in IoT environments, it is important to consider the computational compatibility of the proposed model, especially given the heterogeneity in computational power among IoT devices. A model that works well on high-power devices might struggle or be impractical to implement on resource-constrained IoT devices. Imagine a scenario where our proposed model is deployed for real-time anomaly detection in a smart city environment, where various types of IoT devices are utilized, ranging from resource-constrained sensors to more powerful edge devices. In this scenario, the lightweight nature of our voting GWO ensemble model enables seamless integration across these devices. Resource-intensive tasks are offloaded to devise with higher computational power, while less resource-intensive tasks are managed by lower-powered devices. Our model's architecture is designed to dynamically adjust its computational requirements based on the available resources, ensuring effective and efficient operation across the heterogeneous IoT landscape.

6.3 Transferable of the proposed research to real-world IoT applications

Our research is designed with a strong focus on practical applicability in real-world IoT environments. Here are key points highlighting the transferability of our research to real-world IoT applications:

  1. a.

    IoT-Centric Approach We developed our intrusion detection system with a deep understanding of the unique characteristics and challenges of IoT networks. This approach ensures that our research is directly relevant to the specific requirements and constraints of IoT applications.

  2. b.

    Dataset Selection We utilized datasets, such as BoT-IoT and UNSW-NB15, that are representative of real-world IoT network traffic and intrusions. This dataset selection ensures that our research is grounded in the realities of IoT security.

  3. c.

    Hybrid Approach Our research combines feature extraction via principal component analysis (PCA), feature selection via IG, and GWO-based ensemble models. This hybrid approach is designed to enhance the robustness and effectiveness of intrusion detection in real-world IoT scenarios.

  4. d.

    Generalization We conducted experiments and evaluations on multiple datasets to ensure the generalizability of our proposed model to diverse IoT applications. Our research demonstrates the adaptability and transferability of our approach across various IoT contexts.

  5. e.

    Performance Metrics We evaluated our intrusion detection system using well-established performance metrics, such as accuracy, DR, precision, and FAR. These metrics reflect the real-world effectiveness of our approach in identifying and mitigating security threats.

  6. f.

    Scalability We addressed the scalability challenges often encountered in IoT environments, ensuring that our research can handle growing numbers of devices and data volumes while maintaining effectiveness.

  7. g.

    Practical Deployment Considerations We discussed the practical considerations of deploying our intrusion detection system in real-world IoT applications, including the optimization of model parameters and the importance of network segmentation.

  8. h.

    Security Challenges Our research explicitly addresses a range of security challenges and threats in IoT environments, making it directly applicable to scenarios where IoT security is a concern.

This research is built on a foundation that prioritizes real-world relevance and practicality. We have conducted experiments and evaluations that demonstrate the effectiveness and transferability of our IDS to various IoT applications. By addressing the unique challenges of IoT security and employing a hybrid approach that combines feature extraction, feature selection, and optimization techniques, we aim to provide a solution that can be readily applied in real-world IoT environments.

6.4 Threats to validity

The main danger to validity is random sampling, which makes it difficult to duplicate the exact experiment. To validate the suggested approach's reliability, the experiments were repeated on two separate realistic IoT sets of data with a substantial sample size. Finally, while the presented approach performed well in binary-class classification, it deserves additional investigation in the class of multiple classification issues.

7 Conclusion and future work

This paper proposes a novel voting GWO ensemble learning model for the detection of attacks in an IoT environment. The suggested system successfully detects various forms of IoT threats by leveraging the feature set retrieved from the IoT ecosystem. The strength of this paper concentrates on the voting GWO ensemble model, which is the first of its kind, the hybridization of IG + PCA for dimensionality reduction, and the leverage of realistic datasets that reflect real-time attacks in the IoT context. To construct a successful ensemble IDS for detecting IoT attacks, a collection of relevant features was selected. The experimental findings prove that the detection accuracy is increased in the voting GWO ensemble model in the suggested framework using the average probability technique. Our experimental results indicate that our proposed voting ensemble model outperforms other ML and DL approaches in terms of overall accuracy, attaining 100%, DR of 99.99%, precision of 99.59%, ROC of 99.40%, and FAR of 1.15 on the UNSW-NB15 compared to earlier studies. This indicates that our presented method will be extremely beneficial in designing contemporary IDS for the IoT environment. The suggested model will be extended in the future to incorporate multiple class classification problems. Also, the deep learning model to classify the additional forms of attacks may be considered in the future work.