Next Article in Journal
From Antenna Optimization to MIMO Structures: A Unified Design Framework
Next Article in Special Issue
Understanding Technology Perception in Autism with Separate Analyses for Anxiety and Depression Using Quantum Circuit Simulation Approach
Previous Article in Journal
RACER: A Lightweight Distributed Consensus Algorithm for the IoT with Peer-Assisted Latency-Aware Traffic Optimisation
Previous Article in Special Issue
A Study on Chatbot Development Using No-Code Platforms by People with Disabilities for Their Peers at a Sheltered Workshop
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of Multimodal Sensor Systems for Identifying Basic Walking Activities

by
John C. Mitchell
1,*,
Abbas A. Dehghani-Sanij
1,
Sheng Q. Xie
2 and
Rory J. O’Connor
3,4
1
School of Mechanical Engineering, University of Leeds, Leeds LS2 9JT, UK
2
School of Electronic and Electrical Engineering, University of Leeds, Leeds LS2 9JT, UK
3
Academic Department of Rehabilitation Medicine, University of Leeds, Leeds LS1 3EX, UK
4
NIHR Devices for Dignity, Sheffield Teaching Hospitals NHS Trust, Sheffield S10 2JF, UK
*
Author to whom correspondence should be addressed.
Technologies 2025, 13(4), 152; https://doi.org/10.3390/technologies13040152
Submission received: 24 February 2025 / Revised: 23 March 2025 / Accepted: 1 April 2025 / Published: 10 April 2025

Abstract

:
Falls are a major health issue in societies globally and the second leading cause of unintentional death worldwide. To address this issue, many studies aim to remotely monitor gait to prevent falls. However, these activity data collected in studies must be labelled with the appropriate environmental context through Human Activity Recognition (HAR). Multimodal HAR datasets often achieve high accuracies at the cost of cumbersome sensor systems, creating a need for these datasets to be analysed to identify the sensor types and locations that enable high-accuracy HAR. This paper analyses four datasets, USC-HAD, HuGaDB, Camargo et al.’s dataset, and CSL-SHARE, to find optimal models, methods, and sensors across multiple datasets. Regarding window size, optimal windows are found to be dependent on the sensor modality of a dataset but mostly occur in the 2–5 s range. Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) are found to be the highest-performing models overall. ANNs are further used to create models trained on the features from individual sensors of each dataset. From this analysis, Inertial Measurement Units (IMUs) and three-axis goniometers are shown to be individually capable of high classification accuracy, with Electromyography (EMG) sensors exhibiting inconsistent and reduced accuracies. Finally, it is shown that the thigh is the optimal location for IMU sensors, with accuracy decreasing as IMUs are placed further down away from the thigh.

1. Introduction

Falling is a significant health issue in society. The World Health Organisation (WHO) estimates that each year 37.3 million falls require medical attention, while 684,000 falls are fatal [1], making falls the second leading cause of unintentional death worldwide. Among people who fall, certain groups are at a higher risk due to cognitive or physical impairments, which can be attributed to factors including age [1,2], recent surgery [3], or conditions such as Parkinson’s disease [4], dementia [5], stroke [6], multiple sclerosis [7], and amputation [8].
Many technological developments in recent years have led to an increased capability for monitoring gait in people at a high risk of falling, such as the widespread adoption of smartphones and smartwatches containing sensors, the Internet of Things (IoT) and body sensor networks, and improvements in wearable sensors. With these advances, many studies aim to automate the process of gait analysis by collecting real-time data from wearable sensors during tasks such as level-ground walking, navigating ramps, or ascending and descending stairs [9]. The data from these sensors can be analysed to aid healthcare professionals in diagnosing conditions affecting gait [10], performing gait analysis [11], or for use in detecting fall events so that the severity of future falls can be reduced [9,12,13].
However, to enable remote, real-time gait analysis, the context from which the data are extracted must be provided to the specialist who is reviewing the data. Typically, this context is obtained through the process of Human Activity Recognition (HAR), where classification methods are used to determine walking activity in real time from the collected data [9,14]. As many of these classification methods are supervised [9,14,15,16,17], a training dataset is required to build models capable of identifying activities with high accuracy. Past studies have created such datasets with a wide array of sensors, pre-processing techniques, classification methods, and validation methods, resulting in difficulty determining the most important factors that contribute towards obtaining high accuracy when designing novel sensor systems [9,14,18].
In the literature, Human Activity Recognition (HAR) studies can be separated into two categories that focus on convenience, typically making use of a smartphone or smartwatch [9,19], or accuracy by implementing a multimodal sensor system which can be cumbersome to wear [9,20,21]. In addition to the potential for accuracy, multimodal systems typically collect more appropriate quantities of data for remote gait analysis by allowing the system to collect data from multiple areas of interest through a body sensor network [22].
Existing studies on finding the optimal sliding window parameters for HAR have demonstrated a range of results in different contexts. Banos et al. [23] studied the effect of window size on classification performance for a single dataset featuring accelerometers placed on each thigh, shank, upper arm, and forearm and the back [24]. This work highlights the need for a balance between high accuracy and rapid decision times and finds that larger window sizes do not correlate to increased classification performance, with the optimal window sizes occurring below 2 s using Decision Trees (DTs), K-Nearest Neighbors (KNN), naïve Bayes, and a nearest-centroid classifier. Similarly, Niazi et al. [25] analysed the co-dependency of window size and sample rate to determine what parameters enable the highest classification accuracy using Random Forests (RFs) and a single hip-worn accelerometer. This study found that window sizes of 2–10 s were optimal, contrasting the results of Banos et al. [23]. Both of these studies highlight that future work is needed to consider additional technologies and sensor types. Li et al. [26] discuss the difficulty of determining an optimal window size for a given application, instead choosing to use different window sizes for each activity based on the temporal properties of that activity, which increases classification performance. Finally, Dehghani et al. [27] considered the effects of using overlapping sliding windows against non-overlapping sliding windows with both subject-dependent and subject-independent cross-validation on HAR performance using data collected using inertial sensors with DTs, KNN, naïve Bayes, and a nearest-centroid classifier. This study found that performance across all classifiers was reduced when using subject-independent cross-validation and that, under this condition, the use of overlapping sliding windows did not improve the performance of the models when compared to non-overalpping windows [27].
Regarding sensor placement, Duan et al. [28] placed seven accelerometers on the upper arm, wrists, thighs, and chest to determine how sensor location affected classification accuracy. This study found that sensors placed on the subjects’ dominant side, the right side in all cases for this study, exhibited increased performance, with the right wrist being the highest-performing sensor type when used alone. Furthermore, this study evaluated the use of RF models along with deep learning techniques such as convolutional neural networks, transformers, and long short-term memory models with the latter. Kulchyk et al. [29] analysed the performance of sensors positioned on the sternum, left thigh, right ankle, and right shoulder using a convolutional neural network for both subject-dependent and subject-independent cross-validation. This study found the right ankle to be the optimal sensor location, with multiple pairs of sensors including the ankle sensor resulting in 100% classification accuracy [29]. Finally, Khan et al. [30] placed five sensor nodes consisting of accelerometers and gyroscopes on each forearm, the waist, and each ankle and performed HAR using simple logistic regression, naïve Bayes, and sequential minimal optimisation classifiers. The study found that individual sensor performance was dependent on activity type, with sensors on the chest and thigh being optimal for stationary tasks, whilst sensors on the thigh, lower back, and ankle performed better at movement tasks [30]. Many studies that consider sensor placement for HAR consider only accelerometers or Inertial Measurement Units (IMUs) [28,29,30,31,32], leaving much room for sensor position analysis using additional technologies which can capture motion data.
Overall, these studies highlight a gap in the literature for multi-dataset studies which aim to identify trends in both optimal window size and optimal sensor placement across multiple datasets and with additional motion-related technologies and sensors. As stated by Banos et al. [23], these types of studies form a guideline for future researchers faced with determining sensor locations and sliding window parameters in the future and contribute towards a knowledge database of the interactions between analytical parameters and sensors in HAR using different classifiers so that researchers and system designers can avoid performing lengthy brute-force searches across high-dimensional search spaces for individual applications of HAR.
The contributions of this study, therefore, are to identify these optimal analytical methods, sensor placements, and sensor types which will contribute towards existing knowledge of HAR classification co-dependencies such as window size, sensor type, and sensor location. This novel approach using a normalised cross-comparison of different datasets by controlling variables such as the number of participants, activity types, the sample rate, and window size for the sliding window technique creates a robust analysis that can identify trends with increased generalisability when compared with the current state-of-the-art. Therefore, the results of this study will offer reliable insights into the performance capabilities of individual sensor types and how these differ based on their locations on the body. The results of this analysis will help future researchers effectively design more lightweight sensor systems which decrease the computational burden of HAR while maintaining high levels of accuracy, comfort, and convenience.

2. Materials and Methods

Four datasets were selected for this study which feature a wide variety of sensor systems, an appropriate number of participants for sufficient model generalisation, and walking activities comparable between datasets. A description of each dataset along with the reasons it was chosen for this analysis follows.

2.1. Dataset 1: USC-HAD

The USC-HAD dataset [33] was published in 2012 and features 14 participants with a mean (standard deviation; std) age, height, and weight of 30.1 (std: 7.2) years, 170 (std: 6.8) cm, and 64.6 (std: 12.1) kg, respectively. Each subject was equipped with a single ‘MotionNode’ IMU containing a 3-axis accelerometer, gyroscope, and magnetometer, totalling 9 data channels. The IMU was mounted to the participants’ anterior right hip in a pouch designed for mobile phones. Data were recorded using a laptop which was held under the arm, pressed to the waist by the subject and connected to the IMU via a cable.
The USC-HAD dataset features 12 activities which were performed at the participants’ own pace [33]. These activities were walking forwards, left, and right, walking upstairs and downstairs, running, jumping, sitting, standing, sleeping, and going up and down in a lift.
USC-HAD was chosen because this dataset has been widely explored in the literature since its publication [15,16,34]. Therefore, this dataset acts as a control for the newer datasets to validate the chosen methods and models.

2.2. Dataset 2: HuGaDB

The HuGaDB dataset [35] was published in 2017 and features 18 participants with a mean age, height, and weight of 23.67 (std: 3.69) years, 179.06 (std: 9.85) cm, and 73.44 (std: 16.67) kg, respectively. The sensor system worn by each participant consisted of IMU sensors placed at the thigh, shank, and foot and an Electromyography (EMG) sensor placed on the vastus lateralis, each of which were sampled at around 60 Hz. This setup was mirrored on each leg, for a total of six IMUs and two EMG sensors.
Participants were asked to perform the following 12 activities at a usual pace: walking, running, navigating stairs, sitting (stationary), sitting down, and standing up, standing (stationary), cycling, going up and down in a lift, and sitting in a car [35].

2.3. Dataset 3: Camargo et al.

Camargo et al. [36] created an open-source dataset for the study of lower-limb biomechanics in 2021, featuring 22 healthy participants with a mean age, height, and weight of 21 (std: 3.4) years, 170 (std: 7.0) cm, and 68.3 (std: 10.83) kg, respectively. Subjects were equipped with 11 EMG sensors, 3 goniometers, and 4 six-axis IMUs on their right side only. Sensor locations and sample rates can be found in Table 1.
Whilst participants only performed six basic activities, the transition states were also labelled, raising the activity count to 19 [36]. With the ‘idle’ class removed as no activities were performed, 18 walking activities remained, consisting of six core activities and the transitions between them. These core activities were ramp ascent, ramp descent, stair ascent, stair descent, stand, turning, and walking.

2.4. Dataset 4: CSL-SHARE

CSL-SHARE is a dataset published in 2021 for the purpose of exploring activity recognition for common sport-related movements [37]. The sensor system is a multimodal, knee-mounted system featuring 2 6-axis IMUs placed on the thigh and shank, 4 EMG sensors placed on the vastus medialis, tibialis anterior, biceps femoris, and gastrocnemius, a goniometer placed on the lateral knee, and an airborne microphone. Like the Camargo et al. dataset, these sensors were placed on the right leg only. The CSL-SHARE dataset features 22 activities and was upscaled to 1000Hz due to differing sample rates for the various sensors [37].

2.5. Summary of Datasets

The datasets chosen for this study cover a variety of environments, activities, and sensor configurations. Analysis of the datasets with the same Machine Learning (ML) models and pre-processing methods will provide insight into how sensor configuration and type affect classification accuracy in HAR. A comparison of these datasets can be found in Table 2.

2.6. Dataset Preprocessing

2.6.1. Normalisation Between Datasets

As this study focuses on the sensor types in the HAR datasets, steps were taken to remove the variations between datasets. Of the variables in Table 2, participant numbers, activity types, and sample rates were normalised. To achieve this, the number of participants in each dataset was limited to the minimum number available across all datasets, which was 14, with additional participants being excluded from the datasets where appropriate to maintain a fair comparison between the datasets. For example, in CSL-SHARE, participants 2, 11, and 16 contained different data due to varying protocol versions, device communication issues, and a participant stopping early due to knee pain. As such, these participants were removed, before cropping the number of participants down to 14. Of the activities included in the chosen datasets, only walking, standing, stair ascent, and stair descent were common across all datasets and are activities of interest with respect to fall-related research [38,39]. Therefore, the additional activities were removed from each dataset. Finally, 100 Hz was chosen as the common sample rate, resulting in the sample rate for the Camargo et al. and CSL-SHARE datasets being subsampled to 100 Hz, whilst HuGaDB was interpolated up to 300 Hz with 5th-order polynomial interpolation, before being subsampled to 100 Hz.

2.6.2. Filtering

Before data could be presented to the Machine Learning models, a series of pre-processing steps had to be performed to prepare the data for use by the Machine Learning models. This process began with a 4th-order low-pass Butterworth filter with a cut-off frequency of 7 Hz before windowing and feature extraction occurred. This cut-off frequency was chosen through testing and laid around the 10 Hz mark, which is typical for analyses using inertial sensors [19].

2.7. Feature Extraction

As is typical when performing classification with time-series data, semi-overlapping sliding windows are used to extract statistical features such that a single sample represents a larger time window of raw data. The size of these windows and the amount of overlap varies between studies, with lower window sizes being preferable for real-time classification, whilst larger window sizes consider more of the gait cycle per sample which may result in higher classification accuracies. For this study, a search was performed to identify trends in accuracy from a 1 s to 10 s window size, with a 75% window overlap for each window size. This overlap was chosen to combine co-dependent sliding window parameters and reduce computation times.
For each window of the time-series data, a wide array of statistical features were extracted to enable the ML models to make accurate predictions. There is little consensus on which features are necessary for accurate HAR, with many studies considering a mean of 15 features [15,40,41,42,43,44,45,46]. This analysis included 22 features from each sensor, including commonly chosen features from existing research [15,42,43,44,45,47]. Most of these features were extracted from the raw data in the time domain, with Fourier transforms being used to obtain additional features from the frequency domain. Feature selection methods were then used to eliminate noisy features before classification. This combination of increased feature numbers with appropriate feature selection techniques to accommodate this ensured that relevant data from each sensor were present to allow a sensor-focussed analysis. The list of included features is as follows:
  • Maximum value.
  • Minimum value.
  • Mean.
  • Median.
  • Standard deviation.
  • Mean absolute deviation.
  • Median absolute deviation.
  • Number of zero crossings.
  • Root mean square.
  • Maximum gradient.
  • Kurtosis.
  • Skewness.
  • Variance.
  • Interquartile range.
  • Entropy.
  • Energy.
  • Maximum frequency amplitude.
  • Mean frequency amplitude.
  • Maximum power spectral density.
  • Mean power spectral density.
  • Frequency kurtosis.
  • Frequency skewness.
After feature extraction, the data were split into train and test data by leaving out the data from a single subject. Scikit-Learn’s ‘MinMaxScaler’ function was then fit to the train set and applied separately to the train and test sets to scale each feature between 0 and 1. Principal Component Analysis (PCA) was performed to reduce the number of features. As with the scaler, the PCA was fit to the train set and applied separately to the train and test sets. The number of selected principal components varied for each dataset due to the different features which were dependent on the sensors but was controlled by choosing the minimum amount required to retain 95% of the variance of the full feature set. Finally, another round of scaling was performed to prepare the data for the Machine Learning algorithms.

2.8. Cross-Validation and Test Data

Two methods of cross-validation and testing are prevalent in the literature for gait- and fall-related studies: subject-dependent analysis using Train-Test Split (TTS) cross-validation and subject-independent analysis using Leave-One-Subject-Out (LOSO) cross-validation [27,48]. TTS cross-validation uses a set percentage of the total data from all subjects as test and validation data, whilst LOSO leaves out the data from a specific subject. Each of these methods of cross-validation offers differing advantages and disadvantages, with TTS creating models with higher accuracies at the cost of poor generalisation, whilst LOSO typically creates models with lower accuracies that perform better with data from new subjects. For this study, both TTS and LOSO cross-validations are used to make the results applicable to both types of devices and to be more comparable with existing and future studies.

2.9. Models

For classification, the KNN, Support Vector Machine (SVM), DT, RF, and Artificial Neural Network (ANN) models, an ensemble voting classifier, and an ensemble stacking classifier were chosen due to their prevalence in the literature. Ensemble models were constructed from each of the individual models (KNN, SVM, DT, RF, and ANN), with either a voting or a logistic regression classifier fusing the decisions. This inclusion of a variety of ML models reduced variations in classifier performance that could be introduced due to the various properties of each model, such as how prone they are to overfitting and how dataset size affects their classification performance.
Hyperparameter tuning was performed using 25 iterations of the Scikit-Optimize Bayesian hyperparameter search. All models were trained on a computer with 32 GB of RAM, a 12th Generation Intel i9-12900K processor, and a 12 GB Nvidia RTX 3060 GPU using the Scikit-Learn library for Python version 3.9.18.

2.10. Performance Metrics and Evaluation

To assess the performance of each model, this study considered both macro-average accuracy and the F1-score. While macro-average accuracy provides a straightforward overview of a model by reporting the mean classification accuracy across all classes, it can be misleading in the presence of large class imbalances, as it does not account for differences in class distribution. To address this, the macro-average F1-score was also reported, which provides a more balanced measure of performance across classes. For each dataset, walking was the primary class, with around 10× more walking data than stair ascent and stair descent data. Standing data varied between datasets but were typically around 2–3× more numerous than data in the stair ascent and stair descent classes.

3. Results

To determine the optimal window size for sliding window feature extraction, each model was trained using the PCA-reduced feature set for each window size, ranging from 1 to 10 s. We selected 10 s as the maximum time due to issues with class distributions and the number of samples in each class at larger window sizes. This process was repeated three times for each model to reduce the impact of random initialisations, which can lead to models becoming stuck in local minima during training. The results for subject-dependent cross-validation can be seen in Figure 1, Figure 2, Figure 3 and Figure 4, whilst the results for subject-independent cross-validation can be found in Figure 5, Figure 6, Figure 7 and Figure 8. A full list of performance metrics for each dataset and window size can be found in Appendix A.

3.1. Subject-Dependent Cross-Validation

3.1.1. Determining Optimal Window Sizes

Figure 1 and Figure 2 show the mean performance of each model over the three repeat trials for each window size. The trend lines present in these figures demonstrate an increase in both accuracy and the F1-score with window size for subject-dependent cross-validation using TTS across all models and all datasets. The exceptions to this trend suggest that overfitting may have occurred as the number of samples decreased, with some models decreasing in performance with 9 and 10 s window sizes, where the number of data from each class was at a minimum. This issue was most prevalent with the ANNs among the smaller datasets, whilst the Camargo et al. dataset was the only one in which the ANN performance metrics did not drop at higher window size values. Although performance generally trended upwards with window size, all datasets except for CSL-SHARE, which exhibited 100% accuracy and a 100% F1-score for most models at all window sizes, plateaued at around 4–5 s. Furthermore, CSL-SHARE appeared to exhibit reduced performance at higher window sizes for both the ANN and SVM, likely due to a lack of data.
Figure 3 and Figure 4 show the average highest-performing model among all window sizes, along with the average accuracy and F1-score at each window size across all models. These figures highlight the SVM and the stacking ensemble classifier as the most capable models across all window sizes and that the best model performances occurred at window sizes of 4–8 s.
Regarding the individual (non-ensemble) highest-performing model, all models performed fairly similarly between datasets, with the SVM being the only model that performed significantly higher than others with average accuracies of 99.6%, 83.7%, 99.8%, and 100% and average F1-scores of 99.7%, 90.9%, 99.8%, and 100% on each of the USC-HAD, Camargo et al., HuGaDB, and CSL-SHARE datasets, respectively. However, these results also suggest there may be an issue with the Camargo et al. dataset, as the average accuracies for all models and window sizes were far more reduced for this dataset when compared with the others. An overview of the highest-performing individual models can be found in Table 3.

3.1.2. Individual Sensor Analysis

The optimal window sizes for each dataset were used to determine the sensor importance for achieving high accuracies among the four core activities. As USC-HAD contained just a single sensor, it was excluded from this analysis. Due to its high performance across all datasets, and due to the SVM failing to converge on these reduced datasets, an ANN was trained to classify between the four activities using data from individual sensors.
Table 4, Table 5 and Table 6 show the precision, recall, F1-score, and accuracy of the ANN trained from features extracted from each sensor in the Camargo et al., HuGaDB, and CSL-SHARE datasets, respectively. These tables highlight IMUs as the most effective individual sensors, exhibiting accuracies of 87.4–100% and F1-scores of 74.4–100% across all datasets. Goniometers also appear as high-performing sensors, with the three-axis goniometers at the hip and ankle in the Camargo et al. dataset exhibiting performance metrics marginally lower than those of the IMUs, with accuracies of 86.8% and 87.4% and F1-scores of 74.2% and 70.8%, respectively. Following the three-axis goniometers, both the Camargo et al. and CSL-SHARE datasets feature two-axis goniometers at the knee, which enabled accuracies of 74.2% and 99.6%, respectively. However, with an F1-score of just 44.5% for the Camargo et al. knee goniometer, this may suggest that two-axis goniometers lacked the data dimensionality for high-accuracy HAR. Finally, the EMG sensors exhibited the lowest performance metrics across all datasets. Among the EMG sensors, placement heavily affected classification accuracy, with the vastus lateralis and biceps femoris performing extremely poorly, whilst the tibialis anterior, soleus, gastrocnemius, and vastus medialis generally outperformed EMG sensors placed on other muscles. However, even the highest-performing EMG sensors in each dataset exhibit F1-scores significantly lower than those of the IMUs.

3.2. Subject-Independent Cross-Validation

3.2.1. Determining Optimal Window Sizes

Figure 5 shows the performance trends of each model at each window size for the four datasets in this study using LOSO cross-validation. The maximum accuracy for USC-HAD occurred at a 10 s window size with the SVM exhibiting an accuracy of 91.9% and an F1-score of 81.2%, whilst the Camargo et al. dataset achieved a maximum accuracy of 80.8% and an F1-score of 85.2% at 9 s using the ANN. Both the CSL-SHARE and HuGaDB datasets achieved a 100% classification accuracy and an F1-score with multiple model types at 1 and 2 s, respectively, which was maintained up to a window size of 10 s. The DT, RF, and KNN models performed erratically across all datasets and window sizes, which caused the stacking and voting ensemble methods to underperform when compared to the ANN and SVM.
Figure 7 shows the mean accuracies across all time windows and models. From Figure 7a, the SVMs and ANNs appear as the classifiers with the highest classification accuracy where there is a statistically significant difference between classifier performances, with the SVMs achieving 79.1%, 68.4%, 98.8%, and 99.9% accuracies and F1-scores of 66.9%, 66.1%, 99.2%, and 100%, whilst the ANNs achieved 75.4%, 73.6%, 99.9%, and 100% accuracies and F1-scores of 66.3%, 76.7%, 99.9% and 100% on each of the USC-HAD, Camargo et al., HuGaDB, and CSL-SHARE datasets, respectively. As such, the ANN and SVM can clearly be identified as the highest-performing model types across all datasets, as seen in Table 3. Concerning window size, each dataset presented a different window size at which the maximum mean accuracy occurred. For USC-HAD, the highest mean accuracy and F1-score across all models occurred at 2–3 s window sizes, whilst for the Camargo et al. dataset, these occurred at 5 s, both of which were similar to the time at which model accuracy plateaued using subject-dependent cross-validation. Both HuGaDB and CSL-SHARE achieved accuracies of 100% with several models, but due to the lower accuracies with other models, their highest mean performances occurred at 8 s for HuGaDB and any value from 3 to 10 s for CSL-SHARE.

3.2.2. Individual Sensor Analysis

As with the subject-dependent individual sensor analysis, the ANN was trained on the features extracted from each individual sensor. Table 7, Table 8 and Table 9 show the performance metrics for each sensor used in the Camargo et al., HuGaDB, and CSL-SHARE datasets, respectively. Like with the subject-dependent analysis, the IMUs achieved the highest accuracies across two of the three datasets, whilst the EMG sensors exhibited consistently poor performances. In this scenario, performance metrics were generally reduced, with only the EMG sensors placed on the gastrocnemius medialis and gluteus medius for the Camargo et al. dataset and the vastus medialis for the CSL-SHARE dataset achieving accuracies and F1-scores above 50%. The three-axis goniometers on the hip from the Camargo et al. dataset exhibited higher performance metrics than the IMUs in this case, with the ankle goniometer outperforming all but the foot IMU, whilst the two-axis goniometers positioned on the knee in the Camargo et al. and CSL-SHARE datasets exhibited much lower performance metrics.
Overall, the trends among these sensors were largely the same as with the subject-dependent analysis, with the main difference being the high performance of the three-axis goniometers, along with an overall reduction in accuracy for the two-axis goniometers and EMG sensors, further highlighting the volatility of performance when using these sensors.

4. Discussions

The results of the window size analysis did not exhibit a consistent peak or plateau, with accuracies appearing volatile across the four datasets for each window size and trend lines displaying misaligned peaks. Furthermore, the averaging of accuracies across all models at each window size showed no clear single optimal window size across the four datasets and methods of cross-validation.
It must be noted that the performance metrics of the Camargo et al. dataset did not align with the other multimodal datasets in terms of overall classification accuracy. These systems all made use of the same six-axis IMU positioned on the thigh, yet the Camargo et al. dataset achieved significantly reduced accuracies when trained on only this sensor when compared to HuGaDB and CSL-SHARE. Given the large number of controlled variables in this study, this indicates a difference in experimental procedure or activity data distribution, which negatively affects the results of the Camargo et al. dataset. Figure 9a shows the confusion matrix for an SVM trained on the Camargo et al. dataset, which shows that the misclassifications are between the stair ascend and stair descend classes. This is also shown not to be caused by sample weighting, as Figure 9b,c show the confusion matrices for the HuGaDB and CSL-SHARE datasets, respectively, which feature more extreme sample weightings than the Camargo et al. dataset whilst achieving 100% accuracy.
Figure 9 highlights SVMs as the most effective individual models for HAR using subject-dependent cross-validation, with ANNs proving more effective when using subject-independent cross-validation. This is likely due to the tendency for ANNs to overfit, which was further pronounced by the use of a TTS in creating test data for subject-dependent cross-validation, whereas SVMs typically perform well in these scenarios due to the maximisation of the margin when creating a decision boundary.
For subject-dependent cross-validation, peak accuracies occurred at smaller window sizes, ranging from 2–5 s. The trend lines in Figure 1 and Figure 5 also exhibit rises in accuracy for some models as they approach a 10-s window size, indicating that, if the dataset contains enough samples in each class for this to be viable, larger window sizes offer richer features which lead to higher classification accuracies. For subject-independent cross-validation, the highest-performing model accuracies occurred at 2, 3, 5, and 10 s for the HuGaDB, CSL-SHARE, Camargo et al., and USC-HAD datasets, respectively. Apart from USC-HAD, this further highlights the range of 2–5 s as an effective range of window sizes in achieving high classification accuracy for the core activities of HAR.
Aside from the Camargo et al. dataset, the multimodal datasets achieved much higher classification accuracies when using the same models and window sizes, which allowed high accuracies to be obtained with much smaller window sizes. This has significant implications when considering the delay time, portability, and convenience of systems, as increasing the number of sensors can enable high-accuracy HAR using very computationally inexpensive methods such as DT. These computationally low-cost methods can also allow designers of real-time HAR systems to incorporate low-power computational devices with reduced size profiles and battery consumption, therefore increasing the comfort and convenience of the devices. Additionally, the fact that high accuracies can be obtained in multimodal systems with low window sizes means that much faster response times can be achieved for real-time HAR systems, as some models trained on the CSL-SHARE dataset achieved 100% accuracy using just 1 s windows with a 0.25 s fixed delay time caused by the step size. Whilst it was shown that accuracy at each window size was dependent on the sensor types used in each dataset, further work is needed to identify how model performance varies with window size for each individual sensor type. This will enable the building of a knowledge database to help future researchers choose a window size given a sensor system without the need for lengthy, brute-force approaches to finding the most appropriate window size, combination of sensors, and choice of model for each novel dataset produced in this field.
Regarding individual sensor types, the IMUs and three-axis goniometers generally exhibited the highest accuracies, followed by the two-axis goniometers and finally the EMG sensors. Among IMU locations, accuracy varied among the different locations, with no clear ranking between all datasets. Only the Camargo et al. and CSL-SHARE datasets featured goniometers, with the three-axis goniometers at the thigh and ankle in the Camargo et al. dataset showing large performance improvements over the two-axis goniometers located on the knee in both the Camargo et al. and CSL-SHARE datasets. Goniometers are low-power devices with fewer data dimensions than IMUs which can be incorporated into smart clothing devices to improve comfort and convenience. Given the competitive performance of goniometers in this study, three-axis goniometers should be considered in future datasets and HAR systems. On the other hand, EMG sensor performance was volatile between locations and datasets, which may be due to differences in filtering methods, varying placements on muscles, or changes in experimental procedures. As such, it is not currently possible to compare the locations of these sensors, particularly with so few datasets for reference. More datasets are required to accurately rank the locations of these sensors so that the impact of differences in experimental setup can be minimised.
Regarding the sample rates of each dataset, no correlation was present between the native sample rates of each dataset and the final classification accuracy, with the HuGaDB dataset exhibiting far higher accuracies than USC-HAD and the Camargo et al. dataset, despite having the lowest native sample rate of 60 Hz. As such, whilst sample rate is expected to have an effect at even lower values, 60 Hz can be considered a sufficient sample rate for high-accuracy HAR.
These results align with the findings of Banos et al. [23], who found that increased window size does not necessarily increase activity classification performance across many datasets. However, our study also offers insight into the reason for this assumption, with subject-dependent cross-validation demonstrating this pattern until accuracy and F1-score began to reduce at larger window size values due to insufficient sample sizes. Crucially, this work considers both subject-dependent and subject-independent methods of cross-validation, which highlights how the choice of cross-validation method impacts the selection of an optimal window size, which was not considered in the study [23]. Niazi et al. [25] considered the effect of window size and sample rate on classification accuracy using an RF classifier, where it was reported that window sizes could appear optimal between 2–10 s using subject-dependent cross-validation. Our results support these findings and demonstrate that this also applies to additional classical Machine Learning models such as the ANN, SVM, KNN, and DT. Duan et al. [28] considered the optimal placement of sensors using deep learning techniques for a single dataset, finding that sensors placed on the right leg exhibited increased performance. Our results align with the findings of this study, with the HuGaDB dataset demonstrating that, when subject-independent cross-validation was used, the performance metrics of the right leg were higher than those of the left. Finally, Khan et al. [30] report that sensor performance is dependent on the activities being performed in the dataset. By removing the variation between datasets, our study controlled for this factor, resulting in a reliable ranking of sensor locations that achieved high performances and offer future researchers the information necessary to build effective HAR systems.
Finally, this study featured several limitations due to the computational cost of performing this analysis. The first of these limitations was the lack of investigation into the effects of window step size, which was set to 25% of the total window size. This could have been set to a fixed time value for all window sizes or have been individually analysed to explore the co-dependent effects of step size and window size. Furthermore, the availability of datasets which feature a sufficiently large number of participants and sensors, along with the core activities included in this study, was limited, resulting in the inclusion of just four datasets.

5. Conclusions and Future Work

This study is the first of its kind in providing a bias-reduced, normalised, cross-dataset analysis to determine and rank the highest-performing sensor types for Human Activity Recognition. First, ANNs were found to be the highest-performing models across multiple multimodal HAR datasets, closely followed by SVMs, with the optimal window size being in the range of 2–5 s when using the semi-non-overlapping sliding window approach to feature engineering with a 75% overlap. Where datasets were large enough to reduce the impact of class imbalance, or models were sufficiently powerful to generalise with smaller sample numbers, accuracies were also shown to trend upwards with larger window sizes of 9–10 s. Regarding the contributions of individual sensor types to classification accuracy, IMUs placed on the thigh and three-axis goniometers on the thigh and ankle were the overall largest contributors to high-accuracy HAR, whilst EMG sensors were found to exhibit volatile accuracies which was likely due to the difficulty in ensuring that the sensors were in the same place and calibrated equally for different subjects. It remains appropriate for researchers to collect large HAR datasets and to investigate alternative methods of HAR using multimodal sensor systems and smart clothing to investigate how the size and inconvenience of these systems can be minimised whilst maintaining high accuracy using low-computational-complexity classification methods.
This study was limited by the scarcity of open multimodal gait datasets with large numbers of sensors and common activities. As a result, future work in this area should consider more datasets, activities (including fall-related activities), and sensor types to investigate how classifier performance in HAR is affected by these properties. Additionally, elements such as step size, the proportion of data for each activity, and time-series features should be investigated for their contribution towards achieving efficient and convenient high-accuracy HAR. Finally, the time and space complexity of these algorithms should be considered under the various window sizes to evaluate the feasibility of deploying these optimised models in real-world HAR applications.

Author Contributions

Conceptualisation, J.C.M., A.A.D.-S., S.Q.X., R.J.O.; methodology, J.C.M.; software, J.C.M.; validation, J.C.M.; formal analysis, J.C.M.; investigation, J.C.M.; resources, J.C.M.; data curation, J.C.M.; writing—original draft preparation, J.C.M.; writing—review and editing, A.A.D.-S., S.Q.X., R.J.O.; visualisation, J.C.M.; supervision, A.A.D.-S., S.Q.X., R.J.O.; project administration, A.A.D.-S., S.Q.X., R.J.O.; funding acquisition, A.A.D.-S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the United Kingdom Research and Innovation (UKRI)—Engineering and Physical Sciences Research Council (EPSRC) (grant number EP/T517860/1).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article and available upon request by contacting the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ANNArtificial Neural Network
DTDecision Tree
EMGElectromyography
HARHuman Activity Recognition
IMUInertial Measurement Unit
IoTInternet of Things
KNNK-Nearest Neighbors
LOSOLeave-One-Subject-Out
MLMachine Learning
PCAPrincipal Component Analysis
RFRandom Forest
SVMSupport Vector Machine
TTSTrain–Test Split

Appendix A

Figure A1 and Figure A2 along with Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7 and Table A8 show the performance metrics of each dataset and method of cross-validation, including the mean, standard deviation, and 95% confidence intervals.
Figure A1. Trend graphs showing the macro-averaged performance metrics across all models and window sizes for the four datasets in this analysis when using TTS cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Figure A1. Trend graphs showing the macro-averaged performance metrics across all models and window sizes for the four datasets in this analysis when using TTS cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Technologies 13 00152 g0a1
Figure A2. Trend graphs showing the macro-averaged performance metrics across all models and window sizes for the four datasets in this analysis when using TTS cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Figure A2. Trend graphs showing the macro-averaged performance metrics across all models and window sizes for the four datasets in this analysis when using TTS cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Technologies 13 00152 g0a2
Table A1. Performance metrics for the USC HAD dataset using TTS cross-validation with 95% confidence intervals.
Table A1. Performance metrics for the USC HAD dataset using TTS cross-validation with 95% confidence intervals.
Window (s)MetricMeanStdCI LowCI High
1Accuracy0.95770.03280.92730.9881
Precision0.96540.03410.93390.9970
Recall0.95770.03280.92730.9881
F1-score0.96130.03320.93060.9920
2Accuracy0.97490.02400.95270.9971
Precision0.97910.02620.95491.0034
Recall0.97490.02400.95270.9971
F1-score0.97690.02480.95400.9999
3Accuracy0.98200.02240.96131.0027
Precision0.98380.02210.96331.0043
Recall0.98200.02240.96131.0027
F1-score0.98290.02220.96241.0034
4Accuracy0.98850.01850.97141.0057
Precision0.98860.01970.97041.0068
Recall0.98850.01850.97141.0057
F1-score0.98860.01910.97091.0062
5Accuracy0.99100.01720.97501.0069
Precision0.99080.01860.97361.0080
Recall0.99100.01720.97501.0069
F1-score0.99090.01790.97431.0074
6Accuracy0.98890.01260.97731.0005
Precision0.98950.01580.97501.0041
Recall0.98890.01260.97731.0005
F1-score0.98920.01400.97621.0022
7Accuracy0.99070.01460.97721.0042
Precision0.98860.01910.97101.0063
Recall0.99070.01460.97721.0042
F1-score0.98960.01680.97401.0051
8Accuracy0.98930.01550.97491.0036
Precision0.99060.01470.97711.0042
Recall0.98930.01550.97491.0036
F1-score0.98990.01520.97581.0040
9Accuracy0.98940.01520.97541.0035
Precision0.98950.01660.97411.0049
Recall0.98940.01520.97541.0035
F1-score0.98940.01590.97471.0041
10Accuracy0.98980.01700.97411.0055
Precision0.99090.01740.97491.0070
Recall0.98980.01700.97411.0055
F1-score0.99030.01720.97441.0062
Table A2. Performance metrics for the USC HAD dataset using LOSO cross-validation with 95% confidence intervals.
Table A2. Performance metrics for the USC HAD dataset using LOSO cross-validation with 95% confidence intervals.
Window (s)MetricMeanStdCI LowCI High
1Accuracy0.60830.13530.48310.7334
Precision0.59920.11170.49590.7025
Recall0.60830.13530.48310.7334
F1-score0.55890.14240.42720.6907
2Accuracy0.65210.13170.53030.7739
Precision0.63190.18310.46260.8012
Recall0.65210.13170.53030.7739
F1-score0.58170.15630.43720.7262
3Accuracy0.65790.19000.48210.8336
Precision0.60920.14160.47830.7402
Recall0.65790.19000.48210.8336
F1-score0.57050.18040.40360.7373
4Accuracy0.60040.14810.46350.7374
Precision0.60650.11630.49900.7141
Recall0.60040.14810.46350.7374
F1-score0.55610.15240.41520.6970
5Accuracy0.57950.14510.44530.7137
Precision0.61810.17260.45840.7777
Recall0.57950.14510.44530.7137
F1-score0.50790.14430.37450.6414
6Accuracy0.53720.16010.38910.6852
Precision0.59280.15110.45300.7326
Recall0.53720.16010.38910.6852
F1-score0.45860.14530.32410.5930
7Accuracy0.57990.14150.44900.7108
Precision0.54340.17460.38190.7049
Recall0.57990.14150.44900.7108
F1-score0.49820.12050.38680.6096
8Accuracy0.52790.19380.34860.7071
Precision0.54550.14160.41450.6764
Recall0.52790.19380.34860.7071
F1-score0.43040.17270.27060.5901
9Accuracy0.60610.18170.43800.7741
Precision0.56630.18640.39390.7387
Recall0.60610.18170.43800.7741
F1-score0.52270.16610.36910.6764
10Accuracy0.61500.20310.42710.8028
Precision0.58950.20520.39970.7793
Recall0.61500.20310.42710.8028
F1-score0.54480.19250.36680.7228
Table A3. Performance metrics for the CSL-SHARE dataset using TTS cross-validation with 95% confidence intervals.
Table A3. Performance metrics for the CSL-SHARE dataset using TTS cross-validation with 95% confidence intervals.
Window (s)MetricMeanStdCI LowCI High
1Accuracy0.99840.00150.99700.9998
Precision0.99880.00120.99770.9999
Recall0.99880.00120.99770.9999
F1-score0.99880.00120.99770.9999
2Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
3Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
4Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
5Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
6Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
7Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
8Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
9Accuracy0.97780.05140.93021.0254
Precision0.99040.02220.96981.0109
Recall0.98960.02410.96741.0119
F1-score0.98910.02550.96551.0127
10Accuracy0.92770.18390.75761.0977
Precision0.95230.12230.83911.0654
Recall0.96070.10000.86821.0532
F1-score0.94690.13660.82061.0732
Table A4. Performance metrics for the CSL-SHARE dataset using LOSO cross-validation with 95% confidence intervals.
Table A4. Performance metrics for the CSL-SHARE dataset using LOSO cross-validation with 95% confidence intervals.
Window (s)MetricMeanStdCI LowCI High
1Accuracy0.97620.06090.91991.0325
Precision0.98370.04210.94481.0227
Recall0.98410.04120.94601.0222
F1-score0.98380.04190.94511.0225
2Accuracy0.97080.07730.89931.0423
Precision0.97620.06290.91811.0344
Recall0.96830.08400.89061.0459
F1-score0.97010.07900.89711.0432
3Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
4Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
5Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
6Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
7Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
8Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
9Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
10Accuracy1.00000.0000N/AN/A
Precision1.00000.0000N/AN/A
Recall1.00000.0000N/AN/A
F1-score1.00000.0000N/AN/A
Table A5. Performance metrics for the Camargo et al. dataset using TTS cross-validation with 95% confidence intervals.
Table A5. Performance metrics for the Camargo et al. dataset using TTS cross-validation with 95% confidence intervals.
Window (s)MetricMeanStdCI LowCI High
1Accuracy0.73790.03470.70580.7701
Precision0.82940.02740.80410.8547
Recall0.83940.02500.81630.8625
F1-score0.83360.02620.80930.8579
2Accuracy0.79390.03110.76510.8227
Precision0.87640.02120.85680.8960
Recall0.87960.01900.86210.8972
F1-score0.87760.02030.85890.8964
3Accuracy0.79160.03310.76100.8223
Precision0.88190.01990.86340.9003
Recall0.88300.01860.86580.9003
F1-score0.87840.02490.85540.9013
4Accuracy0.81580.03290.78540.8462
Precision0.90120.01850.88400.9183
Recall0.90020.01790.88370.9168
F1-score0.89850.01940.88050.9164
5Accuracy0.79940.03030.77140.8274
Precision0.88270.01880.86530.9001
Recall0.88360.01690.86790.8992
F1-score0.88130.02050.86230.9002
6Accuracy0.81400.03430.78230.8458
Precision0.89810.01800.88150.9148
Recall0.89590.02050.87690.9149
F1-score0.89080.03170.86150.9201
7Accuracy0.81590.04240.77670.8552
Precision0.89520.02380.87320.9172
Recall0.89580.02360.87400.9176
F1-score0.89320.02650.86870.9177
8Accuracy0.81510.03590.78190.8484
Precision0.90660.02000.88810.9251
Recall0.90570.01940.88770.9237
F1-score0.90420.01910.88660.9219
9Accuracy0.80580.02660.78110.8304
Precision0.91250.01590.89780.9272
Recall0.90460.01370.89190.9173
F1-score0.90010.02280.87900.9211
10Accuracy0.80930.02230.78870.8300
Precision0.90180.01130.89130.9123
Recall0.90140.01160.89060.9121
F1-score0.90100.01180.89010.9120
Table A6. Performance metrics for the Camargo et al. dataset using LOSO cross-validation with 95% confidence intervals.
Table A6. Performance metrics for the Camargo et al. dataset using LOSO cross-validation with 95% confidence intervals.
Window (s)MetricMeanStdCI LowCI High
1Accuracy0.58440.09410.49730.6714
Precision0.57600.18250.40720.7448
Recall0.61940.19480.43920.7996
F1-score0.55170.19660.36990.7336
2Accuracy0.63510.09170.55030.7199
Precision0.68920.07380.62100.7575
Recall0.69830.08830.61670.7800
F1-score0.64950.09010.56620.7328
3Accuracy0.66230.03820.62700.6976
Precision0.69610.05640.64400.7482
Recall0.71830.09350.63180.8047
F1-score0.66130.08820.57980.7429
4Accuracy0.65310.08170.57750.7287
Precision0.71350.07650.64280.7842
Recall0.75070.05980.69540.8059
F1-score0.68960.09000.60630.7728
5Accuracy0.69700.07030.63200.7620
Precision0.74280.08710.66230.8234
Recall0.78790.05020.74140.8343
F1-score0.73390.07350.66600.8019
6Accuracy0.68650.06300.62820.7448
Precision0.73850.10090.64520.8318
Recall0.77940.04470.73800.8208
F1-score0.72490.06570.66410.7857
7Accuracy0.64630.05310.59710.6954
Precision0.65380.01970.63560.6720
Recall0.75470.03950.71820.7913
F1-score0.67930.04300.63960.7191
8Accuracy0.66220.05130.61480.7096
Precision0.66380.01830.64690.6808
Recall0.77170.03340.74090.8026
F1-score0.70100.03600.66780.7343
9Accuracy0.67350.13380.54970.7972
Precision0.68430.12050.57290.7958
Recall0.78900.07400.72060.8575
F1-score0.72250.09650.63320.8117
10Accuracy0.64290.12410.52800.7577
Precision0.64650.09430.55930.7337
Recall0.76570.08180.69010.8413
F1-score0.68430.10250.58950.7791
Table A7. Performance metrics for the HuGaDB dataset using TTS cross-validation with 95% confidence intervals.
Table A7. Performance metrics for the HuGaDB dataset using TTS cross-validation with 95% confidence intervals.
Window (s)MetricMeanStdCI LowCI High
1Accuracy0.98260.01230.97120.9940
Precision0.98460.01260.97290.9963
Recall0.98260.01230.97120.9940
F1-score0.98360.01240.97210.9951
2Accuracy0.98990.00890.98170.9982
Precision0.99040.00860.98250.9984
Recall0.98990.00890.98170.9982
F1-score0.99020.00870.98220.9982
3Accuracy0.99240.00580.98700.9978
Precision0.99310.00670.98690.9993
Recall0.99240.00580.98700.9978
F1-score0.99270.00620.98700.9984
4Accuracy0.99660.00350.99340.9999
Precision0.99480.00520.99000.9996
Recall0.99660.00350.99340.9999
F1-score0.99570.00430.99170.9997
5Accuracy0.99560.00320.99270.9985
Precision0.99580.00380.99220.9993
Recall0.99560.00320.99270.9985
F1-score0.99570.00350.99240.9989
6Accuracy0.99520.00560.99001.0004
Precision0.99780.00340.99471.0010
Recall0.99520.00560.99001.0004
F1-score0.99650.00450.99241.0007
7Accuracy0.99430.00430.99030.9983
Precision0.99270.01040.98311.0023
Recall0.99430.00430.99030.9983
F1-score0.99350.00740.98661.0003
8Accuracy0.99530.00490.99080.9998
Precision0.99550.00430.99150.9995
Recall0.99530.00490.99080.9998
F1-score0.99540.00450.99120.9995
9Accuracy0.99560.00570.99031.0009
Precision0.99590.00800.98861.0033
Recall0.99560.00570.99031.0009
F1-score0.99580.00680.98951.0020
10Accuracy0.99540.00710.98891.0020
Precision0.99540.00750.98841.0023
Recall0.99540.00710.98891.0020
F1-score0.99530.00730.98851.0021
Table A8. Performance metrics for the HuGaDB dataset using LOSO cross-validation with 95% confidence intervals.
Table A8. Performance metrics for the HuGaDB dataset using LOSO cross-validation with 95% confidence intervals.
Window (s)MetricMeanStdCI LowCI High
1Accuracy0.97890.02440.95641.0015
Precision0.97770.02790.95191.0035
Recall0.97890.02440.95641.0015
F1-score0.97810.02510.95491.0013
2Accuracy0.98520.02440.96271.0077
Precision0.98270.02840.95641.0090
Recall0.98520.02440.96271.0077
F1-score0.98280.02670.95811.0075
3Accuracy0.99170.01150.98111.0024
Precision0.98460.03220.95481.0144
Recall0.99170.01150.98111.0024
F1-score0.98740.02040.96851.0063
4Accuracy0.98840.01880.97111.0058
Precision0.97140.06740.90901.0337
Recall0.98840.01880.97111.0058
F1-score0.97580.05290.92691.0247
5Accuracy0.99670.00470.99241.0011
Precision0.98860.02170.96851.0087
Recall0.99670.00470.99241.0011
F1-score0.99210.01350.97961.0046
6Accuracy0.99810.00260.99561.0005
Precision0.98890.01890.97151.0064
Recall0.99810.00260.99561.0005
F1-score0.99320.01120.98291.0035
7Accuracy0.99750.00500.99291.0021
Precision0.98450.03130.95561.0135
Recall0.99750.00500.99291.0021
F1-score0.98990.02080.97071.0091
8Accuracy0.99920.00200.99741.0010
Precision0.99420.01370.98151.0069
Recall0.99920.00200.99741.0010
F1-score0.99660.00810.98911.0041
9Accuracy0.99810.00390.99451.0018
Precision0.98780.02480.96491.0107
Recall0.99810.00390.99451.0018
F1-score0.99260.01530.97841.0067
10Accuracy0.98750.03270.95731.0177
Precision0.99510.01040.98551.0047
Recall0.98750.03270.95731.0177
F1-score0.99050.02350.96871.0123

References

  1. World Health Organization (WHO). Falls; WHO: Geneva, Switzerland, 2021. [Google Scholar]
  2. Pfortmueller, C.A.; Lindner, G.; Exadaktylos, A.K. Reducing fall risk in the elderly: Risk factors and fall prevention, a systematic review. Minerva Med. 2014, 105, 275–281. [Google Scholar] [PubMed]
  3. Lo, C.W.T.; Tsang, W.W.N.; Yan, C.H.; Lord, S.R.; Hill, K.D.; Wong, A.Y.L. Risk factors for falls in patients with total hip arthroplasty and total knee arthroplasty: A systematic review and meta-analysis. Osteoarthr. Cartil. 2019, 27, 979–993. [Google Scholar] [CrossRef] [PubMed]
  4. Fasano, A.; Canning, C.G.; Hausdorff, J.M.; Lord, S.; Rochester, L. Falls in Parkinson’s disease: A complex and evolving picture. Mov. Disord. 2017, 32, 1524–1536. [Google Scholar] [CrossRef] [PubMed]
  5. Härlein, J.; Dassen, T.; Halfens, R.J.G.; Heinze, C. Fall risk factors in older people with dementia or cognitive impairment: A systematic review. J. Adv. Nurs. 2009, 65, 922–933. [Google Scholar] [CrossRef]
  6. Batchelor, F.A.; Mackintosh, S.F.; Said, C.M.; Hill, K.D. Falls after stroke. Int. J. Stroke 2012, 7, 482–490. [Google Scholar] [CrossRef]
  7. Gunn, H.J.; Newell, P.; Haas, B.; Marsden, J.F.; Freeman, J.A. Identification of risk factors for falls in multiple sclerosis: A systematic review and meta-analysis. Phys. Ther. 2013, 93, 504–513. [Google Scholar] [CrossRef]
  8. Hunter, S.W.; Batchelor, F.; Hill, K.D.; Hill, A.M.; Mackintosh, S.; Payne, M. Risk factors for falls in people with a lower limb amputation: A systematic review. PM R 2017, 9, 170–180.e1. [Google Scholar] [CrossRef]
  9. Wang, Y.; Cang, S.; Yu, H. A survey on wearable sensor modality centred human activity recognition in health care. Expert Syst. Appl. 2019, 137, 167–190. [Google Scholar] [CrossRef]
  10. Zhao, H.; Wang, R.; Qi, D.; Xie, J.; Cao, J.; Liao, W.H. Wearable gait monitoring for diagnosis of neurodegenerative diseases. Measurement 2022, 202, 111839. [Google Scholar] [CrossRef]
  11. Chen, S.; Lach, J.; Lo, B.; Yang, G.Z. Toward pervasive gait analysis with wearable sensors: A systematic review. IEEE J. Biomed. Health Inform. 2016, 20, 1521–1537. [Google Scholar] [CrossRef]
  12. Hu, X.; Qu, X. Pre-impact fall detection. Biomed. Eng. Online 2016, 15, 61. [Google Scholar] [CrossRef] [PubMed]
  13. Tamura, T.; Yoshimura, T.; Sekine, M.; Uchida, M.; Tanaka, O. A wearable airbag to prevent fall injuries. IEEE Trans. Inf. Technol. Biomed. 2009, 13, 910–914. [Google Scholar] [CrossRef] [PubMed]
  14. De-La-Hoz-Franco, E.; Ariza-Colpas, P.; Quero, J.M.; Espinilla, M. Sensor-Based Datasets for Human Activity Recognition—A Systematic Review of Literature. IEEE Access 2018, 6, 59192–59210. [Google Scholar] [CrossRef]
  15. Nguyen, H.D.; Tran, K.P.; Zeng, X.; Koehl, L.; Tartare, G. Wearable Sensor Data Based Human Activity Recognition using Machine Learning: A new approach. arXiv 2019, arXiv:1905.03809. [Google Scholar] [CrossRef]
  16. Murad, A.; Pyun, J.Y. Deep Recurrent Neural Networks for Human Activity Recognition. Sensors 2017, 17, 2556. [Google Scholar] [CrossRef]
  17. Jiang, W.; Yin, Z. Human Activity Recognition Using Wearable Sensors by Deep Convolutional Neural Networks. In Proceedings of the 23rd ACM International Conference on Multimedia (MM ’15), Brisbane, Australia, 26–30 October 2015; pp. 1307–1310. [Google Scholar] [CrossRef]
  18. Das Antar, A.; Ahmed, M.; Ahad, M.A.R. Challenges in Sensor-based Human Activity Recognition and a Comparative Analysis of Benchmark Datasets: A Review. In Proceedings of the 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Spokane, WA, USA, 30 May–2 June 2019; pp. 134–139. [Google Scholar] [CrossRef]
  19. Straczkiewicz, M.; James, P.; Onnela, J.P. A systematic review of smartphone-based human activity recognition methods for health research. NPJ Digit. Med. 2021, 4, 148. [Google Scholar] [CrossRef]
  20. Chung, S.; Lim, J.; Noh, K.J.; Kim, G.; Jeong, H. Sensor Data Acquisition and Multimodal Sensor Fusion for Human Activity Recognition Using Deep Learning. Sensors 2019, 19, 1716. [Google Scholar] [CrossRef]
  21. Diraco, G.; Rescio, G.; Siciliano, P.; Leone, A. Review on human action recognition in smart living: Sensing Technology, Multimodality, Real-time Processing, Interoperability, and resource-Constrained Processing. Sensors 2023, 23, 5281. [Google Scholar] [CrossRef]
  22. Majumder, S.; Mondal, T.; Deen, M.J. Wearable sensors for remote health monitoring. Sensors 2017, 17, 130. [Google Scholar] [CrossRef]
  23. Banos, O.; Galvez, J.M.; Damas, M.; Pomares, H.; Rojas, I. Window Size Impact in Human Activity Recognition. Sensors 2014, 14, 6474–6499. [Google Scholar] [CrossRef]
  24. Baños, O.; Damas, M.; Pomares, H.; Rojas, I.; Tóth, M.A.; Amft, O. A benchmark dataset to evaluate sensor displacement in activity recognition. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing (UbiComp ’12), Pittsburgh, PA, USA, 5–8 September 2012; pp. 1026–1035. [Google Scholar] [CrossRef]
  25. Niazi, A.H.; Yazdansepas, D.; Gay, J.L.; Maier, F.W.; Ramaswamy, L.; Rasheed, K.; Buman, M.P. Statistical Analysis of Window Sizes and Sampling Rates in Human Activity Recognition. In Proceedings of the HEALTHINF, Porto, Portugal, 21–23 February 2017; pp. 319–325. [Google Scholar]
  26. Li, H.; Abowd, G.D.; Plötz, T. On specialized window lengths and detector based human activity recognition. In Proceedings of the 2018 ACM International Symposium on Wearable Computers (ISWC ’18), Singapore, 8–12 October 2018; pp. 68–71. [Google Scholar] [CrossRef]
  27. Dehghani, A.; Sarbishei, O.; Glatard, T.; Shihab, E. A Quantitative Comparison of Overlapping and Non-Overlapping Sliding Windows for Human Activity Recognition Using Inertial Sensors. Sensors 2019, 19, 5026. [Google Scholar] [CrossRef]
  28. Duan, Y.; Fujinami, K. Effect of Combinations of Sensor Positions on Wearable-sensor-based Human Activity Recognition. Sens. Mater. 2023, 35, 2175–2193. [Google Scholar] [CrossRef]
  29. Kulchyk, J.; Etemad, A. Activity Recognition with Wearable Accelerometers using Deep Convolutional Neural Network and the Effect of Sensor Placement. In Proceedings of the 2019 IEEE SENSORS, Montreal, QC, Canada, 27–30 October 2019; pp. 1–4. [Google Scholar] [CrossRef]
  30. Khan, M.U.S.; Abbas, A.; Ali, M.; Jawad, M.; Khan, S.U.; Li, K.; Zomaya, A.Y. On the Correlation of Sensor Location and Human Activity Recognition in Body Area Networks (BANs). IEEE Syst. J. 2018, 12, 82–91. [Google Scholar] [CrossRef]
  31. Maurer, U.; Smailagic, A.; Siewiorek, D.; Deisher, M. Activity recognition and monitoring using multiple sensors on different body positions. In Proceedings of the International Workshop on Wearable and Implantable Body Sensor Networks (BSN’06), Cambridge, MA, USA, 3–5 April 2006; pp. 4–116. [Google Scholar] [CrossRef]
  32. Orha, I.; Oniga, S. Study regarding the optimal sensors placement on the body for human activity recognition. In Proceedings of the 2014 IEEE 20th International Symposium for Design and Technology in Electronic Packaging (SIITME), Bucharest, Romania, 23–26 October 2014; pp. 203–206. [Google Scholar] [CrossRef]
  33. Zhang, M.; Sawchuk, A.A. USC-HAD: A daily activity dataset for ubiquitous activity recognition using wearable sensors. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing (UbiComp ’12), Pittsburgh, PA, USA, 5–8 September 2012; pp. 1036–1043. [Google Scholar] [CrossRef]
  34. Han, C.; Zhang, L.; Tang, Y.; Huang, W.; Min, F.; He, J. Human activity recognition using wearable sensors by heterogeneous convolutional neural networks. Expert Syst. Appl. 2022, 198, 116764. [Google Scholar] [CrossRef]
  35. Chereshnev, R.; Kertesz-Farkas, A. HuGaDB: Human Gait Database for Activity Recognition from Wearable Inertial Sensor Networks. arXiv 2017, arXiv:1705.08506. Available online: http://arxiv.org/abs/1705.08506 (accessed on 16 September 2023).
  36. Camargo, J.; Ramanathan, A.; Flanagan, W.; Young, A. A comprehensive, open-source dataset of lower limb biomechanics in multiple conditions of stairs, ramps, and level-ground ambulation and transitions. J. Biomech. 2021, 119, 110320. [Google Scholar] [CrossRef]
  37. Liu, H.; Hartmann, Y.; Schultz, T. CSL-SHARE: A Multimodal Wearable Sensor-Based Human Activity Dataset. Front. Comput. Sci. 2021, 3, 90. [Google Scholar] [CrossRef]
  38. Gay, J.L.; Cherof, S.A.; LaFlamme, C.C.; O’Connor, P.J. Psychological Aspects of Stair Use: A Systematic Review. Am. J. Lifestyle Med. 2019, 16, 109–121. [Google Scholar] [CrossRef]
  39. Bridenbaugh, S.A.; Kressig, R.W. Laboratory Review: The Role of Gait Analysis in Seniors’ Mobility and Fall Prevention. Gerontology 2010, 57, 256–264. [Google Scholar] [CrossRef]
  40. Zhang, M.; Sawchuk, A. A Feature Selection-Based Framework for Human Activity Recognition Using Wearable Multimodal Sensors. In Proceedings of the 6th International ICST Conference on Body Area Networks (BodyNets ’11), Beijing, China, 7–8 November 2011; ACM: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  41. Li, F.; Shirahama, K.; Nisar, M.; Köping, L.; Grzegorzek, M. Comparison of Feature Learning Methods for Human Activity Recognition Using Wearable Sensors. Sensors 2018, 18, 679. [Google Scholar] [CrossRef]
  42. Zurbuchen, N.; Bruegger, P.; Wilde, A. A comparison of machine learning algorithms for fall detection using wearable sensors. In Proceedings of the 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 19–21 February 2020. [Google Scholar]
  43. Zia, U.; Khalil, W.; Khan, S.; Ahmad, I.; Khan, M.N. Towards human activity recognition for ubiquitous health care using data from awaist-mounted smartphone. Turk. J. Electr. Eng. Comput. Sci. 2020, 28, 646–663. [Google Scholar] [CrossRef]
  44. Zhang, H.; Chen, Z.; Zanotto, D.; Guo, Y. Robot-Assisted and Wearable Sensor-Mediated Autonomous Gait Analysis. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020. [Google Scholar] [CrossRef]
  45. Reches, T.; Dagan, M.; Herman, T.; Gazit, E.; Gouskova, N.; Giladi, N.; Manor, B.; Hausdorff, J. Using Wearable Sensors and Machine Learning to Automatically Detect Freezing of Gait during a FOG-Provoking Test. Sensors 2020, 20, 4474. [Google Scholar] [CrossRef] [PubMed]
  46. Chen, Z.; Zhang, L.; Cao, Z.; Guo, J. Distilling the Knowledge From Handcrafted Features for Human Activity Recognition. IEEE Trans. Ind. Inform. 2018, 14, 4334–4342. [Google Scholar] [CrossRef]
  47. Hassan, M.M.; Uddin, M.Z.; Mohamed, A.; Almogren, A. A robust human activity recognition system using smartphone sensors and deep learning. Future Gener. Comput. Syst. 2018, 81, 307–313. [Google Scholar] [CrossRef]
  48. Ferrari, A.; Micucci, D.; Mobilio, M.; Napoletano, P. On the Personalization of Classification Models for Human Activity Recognition. IEEE Access 2020, 8, 32066–32079. [Google Scholar] [CrossRef]
Figure 1. Trend graphs showing the mean accuracy across all models and window sizes for the four datasets in this analysis when using TTS cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Figure 1. Trend graphs showing the mean accuracy across all models and window sizes for the four datasets in this analysis when using TTS cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Technologies 13 00152 g001
Figure 2. Trend graphs showing the mean F1-score across all models and window sizes for the four datasets in this analysis when using TTS cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Figure 2. Trend graphs showing the mean F1-score across all models and window sizes for the four datasets in this analysis when using TTS cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Technologies 13 00152 g002
Figure 3. Model and window size effect on classification accuracy across all four datasets using TTS cross-validation. The highest-performing model for each dataset and window size is marked in bold. (a) Average accuracy for each model across all window sizes for each dataset. (b) Average accuracy across all models at each window size from 1 to 10 s for each dataset.
Figure 3. Model and window size effect on classification accuracy across all four datasets using TTS cross-validation. The highest-performing model for each dataset and window size is marked in bold. (a) Average accuracy for each model across all window sizes for each dataset. (b) Average accuracy across all models at each window size from 1 to 10 s for each dataset.
Technologies 13 00152 g003
Figure 4. Model and window size effect on F1-score across all four datasets using TTS cross-validation. The highest-performing model for each dataset and window size is marked in bold. (a) Average F1-score for each model across all window sizes for each dataset. (b) Average F1-score across all models at each window size from 1 to 10 s for each dataset.
Figure 4. Model and window size effect on F1-score across all four datasets using TTS cross-validation. The highest-performing model for each dataset and window size is marked in bold. (a) Average F1-score for each model across all window sizes for each dataset. (b) Average F1-score across all models at each window size from 1 to 10 s for each dataset.
Technologies 13 00152 g004
Figure 5. Trend graphs showing the mean accuracy across all models and window sizes for the four datasets in this analysis when using LOSO cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Figure 5. Trend graphs showing the mean accuracy across all models and window sizes for the four datasets in this analysis when using LOSO cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Technologies 13 00152 g005
Figure 6. Trend graphs showing the mean F1-score across all models and window sizes for the four datasets in this analysis when using LOSO cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Figure 6. Trend graphs showing the mean F1-score across all models and window sizes for the four datasets in this analysis when using LOSO cross-validation. (a) USC-HAD. (b) Camargo et al. (c) HuGaDB. (d) CSL-SHARE.
Technologies 13 00152 g006aTechnologies 13 00152 g006b
Figure 7. Model and window size effect on classification accuracy across all four datasets using LOSO cross-validation. The highest-performing model for each dataset and window size is marked in bold. (a) Average accuracy for each model across all window sizes for each dataset. (b) Average accuracy across all models at each window size from 1 to 10 s for each dataset.
Figure 7. Model and window size effect on classification accuracy across all four datasets using LOSO cross-validation. The highest-performing model for each dataset and window size is marked in bold. (a) Average accuracy for each model across all window sizes for each dataset. (b) Average accuracy across all models at each window size from 1 to 10 s for each dataset.
Technologies 13 00152 g007
Figure 8. Model and window size effect on F1-score across all four datasets using LOSO cross-validation. The highest-performing model for each dataset and window size is marked in bold. (a) Average F1-score for each model across all window sizes for each dataset. (b) Average F1-score across all models at each window size from 1 to 10 s for each dataset.
Figure 8. Model and window size effect on F1-score across all four datasets using LOSO cross-validation. The highest-performing model for each dataset and window size is marked in bold. (a) Average F1-score for each model across all window sizes for each dataset. (b) Average F1-score across all models at each window size from 1 to 10 s for each dataset.
Technologies 13 00152 g008
Figure 9. Confusion matrices of an SVM trained on data from a single EMG sensor using LOSO cross-validation. (a) Camargo et al. Vastus Lateralis. (b) HuGaDB Vastus Lateralis. (c) CSL-SHARE Vastus Medialis EMG.
Figure 9. Confusion matrices of an SVM trained on data from a single EMG sensor using LOSO cross-validation. (a) Camargo et al. Vastus Lateralis. (b) HuGaDB Vastus Lateralis. (c) CSL-SHARE Vastus Medialis EMG.
Technologies 13 00152 g009
Table 1. The sensor type, position, and sample rate of each sensor in the Camargo et al. dataset.
Table 1. The sensor type, position, and sample rate of each sensor in the Camargo et al. dataset.
SensorPositionSample Rate
Hip
GoniometerKnee1000 Hz
Trunk
Inertial Measurement UnitTrunk200 Hz
Thigh
Shank
Foot
Gastrocnemius Medialis
Tibialis Anterior
Soleus
Vastus Medialis
Vastus Lateralis
Electromyography SensorRectus Femoris1000 Hz
Biceps femoris
Semitendinosus
Gracilis
Gluteus Medius
Right External Oblique
Table 2. A summary of the properties of each dataset in this analysis.
Table 2. A summary of the properties of each dataset in this analysis.
Dataset FeaturesUSC-HADCamargo et al.HuGaDBCSL-SHARE
Participants14221820
Mean Age (Years)30.12123.6730.5
Mean Height (cm)170170179.06N/A
Mean Weight (kg)64.668.373.44N/A
IMU Sensors1462
EMG Sensors01124
Goniometers0301
Acoustic Sensors0001
Activities12181222
Sample Rate100 Hz200 Hz/1000 Hz60 Hz100 Hz/1000 Hz
Table 3. Maximum accuracy, precision, recall, and F1-Score for each dataset, non-ensemble model, and method of cross-validation.
Table 3. Maximum accuracy, precision, recall, and F1-Score for each dataset, non-ensemble model, and method of cross-validation.
DatasetModelWindow Size (s)Acc (%)Prec (%)Rec (%)F1-Score (%)
USC-HAD TTSSVM599.9099.7399.9099.81
USC-HAD LOSOSVM1091.8979.2991.8981.17
Camargo et al. TTSSVM486.1592.5692.5292.51
Camargo et al. LOSOANN580.4186.6686.0685.19
HuGaDB TTSSVM499.9799.8299.9799.90
HuGaDB LOSOANN2100100100100
CSL-SHARE TTSALL2100100100100
CSL-SHARE LOSOALL3100100100100
Table 4. Subject-dependent performance metrics of each individual sensor in the Camargo et al. dataset.
Table 4. Subject-dependent performance metrics of each individual sensor in the Camargo et al. dataset.
SensorPrecisionRecallF1-ScoreAccuracy
Trunk IMU0.8010.7980.7990.897
Thigh IMU0.7530.7510.7440.874
Shank IMU0.7780.7690.7720.881
Foot IMU0.8140.7870.7740.894
Gastrocnemius Medialis EMG0.7160.6300.6210.758
Tibialis Anterior EMG0.6360.5470.5230.755
Soleus EMG0.6760.6200.6290.774
Vastus Medialis EMG0.4590.4930.4700.652
Vastus Lateralis EMG0.1580.2560.1690.458
Rectus Femoris EMG0.1850.2520.2120.374
Biceps Femoris EMG0.2960.3480.3020.561
Semitendinosus EMG0.2160.2960.2420.423
Gracilis EMG0.7630.4600.4560.652
Gluteus Medius EMG0.3480.3570.3160.577
Right External Oblique EMG0.3720.3720.3360.594
Ankle Goniometer0.7410.7470.7080.874
Knee Goniometer0.4100.5000.4450.742
Hip Goniometer0.7530.7440.7420.868
Table 5. Subject-dependent performance metrics of each individual sensor in the HuGaDB dataset.
Table 5. Subject-dependent performance metrics of each individual sensor in the HuGaDB dataset.
SensorPrecisionRecallF1-ScoreAccuracy
Right Thigh IMU0.9900.9940.9920.995
Left Thigh IMU0.9930.9960.9950.997
Right Shank IMU0.9950.9970.9960.998
Left Shank IMU0.9890.9900.9890.993
Right Foot IMU0.9730.9790.9760.987
Left Foot IMU0.9780.9840.9810.991
Right Vastus Lateralis EMG0.6690.5090.5060.775
Left Vastus Lateralis EMG0.5970.4780.4570.783
Table 6. Subject-dependent performance metrics of each individual sensor in the CSL-SHARE dataset.
Table 6. Subject-dependent performance metrics of each individual sensor in the CSL-SHARE dataset.
SensorPrecisionRecallF1-ScoreAccuracy
Vastus Medialis EMG0.6910.6990.6950.661
Tibialis Anterior EMG0.6590.6480.6440.592
Biceps Femoris EMG0.4300.3830.3910.367
Gastrocnemius EMG0.5820.5500.5340.475
Airborne Microphone0.5500.5360.5340.454
Thigh IMU1.0001.0001.0001.000
Shank IMU1.0001.0001.0001.000
Knee Goniometer0.9970.9960.9970.996
Table 7. Subject-independent performance metrics of each individual sensor in the Camargo et al. dataset.
Table 7. Subject-independent performance metrics of each individual sensor in the Camargo et al. dataset.
SensorPrecisionRecallF1-ScoreAccuracy
Trunk IMU0.7810.7870.7540.787
Thigh IMU0.2990.5470.3860.547
Shank IMU0.6800.7200.6790.720
Foot IMU0.7950.8000.7880.800
Gastrocnemius Medialis EMG0.5130.6000.5320.600
Tibialis Anterior EMG0.2720.2270.2260.227
Soleus EMG0.5990.3470.3810.347
Vastus Medialis EMG0.1100.1730.1200.173
Vastus Lateralis EMG0.4530.3070.3610.307
Rectus Femoris EMG0.0720.1470.0850.147
Biceps Femoris EMG0.4750.4000.4090.400
Semitendinosus EMG0.4040.3070.3070.307
Gracilis EMG0.0800.1730.1100.173
Gluteus Medius EMG0.5480.6670.5560.667
Right External Oblique EMG0.4190.1870.1570.187
Ankle Goniometer0.7380.8000.7590.800
Knee Goniometer0.2850.2670.2290.267
Hip Goniometer0.9270.8800.8590.880
Table 8. Subject-independent performance metrics of each individual sensor in the HuGaDB dataset.
Table 8. Subject-independent performance metrics of each individual sensor in the HuGaDB dataset.
SensorPrecisionRecallF1-ScoreAccuracy
Right Thigh IMU1.0001.0001.0001.000
Left Thigh IMU0.9700.9660.9660.984
Right Shank IMU1.0001.0001.0001.000
Left Shank IMU0.9760.9970.9860.992
Right Foot IMU0.9530.9600.9520.982
Left Foot IMU0.8740.8240.7790.923
Right Vastus Lateralis EMG0.2110.2900.2290.478
Left Vastus Lateralis EMG0.4280.3300.3300.726
Table 9. Subject-independent performance metrics of each individual sensor in the CSL-SHARE dataset.
Table 9. Subject-independent performance metrics of each individual sensor in the CSL-SHARE dataset.
SensorPrecisionRecallF1-ScoreAccuracy
Vastus Medialis EMG0.8460.6340.6240.757
Tibialis Anterior EMG0.4750.3750.3320.456
Biceps Femoris EMG0.3660.3610.2700.417
Gastrocnemius EMG0.3000.4580.3540.573
Airborne Microphone0.5250.5170.4750.427
Thigh IMU0.9920.9930.9920.990
Shank IMU0.9350.9310.9240.903
Knee Goniometer0.8840.7670.7060.738
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mitchell, J.C.; Dehghani-Sanij, A.A.; Xie, S.Q.; O’Connor, R.J. Analysis of Multimodal Sensor Systems for Identifying Basic Walking Activities. Technologies 2025, 13, 152. https://doi.org/10.3390/technologies13040152

AMA Style

Mitchell JC, Dehghani-Sanij AA, Xie SQ, O’Connor RJ. Analysis of Multimodal Sensor Systems for Identifying Basic Walking Activities. Technologies. 2025; 13(4):152. https://doi.org/10.3390/technologies13040152

Chicago/Turabian Style

Mitchell, John C., Abbas A. Dehghani-Sanij, Sheng Q. Xie, and Rory J. O’Connor. 2025. "Analysis of Multimodal Sensor Systems for Identifying Basic Walking Activities" Technologies 13, no. 4: 152. https://doi.org/10.3390/technologies13040152

APA Style

Mitchell, J. C., Dehghani-Sanij, A. A., Xie, S. Q., & O’Connor, R. J. (2025). Analysis of Multimodal Sensor Systems for Identifying Basic Walking Activities. Technologies, 13(4), 152. https://doi.org/10.3390/technologies13040152

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop