Full length articleEffect of parameter selection on entropy calculation for long walking trials
Introduction
The use of entropy methods to calculate regularity or predictability in a time series has vastly increased over the last twenty years. Claude Shannon was the first scientist to introduce an algorithm for calculating information entropy within a time series [1]. Modifications to the original Shannon entropy algorithm have been made, including approximate [2], [3], sample [4], [5], multiscale [6], [7], increment [8], permutation [9], and multiscale permutation [10] entropy, just to name a few. The use of entropy analysis has been wide ranging from investigations in financial markets to weather patterns to thermodynamics to biology.
Each entropy algorithm requires parameter selection that may or may not affect the calculation of entropy [11], [12]. The m parameter, is the number of data points that are to be compared, sometimes called the window or vector of data. The parameter r, is the set tolerance, sometimes called radius, that is utilized to determine if two vectors are considered similar. The last parameter, N, is the length of the entire data set.
The two most popular entropy algorithms utilized for human movement analysis are approximate entropy (ApEn) and sample entropy (SampEn). ApEn was originally developed as a method to measure regularity within a time series [2], [3], [13]. However, a limitation of ApEn is that it is biased toward regularity because it includes a self-match count to avoid taking the logarithm of zero [4], [5]. Furthermore, it requires fixed parameters when comparing data [13] and lacks relative consistency [13], a term used to describe how stable the output of the algorithm is when input parameter selection is changed slightly. Our previous work has demonstrated that ApEn is more prone to inconsistent output as compared to SampEn when utilized for time series of 200 data points or less [12]. However, both algorithms were sensitive to certain combinations of parameters. When reporting entropy results, it is of highest priority that the results are not an artifact of parameter choice. In addition, due to the sensitivity of ApEn to parameter choice, comparisons between studies and data can only be done if the parameter choices are fixed. This is very difficult, as each data set requires careful selection of parameters based upon that unique set of data. SampEn was developed to overcome the limitations of ApEn [4], [5]. The calculation of SampEn does not include a self-match and the logarithm of the sum of conditional probabilities is taken as compared to ApEn. As stated by Richman and Moorman [5], ApEn calculates probabilities “in a template-wise fashion” whereas, SampEn calculates “the negative logarithm of a probability associated with the time series as a whole” (pp H2042).
One of the most difficult parameters to select is that of the tolerance, r, value. If the tolerance level is large compared to differences in values of sequential points, the probability values used to calculate entropy will be high and vice versa. Therefore, selection of the tolerance has a crucial impact on the outcome. There have been many proposed methods to determine r, including using the standard deviation (SD) of the entire time series [2], [13], standard error of the entropy values [4], and using fixed tolerance values [14], [15].
The length of the data set is a parameter of concern for human movement scientists. When dealing with pathological populations, it is sometimes difficult to obtain long enough data sets in order to perform analysis. Data sets less than 200 points appear to be too short for entropy analysis, yet, it may take up to 2000 data points for stabilization of entropy values [12]. It is currently unclear as to how many data points are needed for reliable entropy analysis.
Another limitation of entropy analysis is the need for collection of uninterrupted data [2], [3]. To collect uninterrupted data, continuous or discrete, and the amount of data required for analysis, many researchers have subjects walk on a treadmill versus walking overground. This allows the researcher to collect uninterrupted steps without the concern of space and/or equipment constraints. However, the treadmill could be considered a constraint as it limits speed fluctuations in walking that are normally present in overground walking. In addition, there are physiological and biomechanical differences between overground and treadmill walking [16], [17], [18], [19], [20], [21], [22], [23], [24], [25]. Nonlinear measures have shown conflicting results regarding the difference in the structure of variability between treadmill and overground walking [26], [27], [28], [29], [30].
Thus, the purpose of this research was to determine the effect of changing parameter values on entropy calculations for long gait data sets (i.e., step time) using two different modes of walking. In addition, to understand the effect of changing parameters on entropy calculations, an examination of tolerance, r, was completed. It was hypothesized that SampEn would maintain relative consistency across all data lengths and be resistant to changes in parameter values.
Section snippets
Materials and methods
Twenty-one subjects participated in this research study. Foot switch data was collected from subjects, but subjects whose foot switch data contained any signal dropout were excluded. Therefore, 14 subjects’ data from the original cohort were included in analysis (7 males; 24.9 ± 4.2 years; 1.71 ± 0.12 m; 69.3 ± 16.8 kg). All participants were in excellent health, had no conditions that would inhibit their ability to walk for one hour, and reported physical activity at or above the currently recommended
rSD
A significant interaction was found between r and N (p = 0.006) and the interaction between N and m was marginally significant (p = 0.05), meaning ApEn differed depending on the combination of r, m, and N (Fig. 2, Fig. 3). Treadmill walking was more regular than overground walking.
For SampEn, there was a significant effect of r (p < 0.001); as r increased, the value of SampEn decreased. When the r was increased to a critical level, the tolerance was large enough to allow for additional matches to be
Discusson
The overall objective of this research study was to determine the effect of changing parameter values of m, r, and N on entropy calculations for long data sets using two different walking conditions. It was hypothesized that SampEn would maintain relative consistency across all data lengths and be resistant to changes in parameter values. Our hypotheses were partially supported.
Relative consistency was defined by Richman and Moorman [5], “if [entropy] of one data set is higher than that of
Conclusion
Overall, selecting a proper combination of r, m, and N should be done to provide relative consistency. For greatest relative consistency of step time data, it was best to use a constant r value and SampEn. We suggest when using a constant r, examine r values of 0 to n*1/sampling rate by iterations of 1/sampling rate, with n being however the different values to be tested. Due to relative consistency issues for both algorithms, we encourage all future work using entropy analysis to publish
Conflict of interest
The authors have no conflicts of interest to disclose.
Acknowledgements
We would like to thank Mr. Eric Pisciotta and Mr. Josh Pickhinke for their assistance with data collections. Funding for this project was provided by NASA Nebraska Space Grant & EPSCoR and the National Institutes of Health (P20 GM109090).
References (31)
- et al.
Multiscale entropy analysis of human gait dynamics
Phys. A-Stat. Mech. Appl.
(2003) - et al.
Permutation and weighted-permutation entropy analysis for the complexity of nonlinear time series
Commun. Nonlin. Sci. Numer. Simulat.
(2016) - et al.
Improved multiscale permutation entropy for biomedical signal analysis: interpretation and application to electroencephalogram recordings
Biomed. Signal Process. Control
(2016) - et al.
The effect of signal acquisition and processing choices on ApEn values: towards a gold standard for distinguishing effort levels from isometric force records
Med. Eng. Phys.
(2014) - et al.
A kinematic comparison of overground and treadmill walking
Clin. Biomech.
(1998) - et al.
Determination of preferred walking speed on treadmill may lead to high oxygen cost on treadmill walking
Gait Posture
(2010) - et al.
Ageing and limb dominance effects on foot-ground clearance during treadmill and overground walking
Clin. Biomech.
(2011) - et al.
Comparison of pelvic complex kinematics during treadmill and overground walking
Arch. Phys. Med. Rehabil.
(2012) - et al.
Familiarisation to treadmill walking in unimpaired older people
Gait Posture
(2005) - et al.
Effect of treadmill walking on the stride interval dynamics of human gait
Gait Posture
(2009)