Elsevier

Gait & Posture

Volume 60, February 2018, Pages 128-134
Gait & Posture

Full length article
Effect of parameter selection on entropy calculation for long walking trials

https://doi.org/10.1016/j.gaitpost.2017.11.023Get rights and content

Highlights

  • Parameter selection influences both approximate and sample entropy.

  • Multiplying r times the standard deviation can influence relative consistency.

  • Several parameters must be examined and reported.

Abstract

It is sometimes difficult to obtain uninterrupted data sets that are long enough to perform nonlinear analysis, especially in pathological populations. It is currently unclear as to how many data points are needed for reliable entropy analysis. The aims of this study were to determine the effect of changing parameter values of m, r, and N on entropy calculations for long gait data sets using two different modes of walking (i.e., overground versus treadmill). Fourteen young adults walked overground and on a treadmill at their preferred walking speed for one-hour while step time was collected via heel switches. Approximate (ApEn) and sample entropy (SampEn) were calculated using multiple parameter combinations of m, N, and r. Further, r was tested under two cases r*standard deviation and r constant. ApEn differed depending on the combination of r, m, and N. ApEn demonstrated relative consistency except when m = 2 and the smallest r values used (rSD = 0.015*SD, 0.20*SD; rConstant = 0 and 0.003). For SampEn, as r increased, SampEn decreased. When r was constant, SampEn demonstrated excellent relative consistency for all combinations of r, m, and N. When r constant was used, overground walking was more regular than treadmill. However, treadmill walking was found to be more regular when using rSD for both ApEn and SampEn. For greatest relative consistency of step time data, it was best to use a constant r value and SampEn. When using entropy, several r values must be examined and reported to ensure that results are not an artifact of parameter choice.

Introduction

The use of entropy methods to calculate regularity or predictability in a time series has vastly increased over the last twenty years. Claude Shannon was the first scientist to introduce an algorithm for calculating information entropy within a time series [1]. Modifications to the original Shannon entropy algorithm have been made, including approximate [2], [3], sample [4], [5], multiscale [6], [7], increment [8], permutation [9], and multiscale permutation [10] entropy, just to name a few. The use of entropy analysis has been wide ranging from investigations in financial markets to weather patterns to thermodynamics to biology.

Each entropy algorithm requires parameter selection that may or may not affect the calculation of entropy [11], [12]. The m parameter, is the number of data points that are to be compared, sometimes called the window or vector of data. The parameter r, is the set tolerance, sometimes called radius, that is utilized to determine if two vectors are considered similar. The last parameter, N, is the length of the entire data set.

The two most popular entropy algorithms utilized for human movement analysis are approximate entropy (ApEn) and sample entropy (SampEn). ApEn was originally developed as a method to measure regularity within a time series [2], [3], [13]. However, a limitation of ApEn is that it is biased toward regularity because it includes a self-match count to avoid taking the logarithm of zero [4], [5]. Furthermore, it requires fixed parameters when comparing data [13] and lacks relative consistency [13], a term used to describe how stable the output of the algorithm is when input parameter selection is changed slightly. Our previous work has demonstrated that ApEn is more prone to inconsistent output as compared to SampEn when utilized for time series of 200 data points or less [12]. However, both algorithms were sensitive to certain combinations of parameters. When reporting entropy results, it is of highest priority that the results are not an artifact of parameter choice. In addition, due to the sensitivity of ApEn to parameter choice, comparisons between studies and data can only be done if the parameter choices are fixed. This is very difficult, as each data set requires careful selection of parameters based upon that unique set of data. SampEn was developed to overcome the limitations of ApEn [4], [5]. The calculation of SampEn does not include a self-match and the logarithm of the sum of conditional probabilities is taken as compared to ApEn. As stated by Richman and Moorman [5], ApEn calculates probabilities “in a template-wise fashion” whereas, SampEn calculates “the negative logarithm of a probability associated with the time series as a whole” (pp H2042).

One of the most difficult parameters to select is that of the tolerance, r, value. If the tolerance level is large compared to differences in values of sequential points, the probability values used to calculate entropy will be high and vice versa. Therefore, selection of the tolerance has a crucial impact on the outcome. There have been many proposed methods to determine r, including using the standard deviation (SD) of the entire time series [2], [13], standard error of the entropy values [4], and using fixed tolerance values [14], [15].

The length of the data set is a parameter of concern for human movement scientists. When dealing with pathological populations, it is sometimes difficult to obtain long enough data sets in order to perform analysis. Data sets less than 200 points appear to be too short for entropy analysis, yet, it may take up to 2000 data points for stabilization of entropy values [12]. It is currently unclear as to how many data points are needed for reliable entropy analysis.

Another limitation of entropy analysis is the need for collection of uninterrupted data [2], [3]. To collect uninterrupted data, continuous or discrete, and the amount of data required for analysis, many researchers have subjects walk on a treadmill versus walking overground. This allows the researcher to collect uninterrupted steps without the concern of space and/or equipment constraints. However, the treadmill could be considered a constraint as it limits speed fluctuations in walking that are normally present in overground walking. In addition, there are physiological and biomechanical differences between overground and treadmill walking [16], [17], [18], [19], [20], [21], [22], [23], [24], [25]. Nonlinear measures have shown conflicting results regarding the difference in the structure of variability between treadmill and overground walking [26], [27], [28], [29], [30].

Thus, the purpose of this research was to determine the effect of changing parameter values on entropy calculations for long gait data sets (i.e., step time) using two different modes of walking. In addition, to understand the effect of changing parameters on entropy calculations, an examination of tolerance, r, was completed. It was hypothesized that SampEn would maintain relative consistency across all data lengths and be resistant to changes in parameter values.

Section snippets

Materials and methods

Twenty-one subjects participated in this research study. Foot switch data was collected from subjects, but subjects whose foot switch data contained any signal dropout were excluded. Therefore, 14 subjects’ data from the original cohort were included in analysis (7 males; 24.9 ± 4.2 years; 1.71 ± 0.12 m; 69.3 ± 16.8 kg). All participants were in excellent health, had no conditions that would inhibit their ability to walk for one hour, and reported physical activity at or above the currently recommended

rSD

A significant interaction was found between r and N (p = 0.006) and the interaction between N and m was marginally significant (p = 0.05), meaning ApEn differed depending on the combination of r, m, and N (Fig. 2, Fig. 3). Treadmill walking was more regular than overground walking.

For SampEn, there was a significant effect of r (p < 0.001); as r increased, the value of SampEn decreased. When the r was increased to a critical level, the tolerance was large enough to allow for additional matches to be

Discusson

The overall objective of this research study was to determine the effect of changing parameter values of m, r, and N on entropy calculations for long data sets using two different walking conditions. It was hypothesized that SampEn would maintain relative consistency across all data lengths and be resistant to changes in parameter values. Our hypotheses were partially supported.

Relative consistency was defined by Richman and Moorman [5], “if [entropy] of one data set is higher than that of

Conclusion

Overall, selecting a proper combination of r, m, and N should be done to provide relative consistency. For greatest relative consistency of step time data, it was best to use a constant r value and SampEn. We suggest when using a constant r, examine r values of 0 to n*1/sampling rate by iterations of 1/sampling rate, with n being however the different values to be tested. Due to relative consistency issues for both algorithms, we encourage all future work using entropy analysis to publish

Conflict of interest

The authors have no conflicts of interest to disclose.

Acknowledgements

We would like to thank Mr. Eric Pisciotta and Mr. Josh Pickhinke for their assistance with data collections. Funding for this project was provided by NASA Nebraska Space Grant & EPSCoR and the National Institutes of Health (P20 GM109090).

References (31)

  • L.V. Ojeda et al.

    Influence of contextual task constraints on preferred stride parameters and their variabilities during human walking

    Med. Eng. Phys.

    (2015)
  • C. Shannon

    A mathematical theory of communication

    Bell Syst. Tech. J.

    (1948)
  • Pincus S. Approximate

    Entropy as a measure of system-Complexity

    Proc. Natl. Acad. Sci. U. S. A.

    (1991)
  • S. Pincus et al.

    Approximate entrop – statistical properties and applications

    Commun. Stat.-Theory Methods

    (1992)
  • D.E. Lake et al.

    Sample entropy analysis of neonatal heart rate variability

    Am. J. Physiol. Regul. Integr. Comp. Physiol.

    (2002)
  • Cited by (0)

    View full text