1 Introduction

Since the seminal work of Mandelbrot and Van Ness [27], the characterization of data in terms of fractal properties has found near ubiquitous and enduring use in diverse research areas, including research within the fields of engineering [48], hydrology [21, 50], geology [4, 34], physics [40], space science [7, 41], medicine [17, 28], economics [13], financial markets [44] and many more. Fractal properties in nature and human dynamics arguably have served to yield increased understanding and improvement on human society.

Higuchi’s method [18] is a widely applied time-domain technique to determine fractal properties of complex non-periodic, nonstationary physical data [12, 35, 49]. That is, the method can accurately calculate the fractal dimension of time series. Higuchi initially developed it to study large-scale turbulent fluctuations of the interplanetary magnetic field. It is a modification to the method of Burlaga and Klein [5] in which fluctuation properties of turbulent space plasmas can be studied beyond the inertial range. It is simple to implement, efficient, and can rapidly achieve accurate and stable values of fractal dimension, even in noisy, nonstationary data [25]. The fractal dimension calculated via the Higuchi method is called the Higuchi fractal dimension (HFD). Since its initial development, the Higuchi method has been applied to numerous fields of research. In medicine, for instance, it has found widespread use to detect and classify epileptic EEG signals [26], human locomotion [36], and in engineering, it has been used to detect faults in rolling bearings [48]. One difficulty in using the Higuchi method is that certain parameters must be applied to the method, and inappropriate parameter selection results in spurious calculation of fractal properties. Although the method has been used for decades and is widely employed at present, there is an absence of consensus of the appropriate method to determine the parameters that must be input. In this paper, we expose this weakness of the Higuchi method so that there is wider appreciation of its limits and suggests how to solve the drawbacks of this method when applied to different types of scientific data.

The HFD computed depends on the length of the time series, and an internal tuning factor kmax. Higuchi’s original paper did not elaborate on the selection of the tuning factor but illustrated the method with kmax = 211 for time series having length N = 217. Subsequent authors used similar values for the tuning factor but we will show that the tuning factor plays a crucial role in estimation the HFD. Higuchi’s method, if applied appropriately, can reliably find the time series fractal dimension. However, if the tuning factor is incorrectly selected, the method is compromised from the outset.

How is the researcher to determine the appropriate tuning factor for their study that will optimize the calculation of a stable HFD, if it exists? In addition, how does the selection of the factor influence the value of the computed HFD? The literature is vague in answering these questions, and to do so is the main thrust of our research. Multiple studies have addressed the issue of proper selection of tuning factor kmax. Accardo et al. [2] applied the method in their study of electroencephalograms and sought the most suitable pair of (kmax, N). They experimented with kmax = 3–10 on time series with lengths from N = 50–1000 and settled on an optimum kmax = 6. Some papers recommend plotting the HFD versus a range of kmax, and then selecting the appropriate kmax at the location where the calculated HFD approaches a local maximum or asymptote, which can be considered a saturation point [11, 39]. However, there is no reason that in every instance, the HFD will reach a saturation point. Paramanathan and Uthayakumar [31] proposed to determine the tuning factor kmax based on a size–measure relationship that employed a recursive length of the signal from different scales of measurement. Gomolka et al. [16] selected kmax on the basis of statistical tests that allowed the best discrimination between already known diabetic and healthy subjects. But in the absence of such additional data between systems in different dynamic states (e.g. health or pathology), how can one select the correct tuning parameter?

In this paper, we will try to answer these questions in a general way that is helpful to the community of researchers who utilize the Higuchi method. The organization of the paper is as follows. We will generate artificial time series with well-specified fractal dimension and then compare the HFD computed from these data for different values of the tuning parameter kmax. We will demonstrate the results on several examples of physical data.

2 Data and method

In order to investigate the optimization of the Higuchi method, we turn to the generation of synthetic time series with known fractal properties, to see how well the method performs. One difficulty resides in the production of truly fractal time series of given dimension, which is a non-trivial task [20]. Therefore, studies must concern themselves with the adequacy of the data-generating algorithms in addition to the fractal dimension estimation algorithms. We will consider synthetic time series realizations of processes with perfect and controlled scale invariance, viz. signals that have only a single type of scaling. Many other theoretical data types exist that have been used to analyze signals that lack local scaling regularity, but rather have a regularity which varies in time or space [24, 42]. There is also a recent effort to generalize the Higuchi method to distinguish monofractal from multifractal dynamics based on relatively short time series [6].

In this paper, we will limit the research to study of well-understood synthetic data with monofractal scaling. To illustrate how a monofractal scaling exponent can be derived, we consider fractional Brownian motion (fBm) which is characterized by a single stable fractal dimension and is a continuous-time random process [27]. Next, we research these data and compare the fractal dimension recovered using the Higuchi algorithm with the theoretical fractal dimension. The synthetic data time series can be written in terms of stochastic integrals of time integrations of fractional Gaussian noise:

$${\varvec{B}}_{{\varvec{H}}} \left( {\varvec{t}} \right) = \frac{1}{{{{\varvec{\Gamma}}}\left( {{\varvec{H}} + \frac{1}{2}} \right)}}\left\{ {\mathop \int \limits_{ - \infty }^{0} [({\varvec{t}} - {\varvec{s}})^{{{\varvec{H}} - \frac{1}{2}}} - \left( { - {\varvec{s}})^{{{\varvec{H}} - \frac{1}{2}}} } \right]{\varvec{dW}}\left( {\varvec{s}} \right) + \mathop \int \limits_{0}^{{\varvec{t}}} ({\varvec{t}} - {\varvec{s}})^{{{\varvec{H}} - \frac{1}{2}}} {\varvec{dW}}\left( {\varvec{s}} \right)} \right\}.$$

here W is a stationary and ergodic random white noise process with zero mean defined on (− , ). In the above equation, \({\varvec{H}}\in (0, 1)\) is known as the Hurst exponent. The time series Hurst exponent is related to signal roughness averaged over multiple length scales. The higher the value of H, the smoother is the time series, and the longer trends tend to continue. For values closer to zero, the time series rapidly fluctuates, as shown in Fig. 1. The covariance function of the noisy signal can be expressed by:

$${\mathbf{cov}}\left\{ {{\varvec{B}}_{{\varvec{H}}} \left( {\varvec{s}} \right),{\varvec{B}}_{{\varvec{H}}} \left( {\varvec{t}} \right)} \right\} = \frac{1}{2}\left\{ {\left| {\left| {\varvec{s}} \right|^{{2{\varvec{H}}}} + } \right|\left| {\varvec{t}} \right|^{{2{\varvec{H}}}} - \left| {{\varvec{s}} - {\varvec{t}}} \right|^{{2{\varvec{H}}}} } \right\},$$

so that \({\varvec{B}}_{{\varvec{H}}} \left( 0 \right) \equiv 0\) and \({\text{var}} \{ {\varvec{B}}_{{\varvec{H}}} \left( {\varvec{t}} \right)\} = {\varvec{t}}^{{2{\varvec{H}}}}\). For H = 1/2, the white noise process reduces to the well-known random walk. The theoretical relationship between the Hurst exponent, H, and the Higuchi fractal dimension, HFD, is HFD \(= 2 - {\varvec{H}}\), with values of HFD between 1 and 2.

Fig. 1
figure 1

Examples of synthetic time series from the Davies and Harte [9] method, characterized by Hurst exponent H = 0.3, 0.5, 0.7, 0.9 (top to bottom)

We consider four different method generators of processes having long-range dependence to generate synthetic series with exact fractal dimension. First, we consider an exact wavelet-based method. This is based on a biorthogonal wavelet method proposed by Meyer and Sellan [1, 3] and implemented in MATLAB software and the wfbm calling function. The second is the method of Davies and Harte [9] whose generation process uses a fast Fourier transform basis and embeds the covariance matrix of the increments of the fractional Brownian motion in a circulant matrix. The third category of synthetic simulated data is produced using the Wood-Chan circulant matrix method [45], which is a generalization of the previous method [8]. The fourth set of data are simulated using the Hosking method [19], also known as the Durbin or Levinson method [23], which utilizes the well-known conditional distribution of the multivariate Gaussian distribution on a recursive scheme to generate samples based on the explicit covariance structure. All these methods of producing simulated data are considered exact methods because they completely capture the covariance structure and produce a true realization of series with a single scaling parameter.

Figure 1 shows various examples of time series produced via the Davies and Harte [9] method. The smoothest curve corresponds to H = 0.9, which implies high probability to observe long periods with increments of same sign. The roughest curve corresponds to H = 0.1, which is sub-diffusive, with high probability that increments feature long sequences of oscillating sign. The curves show data for Hurst exponents H = 0.3, 0.5, 0.7, 0.9, from top to bottom.

For each of the four data-generating methods, we create 100 unique time series, of differing lengths up to maximum length 500,000 data points, for Hurst exponents H = 0.1, 0.3, 0.5, 0.7, 0.9. Thus, for each time series length N, we have 500 unique simulations of fBm for each method. This produces 44,000 data sets in total, for experimentation. We next apply the Higuchi method to each of these time series with an exact fractal dimension (FD) to determine how well the Higuchi method is able to accurately recover the theoretical value compared to the derived HFD.

Next, we describe the Higuchi method. The Higuchi method takes a signal, discretized into the form of a time series, \({\varvec{x}}\left( 1 \right), {\varvec{x}}\left( 2 \right), \ldots , {\varvec{x}}\left( {\varvec{N}} \right)\) and, from this series, derives a new time series, \({\varvec{X}}_{{\varvec{k}}}^{{\varvec{m}}}\), defined as:

$${\varvec{X}}_{{\varvec{k}}}^{{\varvec{m}}} : {\varvec{x}}\left( {\varvec{m}} \right), {\varvec{x}}\left( {{\varvec{m}} + {\varvec{k}}} \right),{\varvec{x}}\left( {{\varvec{m}} + 2{\varvec{k}}} \right), \ldots , {\varvec{x}}\left( {{\varvec{m}} + \left[ {\frac{{{\varvec{N}} - {\varvec{k}}}}{{\varvec{k}}}} \right] \cdot {\varvec{k}})} \right),$$

here [] represents the integer part of the enclosed value. The integer \({\varvec{m}} = 1,2, \ldots ,{\varvec{k}}\) is the start time, and \({\varvec{k}}\) is the time interval, with \({\varvec{k}} = 1, \ldots ,{\varvec{k}}_{{{\mathbf{max}}}} ;{\varvec{k}}_{{{\mathbf{max}}}}\) is a free tuning parameter. This means that given time interval equal to \({\varvec{k}}\), spawns \({\varvec{k}}\)-sets of new time series. For instance, if \({\varvec{k}} = 10\) and the time series has length \({\varvec{N}} = 1000\), the following new time series are derived from the original data:

$$\begin{gathered} {\varvec{X}}_{10}^{1} : {\varvec{x}}\left( 1 \right), {\varvec{x}}\left( {11} \right),{\varvec{x}}\left( {21} \right), \ldots , {\varvec{x}}\left( {991)} \right), \hfill \\ {\varvec{X}}_{10}^{2} : {\varvec{x}}\left( 2 \right), {\varvec{x}}\left( {12} \right),{\varvec{x}}\left( {22} \right), \ldots , {\varvec{x}}\left( {992)} \right), \hfill \\ \vdots \hfill \\ {\varvec{X}}_{10}^{10} : {\varvec{x}}\left( {10} \right), {\varvec{x}}\left( {20} \right),{\varvec{x}}\left( {30} \right), \ldots , {\varvec{x}}\left( {1000} \right). \hfill \\ \end{gathered}$$

These curves have lengths defined by:

$${\varvec{L}}_{{\varvec{m}}} \left( {\varvec{k}} \right) = \frac{{\left\{ {\left( {\mathop \sum \nolimits_{{{\varvec{i}} = 1}}^{{\left[ {\frac{{{\varvec{N}} - {\varvec{m}}}}{{\varvec{k}}}} \right]}} \left| {{\varvec{x}}\left( {{\varvec{m}} + {\varvec{ik}}} \right) - {\varvec{x}}({\varvec{m}} + \left( {{\varvec{i}} - 1} \right) \cdot {\varvec{k}}} \right|} \right)\frac{{{\varvec{N}} - 1}}{{\left[ {\frac{{{\varvec{N}} - {\varvec{m}}}}{{\varvec{k}}}} \right] \cdot {\varvec{k}}}}} \right\}}}{{\varvec{k}}}.$$

The final term in the numerator is a normalization factor, \({\varvec{N}} - \frac{1}{{\left[ {\frac{{{\varvec{N}} - {\varvec{m}}}}{{\varvec{k}}}} \right]}} \cdot {\varvec{k}}\). The length of the curve for the time interval \({\varvec{k}}\) is then defined as the average over the \({\varvec{k}}\) sets of \({\varvec{L}}_{{\varvec{m}}} \left( {\varvec{k}} \right):\)

$${\varvec{L}}\left( {\varvec{k}} \right) = {\varvec{L}}_{{\varvec{m}}} \left( {\varvec{k}} \right).$$

In cases when this equation scales according to the rule \({\varvec{L}}\left( {\varvec{k}} \right) \propto {\varvec{k}}^{{ - {\mathbf{HF}}{\varvec{D}}}} ,\) we consider the time series to behave as a fractal with dimension HFD. Thus, the HFD is the slope of the straight line that fits the curve of ln(L(k)) versus ln(1/k). Figure 2 shows the L(k) curve from simulated data for the fractal dimension FD = 1.7 (corresponding to H = 0.3) time series data in Fig. 1. The corresponding curve of HFD(kmax) is shown in Fig. 3.

Fig. 2
figure 2

Average curve length versus scale size, k, for the time series with HFD = 1.7

Fig. 3
figure 3

Curve showing the relationship between HFD and kmax for the FD = 1.7 time series shown in Fig. 1

We now turn to finding the best tuning parameter, kmax, for the set of data we have simulated. As discussed in the Introduction, a common way to determine the tuning parameter relies upon finding the location, in plots like Fig. 3, of HFD versus a range of kmax, where the calculated HFD approaches a local maximum or asymptote [11, 39]. We will call this a tuning curve. In Fig. 3, which is for the time series with HFD = 1.7, there is only one local maximum which is located at kmax = 7 which produces a negligible error of 0.5%. There are three places where the Higuchi method finds a best value is achieved for this simulation, viz. kmax = 4, 14, 727. In this particular instantiation of a fBm, the most effective tuning parameter would thus be kmax = 4, 14, or 727. The easiest method would be to use the smallest kmax since this results in the least computational effort. However, in this case, using the local maxima method yields an acceptable estimate result with little additional effort.

3 Results

There is no reason to expect that a local maxima exists in every case in a tuning curve and is therefore searching through these curves for asymptotes is not a general or practical method to determine the best tuning parameter kmax. For instance, Fig. 4 shows the tuning curves for HFD = 1.9, 1.5, 1.3, 1.1 computed from the simulated data of Fig. 1. The black horizontal dashed line in each subplot shows the theoretical value of the fractal dimension. There is not always a local maximum or an asymptotic convergence to a set value of HFD. For HFD = 1.9, a peak occurs but only near kmax ~ 5000; the region of the plateau is found at the tuning parameter that yields the largest error in fractal dimension. This indicates that in this fBm realization, a much smaller kmax would be appropriate.

Fig. 4
figure 4

Curves showing the relationship between HFD and kmax for the HFD = 1.9, 1.5, 1.3, 1.1 time series shown in Fig. 1. The dashed horizontal curves show the theoretical value for the HFD

We now turn to analyzing the simulated realizations of fBm. The smallest time series length we select has N = 1000, and the largest has N = 500,000 data points and compute the HFD for each of these series, as a function of tuning parameter kmax. We use values between kmax = 2 and kmax = N/2. This gives a new data set comprised of HFD values as a function of the time series length, and the tuning parameter, yielding HFD = HFD(N, kmax). The error to be minimized is written by:

$$E\left( {N,k_{{{\text{max}}}} } \right) = 100*\frac{{\left[ {{\text{HFD}} - {\text{FD}}_{{{\text{theory}}}} } \right]}}{{{\text{FD}}_{{{\text{theory}}}} }}.$$

The previous equation gives the percentage error to be averaged over all synthetic time series simulations to yield a general result for all simulation data considered. As researchers do not generally know a priori which method of simulating artificial data most closely follows the statistics of any particular physical or research data set, it is appropriate to use a range of synthetic simulated time series with known fractal dimension, as an average result gives the most general answer. Figure 5 shows surface plots comparing the percentage error HFD versus the tuning parameter kmax and time series length, N for the theoretical HFD = 1.7 for each of the four simulation methods described in the previous section. The curve of least error is shown as a thick grey line.

Fig. 5
figure 5

Surface showing the average percentage error between the Higuchi method fractal dimension and theoretical FD = 1.7 averaged over 100 datasets of different lengths, N. The curve of least error is shown a Wavelet generation method, b Wood-Chan method, c Davies–Harte method, and d Hosking method. The curve of least error is shown as a thick grey line

Each method of simulation yields a different curve of least error. Figure 5b, which is the curve for the Wood-Chan circulant matrix method [45], is the lowest overall error. The Hosking [19] method yields HFD values with the greatest errors (Fig. 5d). Overall, the location of the minimum error curve varies widely depending on the generation algorithm for the synthetic data.

By taking a geometric mean of these minimum error curves for all HFD values, we derive a best-fit curve using a sum of sines function since this gave a simple function with few terms and a fit with small sum squared error. Figure 6 shows the relationship between the time series length and the tuning parameter, for different HFD values, and the dashed curve shows the best fit, given by the following equation:

$$k_{{{\text{max}}}} = \left[ {A_{1} \sin \left( {B_{1} *N + C_{1} } \right) + A_{2} \sin \left( {B_{2} *N + C_{2} } \right)} \right].$$
((*))

here [] represents the integer part of the enclosed function value. Table 1 shows the parameter values for the best-fit. Figure 6 shows that for short time series, the use of a plateau criterion to select the kmax tuning parameter will result in the use of values smaller than those proposed by this generalized study. For example, in Fig. 1, a time series of length N = 20,000 is used. Figure 3 shows the curve of HFD versus kmax. Our fitting function yields kmax = 47 for this length data set.

Fig. 6
figure 6

Comparison of the average minimum error curve (solid) and the best-fit sum of sines function (dashed)

Table 1 Fitting parameters for best-fit sum of sin function, \(k_{\max } = \left[ {A_{1} \sin \left( {B_{1} *N + C_{1} } \right) + A_{2} \sin \left( {B_{2} *N + C_{2} } \right)} \right]\)

4 Applications

In this section, we present two applications of the Higuchi method with the corrections applied to determining the appropriate tuning parameter. The first is a shell model of the nonlinear dynamics of MHD turbulence. We effect this via simplified approximations of the Navier–Stokes fluid equations [15, 30, 47]. We use the MHD Gledzer–Ohkitani–Yamada (GOY) shell model, which captures the intermittent dynamics of the energy cascade in MHD turbulence [22] as it moves along through the shells in a front-like manner.

Shell models of MHD turbulence are an example of dynamical systems incorporating simplified versions of the Navier–Stokes or MHD turbulence equations. They attempt to conserve some of the invariants in the limit of no dissipation. We use the SHELL-ATM code [4] to produce a time series of length N = 500,000 of the magnetic energy dissipation rate (\(\in_{b}\)) as a function of time obtained in the MHD shell model (Fig. 7a). The model is described in detail in Lepreti et al. [22]. In short, the SHELL-ATM model makes it possible to obtain rapid simulations of MHD turbulence in volumes in which a longitudinal magnetic field dominates. Model construction begins via division of the wave-vector space (k-space) into a number, N, of discrete shells with known radius \(k_{n} = k_{0} 2^{n}\) (n = 0, 1, …, N) [14]. Each shell is then assigned complex dynamical Elsässer-like fields \(u_{n} \left( t \right)\) and \(b_{n} \left( t \right)\), which represent longitudinal velocity increments and magnetic field increments. The magnetic energy dissipation rate is defined by

$$\in_{b} \left( t \right) = \eta \mathop \sum \limits_{n = 1}^{N} k_{n}^{2} \left| {b_{n}^{2} } \right|$$

where η is the kinematic resistivity. To find the solutions to the above equations, we solve the equations

$$\begin{aligned} \frac{{db_{n} }}{dt} = & - \eta k_{n}^{2} b_{n} + \frac{1}{6}ik_{n} \left( {u_{n + 1} b_{n + 2} - b_{n + 1} u_{n + 2} } \right) \\ & - \frac{1}{6}ik_{n} \left[ {\left( {u_{n - 1} b_{n + 1} - b_{n - 1} u_{n + 1} } \right) + \left( {u_{n - 2} b_{n - 1} - b_{n - 2} u_{n - 1} } \right)} \right]^{*} + f_{n} \\ \end{aligned}$$

and

$$\begin{aligned} \frac{{du_{n} }}{dt} = & - \nu k_{n}^{2} u_{n} + ik_{n} \left( {u_{n + 1} u_{n + 2} - b_{n + 1} b_{n + 2} } \right) \\ & - \frac{1}{4}ik_{n} \left[ {\left( {u_{n - 1} u_{n + 1} - b_{n - 1} b_{n + 1} } \right) + \frac{{\left( {u_{n - 2} b_{n - 1} - b_{n - 2} u_{n - 1} } \right)}}{2}} \right]^{*} + g_{n} \\ \end{aligned}$$

where ν is the kinematic viscosity, and \(\left( {f_{n} ,g_{n} } \right)\) are forcing terms operating on the magnetic and velocity increments. The symbol * represents a complex conjugate. The forcing terms are calculated from the Langevin equation driven by a Gaussian white noise.

Fig. 7
figure 7

a Magnetic energy dissipation rate for the GOY shell model. b Average curve length versus scale size, k. c The relationship between HFD and kmax. d The relationship between HFD and time series length.

These data in Fig. 7a display clear intermittent bursts of dissipated energy. Figure 7b shows average curve length versus scale size, k, for the time series. Figure 7c shows the relationship between HFD and kmax. There is no asymptote which may indicate an appropriate value of kmax. We now use Eq. (*) to select the appropriate tuning parameter kmax determined from our prior analysis for data featuring a single fractal scaling, for varying lengths, N, of the time series. Figure 7d shows the computed HFD selected. There is a variation in the fractal dimension with values being estimated as smaller from shorter lengths of the time series, and overall HFD ~ 1.04–1.13.

The second data example is that of the severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1. Wu et al. [46] reported on the identification of the novel RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human-1’ coronavirus. We obtained these data from the National Center for Biotechnology Information (NCBI), which is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH).

To analyze the fractal patterns in the genome one must convert the nucleotide sequence from a symbolic sequence, meaning A,G,C,T into a time series. We follow the Peng [32] method in which DNA is represented as a “random walk” with two parameters ruling the direction of the “walk” and the resulting dynamics. We start with the first nucleotide. If it is a pyrimidine base, we move up one position. Every subsequent pyrimidine base moves up one position. When a purine base is encountered in the series the walk steps down one position. The nucleotide distance from the first nucleotide is then plotted versus the displacement, as in Fig. 8a. Figure 8b shows average curve length versus scale size, k, for the time series. Figure 8c shows the relationship shows the computed HFD against tuning parameter kmax from the whole time series of length N = 29,903. In this case, there is a distinct asymptote at kmax = 20, which yields HFD = 1.497. To test our method, we again use Eq. (*) to select the appropriate tuning parameter kmax, for varying lengths, N, of the time series. Figure 8d shows the computed HFD selected. There is no statistically significant variation in the fractal dimension with values being estimated at HFD ~ 1.5.

Fig. 8
figure 8

a WH-Human-1 complete genome represented by the Peng [32] method. b Average curve length versus scale size, k

Our analysis shows that the fractal dimension of WH-Human-1 coronavirus genome is different from its fractal dimension computed from electron microscopic and atomic force microscopic images of 40 coronaviruses (CoV), as reported by Swapna et al. [37] who found a scale-invariant dimension of 1.820. This indicated that the images of the virus feature higher complexity and greater roughness than the pattern we have detected in the genome.

5 Conclusions

Higuchi’s method to compute the fractal dimension of physical signals is widely used in research. However, a major difficulty in applying the method is the correct choice of tuning parameter (kmax) to compute the most accurate results. Poor selection of kmax can result in values of the fractal dimension that are spurious, and this can result in potentially invalid interpretations of data. In the past researchers have used various ad hoc methods to determine the appropriate tuning parameter for their particular data. We have shown that a method such as seeking a convergence of the computed HFD to a plateau is not in general a valid procedure as not every data instance shows the HFD estimate reaches a plateau.

In this paper, we have sought to find a more general method of determining, a priori, the optimum tuning parameter kmax for a time series of length N. To study this problem, we generated synthetic time series of known HFD and applied the Higuchi method to each, averaging results over the different fbm within HFD = [1.9, 1.7, 1.5, 1.3, 1.1] categories. These data allow the calculation of curves showing where in (N, kmax)-space the most appropriate tuning parameter should be selected. We found that fractal dimension calculation via the Higuchi method is sensitive to both the tuning parameter kmax and also the length of the time series. We derive a best-fit curve fitting the location of the average minimum HFD error to provide researchers with an efficient method of estimating and appropriate kmax, given their particular dataset.

We applied the modified method to two physical cases, one from physics and one from bioinformatics. In the latter case, we considered the Coronaviridae genome of the severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, first reported by Wu et al. [46]. Our analysis of this data showed strong evidence of monofractality (Fig. 8b) with HFD ~ 1.5 (Fig. 8d).

In the former case, we computed the magnetic energy dissipation rate from a shell model of the nonlinear dynamics of MHD turbulence. We used simplified approximations of the Navier–Stokes fluid equations [15, 30, 47], in particular the MHD Gledzer–Ohkitani–Yamada (GOY) shell model, and found HFD ~ 1.10 (Fig. 7d). These data have been reported to feature a multifractal scaling [33], and this is consistent with our results in Fig. 8b which show evidence of nonlinear behaviour, which is possibly a reason why there is about a 10 per cent variation in the HFD calculation (Fig. 7d).

It is clear that accurate calculation of fractal dimension can be a delicate process and is influenced not only by the method used, but also by the nature of the data. Studies must therefore concern themselves not only with the type of data, but also with the adequacy of the data-generating algorithms, and fractal estimation algorithms. We considered only synthetic time series realizations of processes with perfect and controlled scale invariance, viz. signals that have only a single type of scaling. However, many other theoretical data types exist. For instance, numerous geophysical signals do not have local scaling regularity, but rather have a regularity which varies in time or space [10, 24]. Data that are multifractal require a variety of scaling exponents to fully describe the dynamics, and methods to generalize the Higuchi method to these more complex data types are going forward at present [6].