Kurtosis provides a good omnibus test for outliers in small samples

doi:10.1016/j.clinbiochem.2007.04.003

Clinical Biochemistry

Volume 40, Issues 13–14, September 2007, Pages 1032-1036

https://doi.org/10.1016/j.clinbiochem.2007.04.003 Get rights and content

Abstract

Objectives:

To compare the power of the Healy, single-outlier Grubbs, skewness and kurtosis tests for outliers and their applicability to quality control (QC).

Design and methods:

Power to detect outliers was calculated in simulated samples of twenty normal variates, one or two of which were shifted in mean or increased in variance.

Results:

All tests showed similar power against a single outlier. For two outliers shifted in the same direction, skewness then kurtosis were the most powerful; for outliers shifted in opposite directions, Healy's method was just superior to kurtosis. For two outliers of increased variance, these two methods were first equal.

Conclusions:

Where the number of outliers is unknown the kurtosis test for outliers is more versatile than the Grubbs test as it has greater power where there is more than one outlier, a similar power when there is just one outlier, and a similar performance in QC.

Introduction

A recent review [1] of outlier detection in small samples noted that the use of fixed multiples of the standard deviation is unsatisfactory and discussed the use of several more appropriate procedures, principally those of Grubbs [2], Dixon [3], Tukey [4] and Healy [5]. The review did not explicitly recommend any one of these procedures for general use, but the authors appear to imply that Healy's procedure is to be preferred.

Outlier detection is of importance in clinical chemistry not only in the analysis of external quality assurance schemes, which was the impetus for Hayes, Kinsella and Coffey's review [1], but also for internal quality control [6] where screening for outliers should probably be part of the procedure for setting target values.

The common procedures for outlier detection are all based on the assumption that the data points, apart from the outlier(s), belong to a single Gaussian (“normal”) distribution. This suggests a two-step approach to outlier detection: firstly use a normality-based outlier test to determine if the sample as a whole shows significant evidence for the presence of outliers, and secondly, if it does so, successively remove the most extreme value(s) until the remaining data points no longer give a significant test for outliers [7]. The values so removed are then deemed to be outliers and deleted from the data set.

Regarding the first step, testing for the presence of outliers in an otherwise normal sample, theory suggests that the skewness test and the kurtosis test are among the most powerful available, especially when the number of outliers is unknown [7], as it usually is. Hence this paper reports a comparison of the small-sample power of Healy test, the skewness test, the kurtosis test and Grubbs test, this last probably being the most popular of those reviewed [1].

In view of the kurtosis test's favourable power, a preliminary comparison has been made of the effect on the type I error (false alarm rate) of using the kurtosis test or Grubbs test to detect outliers in the training sample for a simple internal quality control scheme.

Since critical values for the kurtosis test for normality are only tabulated for a relatively small number of different sample sizes [8], interpolation functions have been derived in this paper to assist in the use of the kurtosis test. These formulae have been used to apply the kurtosis test to Healy's outlier example [5].

Section snippets

Comparison of outlier detection procedures

Ten thousand samples of twenty random variates, x_i, were generated from the standard normal distribution (mean = 0, standard deviation = 1) using the gasdev routine [9]. For each sample the standard deviation (s), skewness ( $\sqrt{b_{1}}$ ) and kurtosis (b₂) were estimated from the sample moments about the mean: $\bar{x} = \sum_{i = 1}^{i = n} x_{i} / n$ $m_{2} = \sum (x_{i} - \bar{x})^{2} / n$ $m_{3} = \sum (x_{i} - \bar{x})^{3} / n$ $m_{4} = \sum (x_{i} - \bar{x})^{4} / n$ $s = (n m_{2} / (n - 1))^{0.5}$ $\sqrt{b_{1}} = m_{3} / m_{2}^{1.5}$ $b_{2} = m_{4} / m_{2}^{2}$ where n = 20. The notation follows that of Thode [8].

Also, for each sample the Grubbs statistics G was calculated

Comparison of procedures

When a single potential outlier per sample of 20 normal variates was generated by model A, a shift in the mean, it was detected with very similar power by all four outlier detection procedures (Fig. 1a). With two potential outliers per sample generated by the same method, the power depended on whether the two outliers were both shifted in the same direction or in opposite directions. When both shifts were positive, $\sqrt{b_{1}}$ performed the best with b₂ the next best (Fig. 1b). However when the two

Discussion

In clinical chemistry quality control, both external and internal, the QC values are conventionally assumed to be from a normal distribution and samples sizes are frequently small. Consequently it was of interest to compare two popular outlier detection methods with two classical statistics, $\sqrt{b_{1}}$ and b₂, that can be proved to be optimal for outlier detection under certain conditions. For normal samples Fergusson (quoted in [8]) showed that, where the proportion of outliers is small and the shifts

Acknowledgment

This study was supported by the Canterbury District Health Board.

References (9)

K. Hayes et al.
A note on the use of outlier criteria in Ontario laboratory quality control schemes
Clin. Biochem.
(2007)
F.E. Grubbs
Sample criteria for testing outlying observations
Ann. Math. Stat.
(1950)
W.J. Dixon
Analysis of extreme values
Ann. Math. Stat.
(1950)
J.W. Tukey
Exploratory data analysis
(1977)

There are more references available in the full text version of this article.

Cited by (30)

Outliers in financial time series data: Outliers, margin debt, and economic recession
2022, Machine Learning with Applications
Outliers in financial time series data are different from that in cross-sectional data in terms of the treatment and the detection. First, outliers in time series can be the focus of analysis itself, such as outliers in margin debt to indicate an overheating market. Second, the outlier detection in time series should be accompanied by decomposition to exclude inherent patterns. Unfortunately, there is a lack of consensus on the best decomposition method. Thus, we propose an ensemble model that combines multiple decomposition methods. Using the approach, we found that the outliers in margin debt are strong predictors of a recession.
Edgeworth expansions for multivariate random sums
2021, Econometrics and Statistics
Citation Excerpt :
It is a curse when unnoticed outliers hamper the usage of Edgeworth expansions based on fourth-order moments. It is a blessing when using kurtosis measures based on fourth-order moments to detect outliers (see, for example, Livesey (2007)). The theoretical results in this paper pave the way in this direction, using an approach which might be informally described as follows.
The sum of a random number of independent and identically distributed random vectors has a distribution which is not analytically tractable, in the general case. The problem has been addressed by means of asymptotic approximations embedding the number of summands in a stochastically increasing sequence. Another approach relies on fitting flexible and tractable parametric, multivariate distributions, as for example finite mixtures. Both approaches are investigated within the framework of Edgeworth expansions. A general formula for the fourth-order cumulants of the random sum of independent and identically distributed random vectors is derived and it is shown that the above mentioned asymptotic approach does not necessarily lead to valid asymptotic normal approximations. The problem is addressed by means of Edgeworth expansions. Both theoretical and empirical results suggest that mixtures of two multivariate normal distributions with proportional covariance matrices satisfactorily fit data generated from random sums where the counting random variable and the random summands are Poisson and multivariate skew-normal, respectively.
Identifying the asymmetry of finite support probability distributions on the basis of the first two moments
2020, Measurement: Journal of the International Measurement Confederation
The asymmetry of a probability distribution of measurement result may be recognised, among other methods, on the basis of the versatile numerical characteristics of distribution. A skewness parameter, although suitable for this task, is often hardly convergent when the estimation is based on a sample. In this paper, the outer assessment of the uncertainty interval for skewness is determined on the basis of knowing the limit points of the distribution support together with its initial two moments. For some of their values the uncertainty interval of the skewness contains solely either significantly negative or significantly positive values. In all such cases the distribution must be asymmetrical. The proposed method for determining the asymmetry of probability distribution is suitable for cases where no sample data is available, and only prior knowledge of the initial two moments is given. Examples are provided of the method’s practical applications to asymmetry detection of the measurement result distributions.
High kurtosis of intracranial electroencephalogram as a marker of ictogenicity in pediatric epilepsy surgery
2012, Clinical Neurophysiology
Citation Excerpt :
A distribution with high kurtosis generally has a distinct peak around the mean, rapid declines but heavy tails. Outliers make the tails heavy and increase kurtosis; therefore, kurtosis is used as a measure of presence of outliers (Livesey, 2007). In epilepsy patients, the occurrence rate, amplitude and duration of spikes on electroencephalogram (EEG) probably correlate with the number of outliers in EEG voltage data, because spikes are in higher amplitude than the background activity.
We determined whether kurtosis analysis of intracranial electroencephalogram (EEG) can estimate the localization of the epileptogenic zone.
We analyzed 29 pediatric epilepsy patients who underwent intracranial EEG before focal resective surgery. We localized the brain regions with high kurtosis, the seizure onset zone (SOZ) and the regions with high-rate, high-amplitude and long-duration interictal paroxysms ⩾20 Hz. We tested correlations between the surgical resection of those regions and post-surgical seizure outcome, and correlations between kurtosis and the rate/amplitude/duration of interictal paroxysms.
The resection of the regions with high kurtosis correlated with 1-year post-surgical seizure outcome (p = 0.028) but not with 2-year outcome. Kurtosis showed more significant correlation with 1-year seizure outcome than the SOZ and the rate/amplitude/duration of interictal paroxysms. Kurtosis showed positive, independent correlations with the amplitude and duration of interictal paroxysms (p < 0.0001) but not with the rate (p = 0.4).
The regions with high kurtosis provide more reliable information to predict seizure outcome than the SOZ and the regions with high-rate/amplitude and long-duration interictal paroxysms. Kurtosis reflects combined effects of the amplitude and duration of the interictal paroxysms.
High kurtosis suggests the regions with acquired ictogenicity within the irritative zone.
Validation of a method for composition measurement of a non-standard liquid fuel for Emission Factor evaluation
2011, Measurement: Journal of the International Measurement Confederation
Citation Excerpt :
For the uncertainty evaluation a repeatability session (30 samples for C, H, N, S, 40 samples for O) was carried out with oil emulsion samples in order to estimate the repeatability contribution, urep, for each substance. Both anomaly (Huber [12], Dixon [13], Grubbs test [14]) and normality tests (Shapiro-Walk) were carried out on measured data. A few anomalous data were eliminated; the normality of data resulted to be satisfied (Table 1).
In this paper the theoretical and experimental aspects are described concerning the validation of a method for the measurement of the percentage composition of a generic liquid fuel, even though specific reference materials for instrumentation calibration are unavailable. This new method could be used for the evaluation of the Emission Factor (EF) of an unknown fuel for emission trading purposes, according to the requirements of both the juridical and technical standards. The validation of the procedure is based on the comparison of different techniques for the measurement of Gross Calorific Value (GCV), Net Calorific Value (NCV) which are quantities useful in EF evaluation; this new combination of actions established a complete and validated methodology for EF evaluation. The experimental results and uncertainty analysis show that this method is suitable for the estimation of the composition measurement and also that of GCV, NCV and EF also for non-standard liquid fuels; a comparable accuracy with respect to the methods for standard fuels has been estimated, making this method suitable for practical applications.
Can hubs of the human connectome be identified consistently with diffusion MRI?
2023, Network Neuroscience

View all citing articles on Scopus

View full text