Elsevier

Clinical Biochemistry

Volume 40, Issues 13–14, September 2007, Pages 1032-1036
Clinical Biochemistry

Kurtosis provides a good omnibus test for outliers in small samples

https://doi.org/10.1016/j.clinbiochem.2007.04.003Get rights and content

Abstract

Objectives:

To compare the power of the Healy, single-outlier Grubbs, skewness and kurtosis tests for outliers and their applicability to quality control (QC).

Design and methods:

Power to detect outliers was calculated in simulated samples of twenty normal variates, one or two of which were shifted in mean or increased in variance.

Results:

All tests showed similar power against a single outlier. For two outliers shifted in the same direction, skewness then kurtosis were the most powerful; for outliers shifted in opposite directions, Healy's method was just superior to kurtosis. For two outliers of increased variance, these two methods were first equal.

Conclusions:

Where the number of outliers is unknown the kurtosis test for outliers is more versatile than the Grubbs test as it has greater power where there is more than one outlier, a similar power when there is just one outlier, and a similar performance in QC.

Introduction

A recent review [1] of outlier detection in small samples noted that the use of fixed multiples of the standard deviation is unsatisfactory and discussed the use of several more appropriate procedures, principally those of Grubbs [2], Dixon [3], Tukey [4] and Healy [5]. The review did not explicitly recommend any one of these procedures for general use, but the authors appear to imply that Healy's procedure is to be preferred.

Outlier detection is of importance in clinical chemistry not only in the analysis of external quality assurance schemes, which was the impetus for Hayes, Kinsella and Coffey's review [1], but also for internal quality control [6] where screening for outliers should probably be part of the procedure for setting target values.

The common procedures for outlier detection are all based on the assumption that the data points, apart from the outlier(s), belong to a single Gaussian (“normal”) distribution. This suggests a two-step approach to outlier detection: firstly use a normality-based outlier test to determine if the sample as a whole shows significant evidence for the presence of outliers, and secondly, if it does so, successively remove the most extreme value(s) until the remaining data points no longer give a significant test for outliers [7]. The values so removed are then deemed to be outliers and deleted from the data set.

Regarding the first step, testing for the presence of outliers in an otherwise normal sample, theory suggests that the skewness test and the kurtosis test are among the most powerful available, especially when the number of outliers is unknown [7], as it usually is. Hence this paper reports a comparison of the small-sample power of Healy test, the skewness test, the kurtosis test and Grubbs test, this last probably being the most popular of those reviewed [1].

In view of the kurtosis test's favourable power, a preliminary comparison has been made of the effect on the type I error (false alarm rate) of using the kurtosis test or Grubbs test to detect outliers in the training sample for a simple internal quality control scheme.

Since critical values for the kurtosis test for normality are only tabulated for a relatively small number of different sample sizes [8], interpolation functions have been derived in this paper to assist in the use of the kurtosis test. These formulae have been used to apply the kurtosis test to Healy's outlier example [5].

Section snippets

Comparison of outlier detection procedures

Ten thousand samples of twenty random variates, xi, were generated from the standard normal distribution (mean = 0, standard deviation = 1) using the gasdev routine [9]. For each sample the standard deviation (s), skewness (b1) and kurtosis (b2) were estimated from the sample moments about the mean:x¯=i=1i=nxi/nm2=(xix¯)2/nm3=(xix¯)3/nm4=(xix¯)4/ns=(nm2/(n1))0.5b1=m3/m21.5b2=m4/m22where n = 20. The notation follows that of Thode [8].

Also, for each sample the Grubbs statistics G was calculated

Comparison of procedures

When a single potential outlier per sample of 20 normal variates was generated by model A, a shift in the mean, it was detected with very similar power by all four outlier detection procedures (Fig. 1a). With two potential outliers per sample generated by the same method, the power depended on whether the two outliers were both shifted in the same direction or in opposite directions. When both shifts were positive, b1 performed the best with b2 the next best (Fig. 1b). However when the two

Discussion

In clinical chemistry quality control, both external and internal, the QC values are conventionally assumed to be from a normal distribution and samples sizes are frequently small. Consequently it was of interest to compare two popular outlier detection methods with two classical statistics, b1 and b2, that can be proved to be optimal for outlier detection under certain conditions. For normal samples Fergusson (quoted in [8]) showed that, where the proportion of outliers is small and the shifts

Acknowledgment

This study was supported by the Canterbury District Health Board.

References (9)

There are more references available in the full text version of this article.

Cited by (30)

  • Edgeworth expansions for multivariate random sums

    2021, Econometrics and Statistics
    Citation Excerpt :

    It is a curse when unnoticed outliers hamper the usage of Edgeworth expansions based on fourth-order moments. It is a blessing when using kurtosis measures based on fourth-order moments to detect outliers (see, for example, Livesey (2007)). The theoretical results in this paper pave the way in this direction, using an approach which might be informally described as follows.

  • Identifying the asymmetry of finite support probability distributions on the basis of the first two moments

    2020, Measurement: Journal of the International Measurement Confederation
  • High kurtosis of intracranial electroencephalogram as a marker of ictogenicity in pediatric epilepsy surgery

    2012, Clinical Neurophysiology
    Citation Excerpt :

    A distribution with high kurtosis generally has a distinct peak around the mean, rapid declines but heavy tails. Outliers make the tails heavy and increase kurtosis; therefore, kurtosis is used as a measure of presence of outliers (Livesey, 2007). In epilepsy patients, the occurrence rate, amplitude and duration of spikes on electroencephalogram (EEG) probably correlate with the number of outliers in EEG voltage data, because spikes are in higher amplitude than the background activity.

  • Validation of a method for composition measurement of a non-standard liquid fuel for Emission Factor evaluation

    2011, Measurement: Journal of the International Measurement Confederation
    Citation Excerpt :

    For the uncertainty evaluation a repeatability session (30 samples for C, H, N, S, 40 samples for O) was carried out with oil emulsion samples in order to estimate the repeatability contribution, urep, for each substance. Both anomaly (Huber [12], Dixon [13], Grubbs test [14]) and normality tests (Shapiro-Walk) were carried out on measured data. A few anomalous data were eliminated; the normality of data resulted to be satisfied (Table 1).

View all citing articles on Scopus
View full text