Kurtosis provides a good omnibus test for outliers in small samples
Introduction
A recent review [1] of outlier detection in small samples noted that the use of fixed multiples of the standard deviation is unsatisfactory and discussed the use of several more appropriate procedures, principally those of Grubbs [2], Dixon [3], Tukey [4] and Healy [5]. The review did not explicitly recommend any one of these procedures for general use, but the authors appear to imply that Healy's procedure is to be preferred.
Outlier detection is of importance in clinical chemistry not only in the analysis of external quality assurance schemes, which was the impetus for Hayes, Kinsella and Coffey's review [1], but also for internal quality control [6] where screening for outliers should probably be part of the procedure for setting target values.
The common procedures for outlier detection are all based on the assumption that the data points, apart from the outlier(s), belong to a single Gaussian (“normal”) distribution. This suggests a two-step approach to outlier detection: firstly use a normality-based outlier test to determine if the sample as a whole shows significant evidence for the presence of outliers, and secondly, if it does so, successively remove the most extreme value(s) until the remaining data points no longer give a significant test for outliers [7]. The values so removed are then deemed to be outliers and deleted from the data set.
Regarding the first step, testing for the presence of outliers in an otherwise normal sample, theory suggests that the skewness test and the kurtosis test are among the most powerful available, especially when the number of outliers is unknown [7], as it usually is. Hence this paper reports a comparison of the small-sample power of Healy test, the skewness test, the kurtosis test and Grubbs test, this last probably being the most popular of those reviewed [1].
In view of the kurtosis test's favourable power, a preliminary comparison has been made of the effect on the type I error (false alarm rate) of using the kurtosis test or Grubbs test to detect outliers in the training sample for a simple internal quality control scheme.
Since critical values for the kurtosis test for normality are only tabulated for a relatively small number of different sample sizes [8], interpolation functions have been derived in this paper to assist in the use of the kurtosis test. These formulae have been used to apply the kurtosis test to Healy's outlier example [5].
Section snippets
Comparison of outlier detection procedures
Ten thousand samples of twenty random variates, xi, were generated from the standard normal distribution (mean = 0, standard deviation = 1) using the gasdev routine [9]. For each sample the standard deviation (s), skewness () and kurtosis (b2) were estimated from the sample moments about the mean:where n = 20. The notation follows that of Thode [8].
Also, for each sample the Grubbs statistics G was calculated
Comparison of procedures
When a single potential outlier per sample of 20 normal variates was generated by model A, a shift in the mean, it was detected with very similar power by all four outlier detection procedures (Fig. 1a). With two potential outliers per sample generated by the same method, the power depended on whether the two outliers were both shifted in the same direction or in opposite directions. When both shifts were positive, performed the best with b2 the next best (Fig. 1b). However when the two
Discussion
In clinical chemistry quality control, both external and internal, the QC values are conventionally assumed to be from a normal distribution and samples sizes are frequently small. Consequently it was of interest to compare two popular outlier detection methods with two classical statistics, and b2, that can be proved to be optimal for outlier detection under certain conditions. For normal samples Fergusson (quoted in [8]) showed that, where the proportion of outliers is small and the shifts
Acknowledgment
This study was supported by the Canterbury District Health Board.
References (9)
- et al.
A note on the use of outlier criteria in Ontario laboratory quality control schemes
Clin. Biochem.
(2007) Sample criteria for testing outlying observations
Ann. Math. Stat.
(1950)Analysis of extreme values
Ann. Math. Stat.
(1950)Exploratory data analysis
(1977)
Cited by (30)
Outliers in financial time series data: Outliers, margin debt, and economic recession
2022, Machine Learning with ApplicationsEdgeworth expansions for multivariate random sums
2021, Econometrics and StatisticsCitation Excerpt :It is a curse when unnoticed outliers hamper the usage of Edgeworth expansions based on fourth-order moments. It is a blessing when using kurtosis measures based on fourth-order moments to detect outliers (see, for example, Livesey (2007)). The theoretical results in this paper pave the way in this direction, using an approach which might be informally described as follows.
Identifying the asymmetry of finite support probability distributions on the basis of the first two moments
2020, Measurement: Journal of the International Measurement ConfederationHigh kurtosis of intracranial electroencephalogram as a marker of ictogenicity in pediatric epilepsy surgery
2012, Clinical NeurophysiologyCitation Excerpt :A distribution with high kurtosis generally has a distinct peak around the mean, rapid declines but heavy tails. Outliers make the tails heavy and increase kurtosis; therefore, kurtosis is used as a measure of presence of outliers (Livesey, 2007). In epilepsy patients, the occurrence rate, amplitude and duration of spikes on electroencephalogram (EEG) probably correlate with the number of outliers in EEG voltage data, because spikes are in higher amplitude than the background activity.
Validation of a method for composition measurement of a non-standard liquid fuel for Emission Factor evaluation
2011, Measurement: Journal of the International Measurement ConfederationCitation Excerpt :For the uncertainty evaluation a repeatability session (30 samples for C, H, N, S, 40 samples for O) was carried out with oil emulsion samples in order to estimate the repeatability contribution, urep, for each substance. Both anomaly (Huber [12], Dixon [13], Grubbs test [14]) and normality tests (Shapiro-Walk) were carried out on measured data. A few anomalous data were eliminated; the normality of data resulted to be satisfied (Table 1).
Can hubs of the human connectome be identified consistently with diffusion MRI?
2023, Network Neuroscience