Elsevier

Clinical Neurophysiology

Volume 122, Issue 11, November 2011, Pages 2195-2202
Clinical Neurophysiology

The effect of measurement error on the test–retest reliability of repeated mismatch negativity measurements

https://doi.org/10.1016/j.clinph.2011.04.004Get rights and content

Abstract

Objective

The aim was to study how the measurement error affects the repeatability of mismatch negativity (MMN) measurements.

Methods

Event-related potentials (ERPs) to changes in sound frequency, location, intensity, duration, and composition were recorded five times during 1–3 weeks from 13 healthy adults using a multi-feature MMN paradigm. The accumulation of MMN was modeled empirically with respect to measurement error, and repeatability was estimated at 0.6–3.5-μV error levels. The analysis was made for the results in the single deviant conditions and their pattern (auditory discrimination profile).

Results

At the single-subject level, the measurement error significantly affected the repeatability until it went below 9–17% of MMN peak amplitude. At the group level, the threshold was higher. Peak amplitude was generally the most repeatable parameter. Latency was superior when the error was moderate or small (<2–3 μV).

Conclusions

The measurement error affects the repeatability of MMN. In single-subject studies, it should not be neglected if it exceeds 10% of the MMN amplitude. The application of the auditory discrimination profile is recommended for future applications.

Significance

The study provided quantitative results to support the discussion on improving the repeatability of the MMN measurements. They are expected to apply conditionally to other ERP measurements, too.

Highlights

► We study the effect of the measurement error on the test–retest repeatability of MMN. ► 13 test subjects, five deviant conditions, five repeated sessions per subject. ► The effect was relevant in all conditions.

Introduction

During the last few decades, an increasing number of studies have demonstrated the application of auditory event-related potentials (AERPs) in the investigation of neurological and psychiatric disorders. One of the most popular measures appearing in these studies is mismatch negativity (MMN, Näätänen et al., 1978). MMN indicates the discrimination of a change in the sensory input (Näätänen, 1990, Näätänen, 1992, Näätänen and Winkler, 1999, Näätänen et al., 2001, Garrido et al., 2009) and it can be measured by analyzing the difference in the AERPs elicited by slightly different stimuli (Schröger, 1998, Duncan et al., 2009). Applications of MMN include the investigation of, e.g., dyslexia, schizophrenia, memory disorders, and coma outcome (for reviews see Kujala et al., 2007, Näätänen et al., 2007, Duncan et al., 2009). So far, the focus has been on general research, but MMN is also a tempting option for clinical use because of its versatile nature.

However, like many other AERP components (which are modulated by cognitive functions of the brain), MMN is not yet suited to clinical applications, because it cannot be measured reliably enough (Kujala et al., 2007). According to the reported studies, the test–retest reliability of MMN varies between 0.37 and 0.87, depending on the experimental conditions (Chertoff et al., 1988, Pekkonen et al., 1995, Lang et al., 1995, Escera and Grau, 1996, Frodl-Bauch et al., 1997, Joutsiniemi et al., 1998, Kathmann et al., 1999, Tervaniemi et al., 1999, Escera et al., 2000, Dalebout and Fox, 2001, Kujala et al., 2001, Light and Braff, 2005, Hall et al., 2006, Lew et al., 2007). The repeatability of this size is tolerated in group studies, but the results from single-subject measurements show too much unwanted variation (Escera and Grau, 1996, Tervaniemi et al., 1999). In order to allow the clinical use of MMN, the single-subject measurements need to be repeatable, too (Kujala et al., 2007, Näätänen et al., 2007).

The first systematic studies that aimed at improving the test–retest reliability of MMN were reported by Pekkonen et al. (1995) and Lang et al. (1995). It was found that the repeatability depends on the presentation of the stimuli, the number of trials recorded (Pekkonen et al., 1995), the parameterization of MMN, and the characteristics (e.g., age, discrimination ability, and alertness) of the test subjects (Lang et al., 1995). This provided a good starting point for the research and still forms the basis of the methods that are used today. Currently, the best way to secure reliable recordings is to take care of the signal quality (Pivik et al., 1993, Sinkkonen and Tervaniemi, 2000), to use appropriate stimuli (Duncan et al., 2009), and to apply efficient recording procedures, such as the multi-feature paradigm (Näätänen et al., 2004, Pakarinen et al., 2007, Pakarinen et al., 2009). Valid stimulus design and a high signal-to-noise ratio (SNR) reduce the variation in the results (Lang et al., 1995, Sinkkonen and Tervaniemi, 2000). An efficient recording procedure, on the other hand, permits efficient denoising through averaging and robust rejection of contaminated data while not causing unnecessary mental fatigue and stress for the test subject (e.g., Pakarinen et al., 2009). In addition, distracting the test subjects from attending to the stimuli also improves the repeatability as a result of the reduced modulation of the recorded responses (Näätänen, 1995).

Considering the further development, many authors have suggested that a major part of the uncertainty would be contributed by the measurement error (e.g., Lang et al., 1995, Sinkkonen and Tervaniemi, 2000, Hall et al., 2006, Paukkunen et al., 2010a). As a smaller error yields higher repeatability, it is probable that even a small error could have a major effect on the repeatability of a weak response like MMN. Quantitative studies, however, have not been published on the subject and the extent to which the effect is relevant has not been properly evaluated. In this study, a series of repeated MMN recordings is made with multiple test subjects to analyze the effect of the measurement error in vivo. The main objective is to determine how the test–retest reliability of the results changes as a function of the measurement error. In addition, it is studied how this affects the parameterization of MMN.

Section snippets

Test procedure

Thirteen healthy volunteer test subjects (9 males, age: 20–28) participated in the study by attending a one-hour MMN recording session which was repeated five times during a period of 1–3 weeks. The recording time would be too long for practical applications, but was still feasible and allowed a large amount of data to be collected for the analysis.

The sessions were scheduled at the same time of day as the first one in order to minimize changes in the level of alertness. In addition, the test

AERPs and MMN

The grand average of the responses recorded for each type of stimulus, the respective MMN parameters, and their variation across the recording sessions are presented in Fig. 4. The MMN peak amplitude was the highest for Freq (PEAK: −3.0 μV ± 1.1 μV, MEAN: −1.9 μV ± 0.9 μV) and Dur (PEAK: −3.0 μV ± 0.9 μV, MEAN: −2.0 μV ± 0.7 μV). The smallest response was produced by Loc (PEAK: −1.8 μV ± 0.8 μV, MEAN: −1.0 μV ± 0.7 μV). The peak latency of the response was the longest for Int (LAT: 177 ms ± 23 ms), while Loc had the

Discussion

First, the results of the present study show that the effect of the measurement error on the test–retest reliability of MMN is dominant at the higher error levels, but it decreases with a smaller error. At the single-subject level, the effect was clear when the error was on the same scale as the MMN, and it was found to be increasing at least until the error went below 0.6 μV (20–33% of the MMN peak amplitudes, mean: 26%). The level where the measurement error would become irrelevant could not

Acknowledgements

The work was supported in part by the Graduate School of Electrical and Communications Engineering, Society of Electronics Engineers, and the Academy of Finland (National Centers of Excellence 2006–2011). The authors would like to thank Riitta Hari, Risto Näätänen, and Pekka Eskelinen for their assistance in the preparation of the study and discussions during the preparation of the manuscript.

References (42)

  • R. Näätänen et al.

    “Primitive intelligence” in the auditory cortex

    Trends Neurosci

    (2001)
  • R. Näätänen et al.

    The mismatch negativity (MMN): towards the optimal paradigm

    Clin Neurophysiol

    (2004)
  • R. Näätänen et al.

    The mismatch negativity (MMN) in basic research of central auditory processing: a review

    Clin Neurophysiol

    (2007)
  • S. Pakarinen et al.

    Measurement of extensive auditory discrimination profiles using the mismatch negativity (MMN) of the auditory event-related potential (ERP)

    Clin Neurophysiol

    (2007)
  • S. Pakarinen et al.

    Fast multi-feature paradigm for recording several mismatch negativities (MMNs) to phonetic and acoustic changes in speech sounds

    Biol Psychol

    (2009)
  • E. Pekkonen et al.

    Variability and replicability of the mismatch negativity

    Electroencephalogr Clin Neurophysiol

    (1995)
  • M. Tervaniemi et al.

    Test–retest reliability of mismatch negativity for duration, frequency and intensity changes

    Clin Neurophysiol

    (1999)
  • P.K.H. Wong et al.

    Brain stem auditory evoked potentials: the use of noise estimate

    Electroencephalogr Clin Neurophysiol

    (1980)
  • M.E. Chertoff et al.

    Early event-related potentials with passive subject participation

    J Speech Hear Res

    (1988)
  • S.D. Dalebout et al.

    Reliability of the mismatch negativity in the responses of individual listeners

    J Am Acad Audiol

    (2001)
  • C. Elberling et al.

    Quality estimation of averaged auditory brainstem responses

    Scand Audiol

    (1984)
  • Cited by (8)

    • Test-retest reliability of mismatch negativity and gamma-band auditory steady-state response in patients with schizophrenia

      2022, Schizophrenia Research
      Citation Excerpt :

      In this study, we included 34 patients and confirmed that power measures of 40-Hz ASSR (i.e., ERSP) showed good to excellent test–retest reliability while ITC index showed poor to fair reliability among the three sessions. In respect of the contrasting reliability results between ITC and ERSP, we learned that the test–retest reliability of an index depends on the characteristics of the test subjects (e.g., pathophysiology, age, and alertness), the presentation of the stimuli and analysis methods (i.e., calculation of the index) (Paukkunen et al., 2011). Notably, the ASSR paradigm adopted in the study was widely used in previous studies (Light et al., 2006; Tang et al., 2018; Wang et al., 2018), and the analysis methods we used were consistent with those of most ASSR studies (Kim et al., 2019; Koshiyama et al., 2018b; Wang et al., 2018).

    • Test-retest reliability of duration-related and frequency-related mismatch negativity

      2021, Neurophysiologie Clinique
      Citation Excerpt :

      However, all of these studies investigated only two sessions of MMN experiments, which was not sufficient to justify its reliability over longitudinal trials. Paukkunen et al., [30] pointed out that measurement error independently affects MMN reliability, and it should not be ignored if it exceeds 10% of MMN amplitude. A major concern is that most of the above studies used the Pearson correlation as a reliability coefficient, which could be unreliable when considering systemic errors [43].

    • Reliability of mismatch negativity event-related potentials in a multisite, traveling subjects study

      2020, Clinical Neurophysiology
      Citation Excerpt :

      The participants completed the same EEG task 16 times, and 14 of these 16 test occasions involved some long travel times, which could have contributed to boredom, sleepiness, jetlag, and/or stress. This constrasts with previous MMN reliability studies that involve typically two, but at most four (Dalebout and Fox, 2001) or five (Paukkunen et al., 2011), repeated assessments at one lab site. The geographic layout of the sites and administrative burden of organizing the study required a fixed travel loop for all subjects, and pseudo-randomization of order was achieved by having one subject start at each site.

    View all citing articles on Scopus
    View full text