Introduction

In recent years, research efforts in Alzheimer’s disease (AD) have focused upon the discovery of clinically meaningful and non-invasive biomarkers that can reliably monitor disease progression and predict future conversion to the disease. Several groups including our own have proposed the use of magnetic resonance imaging (MRI) based tools to aid in the diagnosis of AD (Desikan et al. 2009; Liu et al. 2009) and predict future conversion from the prodromal stage of disease often referred to as mild cognitive impairment (MCI) (McEvoy et al. 2009; Westman et al. 2011a).

Hippocampal atrophy has been frequently observed in AD (Jack et al. 1992; Fox et al. 1996) and has been demonstrated in MCI subjects (Devanand et al. 2007), with increased risk of future conversion to AD in subjects with smaller hippocampal volumes (Apostolova et al. 2006; Csernansky et al. 2005). Hippocampal volumetry has been a useful marker of AD pathology but seems to be insufficiently sensitive for distinguishing between MCI subjects bearing a high risk of AD conversion from those who remain clinically stable (Mueller et al. 2010). Post-mortem studies have also demonstrated that hippocampal atrophy in AD is non-uniform, with Cornu Ammonis 1 (CA1) and subicular atrophy reported in early AD (Braak and Braak 1991; West et al. 2004).

So far, only a few studies have attempted to measure regional atrophic changes within the hippocampus using manual delineation and 3D surface mapping (Mueller and Weiner 2009; Apostolova et al. 2010; Costafreda et al. 2011). Manual delineation of the subfield boundaries is both a time consuming and labour intensive process which limits its widespread applicability in practice. However, recent developments in image acquisition have made it possible to segment the hippocampus into its subfields in a fully automated fashion and this method has now been validated using ultra high resolution MRI (Van Leemput et al. 2009). A recent small study applied this technique to 15 MCI subjects using conventional 3D T1 weighted volume imaging and demonstrated that segmenting subfields increased sensitivity in diagnosing MCI (Hanseeuw et al. 2011). The current study uses an extensive dataset created by combining two large cohorts from the AddNeuroMed and the Alzheimer Disease Neuroimaging Initiative (ADNI) studies to build on and extend this earlier work.

In this study we aimed to (1) investigate the differences in hippocampal subfields between subject groups at baseline in a cohort of 1,069 subjects, (2) determine patterns of subfield volume loss in relation to age, gender, education, APOE ε4 genotype, and neuropsychological tests from mini mental state exam (MMSE) and Alzheimer disease Assessment Score-1 (ADAS-1) scores, and (3) compare combined subfield volumes using orthogonal partial least squares (OPLS) multivariate analysis to hippocampal volume alone for discriminating between AD and healthy control (HC) subjects and predicting future conversion from MCI to AD at 12 months.

Materials and Methods

Study Data and Inclusion and Diagnostic Criteria

The data used in this study were derived from two large multicentre cohorts, the AddNeuroMed and ADNI cohorts. The AddNeuroMed study is an integrated project funded by the European Union Sixth Framework Program and aims to establish and validate novel biomarkers of disease and treatment based upon in vitro and in vivo human and animal models of AD. Data was collected from six participating sites across Europe: University of Kuopio, Finland; University of Perugia, Italy; Aristotle University of Thessaloniki, Greece; King’s College London, United Kingdom; University of Lodz, Poland; and University of Toulouse, France (Lovestone et al. 2009; Simmons et al. 2009, 2011).

Data from the ADNI study was downloaded from the ADNI at the LONI website (www.loni.ucla.edu/ADNI, PI Michael M. Weiner). The initiative was launched in 2003 by the National Institute on Ageing, the National Institute of Biomedical Imaging and Bioengineering, the Food and Drug Administration, private pharmaceutical companies and non-profit organisations, as a 5 years public–private partnership. The primary goal of ADNI has been to test whether MRI, positron emission tomography (PET), and other biological markers are useful in clinical trials of MCI and early AD. Subjects aged 55–90 from over 50 sites across the U.S and Canada participated in the research, and imaging, clinical, and biological samples were collected at multiple time points (Jack et al. 2008). A detailed description of the inclusion criteria for the study can be found on its webpage (http://www.adni-info.org/scientists/aboutADNI.aspx#).

A total of 1,069 subjects were included in this study (AD = 291, MCI = 447, HC = 331). The demographics of the cohorts are given in Table 1. Of the 447 MCI subjects in our whole cohort, 90 converted to an AD diagnosis (MCI converters) at 12 months.

Table 1 Demographic, clinical and neuropsychological data in AD, MCI converters, stable MCI, and control subjects

For the AddNeuroMed cohort, subjects were patients who attended local memory clinics and received a diagnosis of MCI while HC subjects were recruited from non-related members of the patient’s families, caregiver relatives, and social centres for the elderly or General Practitioner (GP) surgeries. Informed consent was obtained for all subjects and the study was approved by the ethical review boards of each participating country. The general inclusion and exclusion criteria were as follows.

AD

(1) diagnosis established by National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA) and Diagnostic and Statistical Manual of Mental Disorders IV (DSM IV) criteria, (2) MMSE score ranged from 12 to 28. Subjects were excluded from the study if any psychiatric or neurological illness other than AD was present, and if subjects presented with a systemic illness or signs of organ failure.

MCI

(1) subjects had MMSE scores between 24 and 30, (2) subjective memory complaint with preserved activities of daily living, (3) Clinical Dementia Rating (CDR) score of 0.5, (4) Geriatric depression scale score less than or equal to 5, (5) absence of dementia in accordance with NINCDS-ARDA criteria. A 12 months follow up was used to determine whether MCI subjects converted to AD (MCI converters) or remained clinically stable (stable MCI).

HC

(1) MMSE scores between 24 and 30, (2) CDR score of 0, (3) no presence of neurological or psychiatric illness, and non-demented.

MMSE, CDR, and the Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) cognitive battery were assessed for each subject. The CERAD cognitive battery was replaced with the Alzheimer’s disease assessment scale for AD subjects in AddNeuroMed. The CERAD battery employs the same 10 word recall task as the Alzheimer’s assessment scale, only the scoring is inverted. Therefore, the mean number of words not recalled in the CERAD word list task was calculated in order to obtain comparable measures of memory for all diagnostic groups. This revised cognitive parameter was named ADAS-1 corresponding to the first subtest of the Alzheimer’s disease assessment scale.

MRI Acquisition

Standardized MRI data acquisition techniques were in place for AddNeuroMed and ADNI to ensure homogeneity across data acquisition sites. A detailed description of the ADNI data acquisition protocol can be found at www.loni.ucla.edu/ADNI/research/Cores/index.shtml. The imaging protocol included a 1.5T high resolution T1 weighted sagittal 3D MP-RAGE volumes (voxel size 1.1 × 1.1 × 1.2 mm3), and axial proton density with T2 weighted fast spin echo images. A comprehensive quality control procedure was carried out on all MR images according to the AddNeuroMed quality control framework (Simmons et al. 2009, 2011).

Hippocampal Subfield Segmentation

Image analysis was carried out using the Freesurfer image analysis pipeline (version 5.1.0). These procedures have been described in detail in previous publications (Dale et al. 1999; Fischl et al. 2002; Ségonne et al. 2004; Fischl et al. 2004). Initially volumetric segmentation involved the removal of non-brain tissue using a hybrid watershed/surface deformation procedure (Ségonne et al. 2004), automated Talairach transformation, segmentation of the subcortical white matter and deep grey matter volumetric structures (Fischl et al. 2004).

Automated segmentation of the hippocampus was performed to define anatomical subfield labels using a Bayesian modelling approach and a computational model of the areas surrounding the hippocampus. An atlas mesh had previously been built and validated from manual delineations in ultra-high resolution MRI scans of 10 individuals (Van Leemput et al. 2009). These delineations include the fimbria, presubiculum, subiculum, CA1, CA2/3, and CA4-DG subfields as well as the hippocampal fissure. Figure 1 illustrates the delineations made to define the different subfields of the hippocampus. For more details about this technique and the borders used to define the different subfields, see Van Leemput et al. (2009).

Fig. 1
figure 1

a Coronal and b sagittal views of the hippocampus

All subfield measures were normalised by the subject’s intracranial volume derived from Freesurfer using the following formula: volumenorm = volumeraw × 1,000/ICV in cm3 (Westman et al. 2013). This automated segmentation approach has been recently applied to a small group of MCI subjects (Hanseeuw et al. 2011).

Statistical Analysis

Statistical analysis was conducted using PASW Statistics (Version 17. 0; SPSS inc., USA). Categorical variables were inspected using the Chi square test while continuous variables were tested using ANOVA with Bonferroni post hoc comparisons. Hippocampal subfield volumes were first analysed using MANCOVA utilising Bonferroni correction by adopting a general linear model procedure, adjusting for age, gender, education, and APOE ε4 genotype as covariates. Bonferroni pairwise comparisons were performed to inspect subfield volume differences between the groups.

Multiple regression analyses were conducted in R version 2.15.2 using the lm function from the R stats package and Bonferroni correction for multiple comparisons. Patterns of subfield volume loss were tested in relation to the effects of age, gender, education, APOE ε4 genotype, and neuropsychological test scores from MMSE and ADAS-1. In this step, all subfield measures were tested as dependent variables by disease group (AD, MCI converters, stable MCI, and HC) as a whole. Age, gender, years of education, APOE ε4 genotype, and neuropsychological scores from MMSE and ADAS-1 tests was treated as independent variables for identifying subfield specific effects. 10 fold cross validation was performed by fitting linear regression models to the data, excluding 1/10th of the data in each fold and using the fitted model for prediction on data that was excluded from the fold.

Hippocampal subfields were subsequently analysed using Orthogonal Partial Least Squares (OPLS) (Wiklund et al. 2008; Trygg and Wold 2002), a supervised multivariate data analysis method included in the software package SIMCA (Umetrics AB, Umea, Sweden). Al1 14 variables (left and right subfields) were used for OPLS analysis. Classification models were created for distinguishing between AD and HC subjects at baseline. The AD versus HC models were subsequently treated as classifiers to investigate how well the hippocampal subfields could predict future MCI conversion to AD at 12 months follow up. Seven-fold cross validation was used for all models. Using this approach we created 4 OPLS models; 2 for the total hippocampus and 2 for the combination of subfield volumes. The first model for each region comprised the AddNeuroMed cohort and the second model comprised the ADNI cohort. To further validate the models created the AddNeuroMed cohort was used as the training set and the ADNI cohort as a test set (and vice versa) to see how well the models could predict new and unseen data. The combined ADNI and AddNeuroMed cohort from the AD versus HC comparison was used as a classifier to investigate the reliability of predicting MCI conversion to AD at 12 months. This OPLS classification approach has been extensively validated (Bylesjo et al. 2006; Wiklund et al. 2008; Westman et al. 2011c) and applied to several biomarker discovery studies in AD (Mangialasche et al. 2010; Westman et al. 2011a, 2012; Spulber et al. 2013).

Sensitivity and specificity were calculated from the cross-validated prediction values of the OPLS models. The positive and negative likelihood ratios (LR+ = sensitivity/(100−specificity) and LR− = (100−sensitivity)/specificity)) were determined. A positive likelihood ratio between 5 and 10 or a negative likelihood ratio between 0.1 and 0.2 increases the diagnostic value in a moderate way, while a value above 10 or below 0.1 significantly increases the diagnostic value of the test.

Receiver operating characteristic (ROC) curves were calculated for the individual subfield volume models using the ROCR library (version 2.1) in R. ROC curves provide a graphical means to interpret the quality of separation and are created by plotting the true positive rate (sensitivity) versus the false positive rate (1−specificity) for various thresholds. The discriminant value of the corresponding ROC curve can be obtained by calculating the area under the curve (AUC). AUC values range from 0.5 (random discriminations no better than chance) to 1.0 (perfect discrimination). The pROC (Receiver Operating Characteristic) package (version 1.5.4) (Robin et al. 2011) in R was used to perform area under the curve (AUC) statistical comparisons between the combined subfield and total hippocampal volume models in the AD vs. HC and MCI converter vs MCI non-converter models.

Results

Demographics, Neuropsychological, and Global Clinical Measurements

1,069 subjects were included in the current study (AD = 291, MCI = 447, HC = 331) from the AddNeuroMed and ADNI cohorts. Results from global, clinical and cognitive assessments revealed that scores on MMSE, CDR, and ADAS-1 were poorest amongst AD and best amongst control subjects as expected (Table 1).

Hippocampal Subfields

Hippocampal subfield volumes from the left and the right hemisphere were used to determine the pattern of subfield atrophy in AD, MCI-converter, MCI stable and HC subjects. Comparisons of the bilateral CA1, CA2-3, CA4-DG, subiculum, and presubiculum were significant across all groups (<0.0001) after correction for multiple comparisons and demonstrated similar results in pairwise comparisons (Table 2 and Fig. 2). No significant volume differences were found for the left and right hippocampal fissure between these groups.

Table 2 Hippocampal subfield differences in AD, MCI converters, stable MCI, and healthy control subjects
Fig. 2
figure 2

Bar plot of subfield volumes of AD (n = 291), MCI converters (n = 90), MCI stable (n = 357), and healthy control (HC) subjects (n = 331). Error bars represent SEM = SD/√n. Subfield volumes are represented in mm3. R = subfield volumes from the right hemisphere, L = subfield volumes from the left hemisphere

In the left hippocampus, presubiculum (F = 144.5, p < 0.0001) and subiculum (F = 144.3, p < 0.0001) volumes were most significantly reduced in AD, MCI converter and MCI stable subjects compared to healthy controls. The same pattern of subfield atrophy was also observed in the right hippocampus for these groups for both presubiculum (F = 122.1, p < 0.0001) and subiculum (F = 120.0, p < 0.0001) relative to healthy controls. MCI-converters displayed significant subfield volume losses in the bilateral subiculum (right = p < 0.0001, left = p < 0.0001), subiculum (right = p < 0.0001, left = p < 0.0001), CA4-DG (right = p < 0.0001, left = p < 0.0001), and CA2-3 (right = p < 0.0001, left = p < 0.0001) relative to stable MCI subjects. However, no significant differences in any of the subfield volume measures were observed between AD and MCI-converter subjects.

Relationship Between Neuropsychological Test Scores and Hippocampal Subfields

A significant positive effect for MMSE was found in relation to all hippocampal subfield volumes except bilateral hippocampal fissure, indicating that subjects with lower MMSE scores had reduced subfield volumes (Table 3). On the other hand, a significant negative effect was observed for ADAS-1 scores across all subfield volumes except the bilateral hippocampal fissure, indicating that subjects with higher ADAS-1 scores (mean number of words not recalled) had lower subfield volumes (Table 3).

Table 3 MMSE and ADAS-1 effect on hippocampal subfield volumes in the combined cohort

Relationship Between Age, Education, APOE ε4 Genotype, and Hippocampal Subfields

A significant negative effect of age was observed in relation to all subfield volumes indicating that older subjects had lower hippocampal subfield volumes. In particular, the strongest effects of age were found in the right presubiculum (β = −0.32, p < 0.001), and left presubiculum areas, the right fimbria (β = −0.31, p < 0.001), and the right subiculum and left subiculum areas (β = −0.28, p < 0.001).

Linear regression models were also created to test for the effect of gender on subfield volume differences in the male (n = 567) and female (n = 502) subjects. A significant positive effect of gender was found in the right fimbria and left fimbria areas, and in the right and left CA4-DG subfield volumes (Table 4).

Table 4 Age and Gender effect on hippocampal subfields in the combined cohort

A significant negative effect of education was only found in the right CA1 (β = −0.95, p = 0.024).

The analysis was repeated for subjects that were carriers and non-carriers of the APOE ε4 allele. APOE E4 genotype was negatively related to all subfield volumes suggesting that subjects with an APOE E4 allele had smaller subfield volumes (Table 5).

Table 5 Years of education and APOE genotype effect on hippocampal subfields in the combined cohort

AD and HC Classification for the Combined AddNeuroMed and ADNI Cohort

For the joint AddNeuroMed and ADNI AD versus HC model, combining the subfield volumes resulted in an accuracy of 81.7 % (sensitivity = 80.4 %, specificity = 82.8 %, AUC = 0.895) compared to 80.7 % for total hippocampal volume (sensitivity = 79.2 %, specificity = 82.8 %, AUC = 0.887) (Table 6). These result were statistically significantly different in terms of the observed AUC differences between the two models (AUC difference = 0.008, p = 0.001).

Table 6 Comparison of performance for the different cohort models in the AD vs. HC classification

Combining subfield volumes resulted in similar classification accuracy to the subiculum (accuracy = 81.2 %, sensitivity = 83.5 %, specificity = 79.2 %, AUC = 0.887) and presubiculum alone (accuracy = 80.6 %, sensitivity = 83.2 %, specificity = 78.3 %, AUC = 0.882), but higher accuracies than the other individual subfield volume measures (Table 7). Figure 3 illustrates ROC curves for the corresponding individual subfield volumes for distinguishing AD and HC subjects.

Table 7 Comparison of performance for OPLS AD vs. HC classification models
Fig. 3
figure 3

a ROC curve for AD versus HC classification using individual subfield measures, b ROC curve for MCI-c and MCI-s classification using individual subfield measures. The curve is calculated with a 95 % probability assurance. ROC receiver operating characteristic, AD Alzheimer’s disease, HC healthy control, MCI-c MCI-converter, MCI-s stable MCI

Model Validation for AD and HC Classification

Seven fold cross validation was used to determine the robustness of all the models. In this study models were validated using an external test set. The ADNI model was used as a training set and predictions were made using the AddNeuroMed cohort as the external test set and vice versa. The results are similar to those obtained by cross validation (Table 6). For the combination of hippocampal subfields, using the AddNeuroMed cohort as the test set and the ADNI cohort as the training set resulted in a similar classification accuracy, 82.1 % (sensitivity = 77.1 %, specificity = 86.9 %, AUC = 0.90) compared to 81.1 % for total hippocampal volume (sensitivity = 75.2 %, specificity = 86.9 %, AUC = 0.897). Similar results for the combination of hippocampal subfields and total hippocampal volume were obtained when using the ADNI cohort as the test set and the AddNeuroMed cohort as the training set (Table 5). For further validation, we compared if subjects were classified differently between the different models for the combination of hippocampal subfields (for example classified as AD in one model and HC in another model). We compared the single cohort cross validated models with the combined cohort model and the single cohort models using the train/test approach. The results demonstrate that classification agreement for the different comparisons were high, lying between 89.5–98.8 % (Table 8).

Table 8 Comparison of subject classification between cohort models

Predicting MCI Conversion

Models previously constructed using AD and HC subjects from the combined cohort were applied to our large external test set of MCI subjects (n = 447) to predict future conversion to AD. These classifiers subsequently identified MCI subjects with an AD like brain structure (percentage classified as AD-like) or a healthy control-like brain structure (percentage classified as HC-like). During the 12 month follow up interval, 90 MCI subjects from the AddNeuroMed and ADNI cohorts met the clinical criteria for AD, and 357 MCI subjects remained clinically stable.

The combined subfield volumes classifier correctly identified 81.1 % of MCI converters from baseline images, with the presubiculum also correctly identifying 81.1 % of MCI-c with an AD-like pattern of subfield atrophy. In comparison, total hippocampal volume identified 76.7 % of MCI-c correctly. The predictive accuracies from the classifiers ranged between 56.7–81.1 % for MCI-c predictions (Table 9).

Table 9 MCI predictions using the baseline OPLS AD versus HC classifiers

However, a considerable number of MCI-s subjects were also predicted with an AD-like pattern of atrophy despite their clinically stable condition at 12 months follow up. For instance, the combined subfield volumes classifier, which was the most robust for MCI-c prediction, identified only 48.7 % of MCI-s with a HC-like subfield structure. A similar result was observed for the total hippocampal volume classifier which only identified 50.1 % of MCI-s correctly. Mean OPLS scores from the combined subfield volumes classifier and total hippocampal volume classifier were 0.50 ± 028 and 0.49 ± 0.27 (mean ± standard deviation) respectively (Fig. 4). As a result, differences in OPLS scores between the two classifiers were not statistically significant (Wilcoxon signed rank sum test, Z = −0.725, p = 0.469) despite the difference in AD-like MCI-c predictions. HC-like predictive accuracies in MCI-s predictions only ranged between 46.8–51.8 % which is because many of these MCI-s subjects will convert to AD at a future stage and already demonstrate an Alzheimer like pattern of hippocampal subfield atrophy.

Fig. 4
figure 4

a OPLS scores from the total hippocampal volume classifier for MCI-s predictions, b OPLS scores from the combined subfields volume classifier for MCI-s predictions

Discussion

Using an automated image analysis pipeline to explore the subfields of the hippocampus, we found that AD and MCI converters displayed a widespread pattern of subfield atrophy, including the bilateral subiculum, presubiculum and CA1 which have been reported in previous studies. Using the same image analysis approach, Hanseeuw et al. (2011) previously reported significant volume losses in the subiculum and CA2-3 region of the hippocampus in a small group of 15 amnestic MCI subjects and 15 healthy controls. We have extended this preliminary work to data from two large studies which together contain a more heterogeneous group of AD and MCI subjects that more accurately reflect the population of MCI and AD. The pattern of hippocampal volume loss that was found was wider than previous reports which have used either manual delineation techniques hippocampal subfield segmentation (Mueller and Weiner 2009; Mueller et al. 2010), 3D surface mapping (Apostolova et al. 2010) or shape analysis techniques (Csernansky et al. 2005; Costafreda et al. 2011) A similar pattern of subfield atrophy was observed for AD subjects and MCI converters suggesting that MCI converters may represent an imaging profile more similar to AD subjects than stable MCI. The pattern of hippocampal subfield loss, though wider than previously reported is in agreement with previous neuropathological studies reporting early neuronal loss in the subiculum, CA4-DG, and CA1 (West et al. 2004). Larger datasets are more likely to contain subjects with different types of atrophy which could explain the widespread pattern of subfield volume losses reported in the present study.

Relationship Between Neuropsychological Test Scores and Hippocampal Subfields

A significant positive effect for both MMSE and ADAS-1 was found in relation to all hippocampal subfield volumes, indicating that subjects with lower MMSE scores and higher ADAS-1 scores had lower subfield volumes. This confirms the relationship between diffuse hippocampal volume loss and poorer neuropsychological test scores (Scheltens et al. 1992; Liu et al. 2009).

Relationship Between Age, Gender, Education, APOE ε4 Genotype, and Hippocampal Subfields

Previous studies investigating the influence of age on hippocampal subfields have found significant negative effects associated with CA1 and CA2-3 subfield volumes (Mueller and Weiner 2009). Consistent with this previous work, using a larger dataset we also found a significant negative effect of age but in relation to all subfield volumes. However, years of education was only significantly associated with the right CA1 and right fimbria. Gender specific differences in the pattern of subfield volume loss were found, with female subjects demonstrating lower bilateral CA2-3, CA4-DG, fimbria, presubiculum and subiculum volumes. Previous work with AD patients suggests that gender specific differences in the rate of hippocampal volume loss are not entirely clear. For example, a previous study has reported a higher prevalence and incidence of AD in females (Barnes et al. 2005), whereas sex hormone differences have been suggested as an explanation of any gender divergence (Gouras et al. 2000). On the other hand, our findings suggest that carriers of the ε4 allele had smaller subfield volumes. which is in agreement with previous studies that have demonstrated a strong neuroanatomic effect of APOE ε4 genotype on the entire hippocampal region. (Jack et al. 1998; Reiman et al. 1998).

AD and HC Classification

In this study we used the multivariate OPLS technique to distinguish between AD and control subjects. This method has previously been used for distinguishing between AD and control subjects (Westman et al. 2011a, b, c, d), as well as predicting conversion from MCI to AD using MRI regional measures and a combination of MRI regional measures and magnetic resonance spectroscopy (MRS) measures (Westman et al. 2010, 2011a, b, c, d). Hippocampal subfields have not been studied using this approach but several other studies have used alternative multivariate techniques including support vector machines, principal components analysis, and partial least squares to latent structures and linear discriminant analysis to analyse multiple MRI regional measures (Fan et al. 2008; McEvoy et al. 2009; Klöppel et al. 2008; Vemuri et al. 2008; Plant et al. 2010).

Studies that have attempted to distinguish between AD and control subjects have often done so using medial temporal structures such as the hippocampus and entorhinal cortex and reported accuracies of 80–90 % (Fox et al. 1996; Jack et al. 1992). Although prior studies have reported accuracies of up to 100 % in discriminating between AD and control subjects(Fan et al. 2008; Lerch et al. 2008), some studies used smaller samples, included more severely impaired AD patients or failed to cross-validate their findings. Here, using two large multicentre studies we segmented the hippocampus into its different subfields to examine whether subfield volumes could improve the sensitivity of MRI in detecting AD. The results suggest that the OPLS technique with fully automated hippocampal subfield volumes performs as accurately as total hippocampal volume, presubiculum volume and subiculum volume in distinguishing between AD and control subjects. The OPLS method which combined hippocampal subfield measures produced a classification accuracy of 81.7 % (sensitivity = 80.4 %, specificity = 82.8 %, AUC = 0.895), while total automated hippocampal volume produced an accuracy of 80.7 % (sensitivity 79.2 %, specificity 82.8 %, AUC = 0.887). Although significantly different, the magnitude of the difference is small and does not offer a particular advantage over hippocampal volume. Recent work has also found that the visual rating assessment of the medial temporal lobe produces accuracies that are comparable to that of manual hippocampal volume in distinguishing between AD and controls (Westman et al. 2011d).

Although our study is the first to use multivariate analysis of automated hippocampal subfields, previous research has examined the combination of automated regional cortical thicknesses and regional volumes in distinguishing between AD and control subjects using support vector machines and linear discriminant analysis (Vemuri et al. 2008; McEvoy et al. 2009).

Predicting MCI Conversion

Building robust classification models on new and unseen data is of great importance for accurately predicting future MCI conversion to AD. MCI predictions were performed using AD vs. HC models in the combined ADNI and AddNeuroMed cohorts as classifiers and MCI subjects as our external validation test set. This approach has been applied previously and demonstrates how larger training sets can be used to assess MCI predictions that are more balanced in terms of sensitivity and specificity (Westman et al. 2011a). Previous studies in the neuroimaging literature utilising advanced methods of high dimensional pattern classification (Fan et al. 2008; Misra et al. 2009), and whole brain structural MRI (Karas et al. 2008; Davatzikos et al. 2010) have demonstrated the complexity of differential atrophy patterns observed in MCI-c and MCI-s subjects. Moreover, studies including our own have also shown heterogeneous patterns of brain atrophy exist in MCI subjects that convert to AD and those who remain clinically stable (Westman et al. 2011c; McEvoy et al. 2009). Consequently, hippocampal subfields were of interest following a small pilot study in MCI subjects (Hanseeuw et al. 2011).

Using a large external validation test set (n = 447) we sought to identify MCI subjects based on the similarity of their hippocampal subfield pattern to AD patients (% AD-like) or healthy control subjects (% HC-like). Unlike some previous studies, the large number of MCI subjects in our study served to more accurately represent the heterogeneity of MCI subjects and included both amnestic and non-amnestic subtypes.

The results demonstrated that the combination of subfield volumes and the presubiculum were the most robust classifiers, identifying 81.1 % of MCI-c correctly, and were better than using total hippocampal volume alone. However, a considerable number of MCI-s subjects were also predicted with AD-like patterns of atrophy despite having a clinically stable MCI condition at 12 months follow up. Although beyond the scope of the current study we intend in future to study longitudinal change in hippocampal subfield measures over longer follow up times in the ADNI cohort. The utility of structural MRI plays a key role in this domain and represents one of the 3 main biomarkers for AD diagnosis (Dubois et al. 2007; Frisoni et al. 2010). However, more focus needs to be addressed towards the standardisation of acquisition and analysis methods in order to facilitate the integration of findings across studies. Recently there has been much interest in exploring the combination of different MRI imaging techniques (i.e. Tensor based morphometry, cortical thicknesses and volumes) with cerebrospinal fluid (CSF) biomarkers, 18F-fluorodeoxyglucose PET, and clinical examination for classifying AD and predicting MCI conversion to AD (Wolz et al. 2011; Vemuri et al. 2009; Zhang et al. 2011; Furney et al. 2011). In regards to our study, a longer follow up time would be helpful to refine our estimates of model specificity for MCI-s predictions. A more robust algorithm that could potentially predict future MCI time to AD conversion would be of future interest to validate our findings in this present study.

Conclusion

Hippocampal subfield volume loss in AD is widespread affecting regions such as the CA-1, subiculum, and presubiculum. Using an automated hippocampal subfield measurement technique we found prominent subfield volume losses in MCI converters and AD. Each of the subfield measures was related to both clinical predictors of AD (Age, gender, years of education, APOE E4 genotype) and cognitive scores (MMSE and ADAS-1 tests). Combined subfield volumes using the OPLS technique produced a similar classification accuracy to total hippocampal volume, presubiculum volume and subiculum volume in distinguishing between AD and HC subjects, but were more accurate than total hippocampal volume measurements at predicting MCI conversion to AD at 12 months.