Keywords
fMRI, MVPA, feature selection, individual variability, semantics
This article is included in the INCF gateway.
This article is included in the Preclinical Reproducibility and Robustness gateway.
fMRI, MVPA, feature selection, individual variability, semantics
It is widely acknowledged that despite some challenges in multi-voxel pattern analysis (MVPA), the issue of individual variability, raised as a penalty to classification accuracy in cross-subject modelling, is challenging to overcome, particularly when targeting concepts and meanings conveyed by language1. Admittedly, the precision rates in MVPA could be mostly uniform for experiments successfully performed at the individual first level, as in the case of the classically recognized Science article authored by T. Mitchell and his group (Predicting Human Brain Activity Associated with the Meanings of Nouns)2. In their study, distributed brain activation patterns were observed in nine subjects conceptualizing 60 concrete nouns in an fMRI scanner, and these neural patterns were regressed on a co-occurrence probability in a text corpus between each of these fMRI nouns and semantic features (25 basic verbs). The highly significant prediction accuracy that they obtained at the subject level was associated with a pattern separability built on a set of informative voxels as features, which are distributed across numerous brain areas and differ across subjects. The Science study and other computational neurolinguistic reports3–7 could generate classifiers with high precision (owing to the L2-regularization technique that obliterates individual variability) and draw plausible semantic maps in the individual brain.
However, functionality at work in the individual brain remains to be specified further beyond unanimously acceptable modelling results. In this study, an alternative view is put forward to uncover systematicity and provide typology in individual variability through a reanalysis of the open data used in the Mitchell et al.’s Science study. Several previous studies have reported that despite the almost invariably accurate performance of the subjects in conceptualization tasks, a fundamental difference is seen in the magnitude of how the information on the feature voxels is dispersive based on their anatomical locations1,5. Unquestionably, the experimental paradigm developed by Mitchell et al. hinged on modality specific factors, which promoted distributiveness of semantic processing systems. Their stimulus set, which used captioned drawings with considerable visual effects and concrete nouns with implications of some motion/perception, rendered the experiment sensitive to embodied cognition8. The attribute generation task, which consisted of thinking about the properties of the object, could allow free association and perceptual simulation9. Yet, even without accounting for the modality specific factors, there remains some debate pertaining to the supramodal semantic centers in general as to whether the left temporal pole and anterior temporal gyrus (hub and spoke model)10–11 or the left middle temporal gyrus and the left angular gyrus (high level convergence model)12–13 are nodal for the loci of a genuinely semantic process. The primary goal of the present study was to determine whether such topology of semantic processing can be elucidated through a typology of fMRI subjects and to examine how that typology is determined by the subjects’ hidden neural responses characteristic to the selection of the most informative voxels.
The datasets were nine .mat files corresponding to the nine subjects (P1–P9), downloaded from the website of Carnegie Mellon University. A ‘.mat’ file was created (by using MATLAB R2015a) from each subject’s data, called 'runByVoxByNouns-P<number>.mat,' as a three-dimensional array corresponding to x: runs (six repeated presentations) by y: voxels by z: words (60 fMRI nouns). This procedure facilitated the computation of informative voxel locations identified by different feature-selection methods, which were the F values of ANOVA to measure and evaluate the mean value across the items repeatedly presented in a learning set and the Stability scores to distinguish voxels that exhibited consistently similar activation patterns to the items for machine learning14. Mitchell et al. adopted the latter method to select the top 500 voxels, which involves computing the Pearson's correlation coefficient of the activation vectors for the stimulus nouns over 15 (=6P2) pairs of fMRI presentation runs repeated 6 times. They computed the average pairwise correlation for each voxel over all pairs of rows in the matrix composed of six presentation runs by 58 nouns (reserving two nouns for testing). In this study, cross-validation was not performed except for computing the modeling accuracy for each subject using the ordinary least square method (OLS) without L2-regularization. The top 500 voxels were selected from the overall runs by the ANOVA and the Stability scores, and the number of the voxels not shared by the two selection results (subtraction operation for the two sets, i.e., type 1: ANOVA set – Stability set and type 2: Stability set – ANOVA set) was counted as “divergence” for each subject, as shown in Table 1. The voxel-wise ranks in the two selection results were compared with each other with Spearman’s rank correlation coefficient and the subject-wise rho and the corresponding p values were computed. The ANOVA and Stability feature-ranking data were all mapped to anatomical regions according to the automated anatomical labeling (AAL) atlas15.
The top 500 voxels were selected by the ANOVA and the Stability and the modelling accuracy based on the ordinary least square method (OLS) without adjustment of L2-regularization. The subject-wise ‘rho’ and the corresponding p values represent Spearman’s rank correlation coefficient between the voxel-wise ranks in the two feature selection results. ‘Divergence’ implies the number of the selected voxels extracted by the subtraction operation of ANOVA set – Stability set or Stability set – ANOVA set.
The raw modelling accuracy, based on the ordinary least square methods (OLS) without L2-regularization, decayed with the magnitude of "divergence" between the ANOVA and the Stability score and with the decrease in rank similarity between the voxels selected by each method (Table 1). Admittedly, good modelling accuracy is associated with high F values and Stability scores for the selected voxels, as is seen in P1 eliciting the best precision by a wide margin and the next best group of P2–P4. However, there was no mean difference in feature scores among the subjects of the middle group, P5, P6, and P7, although P5 and P7 exhibited significant Spearman’s rank correlation coefficients, divergence less than 100, and OLS precision higher than 70%, while none of these conditions was true for P6. When visualizing the score distributions of the top 500 voxels with a notched box plot, it is apparent that the poor-performance group (P6, P8, and P9) was characterized by narrowness of the boxes and the low upper whiskers (Figure 1) and dispersiveness of the voxel-wise ranks in the two selection results (Figure 2).
Our results support the notion that particular types of individuals differ markedly in their way of recruiting voxels with respect to different feature selection methods, i.e., Stability scores and F values of ANOVA. The Stability scores examine the extent to which each voxel reacted to the same stimulus across runs in a constant manner; therefore, it is the “identity” of an object that is emphasized by this index as invariable through repetition. Conversely, the F values of ANOVA pertain to the magnitude of between-group variance across the responses to the 60 nouns with respect to the within-group variance across the 6 presentations of each individual noun; therefore, the “difference” is likely to be captured by that index although it should be inextricably linked with the “identity” side. In consequence, regardless of the feature selection method, mostly the same voxels with a similar top 500 ranking order could be selected from the brains of P1–P5 and P7, but the remaining subjects (P6, P8, and P9) showed important divergence from the list of the 500 important voxels selected by the two methods. The mean index values for the top voxels were significantly larger in the former subjects than the latter ones for each method; the difference in raw classification accuracy without the regularization effect was conspicuous between these subject groups. However, an in-depth analysis revealed some questions to be delineated, since P5 and P7 may be treated equally as members of an interesting subgroup in that, despite the highly significant rank correlation for the selected voxels, the mean values were relatively low and not significantly different from those of P6 in the poor-performance group.
When tapping into the anatomical regions from which feature voxels were selected, the most intriguing property was that high precision in modelling was guaranteed rather by extra-linguistic regions. It is noteworthy that in P1 (recording 82% as classification accuracy by the OLS with no regularization), the majority of the top 487 informative voxels shared by the Stability score and the ANOVA were found in the visual areas of the temporal and occipital lobes and several in the frontal and parietal lobes. Differently from all the other subjects, the left inferior frontal gyrus, pars triangularis ("Frontal_Inf_Tri_L"), and the left precuneus ("Precuneus_L"), frequently considered as involved in executive functions of language activity, were not recorded in the overlapping selected voxel areas of P1. When focusing on the poor-performance subject group (P6, P8, and P9), which exhibited a large divergence (larger than 1 standard deviation from the mean) between the voxel selections by the two methods, it appeared that the modality-specific areas were likely to be monitored by the Stability score (indexing “identity”), and that the ANOVA (emphasizing “difference”) tended to detect supramodal semantic areas. The voxels type 1 (selected by the ANOVA but not by the Stability score as the top 500) and type 2 (selected by the Stability score but not by the ANOVA under the same criteria) voxels were mostly extracted from different anatomical regions. The frequency distribution tables (Figure 3) represent the number of type 1 and type 2 voxels selected from P6, P8, and P9 and attributed to each anatomical area in the AAL brain atlas. The ANOVA highlighted as locations of type 1 voxels, some areas for amodal or supramodal semantic processing, especially the left middle temporal gyrus (“Temporal_Mid_L”) which was the most populous by far in this category. In contrast, the Stability score tended to introduce bias to the vision-related areas, notably the left middle occipital gyrus (“Occipital_Mid_L”), which may reflect stimulus modality or perceptual symbols in embodied cognition.
This divergence allows us to shed a new light on a traditionally controversial subject in neural semantics; where is the border that separates the brain regions selective to purely conceptual functions and sensory-driven, modality-dominant, so extrinsic to meaning processing? The compatibility of the Stability score with the perceptual modalities may suggest with the embodiment view (to which Mitchell et al. also referred for the neural signature of the verb “eat”) that the “identification” of a concept is materially founded upon a sensory-perceptual system and real-life experience with its instances of referent (or stimuli) to shape its cognitively grounded symbols. However, the voxel information brought by the ANOVA enables us to propose an alternative view for the discrimination power in language, having an affinity for the amodal (not to say disembodied) symbol theory. Descended from the school of Saussure, this theory (often relying on lexical co-occurrence information from language corpora as in the case of the Science study) postulates that the value of a symbol (or a linguistic sign) is not derived from its intrinsic sense but from language itself as a computability system of “difference.” It is quite intriguing that the reanalysis of the Science data assessed, through the variability of subjects performing a language task, a salient discrepancy between the brain regions informative of manifold essence in semantic processing. Indeed, we are not yet in a position to argue these philosophically opposite views only through a succinct review such as this report. However, at least we may conclude here that such a fundamental issue was, quite interestingly, readdressed by reanalyzing the data from a subject group that elicited inconsistency to the feature selection methods and relatively low precision rates to fMRI machine learning classifiers.
The dataset of Mitchell et al. was downloaded from
http://www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html
Dataset 1: Reanalysis of Mitchell et al. data. 10.5256/f1000research.14584.d20176716
The author would like to thank Editage (www.editage.jp) for English language editing.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Peer review at F1000Research is author-driven. Currently no reviewers are being invited.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |
---|---|
1 | |
Version 1 24 Apr 18 |
read |
Click here to access the data.
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)