Comparing Assessment Methods of Attribute Importance in Teachers’ Decisions: The Importance of Different Criteria for Tracking Recommendations after Primary School

Lintorf, Katrin; van Ophuysen, Stefanie; Osipov, Igor

doi:10.3390/educsci11100566

Open AccessArticle

Comparing Assessment Methods of Attribute Importance in Teachers’ Decisions: The Importance of Different Criteria for Tracking Recommendations after Primary School

by

Katrin Lintorf

^1,*

,

Stefanie van Ophuysen

² and

Igor Osipov

³

¹

Department Erziehungs- und Sozialwissenschaften, Universität zu Köln, Albertus-Magnus-Platz, 50923 Köln, Germany

²

Institut für Erziehungswissenschaft, Westfälische Wilhelms-Universität Münster, Georgskommende 33, 48143 Münster, Germany

³

Institut für Bildungsforschung in der School of Education, Bergische Universität Wuppertal, Gaußstr. 20, 42119 Wuppertal, Germany

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2021, 11(10), 566; https://doi.org/10.3390/educsci11100566

Submission received: 30 July 2021 / Revised: 11 September 2021 / Accepted: 16 September 2021 / Published: 23 September 2021

(This article belongs to the Special Issue Teachers’ Decisions regarding Students’ Transition from Primary to Secondary School: New Insights from International Research)

Download Versions Notes

Abstract

:

The importance of different criteria for tracking recommendations is usually inferred using regression weights as a cross-student measure. The few studies that have applied alternative approaches or differentiated between student groups sometimes reach different conclusions. According to research on judgment and decision making (JDM), different methods operationalize different facets of importance. Given this, we investigate whether the importance of criteria for tracking recommendations depends on a direct vs. indirect operationalization (regression weights vs. ratings). A total of 181 teachers selected four students from their most recent fourth-grade class using a 2 × 2 design (certain vs. uncertain qualification for the Realschule (vocational track) vs. the Gymnasium (academic track)). Then, they reported on the level and the importance of predetermined criteria for each student. Contrary to JDM research, we found few method-related differences, but striking differences between cases with a certain vs. an uncertain qualification. For the latter, the importance of the criteria is more homogeneous, the regression prediction is less successful and the importance varies with the dependent variable in the regression (actual recommendation vs. perceived qualification). We conclude that further research should focus on uncertain cases rather than method-related differences and suspect that, in uncertain cases, the formation of the recommendation is a multistage decision process.

Keywords:

attribute importance; judgment and decision making; dominance analysis; tracking recommendation; school track

1. Introduction

Between-school tracking is one measure used to deal with heterogeneity among students that is employed worldwide. While tracked and comprehensive school systems have their proponents, evidence disfavors tracking for several reasons [1]. In particular, tracking maintains or increases social inequality, more so as the age at the time of transition decreases [2]. This can be attributed to two factors [3]: First, access to the different tracks depends on social background. Second, school tracks act as differential learning environments. So, the choice of a secondary school track is a significant decision for the child’s future educational path. This is particularly true in Germany and, in international comparisons, Germany is characterized by strong social inequality [4], as well as by an early transition to secondary school [5]. Although the German school system explicitly allows for changes of school track in lower secondary education [6], this rarely happens and downward mobility dominates when it does [7].

Although teachers in all German federal states are obliged to give school track recommendations, in most states, the final choice is the parents’ decision [8]. Nevertheless, the teachers’ tracking recommendation is an important guide for parents. The recommendation, as well as the counseling that precedes the recommendation, informs parental considerations. Thus, it determines the expectations for success that parents have regarding the future educational path of their child [9]. Furthermore, the recommendation is a central determinant of the parents’ final choice. This is even true regardless of its binding nature [10]. This fact has motivated extensive research on the formation of the tracking recommendation and the criteria used by the teachers [11]. Characteristics that teachers should (or should not) take into account in their recommendations are only very roughly defined [8]. Therefore, non-performance-related characteristics (e.g., social background) make an additional contribution to the prediction of the recommendation when controlling legal criteria (e.g., school performance and work behavior) [11].

However, the predictive power of different characteristics and their relation to each other could also be a question of methodological approach. This is suggested by the results for attribute importance found in the research on judgment and decision making (JDM). This branch of research uses diverse methods to investigate attribute importance. In contrast, research on the importance of attributes for tracking recommendations is dominated by one method. In regression analyses, the teachers’ recommendations are predicted by different criteria generalized to all students. Few studies have used different methods and/or differentiated between groups of students. Some of these studies come to different conclusions about the importance of attributes for tracking recommendations than the studies with the dominating method. Therefore, this paper examines whether the importance of the criteria of the tracking recommendation depends on the choice of method and the type of student. To this end, we bring together sociologically or educationally oriented research on criteria of the tracking recommendation (Section 3) with more economically oriented research on JDM (Section 2).

2. Judgment and Decision Making (JDM)

2.1. Theoretical Foundations of Attribute Importance in Research on JDM

Cognitive psychology considers decision-making against the background of normative, descriptive and prescriptive theories. Normative and prescriptive theories relate to ideal or optimal decisions. In contrast, descriptive decision analysis aims to reconstruct and understand actual human decisions [12]. Decisions are the result of choosing between at least two options with (different) values for specific attributes [13]. JDM research determines the importance of the attributes for the choice using a variety of methods. They are broadly distinguished into explicit vs. implicit weighting methods ([14], also decomposed vs. holistic [15] or direct vs. indirect [16]). In the first case, the decision-maker directly provides information on the importance of the attribute. In the second case, attribute importance is (statistically) inferred from an overall judgment of the alternatives.

Consumer research has long been concerned that different methods of measuring importance lead to different conclusions about the importance of product attributes [17,18]. In their review, van Ittersum et al. argue that these different measures represent different facets of the construct [19]. Drawing on the work of Myers and Alpert [20] they distinguish between three facets: “The salience of an attribute represents the importance of the attribute in memory. The relevance of an attribute represents the importance of the attribute to the individual based on personal values and desires. Finally, the determinance of an attribute represents the importance of the attribute in judgment and choice.” [19] (p. 1180, original emphasis). As evidence for this distinction, the authors demonstrate that methods of the same facet correlate higher than methods of different facets.

Consequently, attribute-elicitation methods serve as measures of the salience facet. In these methods, researchers do not specify the criteria to be evaluated (e.g., free-elicitation techniques) and use the order of their nomination as the measure for importance. In contrast, methods measuring the relevance use a preselected set of attributes. They determine attribute importance by directly asking (e.g., direct-rating method) or inferring it from the information search behavior of the subjects (e.g., information-display-board method). Methods measuring determinance infer the importance of given attributes from the statistical relation between the attributes’ values and the overall evaluation of the respective objects (e.g., conjoint method) [19]. Therefore, even criteria that are used unconsciously can be identified using a determinance detection methodology.

Since determinance looks at the overall assessment or actual choice of an object, determinance may at first glance seem the most appropriate way to determine attribute importance. However, a multi-method approach to assessing the importance is advocated [13] since each facet/each group of methods has its specific (dis)advantages. Methods of determinance assessment that work with criteria provided by the researchers run the risk of overlooking important criteria. In turn, this can influence the estimation of the statistical relationship between the decision and the attributes under investigation (e.g., beta weights in regression analysis). Second, determinance is dependent on the variance of attributes. Small variances of an attribute (e.g., due to real-life situations) might lead to statistical underestimation of its importance. Third, a strong determinance of a criterion implies that it is statistically predictive, but this does not necessarily reflect how the information was actually processed by the decision-makers [21].

Like methods of determinance assessment, methods for measuring relevance also use criteria provided by the researchers. Again, there is a risk of overlooking important criteria. Furthermore, some relevance measures are direct or explicit measures and, as such, they are more susceptible to bias than indirect or implicit measures. It is conceivable that commonly known biases are caused by efforts to resolve cognitive dissonance [15] or to meet social desirability. In particular, relevance measures are susceptible to specific biases in weight assessment (e.g., proxy overweighting bias [16]).

As measures of salience are also usually direct measures, they are also susceptible to bias. Furthermore, these measures may be subject to memory distortions. A specific memory distortion might result from a small variance, as with determinance measures. In decision making, cognitive effort and time are mostly devoted to those attributes in which objects differ greatly. This should result in a higher quality and quantity of information processing. In turn, more intensively processed information is remembered as more important [22].

2.2. Application of Research on Judgment and Decision Making (JDM) to Tracking Recommendations

Research on JDM is strongly rooted in the economic sciences [23] and has a focus on consumer decisions. Comparably few studies investigate health, societal or educational decisions. Yet, consumer decisions and tracking recommendations—or broadly speaking: the professional decisions of educators—differ in at least three ways from consumer decisions. First, while consumers make decisions for themselves, the teachers are making surrogate decisions for their students. Second, consumers make a selection decision, while teachers make an allocation decision [24]. Third, as a result, the attributes considered belong to different entities. Consumers focus on the attributes of the decision options, whereas teachers focus on the attributes of the students for whom the decision is to be made.

Nevertheless, it seems reasonable that the results of research on JDM are also transferable to the professional decisions made by teachers. Linking research on school track recommendations to research on JDM was already successful when using the methodological paradigm MouseLab from Payne et al. [25,26]. Furthermore, at least one study based on JDM research compares different methods used with regard to tracking recommendations [27]. In this study, 16 first-year psychology students and 10 teachers provided recommendations to 60 fictitious pupils. The decisions were based on vignettes with information on the same characteristics (e.g., report marks, learning skills). The scores for the attributes were randomly assigned so that attribute values were uncorrelated. The authors determined the attribute importance for each of the participating subjects in terms of (1) regression weights, (2) importance ratings and (3) frequency of denomination in verbal protocols. For the participating teachers, they found only a medium correlation between the results from the three methods (r_Mdn = 0.54). The correlation differed between individuals.

However, two factors could limit the generalizability of the results. First, the results may have a low ecological validity because they are based on fictitious students but see also [28]. The effort to create randomly arranged vignettes, in particular, could have led to atypical student cases and, thus, atypical decision-making behavior. Furthermore, research on the tracking recommendation suggests that a distinction between different types of students might be fruitful because teachers seem to adapt their decision behavior to the respective student case (Section 3). Second, the authors were working with ad hoc generated student attributes. However, current research on the tracking recommendation allows for deriving characteristics theoretically and/or empirically (Section 3). This reduces the risk of neglecting important attributes (see the disadvantages of determinance and relevance in Section 2.1). Therefore, the following section reviews the research on the criteria for the tracking recommendation.

3. Criteria of the Tracking Recommendation at Primary School Transition

Research on the criteria for tracking recommendations can be divided into two groups: those with a classical approach and those with alternative approaches. We will compare their results after introducing the core features of both approaches.

In the classical approach, the teacher’s tracking recommendation is predicted through a regression analysis based on various criteria that are predetermined by the researcher. The information is typically collected using self-reports from large samples of students or parents [29]. Teacher ratings of the characteristics are not used in these kinds of studies. The relative importance of the criteria is indirectly determined and inferred from the size of the regression weights. Thus, the classical approach relates teachers’ judgments to the levels of the student characteristics. In terms of research on judgment and decision making (JDM), this reflects the determinance of the criteria.

The alternative approaches include interview-based and experimental studies. In the interview studies, researchers either explicitly ask about the criteria considered by the teachers [30] or implicitly deduce them from the teachers’ case descriptions [31]. Thus, the data is based on retrospection. The importance of the criteria is determined on the basis of the order and frequency of nomination. This corresponds to the salience measure in research on JDM. The experimental studies follow a social cognition approach [32,33] and use dual process models as a theoretical basis [34]. According to these models, a judgment is the result of an automatic or a controlled strategy of information processing. The more inconsistent an information situation is, the more likely judgers are to use a controlled strategy and consider more information. From a methodological point of view, the experimental studies draw on MouseLab, a process-tracing method from decision research [25] and ask teachers to give a tracking recommendation for fictional children. To this end, they are provided with covered information on grades, work and social behavior, as well as family background. Two properties of the information search behavior are used as indicators for the importance of the criteria: the frequency and order of information retrieval. Böhmer et al. combine this with a direct rating method in order to capture the subjectively assessed importance. Both may be classified as measures of relevance [19].

Although the alternative approaches make use of more than one facet of importance, they have common features that distinguish them from the determinance measure within the classical approach. They directly determine the importance of various criteria based on teachers’ reports or criteria-related actions. Thereby, the data collection is sometimes based on specific student cases [31,35] and sometimes on global appraisals [30,36].

Regardless of the approach and operationalization, performances in the main subjects proved to be the strongest predictor of the recommendation [10,33,35,36,37]. Results for work behavior, which teachers should consider according to the official guidelines [8], are more differentiated. Studies taking an alternative approach emphasize the importance of work behavior [30,31,33], while, in contrast, studies using the classical approach come to inconclusive results. This might be due to the use of varying operationalizations. Work behavior is most likely to prove a significant predictor when operationalized by work virtues or affective aspects [29,38,39].

The discrepancy between the approaches in relation to non-school criteria is particularly striking. Studies using the classical approach have repeatedly shown that the social background of families is significantly connected to the recommendation, e.g., [10,40,41,42]. Among the various operationalizations, the educational background of the parents tends to be the most important factor. In combination with the ISEI (International Socio-Economic Index of Occupational Status), it had at least an incremental predictive power [43,44] and sometimes it was the only predictor associated with the social situation [45,46]. In contrast, the socio-economic background was not mentioned at all in the interview studies. Information of this kind was also accessed last and less frequently in the experimental studies [33].

Discrepancies in the results on migration background are less pronounced. Like social background, the migration background was not mentioned or retrieved in studies that took an alternative approach [33]. This corresponds largely to the findings from the classical approach, where no effects, e.g., [10,47] or reduced effects [39] were found when controlling for grades (but see also [42]). Again, the effects seem to depend on the operationalization. Insignificant findings occurred for family language [10,48,49], significant findings resulted for parental country of birth [45,50] and for a combined predictor [40], but see also [51].

Even if teachers do not directly name social background and migration background as relevant criteria in the studies taking an alternative approach, these criteria indirectly contribute to the recommendation. In these studies, teachers considered process characteristics associated with family background. This is especially true for parental support. Studies using the classical approach hardly considered this characteristic. In the rare cases, it could not compensate for the predictive power of structural characteristics of the family background [42,46]. In contrast, parental support is of great importance according to the studies taking an alternative approach, especially when a child has an unclear achievement profile [33,35]. In contrast, the classical approach does not even differentiate between different student cases.

A comparison of the results from both approaches leads to distinctly different conclusions about the importance of the criteria considered, especially with regard to social background. This may be an effect of the research method. So far, only Böhmer et al. [33] have tested the hypothesis of method dependence within one sample. They found a medium correlation between a direct rating of importance and two measures of information processing (retrieval frequency: r = 53; retrieval order: r = −0.49), indicating an influence of the research methods on the results. As the study was limited to an alternative approach, a comparison of the results between the two approaches is still pending.

4. Research Aims

The criteria for tracking recommendations have been well studied. Most of the findings stem from the classical research method, but studies from alternative approaches come to partially different results. To our knowledge, there are only two studies that have applied several research methods on the same sample in order to estimate methodological influences on the results [27,33]. They worked with vignettes; therefore, we question the transferability on real-life contexts (but see [28]). This led us to pursue the following research question using real student cases: Does the importance of given criteria for the tracking recommendation depend on the operationalization of importance (determinance vs. relevance)?

In answering this question, we also address several shortcomings of the actual research on tracking recommendations. (1) In the classical approach, research is based on self-reports from students and parents. However, some of the characteristics considered are not known to the teachers (e.g., parental educational background, socio-economic status [52]). Thus, significant correlations do not necessarily imply that teachers take the social background of students into account when making their recommendations. Consequently, we will use teachers’ estimations of the characteristics instead of students’ or parents’ self-reports. (2) In the studies of the alternative research approach, the decision/judgment processes in uncertain student cases proved to be more complex than in consistent student cases [33]. The classical approach does not consider student-specific decision processes. Consequently, we will provide differential analyses for the determinance and relevance operationalization of importance. (3) The classical approach measures importance in terms of determinance. As explained above, this is associated with a risk of overlooking important criteria. This seems to be especially true for parental support. Consequently, we will examine the importance of this criterion as a supplement to the classical criteria.

5. Materials and Methods

5.1. Sample and Design

We questioned 181 teachers (91.2% female, 0.6% missing) from 68 elementary schools in the German federal state North Rhine-Westphalia (NRW). Participation in the study was voluntary. As an incentive, we raffled eight cash prizes of 50 euros to be used for funding for the teachers’ classes. The respondents had M = 16.59 years of professional experience (min = 0.5; max = 43) and had accompanied a transition M = 4.08 times (min = 1; max = 15).

We asked each teacher to provide information about children in their most recent fourth-grade class. To limit their effort, we asked them to select only four children. Nevertheless, to ensure a certain range, we specified the following two factors as selection criteria (a) perceived qualification for a school track (Gymnasium (academic track) vs. Realschule (vocational track)) and (b) certainty of judgment (certain vs. uncertain). Thus, each teacher was to select one child who was a certain case for Realschule and one a certain case for Gymnasium, in addition to one who was an uncertain case for Realschule and one an uncertain case for Gymnasium (Table 1).

The sequence of the four cases was balanced over four versions of the questionnaire. Their shares in the responded questionnaires were between 20.4% and 30.4%. With 181 teacher participants, this design resulted in 724 potential students. However, 33 students had to be excluded from the analyses due to implausible data. Thus, data remained for 691 children (certain RS: n = 181, certain GY: n = 181, uncertain RS: n = 155, uncertain GY: n = 174).

5.2. Treatment of Missing Values

The rate of item nonresponse across the analysis variables varied from 0.1% to 4.8%. Missing values were imputed using R package mice [53] with two-level predictive mean matching for interval scales and two-level logistic regression for binary variables. The results of all subsequent analyses were pooled across five data sets with imputed missing values.

5.3. Research Instruments and Operationalization

Our analyses are based on the student characteristics typically used to predict school track recommendations (Section 3). We asked teachers to indicate the level of each characteristic for each student. This was used to operationalize the facet determinance. The following characteristics were included in the analyses (descriptives and internal consistency see Table 2):

School performance was measured in the form of report card grades in German (=average grade of language use, reading and orthography) and mathematics in the first semester of the fourth grade. For better interpretability, we reversed the polarity (1 = poor, 5 = very good; insufficient grades did not occur).
Work behavior was measured using four items (example: “diligence in work behavior [e.g., neatness, orderliness, handwriting]”, 1 = low, 5 = high).
We captured social background via two items on the child’s home-family environment (proximity to education, financial security; 1 = weak, 5 = high). To operationalize parental support, we asked about three items: to what extent parents could provide professional, organizational and financial support if needed (1 = not at all, 5 = very good). The two variables were highly correlated (r = 0.81) and highly correlated, almost identically, with the other analysis variables (Table A1, Table A2 and Table A3). To avoid collinearity, we, therefore, combined the two highly redundant characteristics into the variable family background [54]. All analyses are based on this combined variable.
Migration background was operationalized by the family language (0 = only German, 1 = no German/German and other languages). Despite previous insignificant findings (Section 3), we preferred using language over the parents’ countries of birth because instead of students or parents we interviewed the teachers who have easier access to the former information.

In line with research on judgment and decision making, we used the estimates from logistic regressions (Section 2.1), in order to depict two aspects of determinance. We predicted the actual recommendation—a public judgment—as well as the perceived qualification—a rather personal judgment (each 1 = Gymnasium). In North Rhine-Westphalia, teachers can also give a limited recommendation for the Gymnasium. These cases were combined with those who had an unqualified recommendation for the Gymnasium.

Perceived qualification and actual recommendation coincided highly, especially in the certain cases (94.53%). Teachers only recommended the Gymnasium in 5.47% of the certain cases where their questionnaire response classified the child as a Realschule case. In the uncertain cases, the agreement was lower (68.63%). The teacher’s recommendation for the Gymnasium differed from the perceived qualification for Realschule in 31.31% of uncertain cases. The opposite was rarely the case (0.06%).

The measure of relevance (Section 2.1) is based on the teachers’ subjective assessments of importance. The teachers indicated how important each of the criteria described above was in forming their recommendation for the respective child (1 = unimportant, 4 = very important). For reasons of comparability, social background and parental support were again combined into one family support criteria. Descriptives and internal consistencies are presented in Table 3. Correlations can be found in Table A4, Table A5 and Table A6.

5.4. Analysis Strategy

All analyses were performed in R [55] on multiply imputed data sets and accounted for the hierarchical structure of the data. All analyses were carried out for the total sample and separately for the two sub-samples (certain vs. uncertain student cases).

The multilevel logistic regressions were performed using R packages lme4 [56] and mitml [57]. Unfortunately, only one significant slope variance could be identified (Table A10 and Table A11). So, the correlations between different methods could not be calculated for nearly all predictors and all cases. Instead, we relied on a descriptive comparison of the predictor ranks.

To determine the ranks of the ratings, we first averaged the ratings across the four students for each predictor. We then ranked the predictors across all teachers using post hoc tests in a mixed-effects ANOVA using R packages lme4 [56], mitml [57] and MKmisc [58].

For the regression weights, we determined importance based on a dominance analysis [59], using R package dominanceanalysis [60]. This analyzes the importance of each predictor compared to all other predictors across all possible subsets of the predictors. A predictor is dominant if its additional contribution to R² is higher than that of the predictor it is being compared to. The dominance analysis distinguishes three forms. The strictest form, complete dominance, exists if a predictor makes a higher contribution in all models. A weaker form, conditional dominance, exists if this is not true for all models, but is true on average for all models with the same number of predictors. The weakest form, general dominance, exists if a predictor makes a higher contribution on average. In our analyses, we only relied on the strictest form. Additionally, we used bootstrap samples to further validate our conclusions. We interpreted that one predictor dominated another if this was true for 70% of the bootstrap samples [59] and then assigned this predictor the higher rank.

6. Results

6.1. Direct Measures: Relevance

Information on importance in terms of relevance can be found in Table 3. Mixed-effects ANOVA revealed significant differences in the importance of the different characteristics (all cases: F(4, 77,660.80) = 359.18, p < 0.001; certain cases: F(4, 425,100.00) = 366.92, p < 0.001; uncertain cases: F(4, 5434.94) = 231.98, p < 0.001).

Based on the post hoc tests, it is evident that grades and work behavior are reported as the most important criteria. In contrast, similar to interview and experimental studies (Section 3), family background and migration background are rated as “rather unimportant” (M ≈ 2.00). This evaluation differs significantly from that of the other characteristics. While these findings apply to both certain and uncertain cases, the results on the legally permissible criteria differ between cases: math grade and work behavior come first in the certain cases, while work behavior and both grades are equally important in the uncertain cases.

6.2. Indirect Measures: Determinance

Information on importance in terms of determinance can be found in Table 4 (actual recommendation) and Table 5 (perceived qualification). Migration background is an insignificant predictor of the actual recommendation in all models. However, the significance of the other predictors is case-specific. In the certain cases, the German grade is, by far, the most dominant predictor, followed by the math grade which dominates the family background. While work behavior is an insignificant predictor in the certain cases, it is by far the most dominant predictor in the uncertain cases. No further dominances could be identified among the other predictors. Overall, the prediction in the certain cases (R² = 0.72) is better than in the uncertain cases (R² = 0.21).

The results for perceived qualification (Table 5) largely agree with the results for the actual recommendation in the certain cases. This was to be expected given the high degree of agreement between the two dependent variables (Section 5.3). Only work behavior proves to be an additional significant predictor and the German grade is no longer dominant over the math grade. For the uncertain cases, the picture is partly different from the actual recommendation. Grades and work behavior rank equally high here. The German grade dominates the family background. Migration background is dominated by all other variables. Similar to the actual recommendation, all characteristics appear to be equally important compared to the certain cases. Again, the prediction of the recommendation is better in the certain cases (R² = 0.90) than in the uncertain cases (R² = 0.17). Overall, the prediction of the perceived qualification succeeds slightly less for the uncertain cases and is clearly better for the certain cases than the prediction of the actual recommendation.

6.3. Comparison of Relevance and Determinance

Taken together, the results for both measures of importance are similar. First, the order of the criteria hardly changes. In both forms of measurement, the legally relevant criteria receive the highest importance, while family background and migration background are at most of secondary importance. Method-related differences are found only within the group of legally permissible criteria. Among the relevance measures, the math grade and work behavior are more important than the German grade. In the determinance measures, the two grades are generally of the highest importance, and sometimes more important than work behavior. The only exception is the dominance of work behavior in the prediction of the actual recommendation for the uncertain cases. Second, regardless of the method, the analyses differentiate between the importance of the criteria more strongly for the certain cases than for the uncertain criteria. Here, all criteria seem to be almost equally important.

However, two striking differences appear. First, no criterion is rated as unimportant in terms of relevance. This is true even for the migration background (all cases: t(174.58) = 37.75, p < 0.001; certain cases: t(177.90) = 35.75, p < 0.001; uncertain cases: t(140.63) = 33.56, p < 0.001), which does not make a significant predictive contribution in the logistic regressions. Second, the criteria can be divided into distinct groups according to their relevance. This is not true for their determinance: here, the groups overlap.

7. Discussion

The starting point of this study was the question of whether the importance of given criteria for the tracking recommendation depends on the type of operationalization. To answer the question, we compared ratings on the importance of predetermined criteria (direct measure: relevance) with weights determined in logistic regressions to predict actual school track recommendation vs. personally perceived qualification for a particular school track (indirect measure: determinance). Furthermore, we distinguished between cases with certain vs. uncertain school track recommendations.

Overall, the results from all analyses were quite similar. We found hardly any method-related differences. Thus, our results, obtained using real student cases, are not consistent with the results from studies based on vignettes [27,33]. Consistent with previous research findings (Section 3), grades and work behavior received the highest weight, while family background and migration background were less important.

However, our analyses also revealed differences between the measures of importance. Although the migration background turned out to be an insignificant predictor in the logistic regressions (i.e., low determinance), it was attributed some level of importance by the teachers in the ratings (i.e., medium relevance). This was unexpected as biases in the sense of social desirability or effects of cognitive dissonance were assumed to cause a devaluation of the relevance of non-performance-related characteristics, especially in direct assessment methods (Section 2.1). Obviously, these biases hardly apply here. However, the lack of effect in the logistic regressions could still be due to peculiarities of the methods used to assess determinance. Migration background has a comparatively low variance (Table 2), so its importance may be underestimated (Section 2.1). In addition, in a regression, weights are estimated by taking into account joint predictive contributions with other characteristics. Ratings, on the other hand, consider the sole importance of the queried characteristics.

According to our results, it is hardly necessary to capture importance (relevance vs. determinance) using multiple methods as is required in consumer research [19,61]. In the case of transition research, it seems to be much more important to consider different student cases (certain vs. uncertain cases) and to carefully choose the operationalization of the recommendation (perceived qualification vs. actual recommendation). Both variations led to different results in the logistic regressions.

First, while the specified characteristics led to very high variance explanations for the certain cases, the opposite was true for the uncertain student cases. Although we attempted to supplement the criteria typically studied, it remains largely unclear on which characteristics the recommendations are based. However, in contrast to the few studies of the classical approach that took parental background into account [42,46], we had to combine parental support and family background because of collinearity (Section 5.3). This deviation from the previous state of research presumably occurred because of the change in the methodological approach. Previous studies relied on student and parent responses, whereas we used teacher assessments exclusively. Either teachers do not differentiate between the two constructs or they might infer one from the other due to a lack of information [52]. Overall, the low predictive performance in the models for the uncertain cases illustrates the disadvantage of relying on predetermined characteristics (Section 2.1).

Second, the results for the uncertain cases differ depending on whether the actual recommendation or perceived qualification is predicted. In predicting the actual recommendation, work behavior dominates all other predictors. However, in the prediction of the perceived qualification, the legally relevant criteria dominate as a group. This data pattern could indicate that the formation of a school track recommendation is a multistage decision process (for sequential strategies in diagnostics see [24]). The formation of a recommendation is a complex process of decision-making. In comparison to consumer research, there are only a few alternatives to choose from when making a school track recommendation, but there is a wide range of information that must be taken into account. Moreover, information is often vague and fraught with uncertainty [31]. Teachers may try to reduce the complexity of this decision-making situation by breaking down the decision. First, similar to the strategy “elimination by aspects” [62], they may consider which school track they perceive a child is fundamentally qualified for based on the most stable and predictive characteristics (i.e., achievement). In a further step, more variable and less predictive characteristics (e.g., work behavior) would then be considered for the actual recommendation. Their influence should be visible, above all, in the uncertain cases because in these cases no certain qualification could be derived on the basis of performance.

Even if the largely method-independent results seem unexpected against the background of previous research, the result is, nevertheless, a desirable one with regard to practice. Teachers do indeed judge as they say: their recommendation seems to be the result of a process in which information is weighed against each other in a deliberate and reasoned manner. This corresponds to the requirements of professional practice [63] and the controlled system of dual-process models [34,64]. The results may be due to the high-stakes nature of school track recommendations. Therefore, results may not be transferable to other situations, because teachers do indeed act heuristically and less analytically when decisions are less important and have fewer consequences [65].

Accordingly, diagnostics-related teacher education could and should sensitize future teachers to the properties of human decision making. In general, this includes the distinction into two systems of thought, the controlled system and the automatic system. In particular, teachers should be informed that the easily retrievable salient characteristics or aspects that are personally perceived as important are not necessarily those that are guiding the ultimate educational decision. While this is acceptable for easily revisable, short-term decisions, it is not true for long-term, hard-to-revise decisions such as the school track choice made after the teachers’ recommendation.

In addition, limitations must be taken into account when interpreting the findings. First, our conclusions are based on descriptive analyses alone. An inferential statistical analysis, as found in Harte and Koele [27], was not possible in the absence of significant slope variances (Section 5.4). Second, our results, based on a survey at only one point in time and in only one federal state, might not be generalizable. It is well known from consumer research that decisions are highly context dependent [66]. Transition research also addresses such effects. For example, the importance of social background for the recommendation varies depending on its binding nature [51] and the advantage of certain groups of students changes over time [67] (but see also [10]). This leads to different levels of importance for the same criterion. However, it is not clear whether and to what extent this context dependency influences the answer to the research question pursued here, because context or time dependence is not the same as method dependence.

Third, we have concluded that the results on the importance of attributes in the case of school track recommendation are robust to the use of different operationalizations (and, thus, different facets of importance). This conclusion might be somewhat hasty, as we did not consider the facet of salience. However, results similar to those reported above can be expected for salience. Characteristics that make a student a “certain case” are those relating to consistently good or poor achievement. In free-elicitation methods, these characteristics should, therefore, be salient in memory and should come up first and more frequently than other characteristics. In the uncertain cases, achievement alone is not the determining characteristic. In terms of the multistage decision process outlined above, we expect that a great amount of attention would first be paid to the either inconsistent or consistently medium achievements. Given its low decision support, other features would then be given a similar amount of attention. Therefore, equal naming frequencies but distinct naming orders would be expected.

Future research could well continue to address methodological issues in capturing the importance of teachers’ decision criteria as relates to tracking recommendations. So far, it remains open whether the finding of method independence can be generalized to the multitude of other pedagogical decisions (e.g., assignment of grades, determination of special educational needs, repetition vs. continuation in the subject matter). As expressed above, method independence could depend on the degree of bindingness of the pedagogical decision in question. This needs to be examined further. However, with regard to the school track recommendations, other research goals seem more urgent, especially the identification of recommendation criteria in uncertain cases and the search for causes of the differences in private vs. public expressions of opinion. In particular, further studies could explore the conjecture formulated above that our dependent variables mark different stages in a decision process.

Author Contributions

Conceptualization, S.v.O. and K.L.; methodology, K.L., S.v.O. and I.O.; formal analysis, I.O.; investigation, K.L.; resources, K.L. and S.v.O.; data curation, K.L. and I.O.; writing—original draft preparation, K.L.; writing—review and editing, K.L., S.v.O. and I.O.; visualization, K.L.; project administration, K.L.; funding acquisition, S.v.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are available upon request.

Acknowledgments

The authors would like to thank the participating teachers and the students of a course taught by Katrin Lintorf who assisted with the data collection.

Conflicts of Interest

The authors declare no conflict of interest.