1 The Achievement Emotions Questionnaire—Mathematics (AEQ-M)

Emotions play a pivotal role in academic settings and therefore constitute a major research topic in educational psychology (Pekrun & Linnenbrink-Garcia, 2014). Mathematics is an important domain of investigation in this regard (Schukajlow et al., 2017) for several reasons. Firstly, it is paramount to investigate students’ emotions regarding mathematics because it is a core subject and taught around the world. Secondly, mathematics is a domain to which students commonly attach rather high levels of perceived value (Goetz et al., 2014), which provides the basis for experiencing high levels of both negative (e.g., anxiety before a difficult test) and positive emotions (e.g., pride about receiving a good grade). Thirdly, the domain of mathematics is characterized by gender differences in various psychosocial variables, including the levels, antecedents, and outcomes of emotions (Frenzel et al., 2007a, 2007b; Goetz et al., 2013). Lastly, mathematics anxiety is an often researched topic (see, e.g., Ashcraft, 2002; Hembree, 1990). This established research tradition of examining mathematics anxiety might be fruitfully expanded by assessing other discrete emotions within the same learning context of mathematics.

However, there is still a lack of instruments assessing the core discrete emotions in the domain of mathematics. One exception is the Achievement Emotions Questionnaire—Mathematics (AEQ-M; Pekrun et al., 2005), a frequently used instrument that allows researchers to assess several emotions with a single instrument. The AEQ-M comprises a set of 60 self-report items, each presenting a statement about one of seven mathematics-related emotions (two positive emotions, namely, enjoyment and pride; five negative emotions, namely, anger, anxiety, shame, hopelessness, and boredom) and asking students to indicate the degree to which that statement applies to them personally. The items are organized in scales that cover emotions experienced during mathematics classes (e.g., “I enjoy my math class” is a sample item for class-related enjoyment), while learning for mathematics by oneself (e.g., “My math homework bores me to death” is a sample item for learning-related boredom), or while taking tests in mathematics (e.g., “When I have an upcoming math test, I get sick to my stomach” is a sample item for test-related anxiety). Students’ answers to the items pertaining to each emotion (e.g., ten items measuring enjoyment across class, learning, and test contexts) can be aggregated into composite scores and linked to various constructs of interest in research on mathematics education. For instance, Frenzel et al., (2007a, 2007b) showed that students’ AEQ-M scores for anxiety, anger, and shame were more strongly associated with academic achievement and parental expectations in China than in Germany, shedding light on important cultural differences.

Unlike instruments for assessing achievement emotions across school domains (e.g., the Achievement Emotions Questionnaire (AEQ); Pekrun et al., 2011), the AEQ-M and the data underlying its development have yet to be published. Instead, researchers have relied on a manual of the instrument that is available from its authors upon request (Pekrun et al., 2005). This is not a satisfactory state of affairs because it creates uncertainty about the validity of various assumptions made in research using the AEQ-M. The first assumption underlying the AEQ-M is that emotions can be organized into three different contexts, thus reflecting the internal structure of the AEQ-M. This organization pertains to the idea that emotions are context-dependent, that is, the experience of an emotion depends on whether students attend mathematics classes (class context), learn mathematics by themselves (learning context), or take tests in mathematics (test context). For instance, students might enjoy learning mathematics by themselves (i.e., high levels of learning-related enjoyment) more than attending mathematics classes and taking tests (i.e., low levels of class- and test-related enjoyment; Pekrun et al., 2002). Providing tentative support for this assumption about the internal structure of the AEQ-M, achievement emotions have been empirically shown to be organized within these contexts in research using the domain-general AEQ (Pekrun et al., 2011). However, the extent to which these findings can be transferred to the AEQ-M is an open question, casting doubt on whether mathematics-related emotions should be measured in a context-dependent way.

The second assumption is that achievement emotions are best understood as a set of interrelated affective, cognitive, motivational, and physiological/expressive processes that represent distinct components of the overall emotional experience (e.g., Scherer, 2009). Consequently, this assumption again pertains to the internal structure of the AEQ-M as it affects the content-domain of the items. For example, a comprehensive approach to measuring anxiety during a mathematics test might require items that ask students whether they feel anxious (affect), worry about their performance (cognition), want to escape the situation (motivation), and get queasy (physiological/expressive). The AEQ-M accounts for the assumed component structure of emotions in a non-systematic manner, tapping into different components of each emotion but failing to cover all components of all emotions. Thus, it has not been possible to date to investigate whether the component structure established for achievement emotions in general (Pekrun et al., 2011), may also pertain to the mathematics-related emotions measured with the AEQ-M.

The third assumption is grounded in control-value theory (CVT; Pekrun, 2006, 2018, 2021), which proposes that achievement emotions are linked to specific antecedents and outcomes. According to CVT, control and value appraisals are important antecedents of achievement emotions. Control appraisals pertain to students’ expectations of being able to initiate and perform achievement-related activities (e.g., studying for a mathematics test), expectations about whether these activities will produce desired outcomes (e.g., a good grade), and attributions regarding the controllability of the cause of outcomes that were attained (Pekrun, 2006, 2018). Appraisals of control are reflected in students’ academic self-concept (Shavelson et al., 1976) and self-efficacy (Bandura, 1977), two common measures of perceived control in empirical research (e.g., Goetz et al., 2012; Luo et al., 2016; for the relations between self-concept and self-efficacy in mathematics see Arens et al., 2022). In turn, value appraisals refer to the perceived value of academic activities and outcomes. Perceived value can relate to both extrinsic (e.g., importance of studying for attaining good grades) and intrinsic aspects of academic activities (e.g., interest in an activity). In addition, perceived value can also pertain to positive versus negative valence (e.g., importance of success vs. failure).

Regarding the outcomes associated with achievement emotions, these emotions are assumed to affect students’ learning and academic performance (Pekrun, 2006; Pekrun et al., 2011). Emotions can affect intrinsic and extrinsic motivation (e.g., learning out of curiosity versus learning to obtain good grades) and facilitate the use of flexible (e.g., elaboration of learning materials) and rigid learning strategies (e.g., rehearsal of materials). Moreover, emotions can affect the balance between students’ self-regulation (e.g., setting one’s own goals) and external regulation (e.g., seeking help from others). Importantly, these cognitive and motivational processes are assumed to mediate the effects of emotions on academic performance. Unlike its structural validity (i.e., context-dependency and component structure), the external relations of the AEQ-M along the lines of CVT have already been investigated in scattered studies using the instrument. For instance, there is evidence for control and value appraisals as interactive determinants of achievement emotions in mathematics (e.g., Putwain et al., 2018) and for the impact of mathematics-related achievement emotions on students’ learning and performance (e.g., Camacho-Morles et al., 2021). However, these relations have yet to be demonstrated with the data on which the development of the AEQ-M was based.

Therefore, the AEQ-M is based on several assumptions about the internal structure and external relations of mathematics-related emotions. It is difficult for readers to evaluate these assumptions, as neither the AEQ-M itself nor the data used for its development have been published thus far. Moreover, data permitting a systematic investigation of the proposed component structure of emotions are currently not available. This lack creates uncertainty about the psychometric properties and validity of the AEQ-M, and might impede the progress of research on the role of emotions in mathematics education. We aim to close these gaps by demonstrating the validity of the assumptions behind the AEQ-M in two studies, based on the data used for developing the AEQ-M (Study 1) and novel data with extended AEQ-M scales for enjoyment, anger, anxiety, boredom, and hopelessness (Study 2), through the following investigations. (1) We examine the assumed context-dependency of the emotions in Studies 1 and 2—that is, the assumption that discrete emotions (i.e., enjoyment, pride, anger, anxiety, shame, hopelessness, and boredom) differ between academic contexts (i.e., attending class, studying, and taking tests). (2) We introduce extended AEQ-M scales to examine the assumed component structure of emotions in Study 2—that is, the assumption that emotions represent sets of interrelated affective, cognitive, motivational, and physiological/expressive processes. These extended AEQ-M scales comprise 127 items measuring all four components of enjoyment, anger, and anxiety in class, learning, and test contexts, boredom in class and learning contexts, and hopelessness in test contexts. (3) We establish the external validity of the AEQ-M in Study 1 by investigating the relationship between emotions and their proposed core antecedents (control, value) and outcomes (motivation, learning strategies, achievement).

2 Study 1

Study 1 is based on the data used for developing the AEQ-M. While scattered results from analyses of these data have been reported elsewhere (Goetz, 2004; Pekrun et al., 2005), a comprehensive and systematic analysis of the psychometric properties of the AEQ-M and its internal and external validity had not been conducted previously. To investigate the external validity of the AEQ-M, we followed the approach taken in the development of the domain-general AEQ (Pekrun et al., 2011) and assessed various measures of control and value appraisals (i.e., academic self-concept, self-efficacy, value of achievement, and interest), motivation (i.e., intrinsic motivation, achievement motivation, and effort), learning strategies (i.e., elaboration, rehearsal, self-regulation, and external regulation), and academic performance (i.e., grades).

2.1 Methods

2.1.1 Sample

This study draws upon a sample of 781 German secondary school students (53.5% female, 46.5% male) from Grades 5 to 10 (Grade 5, n = 177; Grade 6, n = 103; Grade 7, n = 140; Grade 8, n = 149; Grade 9, n = 110; Grade 10, n = 102) with a mean age of M = 14.1 years (SD = 1.92). Students attended three different tracks referred to as Hauptschule (lowest track; n = 205 from 10 classrooms), Realschule (middle track; n = 270 from 10 classrooms), and Gymnasium (highest track; n = 306 from 12 classrooms).

2.1.2 Missing data

A total of 0.93% of data were missing, stemming from 279 incomplete records. The percentage of missing values across all variables ranged from 0.00 to 2.69%. Full information maximum likelihood (FIML) was used to deal with missing data (see Enders, 2010).

2.1.3 Measures

We used paper-and-pencil questionnaires with 5-point Likert Scales (1 = not true at all, 2 = hardly true, 3 = somewhat true, 4 = largely true, 5 = exactly true).

2.1.3.1 Achievement emotions

Achievement emotions were assessed with the Achievement Emotions Questionnaire—Mathematics (AEQ-M; Pekrun et al., 2005). It comprises 60 items (see Appendices 1 and 2) that measure seven achievement emotions in the domain of mathematics, namely, enjoyment, pride, anger, anxiety, shame, hopelessness, and boredom. Emotions are measured in terms of three contexts (class, learning, test), four components (affective, cognitive, motivational, physiological/expressive), and three points in time (before, during, after). However, not all contexts (e.g., test-related boredom) and components (e.g., affective test-related anger) are covered.

2.1.3.2 Antecedents of achievement emotions

Students’ academic self-concept, self-efficacy, performance-related valence, and interest were measured as antecedents of achievement emotions.

2.1.3.3 Academic self-concept

Three items measured students’ academic self-concept (e.g., “Mathematics is one of my best subjects”; Goetz, 2004; Marsh, 1990; α = 0.87).

2.1.3.4 Self-efficacy

Self-efficacy was measured with four items (e.g., “I am confident that I can master the skills taught in mathematics”; adapted from Kunter et al., 2002; α = 0.86).

2.1.3.5 Positive value of achievement

The positive value of achievement was measured with five items capturing the value of success (e.g., “It is very important for me to get a good grade in mathematics”; Goetz, 2004; α = 0.85).

2.1.3.6 Interest

Interest was assessed with eight items capturing the intrinsic value of activities (e.g., “Engaging in mathematics is one of my favorite activities”; Goetz, 2004; α = 0.90).

2.1.3.7 Outcomes of achievement emotions

We measured students’ motivation, learning strategies, and self-regulation and external regulation of learning as outcomes of achievement emotions.

2.1.3.8 Intrinsic and achievement motivation and effort regulation

Intrinsic and achievement motivation were assessed with three items (e.g., “In mathematics I do my homework because I like this subject”; α = 0.89) and two items (e.g., “I study for mathematics because I don't want to get bad grades”; Goetz, 2004; α = 0.75), respectively. Effort regulation was assessed with nine items (e.g., “I work hard to do well in mathematics classes even if I do not like what we are doing”; Wild & Schiefele, 1994; α = 0.79).

2.1.3.9 Learning strategies

Elaboration and rehearsal were measured with nine items (e.g., “When I study for mathematics, I try to connect the material to things I've already learned in other subjects”; α = 0.86) and four items (e.g., “When I study for mathematics, I practice by reciting formulas over and over”; adapted from Baumert et al., 1997; Kunter et al., 2002; α = 0.75), respectively.

2.1.3.10 Self- and other-regulated learning

Self-regulated and externally regulated learning was assessed with nine items (e.g., “When studying for mathematics, I set my own goals that I want to achieve”; modified from Goetz, 2004; α = 0.83) and six items (e.g., “In the way I solve my mathematics problems, I follow my teacher's recommendations exactly”; modified from Goetz, 2004; α = 0.74), respectively.

2.1.3.11 Academic achievement

Students reported their last midterm mathematics grade. Grades ranged from 1 (very good) to 6 (insufficient) and were inverted for ease of interpretation, so that higher values corresponded to better achievement.

2.1.4 Analytic strategy

A series of confirmatory factor analyses (Brown, 2015) was conducted to investigate the structural relationships between emotions. First, a total of four CFA models representing different hypotheses about these relationships were estimated (analogous to Pekrun et al., 2011), as follows: one general bipolar factor across all contexts and emotions (M1); seven factors representing each emotion (M2); three factors representing each context (M3); and seven factors representing each emotion and correlated uniqueness within settings (M4; see Fig. 1). Second, we computed latent correlations of emotions with control and value appraisals, motivation, strategies, and performance based on single indicator models with model-based corrections for unreliability (see Cole & Preacher, 2014). Measurement models were evaluated using the fit indices CFI, TLI, RMSEA, and SRMR based on common cut-off criteria (CFI and TLI ≥ 0.95, SRMR ≤ 0.08, RMSEA ≤ 0.05; see Kline, 2016). In addition, the Bayesian information criterion (BIC) was used to select among competing models, where a lower BIC value indicates a better trade-off between model fit and model complexity.

Fig. 1
figure 1

Model 4 with seven factors (A; Study 1) and five factors (B; Study 2)

Models were estimated with Mplus 8.4 (Muthén & Muthén, 1998–2017) using the robust maximum likelihood estimation method (MLR) with chi-square test statistic and standard errors taking into account non-independence of observations due to students nested in classrooms.

2.2 Results and discussion

We observed higher means for positive than for negative emotions, sufficient variation in item scores, and low levels of skewness and kurtosis (Table 1). All scales displayed good or very good reliability, 0.84 ≤ α ≤ 0.91. The positive emotions of enjoyment and pride were positively correlated, r = 0.78, and the negative emotions of anger, anxiety, shame, hopelessness, and boredom were also positively correlated, 0.25 ≤ r ≤ 0.86. Positive and negative emotions were negatively correlated, − 0.62 ≤ r ≤  − 0.14.

Table 1 Descriptive statistics and zero-order correlations of the AEQ-M scales

2.2.1 Structural relationships

In order to examine structural relationships between emotions, four CFA models were estimated (see Table 2). Results showed that the model representing the two-facet structure of the instrument (i.e., seven emotions nested within three contexts; M4) showed an acceptable model fit (χ2(70) = 244.78, CFI = 0.978, TLI = 0.951, RMSEA = 0.057, SRMR = 0.041) according to all fit indices, as well as the smallest BIC. This result indicates the best trade-off between model fit and model complexity among the four competing models. This finding is in line with CVT, showing that several discrete achievement emotions can be distinguished and that they are context-dependent.

Table 2 Confirmatory factor analysis: model comparison

2.2.2 Correlations with external criteria

As expected and in line with CVT (Pekrun, 2006), enjoyment and pride were positively associated with all external criteria (Table 3), indicating that higher levels of positive emotions are related to higher levels of control and value appraisals, higher levels of motivation, more frequent use of learning strategies, and better academic performance. Anger, anxiety, shame, hopelessness, and boredom, on the other hand, were in general negatively associated with control and value appraisals, motivation, and performance. These results mirror research with the domain-general AEQ (Pekrun et al., 2011) and the AEQ-M (e.g., Frenzel et al., 2007a, 2007b; Putwain et al., 2018).

Table 3 Latent correlations of emotions with appraisals, motivation, strategies, and performance

However, there are noteworthy exceptions from this general pattern. Anxiety, shame, and hopelessness correlated positively with declarative repetition, and anxiety and shame correlated positively with external regulation of learning. These negative emotions might prompt students to use more rigid study strategies and to seek help in order to prevent failure. Anxiety may stimulate learning and performance by promoting extrinsic motivation (e.g., Bieleke et al., 2022), but hamper self-regulation and performance by overtaxing cognitive resources (e.g., through processing worry cognitions; Roos et al., 2021a, 2021b), resulting in variable associations with performance (Pekrun, 2018). Interestingly, anger and boredom were negatively associated with perceived positive value of achievement, whereas anxiety, shame, and hopelessness did not relate to positive achievement value. The negative link between value and boredom is in line with CVT propositions (i.e., boredom is generally linked to low levels of value; Pekrun, 2006).

3 Study 2

In Study 2, we developed extended scales for enjoyment, anger, anxiety, hopelessness, and boredom as essential emotions in mathematics, systematically covering all four components (i.e., affective, cognitive, motivational, and physiological/expressive). This allowed us to establish the overall structural validity of the AEQ-M by examining the robustness of the confirmatory factor analyses conducted in Study 1. More importantly, we could investigate the structural validity of each scale. In line with research on the domain-general AEQ (Pekrun et al., 2011), we expected that models with four correlated components (i.e., four-component models) and models with four second-order components governed by a higher-order factor representing the emotion (i.e., hierarchical models) fit the data better than models with a single factor representing the emotion (i.e., single-factor models). The former two models represent the idea that emotions are sets of interrelated affective, cognitive, motivational, and physiological/expressive components, one of the assumptions underlying the AEQ-M (e.g., Lange & Zickfeld, 2021; Scherer, 2009). The latter model represents the idea that emotions are unitary constructs with no distinguishable components.

3.1 Methods

3.1.1 Sample

This study draws upon a sample of 699 German secondary school students (56.9% female, 41.1% male) from Grade 7 (n = 83) and Grade 9 (n = 616) with a mean age of M = 14.0 years (SD = 0.9). Students attended three different tracks referred to as Hauptschule (lowest track; n = 205 in Grade 9), Realschule (middle track; n = 83 in Grade 7, n = 203 in Grade 9), and Gymnasium (highest track; n = 208 in Grade 9).

3.1.2 Measures

The construction of items for the extended AEQ-M scales was based on the same qualitative interviews and pilot studies that were used to construct the AEQ-M (Goetz, 2004; Molfenter, 1999; Titz, 2001). We used paper-and-pencil questionnaires with 5-point Likert Scales (1 = not true at all, 2 = hardly true, 3 = somewhat true, 4 = largely true, 5 = exactly true).

3.1.3 Missing data

A total of 0.27% of data were missing, stemming from 141 incomplete records. The percentage of missing values across all variables ranged from 0.00% to 1.14%. Full information maximum likelihood (FIML) was used to deal with missing data (see Enders, 2010).

3.1.3.1 Achievement emotions

Enjoyment, anger, anxiety, boredom, and hopelessness were assessed with 125 items (Appendix 2). These extended AEQ-M scales supplemented the existing scales by additional items to represent all emotion components (e.g., the affective, cognitive, motivational, and physiological/expressive components of class-related boredom). As in the AEQ-M, boredom was measured only in class and learning contexts, and hopelessness was measured only in test contexts.

3.1.4 Analytic strategy

Firstly, we analyzed the same set of four CFA models as in Study 1. Secondly, the component structure was investigated for each of the five emotions and three contexts by estimating three CFA models for each combination of emotion and context: A model with one general factor across all components (M1), a model with four factors representing each component (M2), and a second-order factor model based on the four factors representing each component (M3). This two-step approach facilitates comparisons between Studies 1 and 2 as well as with previous research on the validation of the domain-general AEQ (e.g., Bieleke et al., 2021; Pekrun et al., 2011), which involved an analogous approach to examine the structural validity of the AEQ.

3.2 Results and discussion

As in Study 1, we observed higher levels of positive than negative emotions, sufficient variation in item scores, and low levels of skewness and kurtosis (Table 4). All scales displayed good to very good reliability, 0.91 ≤ α ≤ 0.96. The negative emotions of anger, anxiety, hopelessness, and boredom were positively correlated, 0.31 ≤ r ≤ 0.84, and negatively correlated with enjoyment, − 0.61 ≤ r ≤  − 0.26.

Table 4 Descriptive statistics and zero-order correlations of the AEQ-M scales

3.2.1 Structural relationships

Among the four CFA models we compared (Table 5), the model representing the two-facet structure of the instrument (i.e., five emotions nested within three contexts, M4) again provided the best fit to our data, and the best trade-off between model fit and complexity among the four competing models. The model fit according to CFI, TLI, and SRMR was acceptable, whereas the RMSEA exceeded the threshold for acceptable model fit (χ2(27) = 213.04, CFI = 0.969, TLI = 0.925, RMSEA = 0.099, SRMR = 0.046).

Table 5 Confirmatory factor analysis: model comparison

3.2.2 Component structure

As expected, the component factor and the hierarchical models fit our data well (Table 6) and were superior to single-factor models in all cases except learning-related anger and boredom. In these latter two cases, however, the fit of the component factor and the hierarchical models were also very good. In general, the best fitting models provided acceptable fit to the data in absolute terms with only few exceptions (e.g., enjoyment). These findings suggest that emotions measured with the extended AEQ-M scales capture the component structure predicted by the control-value theory.

Table 6 Emotion component structure of AEQ scales: confirmatory factor analysis

4 Discussion

The Achievement Emotions Questionnaire—Mathematics (AEQ-M) is an important instrument for assessing a broad range of emotions in mathematics. Despite its popularity and frequent use in research on mathematics education, however, the instrument has yet to be published and evidence for several underlying assumptions is either missing or scattered across the literature. Specifically, there is a dearth of evidence for the context-dependence of mathematics-related emotions (i.e., emotions differ between class, learning, and test contexts), their component structure (i.e., emotions reflect a set of interrelated affective, cognitive, motivational, and physiological/expressive processes), and their associations with antecedents (control, value) and outcomes (motivation, learning strategies, achievement) assumed by the control-value theory of achievement emotions (CVT; Pekrun, 2006). In the present research, we capitalized on both the data originally used to develop the AEQ-M (Study 1) and additional data (Study 2) to scrutinize the validity of these assumptions. Regarding the structural validity of the AEQ-M (i.e., context-dependency, component structure), both studies provided evidence that mathematics-related emotions assessed with the AEQ-M are indeed context-specific. As such, emotions measured in one context might differ from emotions measured in another context (e.g., students might experience more anxiety in tests than in classes). Moreover, Study 2 suggests that mathematics-related emotions are best understood as reflecting a set of interrelated processes. For instance, experiencing anxiety means that students feel anxious (affective), are worried (cognitive), want to leave (motivational), and get queasy (physiological/expressive). These findings corroborate and extend previous research that investigated some of these assumptions about the internal structure of mathematics-related emotions (e.g., context-dependency in a Portuguese version of the AEQ-M; Moreira et al., 2019). Moreover, they provide a novel set of extended AEQ-M scales for assessing enjoyment, anger, anxiety, hopelessness, and boredom in mathematics education research.

Regarding the external validity of the AEQ-M, the results of Study 1 showed the theoretically predicted associations between mathematics-related emotions and their core antecedents and outcomes. For instance, students who reported higher levels of control (e.g., higher self-efficacy) and positive value (e.g., higher interest) also reported higher levels of positive emotions and lower levels of negative emotions. In turn, higher levels of positive emotions and lower levels of negative emotions were linked to higher motivation (e.g., effort), different learning styles (e.g., more self-regulation), and higher performance (i.e., better grades). This aligns well with previous research demonstrating the influence of control and value appraisals on mathematics-related emotions (e.g., Frenzel et al., 2007a, 2007b) and the effect of these emotions on performance (e.g., Pekrun et al., 2017). Besides these main findings, it is noteworthy that the relationships between achievement emotions and performance were substantial (e.g., the latent correlation between enjoyment and grades was r = 0.46). Across both studies, the AEQ-M scales demonstrated good reliability (i.e., it allows researchers to measure emotions with sufficient precision; Cronbach’s α ranging from 0.84 to 0.96) and were correlated with each other in meaningful ways (e.g., higher levels of one negative emotion were associated with higher levels of other negative emotions; 0.14 ≤|r|≤ 0.86).

Our findings are of great relevance for mathematics education for several reasons. Mathematics is a core subject in school curricula around the world and commonly accompanies students through their entire school life and beyond, especially in STEM-related occupational careers, but also more generally in understanding science and the world (e.g., statistics about diseases and health behavior related to the Covid-19 pandemic). Understanding the emotions students experience in mathematics is therefore of paramount importance (Schukajlow et al., 2017), not only because the emotional experiences of students in mathematics class should be studied as an outcome variable in itself, but also because emotions are an important predictor of mathematics achievement (Kim et al., 2014). A psychometrically sound, comprehensive, and valid instrument for assessing emotions in mathematics is therefore indispensable, with the need for such an instrument already demonstrated by existing research capitalizing on the AEQ-M. For instance, the AEQ-M has been used to examine the sources of gender differences in mathematics anxiety (Frenzel et al., 2007a, 2007b) and to investigate the effects of different special education support measures in mathematics on student’s emotions (Holm et al., 2020).

The extended AEQ-M scales developed in Study 2 will allow researchers to measure systematically the different components of achievement emotions (i.e., the affective, cognitive, motivational, and physiological/expressive processes of which emotions are composed). In future studies, it could thus be investigated whether these components are differentially affected by control and value appraisals and whether there are differences in the associations with performance. For instance, the cognitive component of mathematics anxiety (e.g., worries) might be more strongly affected by low levels of perceived control and more substantially associated with higher levels of performance than other components of anxiety (Roos et al., 2021a, 2021b; see also Barroso et al., 2021). This would have important practical implications for mathematics education, as it might guide the design of interventions (e.g., strengthening self-efficacy beliefs to reduce anxiety).

The extended AEQ-M scale may also inform psychometric research, as the results from studies may allow researchers to compare the components of the AEQ-M with the use of single-item measures to capture achievement emotions (Gogol et al., 2014). Moreover, the systematic coverage of emotion components in the extended AEQ-M scales permits an examination of the interplay of these components across different emotions (e.g., anxiety and boredom might share similar motivational processes such as the urge to leave a situation; Lange & Zickfeld, 2021). This would again greatly benefit mathematics education by identifying possible synergy effects among interventions.

In the present research, we developed extended AEQ-M scales to cover systematically all theoretically assumed components of emotions. However, the expanding of a scale may be a double-edged sword, as adding items to scales can be beneficial in terms of increasing reliability and ensuring that all relevant aspects of a construct may be captured, but it may also render the scale less convenient to administer (e.g., increasing time, decreasing compliance). This is particularly relevant for repeated assessments, for instance, in the context of experience sampling studies (Goetz et al., 2016). Experience sampling assesses emotions at the moment of their experience, which is increasingly used to study students’ and teachers’ emotions in the domain of mathematics (e.g., Bieg et al., 2017). A complementary approach would be the development of scales with fewer items that still cover each emotion component, which would address the balance between brevity and comprehensiveness. In terms of the domain-general AEQ, such a short-form has already been developed (the AEQ-S uses four items to measure all components of an emotion; Bieleke et al., 2021). It would therefore be possible to develop similar short-form versions of the AEQ-M, as adapting an already domain-specific questionnaire to a different domain may be less complex than adapting versions of a domain-specific questionnaire like the AEQ.

There are also some limitations of the present research that should be considered when interpreting our findings. Firstly, both studies are based on samples from German secondary schools. And while there already is evidence on the invariance of the AEQ-M across cultures (Frenzel et al., 2007a, 2007b), it would be desirable to investigate whether our results generalize to other age groups and educational settings (e.g., university students). This seems particularly important for the newly developed extended AEQ-M scales. Relatedly, we did not focus on gender differences as CVT assumes structural equivalence across gender—for instance, the association between control and value, and achievement emotions should be similar across female and male students (e.g., Pekrun et al., 2007; for empirical evidence, see Frenzel et al., 2007a, 2007b). Moreover, measurement equivalence of the AEQ-M across genders has already been demonstrated elsewhere (e.g., Moreira et al., 2019).

Secondly, our validation of the AEQ-M in terms of the core antecedents and outcomes of achievement emotions relied on self-reports. This mirrors the approach commonly taken in related research on achievement emotions (Bieleke et al., 2021; Pekrun et al., 2011), however it should still be complemented by more objective measures in future research. For example, physiological measures provide information beyond self-report and could be used to further validate the AEQ-M (Roos et al., 2021a, 2021b).

Thirdly, the fit of models representing different component structures of emotions did not always meet the thresholds recommended in the literature (e.g., for the learning- and test-related enjoyment scales). This may indicate a need for further refinement of these scales in future research, especially when the focus is on distinguishing the different components of emotions. However, it should be noted that these recommended cut-off criteria were derived from simulated datasets and are often not met with data sets derived from more complex studies, suggesting that they should be used with caution (Heene et al., 2011; Marsh et al., 2004). Relatedly, future research might use different analytic approaches to investigate the assumptions behind the AEQ-M. For instance, the context-dependency of mathematics-related emotions and their component structure could be jointly examined in one comprehensive model rather than in two separate steps. This added complexity might allow the examination of more fine-grained hypotheses (e.g., whether context-dependency holds across the different components of emotions).

Fourthly, the research design in Study 1 is correlational and does not allow us to draw causal inferences about the relations between achievement emotions and their antecedents and outcomes, or to capture mediated relations between these constructs. While the observed correlations are in line with CVT propositions, experimental or longitudinal data are necessary to examine the causal effects generating these correlations (see, e.g., Forsblom et al., 2022; Pekrun et al., 2017).

5 Conclusion

Across two independent studies, we examined the internal structure and external relations of mathematics-related emotions measured with the AEQ-M, a widely used instrument that has not been published yet and that lacks dedicated evaluations of the validity of its assumptions. Our results indicate that the structural properties of the AEQ-M correspond closely to predictions that can be derived from the control-value theory of achievement emotions, and these results are similar to those observed for achievement emotions in other school domains. Specifically, mathematics-related emotions depend on the academic context in which they occur (i.e., class, learning, and test), represent a set of interrelated psychological processes (i.e., affective, cognitive, motivational, and physiological/expressive components), and are linked to their assumed antecedents (control, value) and outcomes (motivation, learning strategy, achievement). We introduced a set of extended AEQ-M scales that researchers in mathematics education can use to conduct a valid, reliable, and systematic examination of the component structure of several mathematics-related emotions in future studies.