Introduction

The COVID-19 pandemic has an important impact on people’s mental health worldwide, generating implications for public health, the economy, and social dynamics in general (Zolotov et al., 2020). In this context, university students are considered a vulnerable population to suffer from mental health problems (Husky et al., 2020) since they are generally in a transition stage in their academic, professional, and personal life (Acharya et al., 2018). Numerous studies examining mental health in college students during the COVID-19 pandemic found that it damages their mental health, as there is increased stress, psychological distress, symptoms of anxiety and depression, and difficulties concentrating at work academic (Cao et al., 2020; Kecojevic et al., 2020; Jungmin Lee et al., 2021; Rogowska, et al., 2020a, b; Savitsky et al., 2020). Likewise, increases in anxiety and depression symptoms in college students are associated with a higher perceived risk of the disease (Feng et al., 2020). To this must be added that the closure of universities, the change to virtual learning platforms, difficulties in technological access, economic problems, social isolation, fear of infection, and the death of family members due to COVID-19 have exacerbated the risk of mental health problems in this group significantly (Kecojevic et al., 2020; Sahu, 2020).

Faced with this, adequate measurement of mental health is necessary to prevent and implement intervention programs in university students (Visser & Law-van Wyk, 2021). According to the dual-factor model of mental health (Greenspoon & Saklofske, 2001), it is essential to use mental health instruments that measure both positive and negative emotions. An instrument developed under this dual perspective is the Mental Health Inventory (MHI, Veit & Ware, 1983), made up of 38 items that assess psychological well-being and distress in the general population. Subsequently, a short, five-item version of the MHI was developed, called MHI-5 (Berwick et al., 1991). The MHI-5 is as effective as the extended version (Rivera-Riquelme et al., 2019) and other instruments such as the General Health Questionnaire (Marques et al., 2011) and is even a better measure than the Hopkins Symptom Checklist (Strand et al., 2003). Also, due to its brevity, ease of response, as well as its evidence of validity and reliability, the MHI-5 has been used in different groups and cultures such as Portuguese teenagers (Marques et al., 2011), the general population of Brazil (Damásio et al., 2014), Australia (Milner et al., 2020) and Finland (Elovanio et al., 2020), American women with and without risk of atrial fibrillation (Whang et al., 2012), patients with chronic heart failure (Mo et al., 2020), university students (Almeneessier et al., 2015), people with spinal cord injury (Verwer et al., 2016), children and Spanish adolescents (Rivera-Riquelme et al., 2019), among others.

The MHI-5 has been used as a general measure of mental health problems and used in surveys of general health and quality of life in the non-psychiatric population (Rivera-Riquelme et al., 2019) since it assesses both psychological well-being like psychological distress. Furthermore, the MHI-5 has shown high sensitivity for detecting depressive, anxiety, or panic disorders in the general population and primary care patients (Means-Christensen et al., 2005; Rumpf et al., 2001; Thorsen et al., 2013).

Few studies aimed to evaluate psychometric properties. A study with Portuguese adolescents reported that the MHI-5 presents a single factor that explained 59.88% of the total variance, adequate reliability (α = 0.82), item-test correlations that vary from 0.78 to 0.81, and evidence of convergent validity with other measures of hopeful thinking, life satisfaction, and self-esteem (Marques et al., 2011). Another research carried out in the general population of Brazil supported the presence of a single factor, adequate reliability (Cronbach alpha = 0.86; composite reliability = 0.82), evidence of convergent and discriminant validity with subjective happiness, satisfaction with life, and general health (Damásio et al., 2014). A more recent study with Finland’s general population also indicated that the MHI-5 has good psychometric properties, with good reliability (α = 0.89) and a unidimensional factorial structure. Furthermore, all the items showed adequate discrimination indices, and increasing difficulty as the symptoms became more severe (Elovanio et al., 2020).

In Spain, the MHI-5 has been validated in children and adolescents between 10 and 15 years old (Rivera-Riquelme et al., 2019). Unlike previous studies, in the validation in Spanish, a two-factor structure was obtained that explains 69.2% of the total variance (factor 1 = psychological distress; factor 2 = psychological well-being), adequate reliability for the total scale (α = 0.71) and the psychological distress (α = 0.71) and psychological well-being (α = 0.70) subscales. In addition, the scale showed a significant relationship with symptoms of anxiety and depression. Furthermore, this Spanish version of the MHI-5 (R-MHI-5) presents a simplified response format of four alternatives (never, sometimes, several times, and always) and not six as originally proposed. The reduction of response alternatives is justified due to the lower cognitive demand necessary to complete the inventory and the evidence that there are no differences in the psychometric properties between measures that use 4, 5, or 6 alternatives (Jihyun Lee & Paek, 2014). In Peru, the MHI-5 was recently studied in a small sample of 75 students from a private secondary educational institution (Merino-Soto et al., 2019), whose results indicated that a two-factor model presents a better fit and has adequate evidence of reliability due to internal consistency (alpha = 0.70). However, the smallest number of participants makes it difficult to generalize the results. Also, the version with six answer options was used and not the one with four options.

As seen in the literature review, only the study by Rivera-Riquelme et al. (2019) evaluates the scale’s psychometric properties in university students. Therefore, there is little evidence of the internal structure and other psychometric properties of the scale in university students that allow an adequate evaluation of mental health in the context of the COVID-19 pandemic.

Most of the previous psychometric studies of the MHI-5 have used TCT models. Only one previous study by Elovanio et al. (2020) evaluated the psychometric properties of the MHI-5 based on SEM and TRI models. However, none research was performed with the revised version of 4 response alternatives (R-MHI-5). Using both procedures will allow more robust results since the participants’ characteristics do not influence the psychometric findings derived from TRI models, while the evidence produced by TCT models is (Lin et al., 2020). The TRI approach will also allow estimating the difficulties of the items, the reliability of the people and items, and the standard errors, providing more stable results (Magno, 2009). Thus, using both approaches (TCT and TRI) will allow us to corroborate the previous findings and provide a better perspective of the psychometric properties of R-MHI-5. On the other hand, evaluating the MI will demonstrate that the R-MHI-5 measures the same construct in the same way for different groups (Vandenberg & Lance, 2000). In the present study, the MI evidence would indicate that men and women would have the same conceptualization of the latent variable and would have the same expected score on the R-MHI-5. It would also indicate that the relationships between the observable variables (items of the R-MHI-5) and the latent variable are independent of belonging to one group or another (Lubke et al., 2003). Finally, having evidence of MI is a prerequisite for comparing the measured variable between different groups (Caycho, 2017).

In this sense, this study aimed to evaluate the psychometric properties of the revised version of the MHI-5 (Rivera-Riquelme et al., 2019) in Peruvian university students, using the Classical Test Theory (TCT) and the Item Response Theory (TRI). Specifically, the validity evidence based on the construct, validity evidence based on the relationship with other variables, reliability, the discrimination and difficulty parameters, and the measurement invariance (MI) according to sex and age were evaluated.

Method

Participants

Non-probabilistic sampling was used to collect the data, using the following inclusion criteria: (a) informed consent of the participants, (b) age not older than 40 years, (c) ability to read and write in Spanish, and (d) being enrolled in a university program. A sample of 1002 university students of both sexes (41.4% men and 58.6% women) between 17 to 35 years (M = 21.4; SD = 3.4) was collected. Undergraduate students are from Peru, and the majority came from the highlands (47.5%), 26.2% came from the coast, and an equal percentage came from the jungle. Soper’s (2020) online calculator was used, taking into account the following criteria: five observed variables, two latent variables, anticipated effect size of 0.30 (minimum lambda value for factorial models), desired probability of 0.05, and a power level statistic of 0.95. The minimum size required was 288 cases. Therefore, the present study collected a sample of participants that far exceeds the minimum required.

Instruments

Mental Health Inventory-5 (R-MHI-5)

Developed by Berwick et al. (1991) and adapted into Spanish by Rivera-Riquelme et al. (2019). The R-MHI-5 is made up of five items that assess the presence of psychological well-being (items 2 and 4) and psychological distress (inverse items 1, 3, and 5). Furthermore, the Spanish version has four response categories ranging from “never” (0) to “always” (3), where a higher score indicates a better state of mental health.

Generalized Anxiety Disorder Scale (GAD-7)

Developed by Spitzer et al. (2006) and adapted to Spanish by García-Campayo et al. (2010), this scale is made up of seven items that have four response categories ranging from “never” (0) to “almost every day” (3), where a higher score indicates a greater presence of the disorder. In the present study, the one-dimensional model presented adequate indices of reliability (α = 0.93; ω = 0.89) and validity based on internal structure (χ2 = 174.73; df = 14; p < 0.001; CFI = 0.99; TLI = 0.99; RMSEA = 0.071; SRMR = 0.033).

Patient Health Questionnaire (PHQ-9)

Developed by Spitzer et al. (1999) and adapted into Spanish by Zhong et al. (2014), this questionnaire consists of nine items that have four categories ranging from “not at all” (0) to “almost every day” (3). In the present study, the one-dimensional model presented adequate indices of reliability (α = 0.92; ω = 0.90) and validity based on internal structure (χ2 = 132.39; df = 27; p < 0.001; CFI = 0.99; TLI = 0.98; RMSEA = 0.062; SRMR = 0.032).

Procedure

The study obtained the approval of the ethics committee of the Center for Research and Innovation in Health of the Universidad Peruana Unión (N° 00,131–2020), and the standards of the Helsinki Declaration were met (World Medical Association, 2013). The data was collected through a virtual form, using the Google Forms digital platform. In the first part of the virtual form, the study’s objectives were explained, the time required to complete the form, and the informed consent was presented. The confidentiality of the information was ensured, and the possibility that the participants could withdraw at any time. Only participants who gave their informed consent could complete the following sections of the form.

Data Analysis

A confirmatory factor analysis (CFA) was carried out in this study using the weighted least squares with mean and variance adjusted (WLSMV) estimator since the items are at the ordinal level (Brown, 2015). The chi-square test (χ2), the RMSEA index, and the SRMR index were used to evaluate the fit of the model, in which case values less than 0.05 indicate a good fit and between 0.05 and 0.08 is considered acceptable (Kline, 2015). Also, the CFI and TLI index were used; for these cases, values greater than 0.95 indicate a good fit and higher than 0.90, an acceptable fit (Schumacker & Lomax, 2015). The Cronbach’s alpha coefficient (Cronbach, 1951) and the omega coefficient (McDonald, 1999) were used to evaluate the internal consistency of the scale, where a value of  > 0.80 is adequate (Raykov & Hancock, 2005). The internal consistency of the scale with the composite reliability index was also used. Values greater than 0.70 are generally considered acceptable (Viladrich et al., 2017).

A sequence of hierarchical models of variance was proposed, which were increasingly restrictive to evaluate the scale’s invariance according to sex and age. First, the configural invariance (reference model) was evaluated, followed by the metric invariance (equality of factor loads), scalar invariance (equality of factor loads and intercepts), and finally, the strict invariance (equality of factor loads, intercepts, and residuals). First, a formal statistical test was used in the study to compare the sequence of models, for which the chi-square difference (Δχ2) was used where non-significant values (p > 0.05) suggest invariance between the groups. Second, a modeling strategy was employed, using differences in the CFI (ΔCFI) where values less than < 0.010 evidence model invariance between groups between the groups (Chen, 2007). Additionally, the RMSEA (ΔRMSEA) was used, where differences less than < 0.015 show the model invariance between the groups (Chen, 2007).

For the Item Response Theory (IRT), a graduated response model (GRM, Samejima, 1997) was used, specifically an extension of the 2-parameter logistic model (2-PLM) for ordered polytomous items (Hambleton et al., 2010). For each item, two parameters were estimated: discrimination (a) and difficulty (b). The discrimination parameter determines the slope at which the responses to the items change as a function of the latent trait level, whereas the difficulty parameters of the item determine how much of the latent trait the item requires to be answered. Since scales have four response categories, there are three difficulty estimates, one per threshold. The estimates for these three thresholds indicate the latent variable’s level at which an individual has a 50% chance of obtaining a score equal to or greater than a particular response category. The item information curves (IIC) and test information curves (TIC) were also calculated.

Regarding the validity of the MHI-5 relative to other variables, a structural equation model was proposed. In this model, the degree of psychological well-being and psychological distress is related to anxiety and depression. The WLSMV estimator was used to estimate the model, and the same adjustment indicators performed in the confirmatory factor analysis were taken into account.

All statistical analyzes were performed using the “lavaan” package (Rosseel, 2012) for the CFA, the “semTools” package (Jorgensen et al., 2018) for factorial invariance, and the “ltm” package for the GRM (Rizopoulos, 2006). In all cases, the RStudio environment (RStudio Team, 2018) was used for R (R Core Team, 2019).

Results

Descriptive Analysis

Table 1 shows that item 4 (During the last month, how often have you felt happy?) presents the highest average score in the total sample (M = 1.61) and the different groups of men (M = 1.63), women (M = 1.59), adolescents (M = 1.59), and adults (M = 1.62). It is also appreciated that item 5 (During the last month, how often have you felt so sad that nothing could cheer you up?) presents the lowest average score in the total sample (M = 0.72) and the different groups of males (M = 0.68), females (M = 0.75), adolescents (M = 0.75), and adults (M = 0.70). Furthermore, it is appreciated that the items present adequate asymmetry and kurtosis indexes (± 1.5) in the total sample and all the specific groups.

Table 1 Descriptive analysis of the items in the entire simple and in specific groups

Validity Based on the Internal Structure

Table 2 shows that in the total sample, a one-dimensional model does not show adequate fit indices (χ2 = 894.34; df = 5; CFI = 0.74; TLI = 0.48; RMSEA = 0.422 [IC90% 0.398–0.445]). Similarly, a unidimensional model with reversing negative items does not present adequate fit indices (χ2 = 894.34; df = 5; CFI = 0.74; TLI = 0.48; RMSEA = 0.422 [IC90% 0.398–0.445]). In contrast, a model with two related factors presents better fit indices (χ2 = 24.03; df = 4; < 0.001; CFI = 0.99; TLI = 0.99; RMSEA = 0.071 [IC90% 0.045–0.099]), where the relationship between both factors is acceptable (− 0.30).

Table 2 Fit indices of the related two-dimensional model and invariance models according to sex and age

Also, it can be seen that the model of two related factors shows adequate adjustment indices in the specific groups: men (CFI = 0.99; TLI = 0.99; RMSEA = 0.051), women (CFI = 0.99; TLI = 0.98; RMSEA = 0.086), adolescents (CFI = 0.99; TLI = 0.98; RMSEA = 0.082), and adults (CFI = 0.99; TLI = 0.99; RMSEA = 0.055). Furthermore, it can be seen that in the total sample and in the specific groups, the factorial weight of the latent variable with each of its items is high and significant (see Table 3).

Table 3 Standardized factorial loads for the items and validity of the scale based on sex, age and for the total sample

Factorial Invariance According to Sex and Age

Table 2 shows that the factorial structure of the MHI-5 did not show evidence of being strictly invariant for the group of men and women in the sequence of invariance models proposed: metric invariance (Δχ2 = 9.56, p = 0.022; ΔCFI =  − 0.016), scalar (Δχ2 = 18.58, p < 0.001; ΔCFI =  − 0.036), and strict (Δχ2 = 11.16, p = 0.048; ΔCFI =  − 0.008). Regarding the group of adolescents and adults, the MHI-5 in the sequence of metric invariance models (Δχ2 = 5.34, p = 0.148; ΔCFI =  − 0.005) and scalar (Δχ2 = 3.92, p = 0.269; ΔCFI =  − 0.002) showed evidence of factorial invariance. However, it did not show evidence of strict invariance (Δχ2 = 15.86, p = 0.007; ΔCFI =  − 0.024).

Scale Reliability

It can be seen in the lower part of Table 3, that in the total sample, the psychological well-being dimension (ω = 0.75) and psychological distress (ω = 0.79) present adequate reliability indices. Similarly, it occurs in specific groups: men (ω = 0.82 and ω = 0.79), women (ω = 0.71 and ω = 0.78), adolescents (ω = 0.70 and ω = 0.78), and adults (ω = 0.78 and ω = 0.81). An adequate level of composite reliability is also appreciated in the total sample and the specific groups in the dimensions of psychological well-being (CR ≥ 0.79) and psychological distress (CR ≥ 0.89).

Item Response Theory Model: Gradual Response Model (GRM)

Two gradual response models (GRM) were fitted, specifically a 2PLM model for each scale’s dimension. Table 4 shows that all the discrimination parameters of the psychological well-being and psychological distress dimensions are above the value of 1, generally considered good discrimination (Hambleton et al., 2010). Regarding the difficulty parameters, in both dimensions, all the threshold estimators increased monotonically, as expected.

Table 4 Discrimination and difficulty parameters for the items of each dimension

Figure 1 shows the information curves for the items and dimensions (IIC and TIC, respectively). Regarding the psychological well-being dimension, the IIC shows that item 4 is the most accurate to evaluate the latent trait. The TIC also shows that the factor is more reliable (accurate) in the scale range between − 2 and 1.5. Regarding the psychological distress dimension, the IIC shows that items 3 and 5 are the most accurate for evaluating the latent trait. Furthermore, the TIC shows that the factor is more reliable (accurate) in the scale range between − 1.5 and 3.

Fig. 1
figure 1

Item and test information curves for the scale

Validity Based on the Relationship to Other Constructs

Considering the literature review, we proposed a model to evaluate the relationship between the two dimensions of the construct (psychological well-being and psychological distress) and the level of anxiety and depression. It can be seen in Fig. 2 that the structural model presents adequate adjustment indices (RMSEA = 0.064; CFI = 0.97; TLI = 0.97), and the measurement models are adequately represented by their items.

Fig. 2
figure 2

Relationship model with other constructs

Discussion

University students are a population vulnerable to mental health problems resulting from the COVID-19 pandemic (Son et al., 2020). In this sense, a quantitative measure of mental health is needed that is useful for the development of prevention programs. Therefore, this study’s objective was to evaluate the psychometric properties of R-MHI-5 in a relatively large population of university students, based on traditional methods, such as TCT, and modern, such as TRI analysis.

The results confirmed a 2-factor model, which assesses well-being and psychological distress. The finding coincides with the study carried out in Spain (Rivera-Riquelme et al., 2019) and the version of 6 response options carried out in Peru (Merino-Soto et al., 2019); however, it is different from the one-dimensional structure found in Portugal (Marques et al., 2011), Brazil (Damásio et al., 2014), and Finland (Elovanio et al., 2020). As mentioned, the scores of items 1, 3, and 5 must be inverted to calculate the total score. These results suggest that the combination of direct and inverse items in the same measure produces that Spanish-speaking people have a different understanding of some mental health indicators compared to people who have another mother tongue (Suárez-Alvarez et al., 2018).

Furthermore, this result is consistent with the theoretical model under which the scale was developed (Veit & Ware, 1983). The dual-factor model of mental health explains that well-being and the absence of psychopathological symptoms are not opposites within a single dimension but rather constitute two different factors of mental health that are negatively related (Antaramian et al., 2010). Under this theoretical model, the subjective well-being dimension is the positive indicator of mental health, and the psychological distress dimension is the negative indicator of mental health. Therefore, evaluating these two factors is essential to have a comprehensive understanding of mental health (Wang et al., 2011).

Although all the items have relatively high factor loadings, item 4 (During the last month, how often have you felt happy?) is the one with the highest value and, therefore, is the indicator that best represents psychological well-being.

Furthermore, from the IRT perspective, item 4 is the most accurate indicator to assess psychological well-being. This result is not surprising since happiness is considered one of the most influential factors in psychological well-being due to positive feelings and the absence of negative feelings (Lyubomksky et al., 2005). On the other hand, items 3 and 5 are the most accurate for evaluating psychological distress. This result is essential since discouragement and deep sadness indicators are fundamental to measure the presence or absence of psychological distress (Wang et al., 2011). Especially in the context of the pandemic, since they are one of the most prevalent problems reported in university students (Generali et al., 2021; Martínez Arriaga et al., 2021). In a complementary way, the results based on IRT indicate that university students require a higher presence of the latent trait (greater well-being or psychological distress) to respond to the higher response categories of the R-MHI-5.

Regarding the test information curves (TIC), in the dimension of psychological well-being, most of the information is in the range of − 2 to 1.5, which indicates that the scale is useful and reliable, especially to identify people with low levels of tranquility, peace, and happiness. In the psychological distress dimension, most of the information is in the range of − 1.5 to 3, which indicates that the scale is handy for identifying people with low and high levels of discouragement, sadness, and anxiety.

Finally, the reliability was also adequate and similar to that reported by Rivera-Riquelme et al. (2019) and Merino-Soto et al. (2019).

The latent relationship model reported that psychological well-being was negatively related to anxiety and depression, as suggested in the previous literature (Contreras et al., 2017; Lew et al., 2019; Yüksel & Bahadir-Yilmaz, 2019). In this sense, psychological well-being decreased as symptoms of anxiety and depression increased. On the other hand, psychological distress was positively related to symptoms of anxiety and depression. Previous studies reported similar results (Dyrbye et al., 2006; Sharp & Theiler, 2018). Therefore, anxiety, depression, and psychological distress are considered important predictors of university students’ psychological well-being (Yüksel & Bahadir-Yilmaz, 2019). All these results suggest the importance of increasing psychological well-being and reducing anxiety, depression, and psychological distress in these students by the universities’ psychological counseling centers. Finally, the observed relationships give evidence of validity based on the relationship with other variables.

On the other hand, no evidence of measurement invariance was reported according to sex, which would indicate that men and women understand psychological well-being and psychological distress differently. This result could represent a difficulty in using the R-MHI-5 in studies that aim to compare these mental health indicators between both genders. Non-invariance is not an expected result and is considered a statistical problem that must be solved as a previous step to carry out other studies (Vandenberg & Lance, 2000). However, there is evidence that studies with large samples and good psychometric instruments make it possible to identify non-invariance with greater probability (Meade & Bauer, 2007). On the other hand, the R-MHI-5 has the same factorial structure between adolescents and early adults and evidence partial scalar invariance. The previous allows us to suggest that although the R-MHI-5 shows differences in some individual items, it can be used in studies that compare well-being and psychological distress between groups of different ages.

The study has limitations. First, convenience sampling was used that is relatively biased in favor of women and those residing in the highlands of Peru, limiting the results’ generalizability. It is important to note that it has been difficult to recruit participants due to social distancing measures and movement restrictions for people in general. Therefore, it is recommended to carry out other studies that use more representative samples of Peruvian university students. Second, self-report measures were used, which could generate insufficient or excessive responses to current symptoms due to social desirability.

Despite the limitations, the findings support using the Spanish version of the R-MHI-5 in clinical and research settings. Furthermore, there are important practical implications. First, studies conducted during the COVID-19 pandemic could include a brief mental health assessment. In these studies, mental health could be considered as an outcome measure or also an explanatory factor. Second, the scale would make it possible to identify psychological well-being and distress levels in the university context and examine their relationships with demographic variables. These results would be beneficial to health professionals and decision-makers in the university context to identify those students most likely to have mental health problems during the COVID-19 pandemic or others that could appear in the future and promote the development of psychoeducational interventions aimed at groups at potential risk.

Conclusion

In conclusion, the results show that the R-MHI-5 is an instrument with good psychometric evidence, based on classical and modern techniques. It was also shown that the R-MHI-5 is not invariant between men and women, but it can be useful to significantly compare the scores between groups of different ages without compromising the inventory’s psychometric properties. Without checking for the presence of invariance, it cannot be assumed that the results of comparisons between different groups are valid (Chen, 2008). Therefore, the results are expected to motivate other researchers to assess measurement invariance before comparing well-being and psychological distress, measured by the R-MHI-5, between different age and sex groups. Finally, the study aims to fill a gap in the measurement, identification, and investigation of well-being and psychological distress in the context of the COVID-19 pandemic.