The Satisfaction with Life Scale (SWLS; Diener et al., 1985) is one of the most extensively used instruments for measuring cognitive dimension of subjective well-being (Pavot & Diener, 2008). The SWLS includes five items: The first three items capture present life satisfaction and positive evaluations of one’s life (“In most ways my life is close to my ideal”, “The conditions of my life are excellent”, and “I am satisfied with my life”), whereas items 4 (“So far I have gotten the important things I want in life”) and 5 (“If I could live my life over, I would change almost nothing”) tap into past life satisfaction (Kjell & Diener, 2021). Since its introduction in 1985, the SWLS has been translated to more than 30 languages and used in thousands of studies across different cultures and populations (Chinni & Hubley, 2014; Pavot, 2014). The scale was originally developed and validated on undergraduate student and elderly samples (Diener et al., 1985), and subsequently used primarily in adult samples (e.g., Pavot & Diener, 2008). However, a growing interest in adolescent well-being in recent years has been accompanied by an increased use of the SWLS among adolescent samples from different cultural settings. For example, psychometric properties of the SWLS have been extensively investigated in adolescent samples from various countries (for a review, see Proctor et al., 2009), including France (Bacro et al., 2020), India (Areepattamannil & Bano, 2020), Italy (Di Fabio & Gori, 2016), Norway (Moksnes et al., 2014), Peru (Arias-Gallegos et al., 2018), Portugal (Neto, 1993; Silva et al., 2015), Serbia (Jovanović, 2016), South Korea (Lim, 2012), and Spain (Bendayan et al., 2013; Ortuño-Sierra et al., 2019). Furthermore, the SWLS has been widely used in validation studies of well-being measures in adolescent samples across different cultures, such as Argentina (Góngora & Castro Solano, 2015), China (Tian et al., 2018), New Zealand (Sotardi & Watson, 2019), and in numerous studies on adolescent life satisfaction across various countries (e.g., Al-Attiyah & Nasser, 2016; Arslan, 2019; Frison & Eggermont, 2016; Geraee et al., 2019; Ma et al., 2019; Marcionetti & Rossier, 2016; Zhu & Shek, 2021). The SWLS has been also used in cross-cultural studies on adolescent well-being, but the majority of the existing work compared levels and predictors of life satisfaction across cultures without testing cross-cultural measurement invariance (e.g., Garcia et al., 2017; Kjell et al., 2013).

Measurement invariance entails the evaluation of whether measurement of the latent construct does not vary across groups (Xu & Tracey, 2017). The literature discusses various levels of measurement invariance (Vandenberg & Lance, 2000). Configural invariance assumes general similarity of associations between indicators and the measured constructs. When confirmed, it implies similarity of the content of the measured constructs. Metric invariance assumes equal measurement units of the latent constructs across groups, and, when confirmed, it implies comparability of unstandardized regression coefficients involving the latent constructs. Scalar invariance assumes both equal measurement units and the same scale origin of latent constructs. This level of invariance allows reliable comparison of both regression coefficients and means of the latent constructs across groups. Yet another, but rarely used, higher levels of invariance include invariance of residuals, which assumes equality of unique variances of indicators across groups, invariance of the latent construct variances, covariances between constructs, equality of latent means etc. However, when dealing with a large number of different groups, it is highly unrealistic to expect any invariance level above the scalar, hence the tests are usually conducted for three levels only: configural, metric, and scalar (Wells, 2021).

Measurement invariance is typically tested using a multi-group confirmatory factor analysis (MGCFA), with tests conducted in an increasingly restrictive manner (Greiff & Scherer, 2018). Configural invariance is supported when factor loadings have the same structure and signs across multiple groups. Metric invariance can be confirmed when the factor loadings are equivalent across groups, whereas scalar invariance can be confirmed when both factor loadings and item intercepts are similar across groups. If a given level of invariance does not hold for all parameters (e.g., only for some factor loadings), partial invariance can be tested (Byrne et al., 1989).

Evaluating measurement invariance is a necessary requirement for valid comparisons across cultural groups as it tests whether items function similarly across groups (Boer et al., 2018; Han et al., 2019). Providing evidence to support measurement invariance is vital in cross-cultural research on life satisfaction because items aimed at measuring this component of well-being might have different meaning across cultures (e.g., Oishi, 2006; Vittersø et al., 2002). However, cultural influences on understanding of life satisfaction have hardly been addressed in linguistic and qualitative studies, which is in stark contrast to numerous studies on lay conceptions (e.g., Joshanloo, 2019), cultural construal (e.g., Uchida & Ogihara, 2012), and the semantics of happiness and good life (e.g., Goddard & Ye, 2014; Wierzbicka, 2009). These studies have unambiguously showed that the concept of subjective well-being is embedded in a sociocultural context, and that understanding and meaning of happiness, satisfaction, and related terms capturing the idea of a good life can vary greatly across cultures. For example, in many European and North American countries, well-being, happiness, and good life are understood predominantly as individual, intrapsychic, hedonic states, whereas in most East Asian countries they are construed as relational, interpersonal, dialectical states with both positive and negative features (e.g., Jovanović, 2021). Similarly, life satisfaction judgments may be based on internal processes (such as emotions and beliefs) in some cultures, whereas their source may be more external (e.g., objective living conditions) and social (such as norms) in others (Suh et al., 1998). The SWLS includes items that capture global cognitive judgments, which may be grounded on different sets of information across cultures and may reflect culture-specific ideas of a good life. Therefore, it is theoretically reasonable to expect cultural differences in evaluations of the quality of life, resulting in different functioning of SWLS items across cultures, i.e., measurement noninvariance of this scale. Measurement noninvariance may jeopardize the validity of cross-cultural findings and may lead to incorrect and biased conclusions in multiple group comparisons (Kim et al., 2017). Given the increased interest in cross-cultural research on adolescent well-being and the lack of cross-culturally validated measures to assess adolescent life satisfaction, the main goal of the present study was to investigate measurement invariance of the SWLS in adolescent samples from 24 countries and regions. We aimed to evaluate whether this scale can be used as a valid tool in studies focusing on cross-cultural comparison of life satisfaction among adolescents. In addition, measurement invariance of the SWLS across gender and age was tested to examine whether the SWLS items function similarly across gender and age groups.

Previous Measurement Invariance Studies on the SWLS

Cross-cultural measurement invariance of the SWLS has been tested in a few studies. Most of them used samples from two (Schnettler et al., 2017) or three (Bieda et al., 2017; Whisman & Judd, 2016) countries, but some studies were based on five (Jovanović & Brdar, 2018) and even 26 countries (Jang et al., 2017). For example, using samples of Chinese, German, and Russian undergraduate students, Bieda et al. (2017) found support for partial metric (the loading of item 2 was noninvariant) and partial scalar invariance (intercepts of items 1 and 3 were noninvariant). Evidence of partial scalar invariance was also found in samples of adults aged 50–79 from the United States, England, and Japan (Whisman & Judd, 2016), but in this study intercept for item 4 varied across countries. The largest study on cross-cultural measurement invariance of the SWLS to date has been conducted on a sample of managers in 26 countries spanning five continents, using three different measurement invariance procedures (Jang et al., 2017). The findings of this study supported configural and metric invariance of the SWLS. However, full scalar invariance did not hold, as intercepts of three items (2, 4, and 5) were found to be noninvariant. Emerson et al. (2017) reviewed three decades of research on the measurement invariance of the SWLS and identified a total of 27 articles with 40 unique invariance analysis (using only MGCFA) across 23 nations. Of the 11 studies which tested invariance across cultural groups, only one supported full scalar invariance across American and Russian university students (Tucker et al., 2006), and the majority provided support only for configural or metric invariance. Most importantly, Emerson et al. (2017) showed that different studies pointed out noninvariance of measurement parameters of each of the five SWLS items across cultures, suggesting that none of the SWLS items appears to be culturally invariant.

Cross-cultural measurement invariance of the original 5-item SWLS with a 7-point response scale has been rarely examined using adolescent samples. For the period 1985–2016, Emerson et al. (2017) identified only one cross-cultural invariance study among adolescents. This study, conducted by Atienza González et al. (2016), found evidence for the full metric and partial scalar invariance of the SWLS across Portuguese and Spanish adolescents, with intercept for item 5 being noninvariant. In contrast to the original scale, the authors reduced it to five response options. More recently, Esnaola et al. (2017) used an original 7-point response scale in their study among Mexican and Spanish adolescents, and provided support for strict (which includes scalar) invariance of the SWLS across the two countries and across genders, whereas scalar invariance was supported across age, i.e., three adolescent groups. Gender invariance of the SWLS among adolescents has been rarely examined and yielded inconsistent findings. Contrary to Esnaola et al. (2017) who provided support for strict invariance across gender, Moksnes et al. (2014) found support only for metric invariance across gender in a Norwegian adolescent sample, whereas scalar invariance across gender was supported in studies among Spanish (Ortuño-Sierra et al., 2019), French (Bacro et al., 2020), and Serbian adolescents (Jovanović, 2016).

The Present Study

The cross-cultural invariance of the SWLS among adolescents is still unknown despite extensive use of the SWLS across different cultural contexts. The present study evaluated measurement invariance of the SWLS across culture, age, and gender using adolescent samples from 24 countries and regions (Argentina, Bulgaria, China, Finland, Hong Kong, Hungary, India, Indonesia, Italy, Japan, Lithuania, Malaysia, Poland, Portugal, Romania, Russia, Serbia, South Africa, South Korea, Spain, Switzerland, Taiwan, Turkey, and United Kingdom) across four continents (Asia, Africa, Europe, and South America). The inclusion of adolescents from 24 countries enables a solid test of cross-national invariance of the SWLS, as countries included vary greatly in terms of their economic development and prosperity (Fritz & Koch, 2016), cultural values (Hofstede et al., 2010; Schwartz, 2008), and the mean country-levels of life satisfaction (Helliwell et al., 2020).

Although cross-cultural invariance of the SWLS has been rarely examined in adolescents and using a large set of countries, we had three main expectations derived from previous studies. First, we expected that the SWLS will show evidence of configural invariance across cultures, whereas metric and scalar invariance will not be achieved. This expectation was based on the evidence from a review of measurement invariance studies of the SWLS (Emerson et al., 2017) which showed that cross-cultural studies of the SWLS rarely found invariance beyond the configural level, and that no clear pattern of noninvariance with regard to specific items could be detected across different studies. Furthermore, when many countries are included in the analyses, even partial metric invariance is rarely obtained (Davidov et al., 2018). Second, as most previous studies supported invariance of the SWLS across gender (Emerson et al., 2017), we expected that scalar invariance across gender will be achieved in the present study. Third, as our study included restricted age range, i.e., adolescents 13 to 19 years old, we hypothesized that SWLS will also show evidence of scalar invariance across age.

Method

Sample and Procedure

The sample included 22,710 adolescents (mean age = 15.89 years, SD = 1.52, age range = 13–19 years; 53% female) from 24 countries and regions (Argentina, Bulgaria, China, Finland, Hong Kong, Hungary, India, Indonesia, Italy, Japan, Lithuania, Malaysia, Poland, Portugal, Romania, Russia, Serbia, South Africa, South Korea, Spain, Switzerland, Taiwan, Turkey, and United Kingdom). Sample sizes ranged from 392 in Poland to 3483 in Hong Kong. Demographic characteristics for each country and region, along with language used, mode of administration, and year when data were collected are reported in Table 1. This study relied on secondary data in 22 countries collected between 2008 and 2019, whereas in India and Poland, data were collected in 2020 for the purpose of the present study. The vast majority of countries used convenience sampling for recruitment of participants. Some of the data were already published in studies not related to the present research (see Supplementary Materials for the list of studies that previously used the SWLS data in particular countries).

Table 1 Demographic Characteristics of the Sample by Country and Region

The first author of the study inspected multiple databases and conducted a thorough literature search for relevant studies that used the SWLS in adolescent samples across various countries and regions. More than 50 studies that used the SWLS data were identified, and corresponding authors were contacted to check whether they were interested in collaborating on the cross-national measurement invariance study of the SWLS among adolescents. Several requirements needed to be met in order to be included in the study: (1) an official or validated version of the SWLS had been used to collect the data; (2) an original 5-item SWLS with a 7-point response scale had been used; (3) informed consent had been obtained from participants and data were collected in accordance with protocols from institutional or other relevant ethics committee; and (4) the sample comprised high-school (i.e., secondary school) students, aged 13 to 19 years. Researchers from a total of 22 countries agreed to participate in the study and to share their data, whereas original data were collected in India and Poland.

Instrument

The original 5-item SWLS (Diener et al., 1985) with a 7-point response scale (1 = strongly disagree, 7 = strongly agree) was used in each country. Translations of the SWLS were obtained using a back-translation procedure or a committee approach (van de Vijver, 2019) in each country, except the UK and South Africa in which an original English version of the SWLS was administered (please see Table 1 for details, and Table A1 in the Supplementary Materials for SWLS items in each language).

Data Analyses

Cases with missing values on age variable and on all SWLS items were removed from the analyses. After removing these cases (3.5%), the final sample included a total of 21,915 adolescents. The remaining missing values (0.31%) were treated with the full information maximum likelihood (FIML) approach.

Following the existing literature, we tested both the original single-factor model of the SWLS with five indicators and no residual covariances, and the modified single-factor model allowing for residual covariance between two items tapping past life satisfaction (items 4 and 5). Previous CFA studies on the SWLS among adolescents produced mixed results. Although the original single-factor model of the SWLS was supported in some adolescent samples (e.g., Gouveia et al., 2009; Jovanović, 2016), a poor fit of this model has also been frequently obtained, such as in studies on Chinese (Wang et al., 2017), Italian (Di Fabio & Gori, 2016), and Portuguese (Silva et al., 2015) adolescents. A modified single-factor model allowing for correlated residuals of the items tapping into evaluating one’s past (items 4 and 5, see Fig. 1) provided a superior fit to the original single-factor model in samples of Norwegian (Moksnes et al., 2014) and Spanish adolescents (Ortuño-Sierra et al., 2019). Allowing the residual variances of this pair of items to covary also improved model fit in numerous SWLS studies in adult samples (e.g., Bai et al., 2011; Clench-Aas et al., 2011; Jovanović, 2019). Therefore, the covariance of items’ 4 and 5 residuals was substantively reasoned both empirically and theoretically because both items reflect past experiences. Measurement invariance of the SWLS models was tested across countries, gender, and age groups.

Fig. 1
figure 1

Measurement Model of Life Satisfaction

First, we fitted the confirmatory factor analysis (CFA) model on the pooled sample, then within each country. The models were evaluated using the following criteria for acceptable model fit: Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) > 0.90; Root Mean Square Error of Approximation (RMSEA) < 0.08; Standardized Root Mean Squared Residual (SRMR) < 0.08 (Asparouhov & Muthen, 2018; Browne & Cudeck, 1992; Hu & Bentler, 1999). Chi-square’s p-values were ignored since they tend to be low (pointing to rejection of a model) in large samples (Brown, 2015). The models were fitted using robust maximum likelihood estimation, and the corresponding (scaled) versions of fit indices were used throughout the analysis.

Second, we ran a MGCFA increasingly constraining the parameters to be equal across groups. The configural model constrained only a marker indicator’s factor loading to 1. The marker item was “I am satisfied with my life” (item 3) because in a preliminary analysis it showed the largest factor loading and was the most invariant across countries. In metric model, factor loadings are constrained to be equal across groups, whereas scalar model constrained in addition item intercepts. The inference in regard to invariance relied on the conventional criteria (Chen, 2007; Cheung & Rensvold, 2002) set for the differences in the model fit indices: change in CFI and TLI < 0.01, change in RMSEA < 0.015 were considered small and therefore a more constrained model could be accepted. Additionally, we also considered change in SRMR of 0.03 when comparing configural and metric models, and change of 0.01 when comparing metric and scalar models (Chen, 2007).

Measurement invariance across gender groups followed the same logic and criteria. Regarding age, we first followed the same logic with the rounded age groups; and then applied a moderated factor analysis (Bauer, 2017) in order to test for the continuous invariance across all the age values available. We tested whether each measurement parameter was moderated by age by regressing indicators in the factor model on age to test moderation of intercepts, as well as introducing new interaction terms in order to test the moderation of factor loadings. In addition, the standard errors were corrected for cluster effects of countries. We used likelihood ratio test (LRT) to decide if adding the moderation by age improved the model.

The analyses were run with R, package “lavaan” (Rosseel, 2012) and Mplus 8.4 software (Muthén & Muthén, 1998–2017). The codes for reproducing our results are available at the Open Science Framework, at https://osf.io/xkvjy. Inter-item correlations, means and standard deviations provided in the Supplementary Materials (Tables A2 and A3) can be used to replicate the results.

Results

Measurement Invariance across Countries and Regions

The pooled CFA of the original single-factor model demonstrated a good fit to the data (χ2(5) = 474.3, CFI = 0.982, RMSEA = 0.065, SRMR = 0.023). The initial model fitted well in 14 countries. Most modification indices pointed that adding covariance between residuals of items 4 and 5 would improve the model. Indeed, the comparison of the modified model with the initial congeneric model (without the covariance) showed improved fit to the data in majority of countries – in line with chi-square difference test it improved the model fit in 16 countries, in line with CFI it improved in 21 countries, in line with RMSEA – in 18 countries (see Table A4 in the Supplementary Materials). Thus, we opted for the modified model in the rest of the analyses.

The pooled CFA of the modified model demonstrated a good fit to the data (χ2(4) = 141.6, CFI = 0.995, RMSEA = 0.040, SRMR = 0.010). The country-wise CFAs showed that the model fitted well in 19 countries and regions, whereas in Hungary, Lithuania, Poland, Serbia, and the United Kingdom, the RMSEA was out of the acceptable range (Supplementary Materials, Table A5). Next, we proceeded with a subset of 19 countries and regions in which the initial measurement model fitted well. The fit indices of a multiple group CFA are listed in Table 2.

Table 2 Measurement Invariance Tests of SWLS across Countries and Regionsa

The configural model provided an acceptable fit to the data of the subset of 19 countries and regions. The model confirmed that the overall structure of the SWLS is similar across 19 countries and regions. However, when the factor loadings were set to be equal across countries (metric model), the fit statistics strongly decreased, exceeding the acceptable cut-off values (∆CFI = -0.020, ∆RMSEA = 0.019). Therefore, full metric invariance was rejected.

At this point, we continued our analyses in a more explorative way, following three different strategies. First, using a score test (Bentler & Chou, 1992), we identified and released the most noninvariant factor loadings. The score tests represented an improved fit (chi-square) of the metric model if factor loadings constraints were relaxed (see Table A6 in the Supplementary Materials). Among the factor loadings, item 2 showed the highest score test, and at the next step, when the equality of this loading was released, the loading of item 1 showed the highest impact on the misfit of the model. Then, we estimated a model of partial metric invariance (Byrne et al., 1989) relaxing constraints of loadings of items 1 and 2. The fit statistics substantially improved, as shown in Table 2.

In the subset of 19 countries and regions, compared to the configural model, the change in CFI was acceptable (0.008) as was the change in RMSEA (0.008). Therefore, partial metric invariance was supported. Scalar invariance was rejected due to low absolute fit and large changes in CFI and RMSEA values.

Second, we aimed to reduce the number of countries in order to find higher levels of invariance. We identified nine countries and regions that showed the most similar measurement parameters and ran invariance tests on them.Footnote 1 Table 2 shows that the change of CFI and RMSEA between configural and metric invariance models was within the recommended range, however, the change in fit between metric and scalar invariance models was unacceptable. Therefore, full metric invariance in these nine countries and regions was supported, but not scalar invariance. Next, we applied the score test again, this time to the scalar invariance models (Table A7 in the Supplementary Materials). These models showed that releasing constraint of item 1 intercept could substantially improve model fit. After releasing this constraint, we ran the score test again and found that releasing item 4 intercept could also substantially improve the model. The resulting partial scalar invariance model including freely estimated intercepts of items 1 and 4, showed a good fit to the data (Table 2). The change in fit indices between this partial scalar model and metric invariance model was small and within recommended cut-off values, supporting partial scalar invariance across the nine countries and regions.

Our third approach intended to maximize the number of countries and regions in the test and level of invariance, so we updated the initial model with country-specific model adjustments following modification indices. These added item residual covariances in five countries (see footnote of Table 2). The resulting model was tested for invariance across countries and regions. The configural invariance of this model was fully supported by the data. The full metric invariance, though, was rejected because both CFI and RMSEA changed substantially compared to the configural model (0.024 and 0.026 respectively). Relaxing equality of factor loadings for items 1 and 2 improved the model fit and reduced the fit change compared to the configural model (CFI decreased 0.007 and RMSEA increased 0.009). This supported a partial metric invariance of the model in 24 countries and regions. Scalar invariance was rejected due to a very large drop in fit statistics.

To summarize, the results of the measurement invariance across countries and regions are complex. We found conclusive support for partial metric invariance across 19 countries and regions, partial scalar invariance across nine countries and regions, as well as partial metric invariance of a model with country-specific ad hoc adjustments across 24 countries and regions.

Measurement Invariance across Gender

Table 3 lists measurement invariance tests across gender on the pooled sample. All models fitted the data well, and the fit change between configural, metric, and scalar invariance models was small enough to conclude the highest level of invariance. Therefore, the global tests supported full scalar invariance of the SWLS across gender.

Table 3 Measurement Invariance Tests of SWLS across Gender

Since the model did not fit well in some countries, we extended the analysis by testing invariance across gender within each country. The results evaluated with the criteria listed in the Data Analysis section showed the following (see Table A8 in the Supplementary Materials): a total of ten countries and regions (Argentina, Bulgaria, Finland, Hong Kong, Japan, Malaysia, Russia, South Africa, Taiwan, and Turkey) demonstrated a full scalar invariance across gender. China, Italy, India, Romania, Serbia, South Korea, Spain, and the United Kingdom showed metric but not scalar invariance; Indonesia, Portugal, and Switzerland showed only configural invariance, but not metric; and three countries (Hungary, Lithuania, and Poland), showed an unacceptable fit of the configural model, suggesting it needed further adjustments.

Measurement Invariance across Age

Measurement invariance across age was tested in two ways. First, we treated age as a discrete grouping which allowed following the conventional strategy of measurement invariance testing across seven age groups. The results listed in Table 4 fully supported both metric and scalar invariance.

Table 4 Measurement Invariance Tests of SWLS across Age Groups Defined by Discrete Years

Next, we considered age to be a continuous variable and used it as a moderator for each of the intercepts and loadings. We chose to test each parameter’s moderation separately. At first, we estimated a baseline model in which no parameters were moderated by age. This baseline model represented full invariance. Then we estimated models for each intercept including moderation by age and compared these models with the baseline model using LRT (see Table 5). For example, the p-value of the LRT comparing the baseline model and a model with the intercept of the item 1 moderated by age was 0.863, therefore these models had similar fit to the data and the simpler (i.e., baseline) model should be selected. Thus, the model confirmed that intercept of item 1 was not moderated by age. This was also confirmed by the insignificant interaction terms (listed in Table A9 in the Supplementary Materials). By the same token, we tested each intercept, and only intercept of item 2 had relatively low p-value on the LRT. The estimates of the interaction effect showed that age was positively associated with the intercept of item 2 considering conditions of life. It implies that, given the same level of SWLS, older adolescents evaluated conditions of their lives higher than did the younger ones. Although the LRT was significant (< 0.05), given a high number of comparisons and a large sample size, this moderation might be weak. This was confirmed by a marginal significance of the respective moderation effects.

Table 5 Moderated CFA Model Fit

Next, we tested if the factor loadings were moderated by age. The model with moderated intercepts of item 2 was used as a baseline, and each of the factor loadings was tested separately and compared to the baseline model using LRT. The LRT of the moderated loading of item 2 was highly significant indicating that it was moderated by age. The moderation effect of age on the factor loading estimated in this model was negative (b = -0.026, p = 0.047). This finding implied that item 2, tapping conditions of life had stronger association with an overall life satisfaction among younger adolescents. With this exception, the factor loadings were invariant across age. Information criteria BIC (Bayesian Information Criterion) and AIC (Akaike Information Criterion) supported these conclusions, showing that across all the models, the best fitting one included intercept and factor loading of item 2 moderated by age. All other intercepts and loadings were invariant. Interestingly, life satisfaction itself was independent of age in all estimated models.

Discussion

The present study aimed to investigate measurement invariance of the SWLS across culture, age, and gender in adolescent samples from 24 countries and regions. The results provided clear support for the configural invariance of the SWLS across 19 countries and regions, suggesting that the latent factor structure of the SWLS is similar across groups. In contrast, testing for metric and scalar invariance yielded inconclusive results. Neither metric nor scalar invariance of the single-factor model with correlated residuals between items 4 and 5 tapping past life satisfaction (i.e., a modified single-factor model) were supported across all countries and regions using the traditional cut-off criteria for comparing models (Chen, 2007; Cheung & Rensvold, 2002). It is important to note that by using more lenient criteria for comparing configural and metric models in cross-cultural studies with many groups (e.g., Rutkowski & Svetina, 2014), our results could be interpreted as showing evidence of full metric invariance.

However, using more rigorous criteria we found evidence of partial metric invariance for the modified single-factor model across a subset of 19 countries and regions (Argentina, Bulgaria, China, Finland, Hong Kong, India, Indonesia, Italy, Japan, Malaysia, Portugal, Romania, Russia, South Africa, South Korea, Spain, Switzerland, Taiwan, and Turkey), as well as partial scalar invariance across nine countries and regions (Bulgaria, China, Finland, Hong Kong, Italy, Malaysia, Romania, South Africa, and Switzerland). Several ad hoc adjustments in five countries (i.e., allowing different pairs of item residuals to covary) supported partial metric invariance across all 24 countries and regions included in the study. These adjustments imply that the scale can be used to compare the correlation and unstandardized regression coefficients across 19 countries and regions (e.g., to explain cross-country differences in predictors and correlates of life satisfaction), and with some caution, across 24 countries and regions as well. Moreover, across the nine countries and regions, the establishment of partial scalar invariance allows for a meaningful comparison of mean levels of life satisfaction.

The inspection of parameters across 24 countries and regions indicated that the most noninvariant were loadings of item 1 (“In most ways my life is close to my ideal”) and item 2 (“The conditions of my life are excellent”). In the subset of nine countries and regions, intercepts for items 1 and 4 (the latter is “So far I have gotten the important things I want in life”) were found to vary the most across cultures. Previous cross-cultural studies of the SWLS also consistently detected a number of noninvariant items. For example, in a sample of adult managers from 26 countries, Jang et al. (2017) found that intercepts of items 2, 4, and 5 were noninvariant, whereas in a study conducted on undergraduate samples, the loading of item 2 as well as the intercepts of items 1 and 3 were noninvariant across China, Germany, and Russia (Bieda et al., 2017). The intercept for item 4 was also found to be noninvariant across the United States, England, and Japan among adults aged 50–79 years (Whisman & Judd, 2016). Although the present study did not investigate the possible sources of noninvariance of the three SWLS items (items 1, 2, and 4) found to operate differently among adolescents from different countries and regions, it can be hypothesized that the meaning of “ideal life” (item 1), “life conditions” (item 2), and “important things” (item 4) is culturally embedded. Previous studies have clearly demonstrated cultural differences in construal of a good life, i.e., that people across cultures define good life and well-being differently (Uchida et al., 2015). For example, in an European-American cultural context, the definition of well-being relies heavily on high arousal positive emotions and individual achievement, whereas in an East Asian cultural context, well-being is defined in terms of both positive and negative emotions and close interpersonal relationships (Uchida & Ogihara, 2012). Understanding of well-being in many African countries is rooted in relational and material, rather than psychological, aspects of life (e.g., White & Jha, 2018), whereas health has a key role in life satisfaction judgments in many Indigenous societies with limited access to health care services (Reyes-García et al., 2021). Research on lay conceptualizations of well-being and the good life across different countries and continents clearly show that these concepts are complex, and encompass a wide variety of ideas and beliefs which might lead to different understanding of life satisfaction across cultures, and consequently to different functioning of SWLS items among samples from different cultures. The role of cultural construal of a good life in interpretation of life satisfaction items among adolescents from different cultural contexts is a promising avenue for future research, as there is a lack of empirical studies addressing this issue. In sum, the results of invariance studies suggest that there are culture-specific variations in interpretation of SWLS’s item content, which should be carefully examined in future studies. It is also important to note that the comparison of our findings with those from previous SWLS invariance studies should be done with caution, because we used adolescent samples whereas past work has relied mostly on adult samples and recruited smaller number of participants and/or countries.

Measurement invariance testing across gender on the pooled sample provided evidence of the full scalar invariance, indicating that SWLS items operated similarly among girls and boys in the full sample. This finding is in line with Emerson et al.’s (2017) conclusion that “significant gender-based systematic biases on the SWLS likely do not exist” (p. 2260). However, investigation of gender invariance within each group produced complex findings, suggesting that invariance of the SWLS across gender depends on the culture. More specifically, full scalar invariance was supported in ten (Argentina, Bulgaria, Finland, Hong Kong, Japan, Malaysia, Russia, South Africa, Taiwan, and Turkey) out of 24 countries and regions, full metric in eight countries (China, Italy, India, Romania, Serbia, South Korea, Spain, and the United Kingdom), and configural invariance was found in three countries (Indonesia, Portugal, and Switzerland). This suggests that gender invariance of the SWLS among adolescents should not be taken for granted, and that testing for gender invariance of the SWLS is a necessary step prior to examining gender differences in mean levels and correlates of life satisfaction measured with the SWLS.

Evaluation of measurement invariance across age largely supported invariance of the SWLS during adolescence. Standard MGCFA across seven age groups (from 13 to 19 years) showed that the SWLS is invariant at both metric and scalar level, whereas moderated CFA demonstrated that all but one item parameters were invariant across age with exception of intercept and loading of item 2 (“The conditions of my life are excellent”), which were weakly moderated by age. These findings are largely consistent with previous studies on age invariance of the SWLS in adolescent samples (e.g., Bacro et al., 2020; Esnaola et al., 2017; Ortuño-Sierra et al., 2019), suggesting that SWLS items function similarly among adolescents in different age groups.

The results of the present study have important implications for the use of the SWLS in cross-national research on adolescent life satisfaction. First, the evidence of a partial metric invariance suggests that the associations between life satisfaction and other constructs can be meaningfully compared across cultural groups. Second, the evidence of partial scalar invariance in a subset of countries and regions and a lack of scalar invariance in the full set implies that caution is needed when latent means of life satisfaction scores are compared across cultures. In line with previous studies mentioned above, the meaningful comparison of latent means when using the SWLS in a large number of countries is not likely to be achieved. Finally, relatively weak age and gender effects on functioning of SWLS items suggest that researchers can meaningfully compare both correlates of life satisfaction and latent mean life satisfaction scores across age and gender groups. This suggests that the SWLS is a promising tool for the assessment of life satisfaction in studies focusing on gender and age differences in well-being during adolescence.

Strengths, Limitations, and Further Research

The main strength of the present study is the use of a large sample from a diverse set of countries and regions across four continents. Furthermore, we used different approaches for testing measurement invariance, which enabled a fine-grained analysis of SWLS across different cultural settings. However, the study has some notable limitations. First, convenience sampling was used in most countries and regions and sample sizes varied greatly across countries. Previous studies have shown that the results of measurement invariance testing differ when unequal and equal sample sizes are used (e.g., Chen, 2007), that is, large imbalances in sample sizes across groups might affect invariance testing results (Yoon & Lai, 2018). More accurate results regarding the cross-cultural invariance of the SWLS could be obtained by using more balanced, nationally representative samples, recruited from all world regions. Second, the present study used only the SWLS, so we were not able to investigate cross-national validity of the SWLS in relation to well-established personality, family, school, environmental, and socio-cultural correlates of adolescent life satisfaction (e.g., Proctor et al., 2018). Future studies should go beyond a single self-reported measures, and investigate different types of SWLS validity in a cross-cultural perspective. Furthermore, a direct comparison of a cross-cultural performance of the SWLS and other widely used measures of adolescent life satisfaction, such as the Brief Multidimensional Student’s Life Satisfaction Scale (BMSLSS; Huebner et al., 2006) or the Students’ Life Satisfaction Scale (SLSS; Huebner, 1991) would enable better understanding of adolescent life satisfaction measurement in a cross-cultural perspective. Third, we used only cross-sectional data, so future studies should use longitudinal SWLS data across cultures to examine temporal stability of the scale across different cultural contexts. Fourth, our analyses did not focus on identifying the sources of noninvariance, which is an important avenue for future cross-cultural studies on the SWLS. Fifth, the data in two countries (India and Poland) were collected during the COVID-19 pandemic, which could have led to different interpretation of life satisfaction items among adolescents in these two countries. Finally, we restricted our analyses to samples of the 13–19-year-olds, so future studies should use emerging and young adulthood samples to investigate invariance of the SWLS across meaningful developmental periods.

Conclusions

The results of the present study showed that gender and age did not substantially bias the invariance of the SWLS scores. The main source of noninvariance was embedded in countries that showed a complex pattern of noninvariance. Only nine countries and regions demonstrated a partial scalar invariance that allows comparison of the country means of adolescent life satisfaction. A total of 19 countries and regions demonstrated partial metric invariance, which allows comparing correlations and unstandardized regression coefficients across countries. To summarize, our findings suggest that caution is needed when using the SWLS in cross-national research on adolescent well-being.