Introduction

Mental health problems (MHPs) denote a worldwide public health challenge and one of the leading causes of disability in children and youth population [1, 2]. With the prevalence being around 14% worldwide, MHPs often account for a high global burden of disease (13% of disability-adjusted life-years) [3]. MHPs often persist to adulthood, impacting not only the health and well-being of children and youth but also their socioeconomic trajectories and family life [4]. Depression, anxiety and behavioural disorders are among the leading causes of disability while suicide is one of the leading causes of death among youths [3].

Economic evaluation is increasingly used to inform decision-making in priority setting worldwide, and is often recommended by government agencies [5]. Economic evaluations using cost-utility analyses (CUAs) commonly require the estimation of quality-adjusted life-years (QALYs) to capture the benefits resulting from health interventions. QALYs are calculated by combining length of life and health-related quality of life (HRQoL), where the time spent in a particular health state is weighted by corresponding health state utility values (HSUVs)—used to denote the “quality” in QALYs [6]. HSUVs represent the strength of preference for a particular health state and are anchored on a cardinal scale between 0 (dead) and 1 (full health), where full health is usually defined by the domains of HRQoL measured on each instrument [7].

HSUVs can be derived using either direct or indirect valuation methods [6]. Direct methods include choice-based valuation methods, such as standard gamble (SG), time trade-off (TTO), discrete choice experiments, or best–worst scaling to value health state description (vignettes) or individual’s health state. Indirect valuation method involves the use of generic multi-attribute utility instruments (MAUIs) which usually comprise a descriptive system (largely health related quality of life questionnaires) describing various dimensions of HRQoL, and a preference-based scoring algorithm (tariff) used to convert responses from the descriptive system to a single, numeric HSUV [7]. MAUIs are available as self-report (i.e. one assesses one’s own health state) or proxy-report (i.e. one assesses the health state experienced by someone else) [7].

In modelling-based economic evaluations, analysts may not have the resources and time to acquire original HSUVs for health states of interest, thus often relying on systematic reviews on HSUVs as a robust and transparent source of evidence for various health conditions. Furthermore, HSUVs can be used to explore the burden of disease, especially when compared to population norms or people without the disorder in question. To the best of our knowledge, there has been no previous systematic review on HSUVs for children and youth with MHPs. Furthermore, with the reorientation of mental health services aimed at the young population, which is widely defined as those aged between 15 and 25 years [8], an age group that often experiences a high rate of MHPs [9], exploring HSUVs for the age range up to 25 years is helpful to support studies aimed at this age group which would otherwise be unintentionally excluded by the artificial cut-off point of being 18 years old between adolescents and adults [10].

Over the last decade, while there has been an increased focus on the valuation methods using MAUIs to produce HSUVs for children and youth, the guidelines from international agencies for assessing health outcomes in this population remain unclear [10, 11]. This is concerning as an adequate measure to capture healthcare interventions’ benefits plays a crucial role in the robustness, transparency and rigour of economic evaluations. In adult population, it is well reported that some commonly used MAUIs lack sensitivity to measure the impact of MHPs on quality of life [12]. Specifically, there is evidence that the most commonly used MAUIs, such as the EQ-5D and SF-36 are not sufficiently sensitive to reflect the impact of severe MHPs such as schizophrenia [13, 14], or bipolar disorder [15]. Meanwhile, the psychometric performance of MAUIs used in children and youth with MHPs is not well explored.

The aims of this paper are three-fold: (i) to identify reported utility values associated with MPHs in children and youth aged less than 25 years and conduct a meta-analysis of reported HSUVs if appropriate; (ii) to summarise the elicitation techniques used to derive utility values among children and youth with MPHs; and (iii) to provide a summary of the evidence on the psychometric performance of the identified MAUIs used in this space.

Methods

Search strategy

A systematic literature review was conducted following the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) checklist [16]. Systematic searches without time limit were conducted in MEDLINE (via Ovid), CINAHL (via EBSCOhost), PsycINFO (via EBSCOhost), Embase (1947 onwards), EconLit (1886 onwards), Cochrane Library (including the Health Technology Assessment Database and NHS Economic Evaluation Database) in October 2021. Search terms (Supplementary Document 1) were developed based on main concepts of (a) HSUVs, (b) direct preference elicitation methods, (c) indirect preference elicitation methods, (d) population of interest, and (e) MHPs. A search strategy that included both keywords and Medical Subject Headings (MESH) terms was developed for MEDLINE (Ovid) and subsequently translated to other databases (Supplementary Document 1). Covidence program was used for duplicate removal and for the screening process. Titles and abstracts were completed by three independent reviewers (TT, JP and AT). If an article received two positive votes, it moved to the next stage of full-text screening. In case of conflicts, a third reviewer was referred to make the final assessment. The same process occurred in the full-text screening.

Inclusion and exclusion criteria

  1. (i)

    Type of study: (a) observational and experimental design studies estimating HSUVs in children and youths with MHPs using either direct or indirect elicitation methods; (b) trial-based or model-based economic evaluations, describing the use of HSUVs in CUA of interventions addressing MHPs for children and youths.

  2. (ii)

    Study population: Child and young populations (aged up to 25 years) with any MHPs with diagnostic assessment or screened by a self-reported measure such as Kessler-10. Autism-related studies were excluded following national classifications, which do not classify autism as an MHP [17, 18]. As the review focused on studies reporting HSUVs for MHPs, we excluded studies reporting HSUVs for comorbidities with MHPs (e.g., heart disease and depression).

  3. (iii)

    Type of respondents: proxy- or self-reported HSUVs were included.

  4. (iv)

    Type of publication: peer-reviewed studies and written in English. Grey literature (e.g., conference abstracts and proceedings, discussion papers, reports, and unpublished theses) were excluded. Articles that reported HSUVs derived from non-generic MAUI (if using indirect preference elicitation methods) were excluded. Previous relevant systematic reviews [19, 20] were used for manual article searching.

Data extraction

Data extraction was conducted by TT and checked by two other authors (AT and JP). The following data were extracted:

  1. 1.

    Study description: lead author, publication year, country of publication, sample size, study type.

  2. 2.

    Study population characteristics: age range, mental health condition. We classified the age groups following the recommended practice from the Australian government as: early childhood (< 5 years), primary school children (5–12 years), adolescence (12–19 years) and young adults (20–25 years) [21].

  3. 3.

    Direct valuation method: method used, proxy- or self-report, reported mean and standard deviation of HSUVs.

  4. 4.

    Indirect valuation method: generic MAUIs used, proxy- or self-report, algorithm applied, reported mean and standard deviation HSUVs.

For studies reporting HSUVs at different time points, we only extracted baselines scores. For studies reporting HSUVs for more than one type of MHPs, HSUVs for each MHPs were extracted with the corresponding sub-samples. Meta-analysis of reported HSUVs was only conducted if reported HSUVs for the same health condition were derived from the same population using the same valuation method with the same country-specific algorithms [22].

Assessment of psychometric performance

Assessment of the psychometric performance of MAUIs was based on its reliability, validity, and responsiveness as outlined in Rowen et al. [23]. Data were extracted about the following information if available.

  1. 1.

    Name of MAUIs used, algorithms used, whether the MAUI was self-reported and/or proxy-reported by parents/caregivers, health professionals.

  2. 2.

    Study sample type of mental health problems studied, sample size, and age range.

  3. 3.

    Known-group or discriminant validity refers to the ability of an instrument to reflect known-group differences (e.g. with and without the condition).

  4. 4.

    Convergent validity refers to the extent to which an instrument converges with other measures of the same concept (i.e., the strength of association with other measures of the same concept).

  5. 5.

    Responsiveness is the capacity of a MAUI to reliably detect changes in HRQoL because of a change in their health status.

  6. 6.

    Reliability reflects the ability of an instrument in reproducing unchanged utility values where no change in health is detected when measured (a) at two different time points (test–retest reliability), (b) using different administration modes (intermodal reliability), (c) by different respondents (interrater reliability) [23].

  7. 7.

    Acceptability and feasibility are assessed by the proportion of missing answers and respondents’ understanding of the measure.

  8. 8.

    Internal consistency reliability explores whether items within a MAUI are measuring the same construct.

Results

Search results

After deduplication, 4225 articles were included for the title and abstract screening. Based on the selection criteria, we screened 189 full texts and of these, 38 were included for data extraction. A PRISMA flow chart of the study selection process is provided in Fig. 1.

Fig. 1
figure 1

PRISMA diagram

Study characteristics

Table 1 summarises the characteristics of 38 included articles that cover 12 types of MHPs from 2005 to 2021 (Supplementary document 2 for more information). Most studies (89%) were conducted between 2010 and 2021, which indicates a growing research interest in the burden associated with youth mental health. Of 38 included studies, 20 were cross-sectional studies and 18 were randomised controlled trials.

Table 1 Study characteristics and reported utility values by type of mental health conditions

Study population and conditions

Table 2 presents an overview of the 12 types of MHPs explored. Overall, 35 out of 38 studies assessed only one MHPs while three studies assessed more than one MHP such as 8 MHPs [24], or 3 MHPs [25, 26]. The most studied MHP was ADHD (n = 10), followed by depression (n = 9), anxiety disorders (n = 4), substance use (n = 4), undefined psychiatric disorders (n = 4), psychosis (n = 3), personality disorders (n = 2), behaviour disorders (n = 2), PTSD (n = 2), self-harm (n = 2), internalizing problems (n = 1), and general MHPs (n = 3). The sample size in included studies varied from 43 [27] to 754 [28]. The included studies were from the UK (n = 14) [28,29,30,31,32,33,34,35,36,37,38,39], the Netherlands (n = 6) [40,41,42,43,44,45], Finland (n = 3) [46,47,48], Sweden (n = 2) [26, 49, 50], Germany (n = 1) [51, 52], Australia (n = 4) [24, 53,54,55], US (n = 3) [27, 56, 57], and one study each from Brazil [58], Denmark [59], Norway [60], one study in Ireland and UK [25], and one in UK and US [61].

Table 2 Elicitation techniques used by MHPs

Age range

Figure 2 presents the age range of children and young population in included studies. The age range explored was between 4 and 25 years. EQ-5D-3L was used for a wider age range between 4–19 years. EQ-5D-5L was used mainly for youth aged 14–21. EQ-5D-Y was used for participants aged 4–18. CHU9D was used for children aged between 4 and 17 years. HUI2 and HUI3 were used for participants aged between 5 and 18 years. QWB and SF-6D were used only for adolescents aged 13–18 years. In terms of studies employing direct elicitation techniques, the SG was used for participants aged 5–18 years while TTO for those aged 9–16 years. For samples including mixed adolescents and adults, we found EQ-5D-3L was used for samples aged 14–19 years [63], EQ-5D-5L in samples aged 14–21 years [52, 64] and 15–19 years [65], 16D in samples aged between 11–22 years [46,47,48].

Fig. 2
figure 2

Elicitation methods by age range and perspectives. Notes: P* = Proxy report (parents, professionals and children); P(p) = Proxy-report (parents); P(c) = Proxy report (caregivers); P(g) = Proxy-report (general public); S = Self-report; B = Both self-report and Proxy-report

Valuation methods

Table 2 summarises valuation methods and instruments used in the included studies. Most studies (n = 36) adopted the indirect valuation method, while two used the direct valuation method including SG [27] and TTO [66] which were used only for ADHD. Among 10 MAUIs used as the indirect valuation method, EQ-5D-3L was the most frequently used (n = 16) in 9 types of MHPs, followed by CHU9D (n = 13), HUI2 (n = 7), HUI3 (n = 6), EQ-5D-Y (n = 5), EQ-5D-5L (n = 3), 16D (n = 4), QWB (n = 2), SF-6D (n = 2), and AQoL4D (n = 1). Several studies used more than one MAUI including two studies that used both EQ-5D-Y and CHU9D [29, 31], two using HUI2 and HUI3 [25, 58], one using EQ-5D-Y and HUI2 [36], and two studies using four MAUIs (HUI2, HUI3, EQ-5D-3L, QWB, SF6D) [56, 57]. One study used both direct (SG) and indirect valuation methods (EQ-5D-3L) [30].

Algorithms

EQ-5D-3L was used in five studies with the Dutch (adult) algorithm [40, 44, 45, 63], two with the German (adult) algorithm [52, 64], and four with the UK adult algorithm [30, 35, 38, 41]. EQ-5D-5L was used with the UK adult algorithm [33]. CHU9D was used with the UK adult algorithm in four studies [29, 31, 32, 59], and the Australian adolescent algorithm [53, 59]. HUI2 and HUI3 were used with the Canadian adult algorithm [56,57,58] while HUI2 was used with the UK adult algorithm in one study [25]. SF-6D was used with the UK adult [56, 57]. Only one study [31] reported HSUVs for EQ-5D-Y using the UK adult algorithm while another study using EQ-5D-Y did not report HSUVs [26] possibly due to the lack of a corresponding value set at the time [23].

Perspectives

Figure 2 summarises the use of elicitation methods and MAUIs used by perspectives. Twenty-eight studies administered the MAUI to children and youth using self-report while 16 studies used proxy-report (either a direct elicitation method and/or MAUIs) including parents (n = 13), caregivers (n = 1), teachers (n = 1), professionals (n = 1), general public (n = 1), and children and youth (i.e. evaluating bespoke health states description) (n = 1). Proxy-report was mainly adopted in studies exploring ADHD (8 out 10 studies) while all studies exploring depression, post-traumatic stress disorders, cannabis use disorder, psychosis and self-harm adopted self-report. Studies using the direct valuation methods such as SG [27, 30] and TTO [66] all adopted proxy-report. Indirect valuation methods were used with proxy-report for children as young as 4 years old [31] and self-report in children as young as 5 years old [32].

Reported utility values by MHPs

Figure 3 shows the HSUVs derived by different elicitation methods and MAUIs used across MHPs that were explored in more than one study. Across all generated HSUVs by MHPs, Disruptive Behaviour Disorder had the lowest HSUV of 0.06 [43] while cannabis use disorder was associated with the highest HSUVs of 0.88 [44].

Fig. 3
figure 3

Reported HSUVs by type of elicitation tools across MHPs. Notes: MHPs having HSUVs derived from only one study were excluded (internalising problems, medicine use, delinquency, tobacco use, other drug used disorders, avoidant personality disorder, personality disorder, depressive personality disorder, and obsessive-compulsive personality disorder)

HSUVs reported for ADHD ranged from 0.444 [66] to 0.897 [31]. Only two studies reported HSUVs associated with severity levels, in which HSUVs for mild ADHD were 0.71 and 0.787, for severe ADHD 0.48 and 0.444 [27, 66]. HSUVs reported for anxiety disorders ranged from 0.62 [53] to 0.88 [32], and for depression from 0.495 [34] to 0.81 [57]. HSUVs for post-traumatic stress disorder range between 0.70 [52, 64] to 0.755 [60], for borderline personality disorder from 0.236 [38] to 0.49 [63]. HSUVs associated with undefined psychiatric disorders ranged from 0.5 [39] to 0.782 [25], for psychosis from 0.7992 [47] to 0.80 [46], for self-harming from 0.57 to 0.68 [28], for alcohol use from 0.82 [45] to 0.73 [24], for conduct disorders from 0.11 [43] to 0.802 [25], and for general MHPs from 0.56 [54] to 0.804 [59].

HSUVs for the following MHPs were derived from only one study each where HSUVs associated with internalising problems were 0.71 [50], with medicine use and delinquency in adolescents with MHPs being 0.81 and 0.82, respectively [45]. Furthermore, HSUVs of 0.49 were reported for avoidant personality disorder, 0.70 for personality disorder not otherwise specified, 0.34 for depressive personality disorder, 0.50 for obsessive–compulsive personality disorder [63], 0.52 for oppositional defiant disorder [43], 0.54 for suicide ideation [24].

Across MAUIs, Fig. 3 shows that the generated HSUVs vary significantly among the most frequently explored types of MHPs while the variations could not be overserved obviously in MHPs explored by fewer studies (n < 3). Specifically, CHU9D yielded generally higher mean HSUV scores while HUI2 and HUI3 yielded lower scores than other MAUIs in ADHD. Interestingly, EQ-5D-3L generated the widest range of HSUVs from the lowest HSUV of 0.495 [34, 57] to the highest of 0.81 among all included scores in depression. Similarly, CHU9D generated both the lowest HSUV of 0.62 [53] to the highest HSUV of 0.88 [32] in anxiety disorders.

A wide variation in reported HSUVs between self- and proxy-report was also observed (Supplementary Document 3). While self-reported HSUVs for anxiety disorders vary significantly, most of them (ranging from 0.62 [53] to 0.87 [32]) are general lower than proxy-reported HSUVs (0.85 [32]). In contrast, proxy-reported HSUVs for undefined psychiatric disorders are generally higher than self-reported ones. In conduct disorders, proxy-reported HSUVs were generally higher than self-reported ones except for one proxy-reported HSUV being the lowest score [43]. Among all included MHPs, depression has the widest range of HSUVs which were all derived from self-report.

A meta-analysis was considered inappropriate given the heterogeneity of reported HSUVs caused by different elicitation methods and MAUIs used, scoring algorithms, age range, country of research, and study designs.

Psychometric performance

Only nine studies assessed the psychometric performance of MAUIs used in youth mental health population in which five studies assessed EQ-5D-3L [35, 40, 56, 57, 61], two studies assessed EQ-5D-5L [26, 64], two studies assessed CHU9D [55, 59], and two studies [56, 57] assessed four MAUIs (EQ-5D-3L, QWB, SF-6D, HUI2 and HUI3). Table 3 reports the positive evidence, mixed evidence and no evidence for each psychometric property per MAUI, followed by the clinical population assessed.

Table 3 Psychometric performance

No study assessed all psychometric properties investigated in this review. Overall, seven studies assessed discriminant validity, six studies assessed convergent validity, five studies assessed responsiveness, one study each assessed test–retest validity and feasibility. Three studies assessed the psychometric performance of MAUIs using proxy-report while six studies used self-report.

For EQ-5D-3L, the review found evidence of discriminating validity, convergent validity and responsiveness from both self-report [35, 56, 57] and proxy (parent) report [40, 61]. Whilst there is evidence of the discriminant validity, convergent validity, test–retest validity and feasibility in using EQ-5D-5L for youth population with mental health [64], the review did not find evidence of responsiveness for EQ-5D-5L [64]. For EQ-5D-Y-5L, one study using self-report showed evidence to support its feasibility, discriminant validity, and test–retest validity, however, the evidence of its convergent validity is mixed [26]. Regarding CHU9D, two included studies using proxy-report (parent) found evidence of the discriminant validity, convergent validity, internal consistency, and responsiveness of the dimensions of CHU9D [55, 59]. One study also provided evidence of the discriminant validity, convergent validity, and responsiveness of CHU9D using both UK (adults) and Australian (adolescents) algorithms [59]. For QWB, SF-6D, HUI2&3, the review found that these MAUIs have discriminant validity, convergent validity, and responsiveness [56, 57].

Discussion

This paper, for the first time, provided a systematic review of evidence on (1) the HSUVs of various MHPs, (2) the current practice to generate HSUVs, and (3) the psychometric performance of MAUIs used in children and youths with MHPs. The review found 38 studies reporting HSUVs for 12 types of MHPs in children and youths aged between 4 and 25 years across 12 countries between 2005 and October 2021. Among 12 types of MHPs reported in the review, depression and ADHD are the most explored MHPs, with 10 studies each. Overall, the review found that Disruptive Behaviour disorder has the lowest HSUV of 0.06 from children in primary education valuing vignettes in a cross-sectional study [43] while cannabis use disorder was associated with the highest HSUVs of 0.88 from an intervention study [44]. Moreover, the indirect valuation method using MAUIs (used in 95% of included studies) is the most frequently used approach to generate HSUVs for all 12 MHPs while only two studies using the direct valuation method was focused on ADHD. Of the ten MAUIs found in this review, the most frequently used was EQ-5D-3L, used in 15 out of 38 studies across eight out of 12 types of MHPs. CHU9D, the only MAUIs developed for use in childhood specific populations was used in 13 out of 38 studies across seven out of 12 MHPs. However, the review found no measure that has sufficient evidence of psychometric performance in children and youths with MHPs.

Heterogeneity was noticeable across the included studies. The variations, mainly caused by different elicitation methods and MAUIs used, scoring algorithms, age range, country of research, and study designs, have made it difficult to derive an estimate of the overall effect. For example, using samples of 11–17 years old adolescents at risk of self-harm, one study reported an HSUV of 0.68 using EQ-5D-3L with the UK adult tariff [28] while another study reported an HSUV of 0.57 using CHU9D with the Australian adolescent tariff [24]. Even using the same clinical sample, different MAUIs produced different HSUVs, for example, an HSUV of 0.56 derived from HUI3 while an HSUVs of 0.81 from EQ-5D-3L based on the same sample of adolescents with depression [57]. These findings are consistent with results from adult literature where different MAUIs produced different scores even in the same person [67]. Furthermore, only one study reports the populations norms for several MHPs [24], none of studies compared HSUVs with their corresponding populations norms. Thus, future research may benefit from such comparison.

We found some MAUIs were used for samples with the age range different from their recommended age range. Specifically, EQ-5D-3L, an adult-specific MAUI, was used in children aged as young as 4 years through proxy-report [31] or as young as 8 years old using self-report [41]. While the recommended ages for 16D are 12–15 years [10], the review found that the 16D was used in adolescents as young as 10 years old [60] or as old as 22 years [48]. We noted that studies may use measures for sample outside of the recommended age range as a way to save time and resources, or to facilitate comparisons across different age groups. However, transitioning between child and adolescent/youth populations can be challenging as there are significant developmental differences between these age groups [68]. Therefore, we urge future research to consider the potential limitations and accuracies that may result from this approach.

This review found limited evidence of the psychometric performance of MAUIs used in children and youths with MHPs with only nine studies covering eight MAUIs in five types of MHPs. Of these MAUIs, more studies assessed the psychometric performance of EQ-5D-3L than other measures. Only CHU9D was assessed using a sample with general MHPs while other MAUIs were assessed using a specific MHP samples. Specifically, EQ-5D-3L was assessed in ADHD and depression, EQ-5D-5L was assessed in PTSD, EQ-5D-Y-5L was assessed in psychiatric disorders, and QWB, SF-6D, HUI2&HUI3 were assessed in depression. Furthermore, we found that MAUIs were used in MHPs where evidence on their psychometric performance is lacking, such as the use of EQ-5D-3L in personality disorder [38, 63], behaviour disorders [43], substance used [44, 45, 65] and psychiatric disorders [39].

Evidence on the psychometric performance of MAUIs used in children and youths with MHPs is not conclusive, which makes it difficult to recommend one MAUI over another. Although the best psychometric evidence exists for EQ-5D-3L and CHU9D in terms of discriminant validity, convergent validity and responsiveness, other psychometric properties remain unclear. For example, both CHU9D and EQ-5D-3L have not been assessed regarding their feasibility, test–retest reliability, and interrater reliability. The evidence of other MAUIs was found either mixed or limited in number of studies and type of MHPs explored. Specifically, EQ-5D-5L was found to lack responsiveness while EQ-5D-Y-5L showed inconsistencies in its ability to converge with other measures. Although the psychometric performance of EQ-5D-5L and EQ-5D-Y-5L may look alarming, it is worth noting that the evidence was derived from one study per measure with small samples less than 100 which may limit the statistical power of the results. The findings on the convergent and discriminant validity of EQ-5D-3L, SF-6D and HUI3 in adolescents with depression align with findings in the adult depression literature [14, 15]. The limited evidence in terms of type of MHPs explored, extensive psychometric properties, number of studies with sufficient samples urges further research to provide comprehensive picture of the psychometric properties of all measures used in children and youths with MHPs.

Indeed, various aspects of MAUI psychometric assessments need further attention. Specifically, we found limited or no evidence in test–retest reliability, internal consistency, and interrater reliability of MAUIs. Various validity tests such as face validity and content validity would also contribute to more conclusive evidence of the psychometric performance of MAUIs in children and youths with MHPs. Lastly, although MAUIs were used across different countries using different languages, the review noted that the cultural validity of MAUIs has not been investigated.

Limitations of our study include the inability to conduct a meta-analysis of reported HSUVs due to the heterogeneity of the included studies and the small number of studies per MHP, which would otherwise provide an overview picture of the impact of MHPs on HSUVs and the associated disease burden [22]. Non-English studies and grey literature were excluded, which may limit the comprehensiveness of our review. Furthermore, following previous systematic reviews [10, 11, 19, 23], we did not conduct quality assessment of the included studies as well as the appropriateness of the statistical analyses undertaken in included studies, however results of included studies were discussed in consideration of sample sizes.

Conclusion

This review provides an overview of HSUVs of various MHPs, the current practice to generate HSUV, and the psychometric performance of MAUIs used in children and youths with MHPs. We report limited evidence on the psychometric performance of MAUIs in terms of psychometric properties assessed, type of MHPs explored, number of studies per measure and sufficient sample sizes. This highlights the need for more rigorous and extensive psychometric assessments to produce conclusive evidence on the suitability of MAUIs used in this area.