1 Introduction

Different types of virtual reality technology (e.g. non-immersive, semi-immersive, or fully immersive) have emerged as an useful tool in neurorehabilitation with promising results for physical and cognitive rehabilitation (Voinescu et al. 2021). In this way, virtual reality-based interventions have been enhanced as a technological solution for telerehabilitation at the time of the COVID-19 pandemic (Matamala-Gomez et al. 2021). Furthermore, the previous literature has proposed that virtual reality strategies present higher adherence in patients with neurological disorders (Asadzadeh et al. 2021; Dalmazane et al. 2021). Multitask training, patient motivation, safety, and the low cost of commercial devices are some of the benefits of using virtual reality for neurological rehabilitation (Forsberg et al. 2015; Gustavsson et al. 2021; Moan et al. 2021). Nonetheless, some undesired effects (e.g. headache, sickness, or nausea) (Massetti et al. 2018), as well as the difficulty of transferring the complex skills trained in virtual environments to the real world and the lack of ecological validity in a neurologically impaired population (Levac et al. 2019), were reported. Specifically, for balance training, the time of latency, the underestimation of perceived distances, and the dependence on specific systems (e.g. balance board) and virtual contexts were proposed as potential weaknesses of virtual reality environments (Morel et al. 2015).

Multiple sclerosis is a global neurodegenerative disease affecting approximately three million people in the world (Tafti et al. 2022). Balance disorders, gait impairments, and fatigue are the main symptoms in patients with multiple sclerosis that obtain positive effects with physical therapy intervention (Amedoro et al. 2020; Abou et al. 2022). Particularly, virtual reality-based physical rehabilitation showed benefits for balance and gait training (Casuso-Holgado et al. 2018; Nascimento et al. 2021); however, fatigue is a significant barrier to participation in physical activity, which influences the participants’ adherence (Moore et al. 2022). A recent systematic review has summarised dropout data from randomised control clinical trials about exercise interventions in people with multiple sclerosis, concluding that mean age, the proportion of females, and intervention duration were moderators inversely associated with adherence (Dennett et al. 2020). Therefore, these findings could impact the sample size calculation, promoting an under- or overestimation. Furthermore, this could influence the differential dropout rate, which is how the degree of dropout differs between the intervention and comparator conditions after randomisation (Crutzen et al. 2015). It might affect the power of research and could present a risk of bias for randomised control clinical trials (Cooper et al. 2018). In view of this background, setting accurate expected dropout rates in virtual reality studies for rehabilitation in multiple sclerosis could help future trials to avoid problems in their internal or external validity. In addition, the identification of factors specifically associated with dropout in virtual reality trials could help clinicians when translating research into practice.

As far as we are concerned, no previous systematic reviews were found reporting dropout in virtual reality interventions for balance and gait rehabilitation in this population. Thus, the present systematic review and meta-analysis aimed to: (1) systematically assess and meta-analyse the overall pooled dropout rate of randomised controlled trials using virtual reality as an intervention for balance or gait training in people with multiple sclerosis in both absolute and comparative terms; (2) analyse whether any participant or intervention factors are related to dropout; and (3) identify adverse events that could be the reason for dropouts.

2 Methods

2.1 Data sources and search strategy

This systematic review was carried out following the 2020 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al. 2009). The review protocol was registered in the PROSPERO database (Registration number: CRD42021284989).

Two independent reviewers (M.J.C.-H., C.G.-M.) conducted an electronic search in MEDLINE (PubMed), Scopus, Web of Science (WOS), the Physiotherapy Evidence Database (PEDro), the Cochrane Database of Systematic Reviews (CDSR), CINHAL, LILACS, ScienceDirect, and ProQuest. The search was performed between July and November 2021. Neither language nor date filters were applied in the different databases. Key terms concerning intervention (‘virtual reality’, ‘game’, ‘gaming’, ‘exergaming’, and ‘interactive’), balance (‘balance’ or ‘postural control’), gait (‘gait’, ‘walking’, and ‘ambulation’), and ‘multiple sclerosis’ were combined as search terms in the strategies. The search strategy is shown in detail in Supplemental Material 1.

2.2 Research question and study selection

The participants, interventions, comparisons, outcomes, and study design (PICOS) model was considered to set the following research questions: what dropout data are reported during the intervention and follow-up period by randomised control clinical trials conducting virtual reality intervention to improve balance or gait in multiple sclerosis and what are the possible moderators affecting dropout in these studies?

Participants included in the review were female or male, aged between 18 and 65 years old, with any diagnosis of multiple sclerosis phenotype meeting the revised McDonald criteria (Thompson et al. 2018). Walking ability was preserved according to the Expanded Disability Status Scale (EDSS) score (EDSS ≤ 6). Included interventions involved any type of virtual reality systems aimed at improving balance or gait compared to other interventions based on physical activity with or without external aid use. Furthermore, studies that reported dropout event information were included.

2.3 Data extraction and quality assessment

First, two independent reviewers (C.G.-M. and M.J.C.-H.) identified potential articles in databases to be included in the systematic review through the title and abstract information. Next, duplicates were removed, and an exhaustive analysis of articles was carried out based on their full-text reading. This step was particularly focussed on the selection criteria assessment, ensuring that the inclusion criteria were met before selecting suitable studies. In the case of disagreement, a third reviewer (M.-D.C.-V.) was consulted to decide on the inclusion of the documents.

Once articles were selected, the quality assessment was conducted using the PEDro scale (Maher et al. 2003) and the Revised Cochrane risk-of-bias tool for randomised trials (RoB-2) (Higgins et al. 2019). PEDro is a reliable tool of 11 items that evaluates the inner validity of a clinical trial. If studies score above 6 points, they are classified as level I evidence (6–8: good; 8–10: excellent). If the score is below 5, they are classified as level II (4–5: deficient; < 4: poor). ROB-2 allows the evaluation of bias in randomised control trials, comprising five domains (bias arising from the randomisation process, due to deviations from the intended interventions, to missing outcome data, in the measurement of the outcome, and in the selection of the reported result) that are qualified as a low or high risk of bias with some concerns (Sterne et al. 2019).

Next, reviewers recorded the data for qualitative and quantitative synthesis. The extracted data were country, multiple sclerosis phenotype and disability status, female and male percentages, age, experimental and comparator group intervention characteristics, number of participants recruited and analysed, retention rate, dropout rates (for the experimental and control groups), reasons for dropout (in each group), and adverse events. Disagreements in data were solved by consensus with a third reviewer. Information provided by the included studies allowed us to calculate dropout rates in all cases, so no corresponding authors were contacted.

2.4 Data analysis

Dropout rate was calculated as the number of participants who did not complete the intervention and follow-up period divided by the total number of participants that underwent the randomisation process. Moreover, retention rate was the total number of participants that concluded the intervention, showing the adherence rate to treatment. For those studies that included more than two groups of intervention, comparison between groups was analysed separately two by two.

To conduct the meta-analysis, the R Studio software (version 4.0.0) and its packages meta, metafor, and dmetar were used (Viechtbauer 2010; Balduzzi et al. 2019; Harrer et al. 2021). The proportion meta-analysis was performed through the metaprop function to determine the estimated dropout rate in virtual reality intervention, the control comparator, and all arms. Proportions were transformed using the logit transformation (Schwarzer et al. 2019).

A binary meta-analysis based on odds ratios (ORs) was conducted to examine whether the probability of dropouts is higher in the virtual reality or in the comparator interventions. To assess the effect measure in binary outcomes, the OR with a 95% confidence interval (95%CI) was calculated, and the inverse variance method was used to adjust pooling estimations to sparse data (considering that dropouts are a rare event). Likewise, the Hartung–Knapp adjustment for a random effects model was implemented. Focussing on ORs, if the value is 1, there are no differences in dropouts between the experimental and comparator groups. In contrast, if the OR is greater than 1, a higher dropout rate was registered for the experimental group. The restricted maximum-likelihood estimator for tau2 was selected to estimate the between-study variance (Viechtbauer 2005). As some studies could present zero events in the experimental and/or comparator arm, a 0.5 continuity correction was added to all meta-analyses, as suggested by Gart and Zweifel (1967).

Heterogeneity between studies was assessed through I2, tau2, and Cochrane’s Q (p < 0.05 indicates heterogeneity). When I2 presents a value above 50%, it means that large heterogeneity is found across studies (Higgins et al. 2021). A random effects model was employed considering the possible degree of heterogeneity between the included studies.

Forest plots were used to show the outcomes of proportions and binary meta-analyses. The prediction interval was added as a red line to the forest plot to provide a measure of reliability of future treatment effects in new studies (Nagashima et al. 2019). Depending on the level of immersion of the subject within the virtual environment, virtual reality was classified as non-immersive, semi-immersive, and fully immersive for subgroup analysis.

A sensitivity analysis was carried out to assess the influence of studies on the overall binary meta-analysis results. The influence was explored to detect the presence of outlier data and whether there were studies that contributed to heterogeneity or bias pooled results. A Baujat plot, a L’Abbé plot, and influence graphs were created to represent influential cases in meta-analysis. The influence graphs showed the studies that significantly influenced the pooled effect size in red. In addition, an exploratory graphical analysis of data was performed to examine whether there is a clear trend of effect size related to independent variables.

Meta-regression was conducted to evaluate possible associations between participants or study characteristics which could vary in the presence of dropout events. Studies with no available data were excluded from the meta-regression analysis. Moreover, to run the meta-regression, at least three studies with the predictor were needed. The analysed moderators were interventions, number, duration, frequency and weeks of sessions, EDSS score, multiple sclerosis phenotype, and sex.

Publication bias and small study effects were evaluated through a contour-enhanced funnel plot adjusted by the Duval and Tweedie trim and fill method (Shi and Lin 2020). Asymmetry in the funnel plot indicated the effect of small studies in the pooled results. To confirm the absence of asymmetry, a p value greater than 0.05 must be reached in the Harbord’s test (Harbord et al. 2006) and the Egger bias test (Egger et al. 1997).

3 Results

3.1 Study selection and methodological quality assessment

In total, 7024 articles were identified through the initial database search based on titles and abstracts. After that, duplicates were removed, obtaining 5995 articles. Once the studies underwent the screening and eligibility steps, 16 randomised control trials were included for the qualitative synthesis and quantitative analysis. There was no disagreement between reviewers in the study selection process. Figure 1 showed the PRISMA flowchart detailing the selection procedure. Excluded studies and their reasons are detailed in Supplemental Material 2.

Fig. 1
figure 1

Flow diagram of trials selection based on PRISMA 2020 guidelines. *Consider, if feasible to do so, reporting the number of records identified from each database or register searched (rather than the total number across all databases/registers). **If automation tools were used, indicate how many records were excluded by a human and how many were excluded by automation tools. From: Page et al. (2021). For more information, visit: http://www.prisma-statement.org/

Regarding the quality assessments, the PEDro scale results are shown in Supplemental Material 3. PEDro scores were reported from the included studies: thirteen with level I evidence (Lozano-Quilis et al. 2014; Hoang et al. 2016; Kalron et al. 2016; Calabrò et al. 2017; Peruzzi et al. 2017; Russo et al. 2018; Khalil et al. 2019; Munari et al. 2020; Ozkul et al. 2020; Tollar et al. 2020; Molhemi et al. 2021, 2022; Pagliari et al. 2021) and three with level II (Brichetto et al. 2015; Robinson et al. 2015; Yazgan et al. 2020). Most studies were single blinded, with the assessor being blinded to participant allocation. In addition, the ROB-2 overall score reported that most studies presented some concerns, but only three studies (Robinson et al. 2015; Ozkul et al. 2020; Yazgan et al. 2020) had a ‘high risk’ of bias (Fig. 2). Disagreements between reviewers occasionally occurred for domain 2, but consensus was always reached without the participation of the third reviewer.

Fig. 2
figure 2

Cochrane risk of bias tool-2 summary

3.2 Study design and population characteristics

The main characteristics of the participants and the interventions are shown in Table 1. The randomised pooled population obtained from the reviewed studies reached a total of 656 participants with a mean EDSS score of 4.22 (95%CI 4.15–4.30). The mean age was 45.12 (95%CI 44.66–45.59), and 65.57% of the population were female. All studies involved patients with relapsing–remitting type, except for three studies which did not specify the phenotype of multiple sclerosis (Robinson et al. 2015; Kalron et al. 2016; Pagliari et al. 2021). Furthermore, eight studies (Lozano-Quilis et al. 2014; Brichetto et al. 2015; Hoang et al. 2016; Munari et al. 2020; Tollar et al. 2020; Yazgan et al. 2020; Molhemi et al. 2021, 2022) involved participants with any type of multiple sclerosis (relapsing–remitting, secondary progressive, and primary progressive) without subgroup analysis.

Table 1 Characteristic of studies included in the systematic review

Concerning the immersion of the virtual reality systems, 14 studies employed non-immersive virtual reality as the main experimental intervention and four of them used the Wii Fit system (Brichetto et al. 2015; Robinson et al. 2015; Khalil et al. 2019; Yazgan et al. 2020). Only two trials used fully immersive virtual reality (Kalron et al. 2016; Ozkul et al. 2020).

Most studies compared the virtual reality intervention to improve balance or gait to conventional balance training (n = 13, 81.25%) (Lozano-Quilis et al. 2014; Brichetto et al. 2015; Robinson et al. 2015; Hoang et al. 2016; Kalron et al. 2016; Peruzzi et al. 2016; Calabrò et al. 2017; Russo et al. 2018; Khalil et al. 2019; Ozkul et al. 2020; Molhemi et al. 2021, 2022; Pagliari et al. 2021), followed by robotic-assisted gait training (n = 3, 18.75%) (Calabrò et al. 2017; Peruzzi et al. 2017; Munari et al. 2020). The lowest number of sessions performed was 8 (Robinson et al. 2015), while the highest was 54 (Russo et al. 2018). Most authors proposed a frequency of intervention of 2 times per week with a minimum time per session of 30 min (Hoang et al. 2016; Kalron et al. 2016) and a maximum of 85 min (Calabrò et al. 2017).

The mean number of dropout events for the experimental group was 1.61 cases and 1.88 for the comparator group. The highest number of dropouts in the virtual reality groups was registered by Hoang et al. (2016) and Pagliari et al. (2021). The reasons reported by the authors for dropout in both groups were: difficulties reaching the research centre, transportation problems, scheduling problems, moving to another city, refusal to participate, personal or familial issues, lack of motivation or time, loss of data due to administrative problems, exacerbation of symptoms, disease relapse, work intensity, and illness/medical reasons/hospitalisation not related to multiple sclerosis. Three studies did not report any dropout events during the intervention or follow-up period (Brichetto et al. 2015; Calabrò et al. 2017; Russo et al. 2018).

3.3 Meta-analysis of proportions

A total of 18 arms (k) from 16 studies were included in the proportion and binary meta-analysis, since one of the randomised control trials presented three study groups (Tollar et al. 2020). From a total of 638 participants, 63 cases of dropouts were reported. The forest plot showed an overall pooled dropout rate of 6.6% (95%CI 3.2–12.9%) without heterogeneity between studies (tau2 = 1.18, Q = 10.07, df = 17, I2 = 0%, 95%CI 0–50%, p = 0.90) (Fig. 3). The dropout rate for the virtual reality-based interventions was 5.7% (95%CI 2.3–13.6%) against the 9.7% (95%CI 5.7–16.02%) in the comparator groups (Supplemental Material 4). Conversely, the retention rate for the virtual reality and comparator groups was 94.3% and 90.3%, respectively. None of the prediction intervals calculated across the meta-analysis suggested that the intervention would achieve the same effects in the future.

Fig. 3
figure 3

Forest plot of dropout rate for all groups of studies

3.4 Binary meta-analysis (OR)

The main results showed a slightly lower probability that dropouts occurred in the virtual reality-based interventions than in the comparator groups, but a significant difference was not obtained (OR = 0.89, 95%CI 0.64–1.24, p = 0.46). No significant heterogeneity between studies was found (tau2 = 0, Q = 5.6, df = 17, I2 = 0%, 95%CI 0–50%, p = 0.99) (Fig. 4). The prediction interval confirmed that the same effects would not happen in the future studies. A subgroup meta-analysis according to the immersion level of the virtual reality was not carried out because the number of studies using immersive systems did not reach the minimum required (3 studies).

Fig. 4
figure 4

Forest plot of odds ratio comparing attrition from virtual reality intervention and other comparator interventions in people with multiple sclerosis to improve balance or gait

A post hoc sensitive analysis using the L’Abbé and Baujat plots and influence graphs (Supplemental Material 5) showed that none of the included studies influenced heterogeneity or bias for the pooled effect size, and no outliers were found. Additionally, no small study effects or publication bias was shown in the contour-enhanced funnel plot (Fig. 5), the Harbord test (p = 0.37), or the Egger bias test (p = 0.34).

Fig. 5
figure 5

Contour-enhanced funnel plot

3.5 Meta-regression

The meta-regression revealed that the type of intervention, number, frequency, and duration of session, weeks of intervention, EDSS score, multiple sclerosis phenotype, sex, and methodological quality could not be related to the dropout events. A detailed description of the analysis is shown in Table 2.

Table 2 Meta-regression analysis

4 Discussion

A total of 16 randomised control trials reporting dropouts were meta-analysed to calculate the overall pooled dropout rate of virtual reality-based interventions for the improvement of balance and gait in patients with multiple sclerosis. The main clinical implication of the results of our study was that the virtual reality-based training for balance and gait in people with multiple sclerosis was highly accepted with a low dropout rate and high adherence during the study period. Torous et al. (2020) suggested that the retention in research contexts could change when experimental approaches are translated into a clinical setting. This could be especially important for long rehabilitation programmes in chronic conditions. A recent study (Hortobágyi et al. 2022) reported a high adherence rate to a two-year maintenance programme including exergaming in people with multiple sclerosis; however, the sample size was very small, and more research about long-term adherence to virtual reality rehabilitation in this population is needed.

Adherence is one of the main conflicts faced in rehabilitation; the therapeutic approach of multiple sclerosis is not an exception. As a result, looking for rehabilitation therapies that achieve higher participant compliance to treatment is vital (Arafah et al. 2017). If correct adherence is not achieved, the effectiveness of the rehabilitation might be limited and incur additional healthcare costs (Jack et al. 2010; Room et al. 2021). Accordingly, the previous literature has proposed that virtual reality strategies presented higher adherence in patients with neurological disorders (Asadzadeh et al. 2021; Dalmazane et al. 2021). Nonetheless, our results suggested lower dropout rates in virtual reality-based interventions, which may be confirmed with larger sample sizes. This idea is supported by the prediction intervals, which stated that our findings could change with future trials. The recent systematic review of Bevens et al. (2021) analysed the dropout rate in people with multiple sclerosis who received digital health interventions, showing no significant differences between experimental and control comparators. Therefore, we can consider that the adherence to virtual reality or other technological approaches were at least similar to other interventions.

During the screening process, several studies were discarded because dropouts were not mentioned. Despite CONSORT guidelines stating the need to report complete data, many authors do not know how to handle dropouts (Bell et al. 2013). To address this issue, it is necessary to standardise the way in which the reason and number of dropouts are described, for example, using the CONSORT flowchart of the study period. Also, further details of dropouts could help to make decisions regarding which interventions to offer to whom (Wright et al. 2021).

Our meta-regression data showed that the type of intervention, number, duration, and frequency of sessions, weeks of intervention, disability score, phenotype, sex, and methodological quality were not predictors of dropouts. Although it seems that a higher frequency of sessions could favour participant dropouts, no significant results were found. Similar results were obtained by Dennett et al. (2020), who stated that there was no relationship between the frequency of exercise-based sessions and dropouts, but duration modified the likelihood of dropouts. Although our protocol included the analysis according to the level of immersion, fully immersive and semi-immersive virtual reality was excluded from the moderator analysis because of the limited number of studies included. Therefore, we suggest to provide a specific dropout rate analysis when the proportion of studies using immersive virtual reality rises, since higher immersion and presence levels are expected to achieve a higher treatment adherence (Rose et al. 2018; Dębska et al. 2019). Additionally, future studies should evaluate enjoyment and motivation with specific measurement scales, allowing researchers to understand whether motivation or enjoyment during the intervention is predictors of dropout or adherence to treatment in the targeted population.

According to the literature (Grover et al. 2021), adverse events due to treatment are considered one of the main causes of dropouts. Nonetheless, we were unable to analyse them as a moderator of dropout rate, since none of the studies included reported the undesired effects of the virtual reality intervention. Two possible explanations behind the low number of studies describing adverse events or side effects because of the intervention were considered: the first is that participants did not actually have adverse effects due to the virtual reality-based intervention, and the second is that the authors decided not to report them. The latter idea is supported by Phillips et al. (2019) and Pitrou et al. (2009), who addressed methodological weaknesses in reporting adverse events in randomised control trials, leading to a misinterpretation of intervention safety.

4.1 Strength and limitations

This is the first meta-analysis to calculate the overall pooled dropout rate for innovative virtual reality-based interventions in patients with multiple sclerosis. The findings of this review could help future randomised control trials to calculate their sample size to avoid dropout bias. Furthermore, no heterogeneity between the included studies was found in the analysis. The sensitivity analysis did not report any randomised control trial as an outlier that could strongly influence the overall size effect. Moreover, the funnel plot did not show any publication bias.

The main limitation of this review was the small sample size that the randomised control trials included, so a larger overall sample size would make our results more reliable. Another issue was that many studies did not report detailed reasons for dropouts. Furthermore, adverse events were not reported, so it was not possible to determine whether they could be moderators for dropout rate.

5 Conclusion

The overall pooled dropout rate of randomised control trials on virtual reality for balance or gait training in people with multiple sclerosis was 6.6%. Our analysis reported no differences in dropout rate for participants who received virtual reality-based interventions versus other comparators; however, the lower dropout rate in the virtual reality group could indicate that the inclusion of larger sample sizes would show a significant difference in favour of the virtual reality group. The number, duration, frequency, and weeks of sessions, sex, age, phenotype, disability, and methodological quality were not determined to be moderators of dropouts. Adverse events were not reported by the studies included, making it impossible to analyse their influence as moderators.

Future randomised control trials should standardise the description of dropout causes and adverse effects of the rehabilitation treatments. Furthermore, the advantages of virtual reality, such as motivation and enjoyment, should be systematically assessed in clinical trials to determine whether these outcomes are indeed moderators of dropout and adherence.