1 Introduction

Sutherland described “The Ultimate Display” [146] as “a room within which the computer can control the existence of matter”, clearly underlining the immense potential of technological innovation to enhance the learning rates of almost any professional skills training. Teaching has therefore to adapt itself to this new technology, quite unlike traditional oral-based education that is mainly focused on abstract rather than practical learning skills, resulting in a weaker and less robust understanding of the topic [12]. However, Virtual Reality (VR) environments have been excluded from educational settings, due to the high cost of VR equipment. Their usage over the past 50 years has been restricted to military applications and research institutes [162]. Throughout that time, research objectives have been focused on technological issues: the development of VR-environments and both hardware and software [13, 162]. In parallel, educational researchers have described any educational experience that introduces the user to visual and auditory experiences as a “virtual world”. The reviews on these topics have underlined both learning [72, 120] methods that employ conventional computer graphics on a monitor or other 2D displays. This concept of virtual worlds is nowadays categorized as low-immersive VR.

Some 15 years ago, high-immersive VR emerged with the development of devices that surround the user in large 3D viewing areas, such as the Head-Mounted Display (HMD) and the Cave Automatic Virtual Environment (CAVE) [15]. The development of those devices was accompanied by the first VR-environments applied to educational tasks in specific knowledge areas: mathematics, language, business, health, computer science, and project management [9, 37, 62]. The main reviews of these initial educational VR experiences outlined their two guiding principles: 1) the fascination among young people with new technologies, including the clear example of VR, suggests greater interest in learning in those environments [75]; and, 2) VR could facilitate a visual understanding of complex concepts [12] for students and reduce misconceptions [98].

This first generation of immersive VR devices was also applied to training. The high cost of VR equipment was no obstacle to the military that exploited the effectiveness of simulation exercises. VR-based simulations offered a secure space to conduct exercises that would otherwise be risky and costly in real life. [79, 109]. These devices were also tested in training for sports [69, 99] and especially in industry, where new employees receive ‘risk-free’ training in a virtual manufacturing scenario [84]. Finally, medicine and especially surgery are also considered promising fields for VR training [130].

At this stage in the incorporation of the VR learning environment into traditional learning methods, a debate emerged over which procedures could best achieve the perception of a user presence in the VR-environment. This feeling of immersion and presence is identified as a key factor for the enhancement of learning rates [98]. Presence might be defined as the immediate perception of the user of “being there” and a feeling of existing inside the virtual environment [143]. Presence is therefore a very subjective experience. Immersion can be defined as the technological fidelity of VR that the hardware and software can evoke [15] and it can be objectively evaluated. Immersion is therefore considered in this review as a better key objective for VR experiences than presence.

However, immersion and presence have only been key objectives of VR experiences nowadays, because of the improvements, over the last five years, in the quality of HMDs and their significant reduction in cost (e.g. the launch in 2013 of Oculus Rift™ dk1). Moreover, the second bottleneck for the large-scale development of VR-environments, the software tools, was eased with the launch of the free versions of two powerful motor engines: Unreal Engine™ and Unity™. These new software programs have permitted the rapid development of user interactions with the VR-environment, opening the way towards the design of serious games in VR immersive environments.

However, although the VR-environment will produce the effect of immersion, a second element is required to achieve high learning rates: user interactivity with the VR-environment. The use of games is the natural way to achieve high levels of interactivity. Serious Games (SGs) are activities designed to entertain users in an environment from which they can also learn and be educated and trained in well-defined areas and tasks. Unlike traditional teaching environments, where the teacher controls the learning (e.g., teacher-centered), SGs present a learner-centered approach to education. The trainee feels in control of an interactive learning process in an SG, thereby facilitating active and critical learning [140]. Different reviews have described the use of SGs in education and training. Malegiannaki [90] analyzed the use of spatial games in formal education related to Cultural Heritage issues, concluding that there were still many challenges relating to effective storytelling and the evaluation of the effect on student learning performance. Ibrahim [62] reviewed serious games in programming education, seeking to summarize findings on initial user perceptions towards the use of games in terms of motivation and learning. In the case of training, some researchers [48] have pointed to the most-effective final use of these experiences, which relates to the recreation of situations that could not otherwise be done in real life, including ethical dilemmas, and dangerous and even impossible situations, in terms of time and space. But all those reviews analyzed serious games which do not use immersive VR-environments, mainly because they have only very recently been launched.

While Virtual Reality Serious Games (VR-SGs) should improve user experiences and, therefore, knowledge acquisition, it is also clear that immersive VR-environments pose new questions on the best way to design efficient serious games for such environments. The main questions that present and future research will have to answer can be directly linked with the different stages of the definition of immersive VR-SGs shown in Fig. 1.

Fig. 1
figure 1

Flow chart for the design and implementation of immersive VR-SGs for learning tasks

In the first stage, two key items should be clearly defined before creating immersive VR-SGs: the target audience and the application domain. There are four key objectives for a VR-SG: interaction, immersion, user involvement and, to a lesser extent, photorealism [127]. Each objective will play a different role depending on the target public and the application domain.

In the second stage of VR-SG design, the materials necessary for the immersive VR-SGs are created and included in the VR-environment. Different questions can be addressed: which are the best technologies to be used for the construction of the VR-environment? Which is the best game design for a certain application? If a game experience is to be a meaningful experience for players, it needs to have certain basic elements. Interactivity should therefore be designed with clarity: the required inputs and outputs, the short- and long-term goals that shape the player’s experience, a well-designed ramp for beginners to learn the ropes; and a game structure that offers genuine play, rather than quiz-style questions and answers.

Finally, the third stage consists of the evaluation of the VR-SG performance. The evaluation should take four different elements into account: 1) the key factors to be evaluated; 2) the way they are evaluated; 3) the number of individuals testing the serious game; and, 4) the existence or otherwise of a reference group. There is no clear consensus on how to evaluate serious games for educational and training tasks. For example in the case of computing education [115], this fact has been clearly remarked: “As a result, we can confirm that most evaluations use a simple research design in which, typically, the game is used and afterwards subjective feedback is collected via questionnaires from the learners”. The findings of Egenfeldt-Nielsen also showed that most educational games are evaluated in an ad-hoc manner. An evaluation mode that involves the administration of the game to very small validation groups of end users and then data collection, typically through the administration of a questionnaire [24].

Two final remarks should be added before finishing this Introduction. First, this review refers to Virtual Reality immersive serious games. Therefore, immersion should be a key factor in the research under analysis. Following this approach, many of the articles identified in a first stage of the survey were excluded from subsequent analysis, because they referred to 2D virtual reality, far removed from the concept of immersiveness that is relevant to the development of 3D HMDs.

Second, this review considers two different approaches to the learning process: the acquisition of new knowledge and the development of new skills. While the first has traditionally been seen as a combination of theory and problem-solving capability, the second has been directly related to practical skills and decision-making ability. However, there is no clear difference in the nature of the final process: learning. Therefore, this review considers both educative and training approaches to the learning process, even though they are analyzed separately, because the VR-SGs listed in the bibliography are carefully thought out, designed and evaluated from different perspectives.

2 Survey

2.1 Methodology

The methodology followed in the literature review was composed of four stages, as shown in Fig. 2 (educational results in bold and training results in italics). First, a search in the databases was performed with the keywords (“virtual reality” OR “head-mounted display”) AND (education OR learning) for educational papers, and (“virtual reality” OR “head-mounted display”) AND (training) for training papers. Two interdisciplinary research databases were used, to ensure an exhaustive search: SCOPUS and Web of Science, both identified as suitable databases for serious games searches [24]. The search was conducted in July 2019. Secondly, some additional references cited in the selected literature were considered, in an example of a snowball effect, as their titles clearly reflected their suitability for inclusion in the survey. Finally, the survey was extended to industrial magazines, VR/AR associations and technical congresses closer to the industry (e.g. the IEEE International Symposium on Mixed and Augmented Reality), to identify industrial efforts to recreate VR simulators for training tasks. But most of the research from those sources contained no quantitative evaluations and was not, therefore, considered in this survey. So only 3 papers, from among the total of 52 articles identified from these sources, could be added to the final survey.

Fig. 2
figure 2

Scheme of the references survey process

Having filtered out all duplicated papers, 6751 and 4432 articles were considered for the educational and training categories, respectively. Then, their abstracts were read, and irrelevant papers were removed considering the objective of this review. Most of the articles were excluded from the survey, because the core of their work referred to 2D virtual reality, far apart from the concept of 3D immersivity in relation the development of 3D HMDs. In any case, the search was not restricted to new 3D HMDs, so some articles on CAVEs and first-generation 3D HMDs were considered. Then, those articles that focused on VR solutions designed to enhance the recovery of patients from different illnesses and post-operative complications were filtered out, because their evaluation was focused on health indicators, rather than on learning and skills improvement. A total of 171 and 235 relevant articles were left, following that filtering process, under the two categories of education and the training, respectively.

In the fourth and final stage, the full text of each remaining article was analyzed and the articles with no final-user performance evaluation of the virtual environment were filtered out. In all, 68 [1, 3,4,5,6,7,8, 10, 11, 18, 26, 28,29,30, 33,34,35,36, 40, 45, 46, 49, 51, 60, 63,64,65,66,67,68, 76, 80, 81, 83, 86,86,87,89, 94, 97, 101, 102, 104,104,106, 110,110,111,113, 117, 118, 121,120,123, 127,126,129, 132,131,134, 138, 142, 144, 149, 150, 153, 156, 160, 164] and 67 [2, 10, 14, 16, 17, 19, 21,22,23, 25, 27, 31, 38, 39, 41,42,43, 47, 50, 52,53,54,55,56,57,58,59, 61, 70,71,72,74, 77, 78, 82, 85, 91,91,93, 95, 96, 100, 103, 107, 108, 114, 116, 119, 125, 126, 131, 135,134,137, 139, 141, 145, 147, 148, 151, 152, 154, 155, 157,156,159, 161, 163] articles were considered for both surveys, representing a good balance between education and training. This balance was unexpected, because training is only one sector of education as a whole and no immediate explanation was found. Interestingly, other authors have also found similar balances between training and learning, for instance in application to project management software [24]. Although there was an important overlap between the articles of both categories in previous stages of the survey process, no manuscript can be considered in both categories at this final stage. The complete list of these manuscripts with their different classifications is provided in the supplementary material. The sample size in this review is comparable to reviews on similar topics, such as the 102-paper review of serious games for software project management [24] and the 129-paper review of empirical evidence on computer games [37]. It is also larger than other studies that analyzed virtual educational environments (53 papers) [98] and the effect of spatial games for cultural heritage (34 papers) [90].

2.2 Data distribution

Some general ideas on VR-SGs can be directly extracted from the data on year of publication and the main congresses and journals in which the work was published.

Figure 3 shows the temporal evolution of the selected references. As expected, the launches of both VR hardware and software have, since 2015, boosted the number of publications on these topics, while a progressive short-term increase in such publications is still to be expected, although 2018 was an exception in this trend. The low number of articles in 2019 is directly related to the date of survey: before the annual conferences on these topics and after the publication in 2018 of only the first issues of the relevant journals. Although the growing trend is more stable in the training field, this result could change in the short term and further analysis of its evolution over coming years will contribute to a coherent conclusion.

Fig. 3
figure 3

Temporal evolution of the publications on VR-SGs

Finally, Fig. 4 shows the distribution of the articles between journals and scientific conferences. The information leads to the direct conclusion that there is a preference for publishing training applications in journals, while educational applications are mainly presented at conferences. If a deeper analysis is done to identify the preferred journals and conferences, the result shows the absence of any established publication forums for VR-SGs. The main congresses detected in the survey for educational applications were: AHFE -Conference on Applied Human Factors and Ergonomics- (3 articles), CHI PLAY -Play, Games and Human-Computer Interaction- (2 articles), AVR -Conference on Augmented Reality, Virtual Reality and Computer Graphics- (2 articles) and EDUCON -IEEE Global Engineering Education Conference Engineering Education Through Student Engagement- (2 articles). The main congresses for training applications were: VAMR -International Conference on Virtual, Augmented and Mixed Reality- (3 articles) and MELECON -Mediterranean Electrotechnical Conference- (2 articles). Likewise, the preferred journals for educational applications in the survey were: Behavior & Information Technology (3 articles) and Virtual Reality (2 articles). The preferred journals for training applications were: IEEE Transactions on Visualization and Computer Graphics (3 articles) and Mathematics, Science and Technology Education (2 articles). The major conferences and journals on these topics therefore included only 29% and 26% of the articles in the survey, respectively. The main reason for this result is the novelty of the topics, which fall outside the scope of established journals with high-impact scores in the Journal of Citation Reports, added to which the conferences on these topics are very recent.

Fig. 4
figure 4

Distribution of articles on VR-SGs between journals and conferences

3 Analysis of the article

The results of both surveys are arranged in this section under application domains and target public, technological implementation, game design, performance evaluation procedures and results. The aim of this analysis is the identification of factual standards or differences between the proposed solutions in both fields.

3.1 Application domain and target public

The target audience of the studies was classified into three classes: general public, students and professionals. Figure 5 presents the respective percentages of the articles in the survey that belong to those three classes. For a deeper analysis, the professionals were classified into four subclasses in the training case: teachers, health services, industry, and sports professionals.

Fig. 5
figure 5

Target public of the VR-SGs

Three conclusions may at first sight be extracted from this figure. Firstly, around one fourth of the studies (22% for educational games and 25% for training applications) belong to the class “general public”. Most papers related to VR-SGs for museums and other types of exhibitions belong to that class, where the final user is unrestricted; the papers that study the technological issues of VR and SGs also belong to this class. Secondly, more than two thirds of the educational applications are focused on students at different levels, as there is a natural correlation between students and education. There are studies for all the learning stages, from kindergarten to university, with a higher proportion of studies focused on undergraduate students. A clearly lower proportion of students is found in the training survey; most of them refer to medical applications and focus on training students in different hospital operations, see Fig. 6. Thirdly, almost half of the SG-VRs for training are specifically designed for professionals, mainly in industry and medicine, and less so in educational institutions and sports. It is interesting to note the small niche for VR-SGs to train teachers (e.g. related to the development of skills to detect bullying and to improve presentation skills).

Fig. 6
figure 6

Main 3D Displays used in VR-SGs

Surprisingly, only medicine presents a significant quantity of articles in both categories (training and education). Medicine therefore appears to be a more mature domain for VR-SGs, because a broader range of final applications has been studied in that area. Unlike medicine, sports and industry only present training applications. As regards education, consideration is mainly given to either students or the general public, with undergraduate students playing a central role. Much remains to be done to find the best orientation of VR-SGs in the various final applications, as the immediate solutions of the pairs ‘education-learning’ and ‘skills-training’ have only recently been extensively applied.

3.2 Technological implementation and game design

Different technical solutions can be selected for the same application, all the more so given the diversity of VR-SGs applications and with such different target publics, as observed in previous subsections. Usually the technical solutions should be based on three choices: the visualization display, the game engine, and game typology. Figures 6, 7 and 8 show the selected HMDs, the game engine and the serious game typology presented in the survey for training and educational applications, respectively.

Fig. 7
figure 7

Most popular game engine for VR-SGs

Fig. 8
figure 8

Typology of serious games

Figure 6 shows the selected HMDs for training and educational applications. The two branded HMDs presented in the survey -Oculus Rift (in its three versions) and HTC Vive- are the most widely used, as well as cardboards connected to smartphones. The least recent articles of the survey used Sony HMZ-T1, Nvis nvisir sx111, and Emagin z800 HMDs; these HMDs are clustered in the graph, in Fig. 6, under the class “First generation of HMDs”.

Figure 6 shows that Oculus Rift is the most common HMD (>40% of the cases), while HTC Vive is used in around 25% of the applications. The other 35% of applications in use are: 1) low-immersion solutions such as cardboards or Gear VR; 2) very expensive solutions (i.e. CAVEs); and, 3) self-designed or not stated in the article.

Figure 7 shows the selected game engine for both training and educational applications. The game engines presented in the survey were the most widely used in the gaming industry at the time of this research: Unity 3D and Unreal Engine over the last 3 years. XVRtechnology, Worldviz and Ogre3D were mainly used in older works and are clustered in the class “Old game engines”. Figure 7 shows that Unity 3D is the preferred solution, while no other motor engine exceeds 15% of mentions in the references. The most likely reasons for the widespread use of Unity 3D are its low cost and its ease of implementation with HMDs. Besides, a quarter of all the studies (25%) contain no statement of which game engines were used. They usually omit any reference to the development of the VR-SG, limiting themselves to its applications. These VR-SGs were developed by an external provider, so it may be assumed that the researchers were only interested in the application of the VR-SG to certain well-defined tasks and its effects. Finally, although the difference between educational and training solutions was not significant, the educational applications presented a higher use of Unity 3D than the training applications. The articles that describe the use of Unreal Engine were presented over the past three years, a period that coincides with its conversion to free software, which may point to stronger growth in the future for this software that stands out for its photorealistic capabilities, a key factor for training purposes for certain SGs [30].

Figure 8 shows the game typologies, both for the training and the educational applications, divided into four classes: explorative interaction, explorative, interactive experience and passive experience. Explorative interactions are those games that allow the user to explore and to interact freely with the virtual environment. A more restricted solution is the explorative experience, which allows free exploration of the virtual environment, although no direct interaction. The interactive experiences permit user interaction with the environment, but no free movement through it. Finally, the most restricted solution is the passive experience, in which user interactivity and movement are very limited.

The most common solution, especially for training, is the interactive experience, as shown in Fig. 8. This solution is more affordable than explorative experiences that require the complete development of the VR-environment. In the case of interactive experiences, the VR-environment will only have to be developed in high resolution in the areas where the user is permitted, while any secondary area can be roughly modelled, saving costly human and computational resources [29]. Along the same lines, the number of explorative experiences is very limited, due to their high cost. Besides, no clear use of explorative experiences for both learning and training is evident, because the user has no clear objective in the VR-environment. They are therefore mainly used as complements rather than core educational resources in the educational process. There are very few passive experiences and they are clearly connected to the use of cardboards (see Fig. 9), in view of the useful interactive and explorative experiences provided by those devices, despite their technical limitations. Although, these solutions are not very common, they are presented here because of their very low economic cost for both creation and implementation in the classroom.

Fig. 9
figure 9

3D Display type distribution for every type of VR experience

The analysis of Fig. 8 leads to the conclusion that the interactive experience is the preferred VR-SG for training and education, due to its balance between costs, technological development, immersive feeling, and potential to stimulate learning and skills improvement. Explorative experiences might be more suitable for research tasks and, although still too expensive for mass use, show a promising potential for future growth.

Figure 9 presents a detailed analysis of the correlation between the different HMDs and the VR-SGs typologies. It compares the use of each kind of 3D Display in the different typologies of VR experiences. This figure shows that explorative and explorative-interaction VR-experiences are only developed for CAVEs and high-quality HMDs such as Oculus Rift and HTC Vive, because of the higher computational capabilities of the workstations that control these devices. In contrast, passive experiences, as mentioned, are clearly connected to the useful interactive and explorative experiences achieved with cardboards, despite their technological limitations.

3.3 Performance evaluation

As previously outlined in the Introduction, one of the most conflicting issues in the use of serious games and VR-environments for education and training is the evaluation of the learning experience. Four different elements should be considered for this evaluation: 1) the key factors that should be evaluated; 2) the way they are evaluated; 3) the number of subjects that test the serious game; 4) and, the existence or otherwise of a reference group.

Regarding the first point, five different key factors were identified from the surveys: user satisfaction, learning rate, skills improvement, immersion and usability. Figure 10 shows the proportion of studies that evaluate these key factors. User satisfaction is not included in this figure, because all the selected articles in the survey evaluated it besides other key factors. As with the target audience, a significant difference between training and educational applications was noted: the educational applications were mainly focused on knowledge acquisition, while the training applications were designed for skills improvement. Despite this clear trend, some educational applications were also focused on skills improvement and some training applications were for knowledge acquisition. In any case, the evaluation of both skills improvement and knowledge acquisition is balanced in the survey, leading to a new question: are VR-SGs equally good for both tasks or is it just a consequence of a balanced survey between training and educational applications? Finally, studies focused on immersion and usability were very rare, although both factors could play a main role in the learning rate, as previous studies have stated [32]. It may therefore be concluded that the researchers considered two key factors -user satisfaction and a key factor directly related to the objective of the experience (whether learning rate or skills improvement)-. However, other key factors such as immersion and usability, which have a direct correlation with a successful experience, were not considered.

Fig. 10
figure 10

Key factors evaluated from the VR-SGs performance

In addition, the type of evaluation can generate different results, if it is not performed in a standard way. Figure 11 shows the different methods used to measure the key factors: questionnaires, interviews with users, data recordings, and direct user observation. Figure 11 shows that the questionnaire is the most common solution to evaluate knowledge acquisition in educational applications. The training applications showed a balance between the use of questionnaires and metrics on user experiences directly extracted from the recorded data. The use of the other two types of evaluation -interview with users and direct observation of the user- was very rare, as was the simultaneous use of more than one type of evaluation. In the case of the recorded data, the most common indicators were: 1) physiological data directly correlated with the proposed task, mainly in relation to medical applications; and, 2) the game score in educational applications. This group of metrics appears to be a more objective source of information than questionnaires.

Fig. 11
figure 11

Type of evaluation in VR-SGs experiences

Finally, the number of subjects that test the serious game will add weight to the statistical significance of the conclusions of each study. Figure 12 shows the size of the target group that tested the VR-SGs. There is a trend in the educational studies to use larger target groups than in the training studies, perhaps because the number of students available during the evaluation stage of the study was higher than the number of professionals (e.g. a degree module can have more than a hundred students in a small-medium university, while a medium-sized hospital may have fewer than 20 cardiovascular surgeons). In any case, the size of the target group was very limited compared with other educational applications, as in the case of SGs for teaching computing [115], where the mean average size was around 50 students. One reason might be due to the high average cost of hardware for VR-environments compared with more traditional learning methodologies.

Fig. 12
figure 12

VR-SG evaluation group sizes

3.4 Results of the performance evaluation

There is one common conclusion presented in all the articles under analysis: user satisfaction is higher with the VR-SG experience than with other learning methodologies. This conclusion justifies the guiding principle that higher learning rates and skills improvement can be expected from VR-SGs (implying greater engagement, interest and motivation), in comparison with traditional learning and training methods. However, this line of reasoning may only be true in some cases and all possible scenarios should be scientifically validated.

Following this first general conclusion, in each article the pros and cons of the selected technology and methodology are discussed for the corresponding final application. From this discussion, the real value of each article can be understood. Table 1 shows the main conclusions in relation to each of the articles (after removing the conclusion on the increased overall satisfaction with the VR experience). The first three rows refer to positive results: VR-SGs increased the learning rate or improved certain skills compared with other learning or practice techniques. The studies with positive results were classified at three different levels. Item number 1: studies that provided well justified conclusions. Item number 2: studies that showed preliminary results. Finally, item number 3: studies that showed potential results without sufficient justification. Consideration was given to the size of the target audience in this three-point classification and to the existence of a reference group that is taught or trained with a different methodology. These three rows (items number 1 to 3) account for 75% and 86% of the studies on education and training, respectively. Therefore, most of the studies arrived at the following conclusion: VR-SGs are a suitable tool for both educational and training objectives regardless of the technical solution.

Table 1 Results of performance evaluation

Support for the use of VR-SGs in education and training was not forthcoming in all cases: no clear advantage for VR-SGs was observed in 6% and 5% of the studies compared with traditional methodologies. Item numbers 4 to 6 of Table 1 show the percentage of studies that achieved the same performance level for both the reference and VR-SGs group (item number 4), those that achieved worse results with the VR-SGs group (item number 5) and those that arrived at no conclusion (item number 6), mainly because of weaknesses in the experimental design. The proposed tasks for these VR-SGs should be analyzed in detail to understand those negative conclusions. In the educational field, two kinds of VR-SGs showed lower learning rates: those that shared supplementary medical knowledge with undergraduate students and those designed to impart abstract scientific concepts on the curricula of Bachelor degrees. Even though the studies demonstrated lower learning rates than traditional teaching methodologies, they also identified higher levels of motivation, engagement and interest among the students. Lower skills improvements were noted with VR-SGs rather than 2D-screen simulators, in the case of simulators for training, driving, navigation, and pedestrian behavior. Those lower levels of improvement might be due to the low levels of experience with HMD setups among users. Therefore, the use of VR-SGs still has to be optimized in relation to very abstract concepts and skills that require extensive movements within a 3D environment. Finally, around 10% of the studies (shown in Table 1 and Fig. 11) were focused on the evaluation of usability and immersion with no measurement of learning or training goals.

Advancing with this analysis, some conclusions on VR-SG experiences and their impact on training and education can be outlined. Nevertheless, the marked differences between the target audiences and the fields of application of the papers that were surveyed complicate any statistical conclusions on those issues. Regarding their educational impact, most research works pointed (in order of importance) to: 1) the main advantage of these solutions for communicating visually acquired knowledge; 2) greater student motivation when working in a VR-environment rather than in a traditional one; and, 3) the synergies with traditional teaching methodologies, focusing each methodology on different learning topics (e.g. traditional teaching can be used to empower the relationship between different concepts presented in VR-environments with extensive discussions between students moderated by the teacher).

Regarding the impact on training, most studies have (in order of importance) pointed out that: 1) VR-SG solutions have a very interesting cost-effective relation (highly accurate learning, low learning times, high visualization and understanding…); 2) the immediate transfer of behavioral skills in VR-environments to the real world; and, 3) the potential to heighten learning skills in a risk-free environment. Finally, research from both fields has outlined that the impact on training is often measured among final users whose experience of VR-environments and interfaces is very limited. They expect that the impact of VR-SGs will be much higher in the short-term, as those devices permeate daily life and the final users will become familiar with them before any learning/training experiences. The same argument (low user familiarity with VR devices and interfaces) was also mentioned in the studies with negative results for VR-SG solutions as a possible explanation for their poor performance.

4 Future research lines

Different future research lines have been proposed in the articles included in the two surveys: some directly in the present Section and some identified in the discussion of the “Results” Section. Besides, the analysis of the surveys, presented in Sections 3 and 4, raises some open questions.

One of the most demanding improvements proposed in the survey is the use of robust evaluation methods that will increase confidence in the results. This comment has already appeared in the first reviews on Virtual Reality applied to teaching ten years ago [98]. In many cases, the studies used no reference group at all, because they drew no comparison between the performance of their VR-SGs and other learning methodologies. However, most of the study cases with a reference group tested the VR-SGs in target and reference groups of very limited size. Therefore, the enlargement of the size of both groups would be advisable in the future to achieve conclusions with a degree of statistical significance. This lack of comparison or the limited size of testing groups is also mentioned in similar reviews on the analysis of the educational use of video games [44], SGs for learning software project management [24], and spatial games for Cultural Heritage topics [90]. Besides, most studies used only one of the following evaluation procedures: questionnaires, user interviews, data recording, and direct user observation. A combination of two of these procedures, especially questionnaires and indicators extracted from data recording, would also increase confidence in the results, especially if standardized questionnaires were created. This strategy would increase the validity and reliability of the conclusions, as others authors have pointed out [115]. The definition of new indicators that are directly connected to learning rates is necessary, in relation to the indicators taken from recorded data. Up until the present, the proposed indicators have only shown a solid relation with the proposed performance of the task in medical applications, while the SG score is the only indicator considered in the educational applications.

Besides, although four different key factors (learning rate, skill improvement, immersion and usability) were identified in this review, only one key factor was measured in the studies under analysis. The development of study cases that evaluate up to three of them would be of great interest, combining learning improvements, immersion and usability. In this way, it will be possible to reach new conclusions on the correlation between the design parameters of the VR-SGs and the learning goals, as other authors have outlined for similar tasks, such as spatial games for cultural heritage [90] and ball-based sports improvement [99]. Besides, design strategies of VR-SGs may be identified in this way. For instance, VR-SGs have some way to go, before they reach an optimal level of use for teaching very abstract concepts and training skills that require complex movements in a 3D environment. Along those lines, comparative studies of VR-SG efficiency are needed between final users with extensive experience of video-gaming and users whose interests are unrelated to such games.

The two surveys raised some open questions on the best design strategies of the VR-SGs for different learning objectives and final applications. First, are VR-SGs equally efficient at presenting learning tasks and at skills improvement? In those reviews, the VR-SG applications are balanced between skills improvement and knowledge acquisition, although there was no clear evidence that VR-SGs were equally effective at both tasks; a conclusion that arises from the balanced structure of both surveys. Second, has the best design of VR-SGs already been identified for each type of final application? Very few VR-SGs have been designed for skills improvement in education and for knowledge acquisition in industrial tasks (like industry, sports or medicine). In other words, there are very few applications in some fields where VR-SGs might be very effective, but where these applications are not so immediate or expected. Therefore, an effort of imagination and open-thinking will be required to find the best design of VR-SGs in many final applications. Third, should the VR-SGs be embedded in a much lengthier learning process? Nowadays VR-SGs are presented as isolated learning experiences, where previously acquired knowledge can be applied to new problems, exercised in new contexts, thereby motivating students to seek further information. However, no correlation with other learning methodologies exists, nor is there a broader learning process and the main roles to play in this scenario.

There are also strong budget limitations on the VR-SGs analyzed in this study. Up until now, user satisfaction with these experiences has been high, certainly due in part to their novelty. In the near future, the development of a broad offer of VR-commercial games will mean more demanding end-users towards final VR-SG quality. Therefore, the development of low-cost high-visual quality methodologies for the design of VR-environments will be a clear requirement. Along the same lines, VR-SGs based on explorative interaction experiences have, up until the present, been very rare, due to their higher costs. Nevertheless, those experiences might provide higher learning rates than other VR-SG typologies and their use has a strong growth potential that should be studied.

Budget limitations have other consequence for the development of VR-SGs: VR-experiences tend to be very short and short exposure times to knowledge clearly limits the learning rate [124]. Short viewing times were expected in the past, due in part to the immaturity of HMD technology that caused VR sickness syndrome [20]. But those problems now appear to have been resolved with the new generation of HMDs and new strategies for user interaction with the VR-environment [29]. Besides, if longer VR-experiences are developed, the learning time can be considered a key factor and effective time ranges for different learning tasks can be done. However, lengthier VR-SG experiences will depend on two new requirements: 1) a multidisciplinary team with specific skill sets, unlike most of the academic research groups working on these issues; and, 2) the development of rich storytelling VR-SGs with a clear orientation towards the final objective of the learning experience. The absence of oriented storytelling is especially clear in the 10% of studies that concluded that VR provided no improvements, although no clear learning objective was identified in those VR-SGs. The same weakness was also mentioned in the context of spatial games for the teaching of Cultural Heritage [90].

Finally, Fig. 13 presents a visual summary of the main characteristics of immersive VR-SGs and their application collected in the survey for both education and training tasks. Each of the largest circles is split into four quarters, one for each characteristic of the VR-SGs: target audience, type of game, type of evaluation, and key factors to consider. The surface of each smaller circle is proportional to the number of papers included in each category. The color coding is as follows: red refers to the most common solution nowadays, grey to secondary solutions, and yellow is used for the solutions that appear to be the most promising in the near future.

Fig. 13
figure 13

Present and future of immersive VR-SGs

In the field of education, the majority of the target audience are students, especially university students, perhaps because VR-SGs are easily accessible through university research groups. Interactive experiences evaluated by means of questionnaires, through which knowledge acquisition can be ranked, are perhaps the most balanced means of assessment. However, the development of immersive VR-SGs in the near future will be very different, once they enter into mass production and become affordable products; significant growth is expected for primary school applications and general applications for the public. VR-SGs will be explorative-interactive experiences, due to their greater effectiveness in relation to different audiences and the evaluation will include additional key factors, especially immersion, using various evaluation procedures: from questionnaires to recorded data on personal performance throughout the experience.

With regard to training courses, most target audiences are industrial workers, perhaps due to the high budgets in this sector for training new employees and the imperative need for risk prevention in the workplace. In this field, the interactive experience evaluated by means of recorded data, where skills improvement can be measured, appears to be the most balanced solution. But, significant growth of applications for both students and teachers is likely in the near future; VR-SG will become an explorative-interactive experience and the evaluation will include more key factors, especially complex skills performance and immersion, using different evaluation procedures: from questionnaires to recorded data.

5 Conclusions

Immersive Virtual Reality Serious Games, if they are not already, will soon be capable of changing the way we perform many learning and training tasks. The technology and therefore the potential of both presence and immersion to boost VR learning processes is advancing at a rapid pace. Nevertheless, a lot of research work remains to be done, before these changes may be introduced at all stages of a learning procedure: from design strategies to the evaluation of key factors. In this review, 86 articles on VR-SGs for education or training have been analyzed. Thousands of papers that might appear to be related to immersive VR-SGs are stored on the main scientific databases. However, the limited size of the sample is because most papers, neither refer to non-immersive solutions, such as 2D virtual reality worlds, nor include a performance evaluation of the VR-environment with final users. Evaluation therefore remains a critical issue to assure reasonable conclusions related to learning rates. The survey analysis has resulted in the following conclusions:

  • The launch of new high-quality affordable hardware and software media for VR has, since 2015, boosted the number of publications on these topics. A progressive short-term increase in such publications can still be expected. Although there is a lack of well-established publication forums for VR-SGs, there is a preference for training applications to be published in journals, while educational applications are mainly presented in conferences.

  • VR-SG applications that involve learning and knowledge dissemination have, up until now, been considered for educational purposes, while the applications for industry and sports are still restricted to skills training. Some niches for VR-SGs to be used for training at educational institutions have been identified, such as sensitivity to bullying and motivating presentations for teachers. Medicine seems to be a very mature sector and both kinds of applications (skills improvement and knowledge acquisition) have been developed for hospital staff. Finally, important work remains to be done in the sports and industry sectors to prepare educational VR-SGs of interest that will assist professionals in acquiring the knowledge that they will require.

  • Oculus Rift was preferred as an HMD rather than HTC Vive, especially in education, perhaps because of its lower price and easier configuration. On the other hand, HTC Vive was slightly preferred for training, certainly because of its better capabilities in video games of the explorative interaction type.

  • Unity 3D was the preferred game engine, perhaps due to its reliable documentation and easy implementation with HMDs. Use of Unreal Engine in training applications, although in a minority, was of slightly greater significance. One reason might be that Unreal Engine renders more realistic virtual environments than Unity 3D, a key factor for certain VR-SGs that are applied to training.

  • The interactive experience is the preferred VR-SG for training and education, due to its balance between costs, nowadays-technological development, immersion feeling and the possibilities that users have of learning and improving their skills. Explorative experiences might be more suitable for research tasks. Finally, passive experiences, although very economic, are very limited and rarely achieve significant learning and skill improvements.

  • Two key factors were usually considered: user satisfaction and an indicator related to the objective of the experience (whether learning rate or skills improvement). Only rarely were other key factors such as immersion and usability considered. Key factors directly related to the user experience should be considered, to assure the success of the VR-experience, and their correlation with the learning rates should be measured.

  • Explorative and explorative interaction VR-experiences were only developed for CAVEs and high-quality HMDs, because of the higher computational capabilities of the workstations that control these devices. In contrast, passive experiences were clearly connected to the use of cardboards, because of their technological limitations.

  • Four different types of evaluation systems were found in the survey, although only two played a main role: questionnaires and recorded data. Questionnaires were the most common solution to evaluate knowledge acquisition in educational applications. In training applications, the use of questionnaires was balanced by metrics from the recorded data that were directly related to the user experience. Only very rarely were two types of evaluation procedures used in the same evaluation process.

  • The target audience was usually of a very limited size, due to the high cost of the hardware compared with the more-conventional teaching solutions. The reference group, if one existed at all, had the same limitation; a fact that limited the emergence of rigorous conclusions from those studies.

  • A common conclusion in all the articles that were surveyed was the higher user satisfaction with the VR-SG experience than with other learning methodologies. This conclusion was used to justify higher learning rates or skills improvement with VR-SGs rather than with traditional learning and training methodologies.

  • Only 30% of the studies really demonstrated that VR-SGs enhanced learning and training in their respective domains, while no clear advantage was observed in 10% of the studies with regard to the use of VR-SGs compared with conventional methodologies. This result shows that VR-SGs are still a very open research topic for learning and training.

  • Nowadays, most of the final users enjoy the experience, but are not sufficiently familiar with the interfaces to benefit from the full potential for learning and training. The design of VR-SGs should therefore include an extensive pre-training stage, in which students gain sufficient skills through their interaction with the VR-environment.

The proposed lines of future research lead us to suggest that immersive VR-SGs will measure many key factors of a different nature within large user groups compared with a significant reference group. These experiences will belong to the explorative interaction experiences category and will be systematically integrated in standard learning programs. Finally, some of the most promising VR-SGs will belong to certain fields of application where potential effectiveness is high, even though they are not frequently employed nowadays.