Virtual-reality (VR) simulators are widely implemented in laparoscopic surgical training programmes to train psychomotor skills associated with this kind of surgery [1]. By training basic skills on a virtual-reality simulator, the first part of the learning curve of laparoscopic surgery is moved out of the operating room into the skills lab. Training basic laparoscopic skills in a skills-lab setting is proven to improve performance in the operating room [2, 3].

Different simulators have been produced and validated for training of basic laparoscopic skills [4, 5]. However, the optimal implementation of simulators in training programs remains a topic of discussion and investigation.

Due to the implementation of European legislation that reduced trainee working hours and the increased workload due to rising use of healthcare facilities, training time needs to be used as efficiently as possible [6]. Therefore, it is important to know what is the most beneficial skills-lab training time, training schedule and training program. Research has shown, for instance, that an optimal training program is distributed over more days [7, 8]. Recent literature also suggests that the optimal endpoint for simulator training is the attainment of a predefined level (criterion-based training), rather than the completion of an arbitrary number of procedures, task repetitions or hours using the simulator (time-based training) [9, 10]. Also criterion-based training is supposed to boost resident motivation [11].

While the performance and motivational benefits of criterion-based VR simulator training have been demonstrated in previous studies, the training time benefits associated with such training are unclear [911].

The purpose of this study is to compare criterion-based training with time-based training to investigate whether criterion-based training is better than time-based training with respect to training outcome, transferability of skills, skills retention and training time.

Methods and materials

Protocol

In this study, 34 medical interns completed a simulator training program of four training sessions within 1 week (one session per day). In the introduction to the study it was explained to the participants that the researchers were not affiliated with the manufacturer of the simulator and that all data would be analysed anonymously. Informed consent was given by all participants (N = 34), after which they commenced the study by filling out a questionnaire about demographics and prior laparoscopic or laparoscopic simulation experience (Fig. 1). Subsequently, the participants watched a demonstration video about laparoscopic simulation and usage of the tools. They all started training on the ProMIS I or III augmented-reality (AR) simulator (Haptica, Dublin, Ireland) to determine a baseline performance level. The simulator displayed a demonstration video previous to the task, and step-by-step verbal explanation was given by the simulator during the training. All participants performed twice a translocation task and twice a sharp dissection task. The first exercise was to become familiar with the simulator; the second repetition was used to determine the baseline performance level. Thereafter, all participants received the same introduction on the LAP Mentor II (Simbionix Corp., Cleveland, USA) simulator by three informative posters. The participants performed the clipping and grasping task (task 5) and the cutting task (task 7) on the LAP Mentor. The second repetition on day 1 on the LAP Mentor was used to determine a baseline performance level, and the last repetition of each task on day 4 was used for the post-test. After training on day 4, a post-training performance level was established by performing twice the same translocation and sharp dissection task as on day 1 on the ProMIS (Fig. 1). The level of retention was established 1 week after training by performance of the two tasks on the LAP Mentor and the ProMIS. The participants, 34 medical interns in total (21 from Eindhoven, The Netherlands, and 13 from Athens, Greece), were randomly allotted to one of two groups. In the first group (group T, N = 17) the training was time based. Participants in group T performed the clipping and grasping task and the cutting task on the LAP Mentor for a fixed time period (Fig. 1). They completed four training sessions within 7 days on the LAP Mentor (180 min in total). A 4-day training program with 1-h sessions (maximum) was chosen to ensure that the participants would overcome the initial learning curve [12] and assure that some overtraining would take place. We divided training over multiple days to improve training performance [7] and to prevent exceeding a maximum of one training hour a day, the estimated maximum time besides an intern’s mandatory clinical attendance.

Fig. 1
figure 1

The study protocol

The second group (group C, N = 17) trained on the LAP Mentor until their performances matched specific predefined performance criteria (Table 1). The criteria used in this study were derived from the performances of experienced surgeons [13]. When the participants in group C achieved the criteria on each task twice, they could stop training on that task for that day. The consecutive training day, they trained again until they achieved the criteria.

Table 1 Description of tasks and criteria

Equipment

The LAP Mentor is a VR-based laparoscopic training system. The software of the LAP Mentor II offers a variety of basic and procedural tasks in a VR environment to train different laparoscopy skills. After the performance of each task, the software provides numerical scores. In this study two basic tasks were used: ‘clipping and grasping’ and ‘cutting’.

The ProMIS augmented-reality (AR) simulator was used in this study to assess the transferability of the skills learned on the LAP mentor. The ProMIS AR simulator consists of a torso-shaped mannequin with a neoprene cover, containing an instrument tracking system. Different trays may be placed in the mannequin for each task, such as for the ‘translocation’ and the ‘dissection’ tasks we used in this study. The tasks are performed with AutoSuture disposable 5-mm Endo Clinch and Endo Shears (Covidien, Dublin, Ireland).

Statistics

The Dutch participants (N = 21) used a ProMIS I system, while the Greek participants (N = 13) trained on a ProMIS III system. Because of different data output settings, we analysed the two groups separately. All data were processed and analysed using the Statistical Package for the Social Sciences 18.0 (SPSS Inc., Chicago, USA). To analyse the differences in performances the Mann–Whitney U test (between the groups) and Wilcoxon signed-rank test (within the groups) were used. P value <0.05 was considered as statistically significant.

Results

All participants (N = 34) in both of the groups improved their performances on the LAP Mentor tasks significantly over the course of the training sessions based on the parameters of time, economy of movement and path length. Figure 2 presents box plots of two parameters tested on the LAP Mentor: time and path length, for both tasks. Comparing the performance parameters of group C and group T, their performances in the first, the last and the retention training sessions did not differ significantly between the groups.

Fig. 2
figure 2

Boxplot of LAP Mentor parameters: time for the A clipping and grasping task and B cutting task, and path length for the C clipping and grasping and D cutting task

In both groups the skills acquired on the LAP Mentor transferred equally to their performances on the ProMIS; their performance on the ProMIS simulator improved between the pre-test and the post-test. Improvement was not significant for all tested parameters (Tables 2, 3). Participants in both groups showed skill retention in task performance on the LAP Mentor and ProMIS simulator. Performance on the retention test did not differ significantly from the post-test (Tables 2, 3).

Table 2 Simulator scores Promis I (n = 21)
Table 3 Simulator scores ProMIS III (n = 13)

Besides the performance metrics of both groups, we analysed the number of repetitions of tasks and the total time spent on the simulator. Group C performed significantly fewer repetitions of each task, overall and in session 2, 3 and 4 (Table 4). Altogether, group C spent significantly less time training on the simulator than group T (74:48 and 120:10 min, respectively; P = 0.001) (Table 5). Retrospectively, the average number of repetitions needed to meet the criteria did not differ significantly between the groups (Table 4), although Group T was unaware of the criteria.

Table 4 Number of repetitions required by the participants to achieve the criteria for the LAP Mentor tasks
Table 5 Number of task repetitions performed by the participants per day

Discussion

In this study, we show that training time can be significantly reduced using criterion-based training instead of time-based training. Training results for the two training methods did not differ significant. We confirm that novices can extensively improve their skills in basic laparoscopy by training on the LAP Mentor. Both groups showed equal retention of skills. The skills learned on the LAP Mentor do transfer to a different laparoscopy simulator, the ProMIS.

Previous studies have shown advantages of criterion-based training in training outcome and in operating performance [9, 10]. These studies did not describe training time benefits, because training time was fixed and equal in both groups. The absolute performance benefits of criterion-based training shown by Gauger et al. [10] were not found by our study. The fact that the criterion-based group (group C) in our study did not significantly outperform the time-based group (group T) can partly be explained by the significant differences in the total amount of repetitions and the associated total training time in favour of group T. This was a direct consequence of the training protocol, forcing the participants in group T to continue training despite their performance.

Nevertheless it would be expected that group T, which trained with significantly more repetitions and longer duration, would outperform the other group in the post-test or retention level. This was not the case. The equal post-test performance in both groups despite the significantly fewer repetitions in group C can presumably be related to the lower amount of overtraining in the criterion-based training. Some overtraining can be beneficial, although too much extra practice can lead to poor test performance [14]. In our study, group T had extensively extra practice; while criteria were reached after an average of 15 repetitions for the clipping and grasping task and 13 repetitions for cutting task, the average total repetitions were 46 and 45, respectively. The performances on the simulator did not improve significantly during that extra training. When using the criteria as optimal endpoint, there was approximately 200% overtraining in repetitions. Group C did have some overtraining as well, because of the requirement to reach the criteria on every training day, however this was far less than for group T. It seems that identification of training criteria or benchmarks and a related training endpoint can reduce excessive over practice and corresponding unnecessary training time.

Another contradiction with previous research is that we did not find a significant difference in our study between the groups in terms of the number of repetitions needed to achieve the criteria, even though these were shown to the participants. In other studies [9, 10], when criteria were made known to the participants, the number of repetitions needed to meet the criteria did decrease.

There are two possible explanations for this contradiction. The first is that criteria were possibly set too easy, so that they were effortlessly reachable with or without known criteria. The second is the fact that the time-based group had knowledge of their results; they were equally aware of their performance and improved because of the feedback from the simulator after each exercise. This may have caused the participants in this group to train on improving their own scores, converting the training in some way to criterion-based training in which they set their own criteria.

Limitations

Due to two different test locations, we used two different ProMIS simulators. Because of different output settings, we could not perform a combined analysis.

Conclusions

The outcome of this study allows us to conclude that criterion-based training is more time efficient than time-based training in training of basic laparoscopic skills. It is recommended to develop future curricula as criterion-based. Therefore, one of the first steps in implementing new, or revising existing, curricula should consist of implementing criterion-based training with predefined criteria.