Development of performance and learning rate evaluation models in robot-assisted surgery using electroencephalography and eye-tracking

Shafiei, Somayeh B.; Shadpour, Saeed; Sasangohar, Farzan; Mohler, James L.; Attwood, Kristopher; Jing, Zhe

doi:10.1038/s41539-024-00216-y

Download PDF

Article
Open access
Published: 20 January 2024

Development of performance and learning rate evaluation models in robot-assisted surgery using electroencephalography and eye-tracking

npj Science of Learning volume 9, Article number: 3 (2024) Cite this article

1119 Accesses
Metrics details

Subjects

Abstract

The existing performance evaluation methods in robot-assisted surgery (RAS) are mainly subjective, costly, and affected by shortcomings such as the inconsistency of results and dependency on the raters’ opinions. The aim of this study was to develop models for an objective evaluation of performance and rate of learning RAS skills while practicing surgical simulator tasks. The electroencephalogram (EEG) and eye-tracking data were recorded from 26 subjects while performing Tubes, Suture Sponge, and Dots and Needles tasks. Performance scores were generated by the simulator program. The functional brain networks were extracted using EEG data and coherence analysis. Then these networks, along with community detection analysis, facilitated the extraction of average search information and average temporal flexibility features at 21 Brodmann areas (BA) and four band frequencies. Twelve eye-tracking features were extracted and used to develop linear random intercept models for performance evaluation and multivariate linear regression models for the evaluation of the learning rate. Results showed that subject-wise standardization of features improved the R² of the models. Average pupil diameter and rate of saccade were associated with performance in the Tubes task (multivariate analysis; p-value = 0.01 and p-value = 0.04, respectively). Entropy of pupil diameter was associated with performance in Dots and Needles task (multivariate analysis; p-value = 0.01). Average temporal flexibility and search information in several BAs and band frequencies were associated with performance and rate of learning. The models may be used to objectify performance and learning rate evaluation in RAS once validated with a broader sample size and tasks.

Training and proficiency level in endoscopic sinus surgery change residents’ eye movements

Article Open access 03 January 2023

The development of an eye movement-based deep learning system for laparoscopic surgical skills assessment

Article Open access 15 August 2022

Directed information flow during laparoscopic surgical skill acquisition dissociated skill level and medical simulation technology

Article Open access 25 August 2022

Introduction

The benefits of robot-assisted surgery (RAS), and more specifically, the da Vinci Surgical System (Intuitive Surgical, Sunnyvale, CA), have increased its popularity in surgical fields, especially surgical oncology, urology, and gynecology¹. These benefits include, but are not limited to, smaller incisions, less pain, lower infection risk, and a shorter hospital stay^1,2. Compared to conventional surgery, RAS presents more challenges for trainees, which include adjusting to a video view of anatomical structures rather than a direct view³, a lack of haptic feedback⁴, complex hand-eye coordination, the need for bimanual tool dexterity, and active foot coordination⁵. The establishment of a validated and standardized training protocol for RAS surgical trainees is crucial to ensure efficient and consistent training, patient safety, and high-quality outcomes.

The objective of this study is to develop linear models for evaluating performance and rate of learning RAS skills using features extracted from electroencephalogram (EEG) and eye-tracking data. These data were recorded from 26 subjects engaged in repeated RAS simulator tasks until successful completion (defined as a score of 70 out of 100). The analysis of the RAS skill acquisition did not adhere to a fixed timeframe, as the number of attempts varied among subjects.

Available skill evaluation methods in RAS

Operative time (OT) is one of the measures for evaluating surgical learning progress⁶. While OT can indicate a surgeon’s proficiency and familiarity with an operation, utilizing this variable as a standalone criterion for performance evaluation may be misleading since this evaluation metric does not account for surgical outcomes⁷. Additional factors have been suggested to evaluate surgical performance, including intraoperative blood loss, length of hospital stay, functional outcomes^8,9, and procedure-specific outcomes such as urinary incontinence and positive surgical margins following radical prostatectomy¹⁰. A more robust approach to assessing learning progress uses multidimensional analysis, which considers a variety of surgical performance markers¹¹. Global Evaluative Assessment of Robotic Skills (GEARS) has been proposed as a tool to assess the RAS skills of trainees¹². Robotic-Objective Structured Assessment of Technical Skills (R-OSATS) is an additional rating scale, evaluating key aspects such as respect for tissues, dexterity, fluency, knowledge, and accuracy¹³. Both GEARS and R-OSATS represent holistic assessment methods that provide a non-procedure-specific evaluation of trainees’ competencies, retrospectively covering all aspects of a task.

Lovegrove et al. have developed a modular training and assessment method, utilizing Healthcare Failure Mode and Effect Analysis¹⁴. In this approach, radical prostatectomy is segmented into seventeen distinct stages and sub-processes. Each sub-phase is then individually scored by experts. Competency in each stage is defined as acquiring a score of at least 4 out of 5 in all sub-processes consistently. However, modular assessment methods, while detailed, tend to be costly and less practical in live surgical settings. In addition, their results can be inconsistent and heavily dependent on raters’ subjective opinions, which may introduce bias. Despite the existence of some surgical performance tools like the Robotic Anastomosis Competency Evaluation for ureterovesical anastomosis (RACE)¹⁵, such methods are often task-specific and fail to encompass the entire surgical procedure¹⁶.

Computerized virtual reality simulations offer surgical trainees a safe environment to familiarize themselves with the robotic console and enhance their psychomotor skills without compromising the safety of patients^17,18. These simulators have been shown to reduce the learning curve for surgical trainees¹⁹, leading to their widespread adoption in most training programs²⁰. Yet, the development of objective and generalizable methods for evaluating performance and learning rates, essential for monitoring surgeons’ progress during training, continues to be a significant research gap. An ‘objective’ assessment technique not only evaluates performance but also aims to eliminate inconsistencies in evaluation. Currently, such a technique has not been fully developed within existing surgical practice protocols. In contrast, fields like aviation have significantly benefited from standardized, quality-assured training benchmarks. Pilots must demonstrate proficiency in numerous performance areas before being licensed to operate passenger planes²¹. However, this level of standardized, objective method has yet to be implemented in RAS surgical training.

Proposed objective skill evaluation methods in RAS

Several studies have proposed objective surgical performance evaluation methods utilizing physiological data such as electroencephalogram (EEG)^22,23, functional near-infrared spectroscopy (fNIRS)^24,25, eye movement^26,27, hands kinematics, and analysis of surgical videos^28,29,30. While existing literature has utilized EEG for skill assessment, its focus has predominantly been on classifying experts and novices through EEG spectral analysis³¹, without considering the dynamic changes in EEG over time and across different brain areas. However, EEG has found application in performance evaluation in other fields, such as piloting and driving^32,33. Eye-tracking, on the other hand, has been primarily used for workload evaluation²⁷ and investigating the allocation of attentional resources^34,35. Despite these uses, there remains a noticeable gap in the use of both EEG and eye-tracking for performance evaluation specifically in RAS training.

The potential advantages of utilizing EEG and eye-tracking in RAS performance evaluation

The EEG’s high temporal resolution offers a dynamic perspective on cognitive processes during surgical tasks, going beyond what is possible with video processing of external movements. EEG directly measures neural mechanisms that are fundamental in skill learning and task execution, including attention levels, cognitive load, and decision-making processes. These aspects are vital for understanding surgical training and performance. Furthermore, EEG is capable of recording cortical activity, which is closely linked to learning processes. This cortical activity can change through practice and learning, reflecting neuroplasticity—the brain’s ability to reorganize itself by forming new neural connections in response to learning and experience³⁶. EEG and eye-tracking can provide a multifaceted view of the surgical learning curve, capturing dimensions not visible in video data. EEG, for instance, can identify specific moments where a surgeon may experience a peak in cognitive load, which can be pivotal for modifying individual training programs.

The potential limitations of utilizing EEG and eye-tracking in RAS performance evaluation

Collecting high-density EEG data, involving numerous channels (116 in this study), poses greater challenges than other methods like video analysis or hand movement tracking. The complexity arises from the technical demands of setting up many electrodes, potential signal losses due to electrode dropout, and the extensive pre-processing needed to ensure signal integrity. In contrast, video or motion tracking systems are generally more user-friendly, with fewer issues related to data loss. Furthermore, the practical application of EEG and other sensor-based methods is significantly limited by the difficulty in usage and potential disruptions caused by the equipment, a challenge not typically encountered with video-based methods. While video and motion tracking excel in providing spatial and temporal information about a surgeon’s techniques, high-density EEG offers unique insights into the cognitive processes behind surgical performance. Thus, despite its challenges, EEG remains an invaluable tool for a comprehensive performance assessment, encompassing both cognitive and physical aspects of surgery. Eye-tracking and EEG, with their distinct advantages, do not replace but rather complement video processing techniques. Together, they offer a more holistic understanding of the surgical learning curve.

Potential use of machine learning approaches for surgical skill assessment

Information retrieved from hand movement kinematics, videos, EEG, and eye-tracking data has been used to develop deep convolutional neural networks, gradient boosting, and random forest models for surgical performance and skill evaluation^37,38,39,40. The results from these approaches were promising. Developed machine learning algorithms, trained by physiological data, to identify predictors of performance have the potential to enable personalized learning and eventually automated performance feedback⁴¹.

This paper provides an exploratory analysis on the role of multi-source spatiotemporal signal processing in advancing automated surgical performance and learning rate evaluation.

Results

Twenty-six subjects, having differing amounts of RAS practice, completed the Tubes (61 attempts), Suture Sponge (66 attempts), and Dots and Needles (66 attempts) tasks, achieving average performance scores of 71.47, 73.04, and 71.72, respectively. Linear random intercept models were developed for performance evaluation, while multivariate linear models were developed for learning rate evaluation. Age was not a significant predictor in these final models.

Tubes task

Table 1 represents the results of the linear random intercept regression model analysis for evaluating the performance of the Tubes task. A one standard deviation increase in the average pupil diameter of the subject’s nondominant eye (standardized for each subject) was associated with an 8.13-point decrease in their performance score. This suggests that larger pupil sizes in the nondominant eye are linked to worse performance. In contrast, a one-standard deviation increase in the average temporal flexibility in Brodmann area 18 (BA 18) at beta band frequencies was associated with a 0.52-point performance improvement, suggesting that higher neural flexibility in this brain region enhances performance. In addition, a one standard deviation increase in rate of saccade was associated with a 5.87-point decrease in performance, indicating that more frequent saccades, compared to the individual’s average, are linked to lower performance scores.

Table 1 Results of a linear random intercept regression model for performance evaluation at the Tubes task with subject-wise standardized eye-tracking features.

Full size table

Table 2 illustrates the outcomes of the linear regression model analysis for the learning rate in the Tubes task. A one-standard deviation increase in the average temporal flexibility in BA 18 at theta-band frequencies was associated with a 0.59-point decrease in the learning rate, suggesting that larger temporal flexibility in BA 18 at theta-band frequencies is linked to poorer learning rates. Similarly, a one-standard deviation increases in the temporal flexibility in BA 46 at alpha-band frequencies corresponded to a 0.87-point decrease in learning rate, indicating that greater neural flexibility in this area of the brain is associated with a lower learning rate. Furthermore, each one-unit increase in initial performance score was associated with a 0.35-point decrease in learning rate, implying that subjects with higher initial scores tend to exhibit lower learning rates.

Table 2 Results of a multivariate linear regression model for learning rate evaluation at the Tubes task with subject-wise standardized eye-tracking features.

Full size table

Suture Sponge task

Table 3 presents the results from the linear random intercept regression model for performance evaluation in the Suture Sponge task. A one-standard deviation increase in the average temporal flexibility in BA 10 at beta-band frequencies was associated with a 0.6-point improvement in the performance score for the suture sponge task, suggesting that enhanced neural flexibility in this area of the brain is associated with better performance. Likewise, a one-standard deviation increases in the average search information in BA 45 at theta-band frequencies corresponded to a 0.6-point increase in performance score.

Table 3 Results of a linear random intercept model for performance evaluation at the Suture Sponge task with subject-wise standardized eye-tracking features.

Full size table

Table 4 displays the findings from the linear regression model analysis for the learning rate in the Suture Sponge task. A one-standard deviation increase in the average search information in BA 45 at theta-band frequencies was associated with a 1.08-point decrease in the learning rate, suggesting that higher search information in this area and frequency band correlates with a lower learning rate. Similarly, a one-standard deviation increases in the average temporal flexibility in BA 45 at theta-band frequencies corresponded to a 0.31-point decrease in learning rate, indicating that increased neural flexibility in this area is associated with a reduced learning rate. Conversely, a one-standard deviation increase in the average search information in BA 19 at gamma-band frequencies was associated with a 1.19-point increase in the learning rate.

Table 4 Results of a linear regression model for learning rate evaluation at the Suture Sponge task with subject-wise standardized eye-tracking features.

Full size table

Dots and Needles task

Table 5 presents the outcomes from the linear random intercept regression model analysis for performance in the Dots and Needles task. A one-standard deviation increase in the average search information in BA 37 at gamma-band frequencies was associated with a 1.35-point decrease in the performance score for this task. In addition, a one-standard deviation increase in the entropy of the nondominant eye’s pupil diameter was associated with a 4.68-point decrease in performance score.

Table 5 Results of a linear random intercept regression model for performance evaluation at the Dots and Needles task with subject-wise standardized eye-tracking features.

Full size table

Table 6 showcases the results from the linear regression model analysis for the learning rate in the Dots and Needles task. A one-standard deviation increase in the average search information in BA 45 at beta-band frequencies was associated with a 1.92-point increase in the learning rate value for this task. Similarly, a one-standard deviation increases in the average search information in BA 40 at alpha-band frequencies corresponded to a 1.45-point increase in learning rate. In addition, a one-standard deviation increase in the average temporal flexibility in BA 41 at theta-band frequencies was associated with a 0.41-point increase in learning rate.

Table 6 Results of a linear regression model for learning rate evaluation at Dots and Needles with subject-wise standardized eye-tracking features.

Full size table

We created boxplots to illustrate the differences between predicted and actual performance scores (Fig. 1). The analysis reveals that both the mean and median differences are close to zero. Moreover, for most samples, the absolute difference between actual and predicted performance scores was less than 10. These findings indicate that our performance evaluation models for the three tasks are reasonably accurate.

**Fig. 1: Representation of differences between predicted and actual performance scores in Tube, Suture Sponge, and Dots and Needles tasks.**

Effect of subject-wise standardization of eye-tracking features

Supplementary Information details the outcomes of the linear random intercept regression models for performance evaluation and the linear regression analysis for learning rate evaluation, conducted without subject-wise standardization of features (Supplementary Information). The results indicate that subject-wise standardization of eye-tracking features marginally enhanced the R² values for both performance (0.17 increase for the Tubes task, 0.04 increase for the Dots and Needles task) and learning rate evaluations (0.09 increase for the Dots and Needles task).

Relationship between hours of experience with RAS and performance

Pearson correlation analysis was conducted to examine the relationship between subjects’ hours of RAS practice and their performance. The results revealed no significant correlation between RAS practice hours and performance in the Tubes task (Pearson correlation; p-value = 0.20), Suture Sponge task (Pearson correlation; p-value = 0.07), and Dots and Needles task (Pearson correlation; p-value = 0.85).

Relationship between performance and mental workload

The Pearson correlation between performance and mental workload was not significant for the Tubes task (Pearson correlation; p-value = 0.37), Suture Sponge task (Pearson correlation; p-value = 0.79), and Dots and Needles task (Pearson correlation; p-value = 0.97).

Discussion

Tubes

Our findings indicate a negative association between the average pupil diameter of the nondominant eye and performance in the Tubes task, as shown in Table 1. This result aligns with the literature^42,43, supporting the notion that pupillometry, the measurement of pupil diameter, is a reliable marker of mental workload and performance^42,43,44. Pupil dilation has been shown to be associated with higher workloads and lower performance scores⁴².

To successfully complete the Tubes task, subjects must consciously track targets, drive needles through them, visually anticipate upcoming targets, enhance hand-eye coordination, and drive the needle through the yellow side of the target. The significant correlation between performance and rate of saccade identified in this study (Table 1) is consistent with these required skills. Saccades are known to be essential for attention^45,46, and both consciousness (perceptual awareness required for engaging with the Tubes task) and attention are critical for making timely and accurate decisions in this task.

Average temporal network flexibility in Brodmann area 18 (BA 18) at beta-band frequencies showed a positive association with performance in the Tubes task (Table 1). Functional MRI studies have indicated that BA 18 plays a role in basic visual functions, such as attention and pattern detection, and in processing visuo-spatial information^47,48. In addition, brain oscillations in the beta-band frequencies are associated with logical and conscious thinking⁴⁹. The selection of this feature as a performance predictor in our study may show the importance of attention and visuo-spatial information processing in the Tubes task. Therefore, greater flexibility in BA 18 at beta-band frequencies may enhance attention and adaptation to new visual stimuli, leading to quicker decision-making and ultimately improved performance in the Tubes task.

Performance at the first attempt was identified as a predictor of learning rate in the Tubes task, possibly due to the high standard deviation (SD) of performance scores in this task (SD = 16.3).

Suture Sponge task

To successfully complete the Suture Sponge task, subjects need to skillfully control needles and navigate them through a deformable object. Since the object is deformable and its interior is invisible, subjects often need to correct their hand motions for accurate needle insertion and extraction, while also choosing appropriate movements based on the needle and target positions. The association between selected EEG features and performance in this task (Table 3) aligns with these requirements. Functional MRI studies have shown that Brodmann area 10 (BA 10) is involved in various memory functions, executive control, error processing, and decision-making^{50,51,52,53,54}, while BA 45 is associated with reasoning processes and working memory^51,55. As a result, increased flexibility in BA10 at beta-band frequencies and enhanced search information in BA 45 at theta-band frequencies may be associated more efficient memory retrieval, error processing, and decision-making, thereby leading to better performance in the Suture Sponge task.

Our findings showed that BA 45 functioning plays a key role not only in performance evaluation but also in the learning rate evaluation of the suture sponge task (Tables 3 and 4). Its search information and flexibility at theta-band frequencies were associated with the learning rate (Table 4), aligning with literature that underscores BA 45’s involvement in reasoning processes and working memory^51,55. In addition, gamma-band frequencies are associated with perception, cognitive processes, attention, working memory, and information integration^56,57. BA 19, known for its role in spatial working memory, visual memory recognition, and visuo-spatial information processing^48,58,59, also showed a connection with learning rate through its search information in gamma-band frequencies (Table 4), representing the skills necessary for the successful completion of the Suture Sponge task.

Dots and Needles task

Our results showed that entropy of the nondominant eye’s pupil diameter is negatively associated with performance (Table 5). Since entropy of eye’s pupil diameter has been proposed in prior studies as a measure of visual scanning efficiency⁶⁰, this association may indicate that fewer resources are available to perform the task when the entropy is higher. Hence, this finding may be interpreted as suggesting that lower cost of retrieving information from the visual system may be associated with a better performance⁶¹.

The EEG features selected for performance evaluation in the ‘Dots and Needles’ task (Table 5) align well with the task’s requirements. This task requires subjects to (1) develop hand-eye coordination skills for precise needle placement and manipulation through soft objects, and (2) precisely detect target positions and execute needle insertion and extraction. Functional MRI studies have shown that BA 37 plays a crucial role in complex visual motion processing⁶², structural judgment of familiar objects⁶³, and visual memory processes⁵⁹. The observed association between EEG features and performance in ‘Dots and Needles’ suggests that higher search information levels may reflect an increased need for visual and cognitive information processing in BA 37, which could potentially reduce performance.

The observed associations in Table 6—between learning rate and search information in BA 45 and BA 40, as well as between learning rate and temporal flexibility in BA 41—align with the required skills for the ‘Dots and Needles’ task. Functional MRI studies indicate that BA 40 plays a role in various activities, including visually guided grasping, visuomotor transformation/motor planning, response to visual motion, and working memory^{64,65,66,67,68}, and BA 41 is linked to working memory⁶⁹.

Effect of subject-wise standardization of eye-tracking features

Comparing the performance and learning rate evaluation models with subject-wise standardization (Tables 1 to 6) against those without such standardization (Supplementary Information), reveals that subject-wise standardization reduces the impact of individual variances among subjects. As a result, the standardized features more accurately reflect skill differences as opposed to variations in subjects’ individual characteristics.

Relationship between practice hours and performance

This study found no significant correlation between subjects’ hours of RAS experience and task performance, which could be attributed to the quality of practice rather than its quantity. Effective performance improvement likely depends on proper execution of RAS tasks. Moreover, it has been shown that extended breaks between practice sessions might disrupt functional brain networks, affecting performance⁷⁰. It’s also worth noting that inefficient practice, despite increasing the total practice hours, may not necessarily lead to performance enhancement.

Relationship between performance and mental workload

Our study revealed no significant correlation between performance and mental workload. Mental workload represents the balance between a person’s cognitive capacity and the demands a task imposes on them^71,72. Acquiring new skills typically involves enhancing both performance and mental workload management^22,73,74. Previous research indicates that during skill acquisition, mental workload may continue to decrease even after achieving a passing performance score⁷⁵. Therefore, the absence of a significant correlation in our study might imply that some subjects were still refining their RAS skills beyond achieving passing scores, indicative of ongoing improvements in their mental workload management.

Practical implications of the findings

The findings establish a basis for an objective evaluation of the performance and learning rate of RAS trainees. The developed models, once validated for a broader population and surgical tasks, could be used in surgical residency programs to improve the RAS skill acquisition process in three possible ways: (1) They provide objective, unbiased assessments of RAS trainees’ performance without needing an expert RAS surgeon’s presence during practice sessions. This approach reduces training costs and offers immediate performance feedback, allowing trainees to correct mistakes more efficiently and shorten the learning process. Consequently, training programs can admit more RAS trainees and expedite their graduation, streamlining the overall training procedure. In addition, this model enables training of more RAS surgeons annually, increasing the number of patients who can benefit from RAS technology. Hospitals will also benefit, as RAS typically involves shorter hospital stays and fewer surgical complications compared to traditional surgery methods^76,77; (2) The learning rate evaluation models, based on data from the first attempt, enable RAS training programs to predict specific trainees’ learning rates. This information allows programs to either select better RAS learners or plan effective strategies to enhance learning for those who progress more slowly; (3) Such performance and learning rate evaluation methodologies could be used for a broader range of surgical tasks, particularly those that are similar to actual surgical operations.

Limitations of the study

Several limitations may impact the generalizability of the findings of this study. First, the moderate R² values of the learning rate evaluation models (0.64, 0.71, and 0.69 for the Tubes, Suture Sponge, and Dots and Needles, respectively) might be attributed to limited sample sizes. Second, as the study was conducted within a single U.S. health system, its findings may not be applicable to other institutions, specialties, or countries. Further validation of the models is needed, incorporating data from a more diverse group of subjects across various hospitals and specialties, and involving different surgical tasks. Third, exploring potential nonlinear relationships between learning rate and the features requires more attempts per subject and analysis using nonlinear regression models. Lastly, the inherent challenges associated with the use of EEG and other sensor-based techniques, coupled with the potential disruptions caused by the equipment, limit their practical application.

Methods

This study was conducted in accordance with relevant guidelines and regulations and was approved by the Roswell Park Comprehensive Cancer Center (RPCCC)’s Institutional Review Board (IRB; I-241913). The IRB issued a waiver of documentation of written consent, and the subjects were given a research study information sheet and provided verbal consent.

Subjects

The experiments involved a group of 26 subjects, and the demographics and relevant experiences of all subjects are detailed in Table 7. The ‘Hours of RAS Experience’ column reflects each subject’s experience hours. The subjects themselves provided this information. Each subject was required to perform every task at least twice, aiming for a minimum score of 70 out of 100 to qualify as a successful attempt. If the required score was not achieved within the first two attempts, they continued to try until meeting the benchmark.

Table 7 Demographics of subjects and number of task attempts.

Full size table

Skill level of subjects

This manuscript does not aim to classify skill levels based solely on hours of experience, recognizing that proficiency can vary significantly across different tasks. Such categorization would require specific assessments beyond the scope of this study. For general categorization purposes, RAS surgeons in Table 7 are considered RAS experts and typically act as primary surgeons. Surgical fellows are typically estimated to be competent, whereas residents are often viewed as beginners. In our categorization, oncologists, researchers, students, and scientists are generally labeled as novices. It’s important to note that thoracic surgeons and head and neck surgeons, despite their expertise in other surgical areas, are classified as novices or beginners in RAS for this study. These categories are broad and should not be taken as a substitute for detailed skill assessment.

Recruitment method

Subjects were invited to the study via email or verbal invitation. Subjects included surgeons, fellows, residents, pre-medical students, and/or scientists at Roswell Park Cancer Institute.

Data recording set up

The da Vinci® Skills Simulator™ (developed in collaboration with Mimic® Technologies, Inc., Seattle, WA, USA) has two instruments attached to mechanical arms and a camera arm. The subject operates the arms while sitting at a computer console (Fig. 2). The Tubes, Suture Sponge, and Dots and Needles tasks were completed by subjects using the da Vinci® Skills Simulator™. A 124-channel EEG headset from AntNeuro® was used to record EEG data at a frequency of 500 Hz, using Cz as the reference channel. Simultaneously, Tobii® eyeglasses were utilized to record eye-tracking data at a frequency of 50 Hz, as illustrated in Fig. 2. Due to the poor quality of signals recorded from the F8, POz, AF4, AF8, F6, FC3, M1, and M2 channels, data from these channels were excluded from the study. The analysis was conducted on the signals from the remaining 116 channels.

**Fig. 2: Representation of a subject completing three tasks on the da Vinci simulator while wearing an EEG headset and eye-tracking glasses.**

Tasks and the purpose of each task

Subjects were instructed to watch a training video before performing the task. The Tubes, Suture Sponge, and Dots and Needles tasks with their highest level of complexity were included (Fig. 2). Subjects always conducted the tasks in the same order.

Tubes task

Subjects practiced tissue manipulation and needle driving skills that will be encountered as part of a urethral anastomosis (i.e., a challenging portion of a radical prostatectomy operation). Both simulator instruments were used to manipulate two vessels to facilitate needle driving. Subjects were instructed to insert the needle through the yellow side of the target and then guide it out from the black side. The task was to continue driving the needle tip through the yellow target until it changed to green.

Suture Sponge task

Subjects were trained to enhance their dexterity and precision in manipulating a needle through a deformable object. This involved controlling the needle during its transfer between instruments, as well as during insertion and extraction through various pairs of targets. These targets were placed on the edge of a sponge, with random variations in their positions and sizes.

Dots and Needles task

Subjects were taught to perform challenging needle throws through a soft, flexible object. The task required them to insert and accurately guide a needle through several pairs of targets, each varying in spatial distance and position. Upon the first target changing to green, the subjects had to skillfully rotate their wrist to drive the needle tip through the second yellow target, continuing until it too turned green.

Attempts

Each subject performed every task a minimum of two times. If they did not attain a passing score of 70 out of 100 on at least one of these two attempts, they continued repeating the task until the passing score was achieved.

Mental workload

At the end of each attempt, subjects completed the Surgery Task Load Index (SURG-TLX) questionnaire to assess their mental workload. The SURG-TLX is a tool comprising six domains that measure perceived workload⁷⁸. These domains are mental demands: the level of mental effort required during task completion; physical demands: the level of physical effort required during task completion; temporal demands: the level of time pressure felt in completing the task; task complexity: the degree of difficulty of the task; situational stress: the level of stress or anxiety experienced while completing the task; and distractions: the degree of distraction from the surrounding environment. Each domain is scored on a scale from 1 to 20, where 1 indicates the lowest and 20 indicates the highest level. The overall mental workload score was calculated by summing the scores from all six domains.

Performance scores

After the subject completed each attempt of the tasks, the simulator generated a single score between 0 and 100 based on their performance, where 0 indicated no acceptable performance and 100 represented performance that satisfied all necessary standards. To determine the performance score, the simulator program uses the following metrics: the time required to complete the exercise (measured in seconds); economy of motion: the total distance traveled by all instruments (measured in centimeters); instrument collisions: the total number of instrument-on-instrument collisions; excessive instrument force: the total time an excessive force was applied to the instrument (measured in seconds); instruments out of view: the total distance traveled by instruments outside of the user’s field of view (measured in centimeters); master workspace range: the radius of the user’s working volume on master grips (measured in centimeters); drops; and missed targets.

Learning rate

The learning rate was defined as the change in performance score per additional attempt. The learning rate was calculated for subjects performing each task as the slope of a linear regression fitted on the performance scores across attempts.

EEG Pre-processing

Signals from 116 EEG channels underwent artifact decontamination through blind source separation and topographical principal component analysis within the Advanced Source Analysis (ASA) framework. The framework has been developed by ANT Neuro Inspiring Technology Inc., Netherlands. In this study, the EEG artifact decontamination was carried out in five distinct steps: (1) The EEG data were re-referenced to the ‘common average reference,’ which involves averaging the signals from all channels used in the study⁷⁹. (2) A 60 Hz notch filter was applied to eliminate line noise. (3) The data were then processed with a band-pass filter, ranging from 0.2 to 250 Hz, with a steepness of 24 dB/octave. (4) Facial and muscle activity-related artifacts were detected and removed using ASA, followed by a visual inspection of individual EEG data segments for any remaining artifacts⁷⁹. (5) Finally, the Spatial Laplacian technique, known for emphasizing sources at small spatial scales, was utilized to reduce the effects of volume conduction on coherence calculations⁸⁰.

After decontaminating the EEG data, they were utilized to extract search information and temporal network flexibility features in theta (4–8 Hz), alpha (8–12 Hz), beta (13–35 Hz), and gamma (35–65 Hz) frequency bands, spanning 21 Brodmann Areas (BA).

Distribution of EEG channels across Brodmann Areas

Each EEG channel was assigned to a specific BA based on its approximate position over the area. The correspondence between EEG channels and BAs was determined using Brodmann’s Interactive Atlas (http://www.fmriconsulting.com/brodmann/Interact.html) and the Brain Master software (http://www.brainm.com/software/pubs/dg/BA_10-20_ROI_Talairach/). This assignment process categorized the 116 EEG channels into the 21 BAs, as detailed in Table 8.

Table 8 List of EEG channels roughly above each Brodmann Area.

Full size table

Traditional names for numbered Brodmann’s areas (BAs)

BAs 1 and 2 represent the primary somatosensory cortex; BA 5 is known as the pre-parietal (somatosensory association) cortex; BA 6 encompasses the premotor and supplementary motor cortices; BA 7 is identified as the superior parietal (somatosensory association) cortex; BA 8 is intermediate frontal; BAs 9 and 10 correspond to the dorsolateral prefrontal cortex; BA 18 is the secondary visual cortex; BA 19 is the associative visual cortex; BA 20 is the inferior temporal cortex; BA 21 is middle temporal cortex; BA 37 is known as occipitotemporal; BA 39 is angular (i.e., an area in the parietal lobe); BA 40 is supramarginal (i.e., a portion of the parietal lobe); BAs 41 and 42 are the anterior and posterior transverse temporal areas, respectively; BA 44, also known as opercular (i.e., refers to the frontal, temporal, or parietal operculum, which together cover the insula); BA 45, the triangular area, is a part of Broca’s area on the left hemisphere; BA 46 is the middle frontal area; and BA 47 is referred to as orbital (i.e., an area of the prefrontal cortex).

Extraction of search information feature using EEG data

Search information is the amount of information (measured in bits) required to pass the shortest, and presumably the most efficient path between two nodes of a network^81,82,83. The search information feature was extracted using the adjacency matrix, commonly known as the functional brain network, of each EEG recording^81,82 and the Brain Connectivity Toolbox (https://sites.google.com/site/bctnet/). The adjacency matrix is a network that mathematically illustrates the functional connections between the various brain areas involved in information processing⁸⁴. The entries in the adjacency matrix represent the average magnitude coherence (MC) across specific frequency bands. Magnitude coherence is a measure of the statistical similarity between two time series, in this case, the EEG signals from different channels. The MC values are calculated for each pair of EEG channels i and j (Γ = (Γ_ij) ∈ℜ^NXN, with i and j ranging from 1 to N, where N is the number of EEG channels) and are assessed over designated frequency bands. These values were obtained using coherence analysis in this study⁸⁵. Finally, 84 search information features were generated by averaging the extracted feature for channels within each of the 21 BAs, across four band frequencies (Fig. 3).

**Fig. 3: Feature extraction from EEG data across brain areas and frequency bands.**

Extraction of temporal network flexibility feature using EEG data

The temporal network flexibility (f) of each network node is proportional to the number of times the node changed its network community assignment over time⁸⁶. A network community is described as a subset of network nodes with denser connections between themselves compared to connections with other nodes in the network⁸⁷. Temporal network flexibility has been proposed as a functional brain network feature that changes with learning⁸⁸, surprise, and fatigue⁸⁶. This feature has also been proposed for evaluating the mental workload of surgeons conducting surgical tasks⁸⁹.

To calculate the temporal network flexibility feature, an adjacency matrix (i.e., functional brain network) was extracted for every one-second window of EEG data recording. Then, the modularity metric associated with each adjacency matrix was extracted using the “community Louvain” function of the Brain Connectivity Toolbox. This metric measures how well nodes are assigned to communities. To detect network communities, modularity was maximized using a Louvain-like locally “greedy” algorithm^90,91. This process was repeated 100 times using a consensus iterative algorithm to identify a single consistent representative partition from all partition sets based on statistical testing in comparison to the ‘Newman-Girvan (NG)’ null network^91,92. The output of modularity maximization is the community assignment of EEG channels for each 1-second window EEG. The community assignment of each EEG channel is the community that the EEG channel was assigned to (e.g., if three communities were detected for an adjacency matrix, the community assignment of each node is an integer from one to three). The community assignments of EEG channels across 1-second windows were used as elements of the partition matrix A ∈ ℜ^NXT. The elements of the partition matrix ${A}_{i,t}\,\in\, \left\{1...g\right\}$ displayed the communities (g) to which brain area i (EEG channels; 1 to N, where N = 116) was assigned at time t (second; t = 1 to T, where T denotes recording duration).

Finally, the partition matrix was used in the flexibility function of the Network Community Toolbox (http://commdetect.weebly.com/)⁹³ to calculate the temporal network flexibility of each channel as Eq. 1.

$${f}_{i}=1-\frac{1}{T-1}\mathop{\sum }\limits_{t=1}^{T-1}\delta \left({A}_{i,t},{A}_{i,t+1}\right)$$

(1)

where, ${f}_{i}$ is the temporal network flexibility of channel i, defined as the number of times that brain area i changed its community assignment across successive 1-s time windows. High values of ${f}_{i}$ indicate frequent changes in community assignments (high temporal flexibility), while low values suggest stable assignments (low temporal flexibility)^86,93. In Eq. 1, ‘A’ is the partition matrix, and ‘T’ is the recording duration. The $\delta ({A}_{i,t},{A}_{i,t+1})$ represents a binary function used to determine whether the community assignment of brain area i changes between two successive time windows t and t + 1. The function δ takes the value 1 if there is a change in the community assignment of brain area i from one time window to the next. If there is no change in community assignment, δ takes the value 0. Finally, the average of the extracted temporal network flexibility for channels within each BA was calculated at four band frequencies, resulting in a total of 84 temporal network flexibility features, corresponding to 21 BAs and four frequency bands (Fig. 2).

Extraction of eye-tracking features

Tobii Pro Lab © was used to process eye-tracking data. A moving average filter with a window size of three points was applied to reduce noise in eye-tracking data. A velocity-threshold identification fixation filter with a threshold of 30 degrees per second was used to identify fixation and saccadic points. Features extracted from eye-tracking data were defined in Table 9 and Fig. 4. Extracted eye-tracking features were then standardized for each subject (i.e., subject-wise standardization). Subject-wise standardization: for each subject, the mean (µ) and standard deviation (σ) of each eye-tracking feature (X), within the task, are calculated, and then the mean value is extracted from each eye-tracking feature, and the result is divided by the standard deviation value ((X − µ)/σ)⁹⁴.

Table 9 Definition of eye-tracking features.

Full size table

**Fig. 4: Feature extraction from eye-tracking data.**

Statistical analysis for performance evaluation

Extracted features—comprising 84 search information features, 84 temporal network flexibility features, and 12 eye-tracking features—were used as independent variables to develop random intercept models for evaluating performance. The random intercept model accounts for the within-subject variability. The goal was to find the features that are associated with performance among different subjects. Seven-fold cross-validation was used to reduce individual effects in detecting important features (i.e., predictors). Forward feature selection was used to identify the possible predictors. Variables selected at least twice during cross-validation were considered as possible predictors. These potential predictors were then used to develop the final linear random intercept models for performance evaluation. To quantify the variation in the output variable explained by the independent variables in the model, Efron’s pseudo-R-square was computed. Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) metrics were computed to assess the performance evaluation models’ performance.

Statistical analysis for learning rate evaluation

In our analysis, all features extracted from EEG and eye-tracking data were considered continuous variables. Linear regression was used to analyze the learning rate, a suitable method given that each subject exhibits a unique learning rate for each task. Eye-tracking and EEG features from the first attempt, along with baseline performance scores and age, were used as potential factors in a multivariate linear regression analysis to identify the most significant factors (i.e., features). Subjects with high initial performance scores typically exhibit lower learning rates. For example, a subject scoring 95 out of 100 is less likely to achieve a steep learning rate compared to one who scores 60. Therefore, we use the first-attempt performance score as a baseline in analyzing learning rates. We considered the initial performance score as a baseline to adjust for individual variances among subjects. Forward feature selection was used to identify the predictors of learning rate. The identified features were used to develop the learning rate evaluation model. To assess how well the independent variables explain the variance in the dependent variable (learning rate), the R² metric was calculated. MAE, and RMSE metrics were calculated to assess the learning rate evaluation models’ performance.

Regression models’ terms

In the regression models, the term ‘estimate’ reflects the variation in the outcome variable (e.g., performance score) for a one-standard deviation shift in the predictor variable. The standard error of an estimate indicates the standard deviation of its sampling distribution.

Relationship between hours of experience with RAS and performance

We employed Pearson correlation analysis to investigate the relationship between hours of experience with RAS and performance.

Relationship between performance and mental workload

It has been frequently reported that performance and mental workload mutually influence each other⁹⁵. We employed Pearson correlation analysis to investigate the relationship between the two factors in this study.

All tests were two-sided with a level of significance set at 0.05. Statistical analyses were conducted using SAS® (version 9.4, SAS Institute Inc., Cary, NC, USA).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data analyzed in the current study are available at Shafiei et al.⁹⁶. https://doi.org/10.13026/9m3f-ac20.

Code availability

No custom code or mathematical algorithm was developed for this study. Details regarding the specific codes used can be found in the references cited. Any access restrictions or licensing information associated with the codes used can be obtained from the respective sources as indicated in the references.

References

DiMaio, S., Hanuschik, M. & Kreaden, U. The da Vinci surgical system. Surgical robotics: systems applications and visions.199–217 (Springer, 2011)
Lanfranco, A. R. et al. Robotic surgery: a current perspective. Ann. Surg. 239, 14 (2004).
Article PubMed PubMed Central Google Scholar
Rassweiler, J. et al. Heilbronn Laparoscopic radical prostatectomy. Eur. Urol. 40, 54–64 (2001).
Article CAS PubMed Google Scholar
Van der Meijden, O. A. & Schijven, M. P. The value of haptic feedback in conventional and robot-assisted minimal invasive surgery and virtual reality training: a current review. Surg. Endosc. 23, 1180–1190 (2009).
Article PubMed PubMed Central Google Scholar
Morris, B. Robotic surgery: applications, limitations, and impact on surgical education. Medscape Gen. Med. 7, 72 (2005).
Google Scholar
Soomro, N. et al. Systematic review of learning curves in robot-assisted surgery. BJS Open 4, 27–44 (2020).
Article CAS PubMed Google Scholar
Shafiei, S. B. et al. Developing surgical skill level classification model using visual metrics and a gradient boosting algorithm. Ann. Surg. Open 4, e292 (2023).
Article PubMed PubMed Central Google Scholar
Meyer, M. et al. The learning curve of robotic lobectomy. Int. J. Med. Robot. Comput. Assist. Surg. 8, 448–452 (2012).
Article Google Scholar
Frede, T. et al. Comparison of training modalities for performing laparoscopic radical prostatectomy: experience with 1000 patients. J. Urol. 174, 673–678 (2005).
Article PubMed Google Scholar
Good, D. W. et al. A critical analysis of the learning curve and postlearning curve outcomes of two experience-and volume-matched surgeons for laparoscopic and robot-assisted radical prostatectomy. J. Endourol. 29, 939–947 (2015).
Article PubMed Google Scholar
Wong, S. W. & Crowe, P. Factors affecting the learning curve in robotic colorectal surgery. J. Robot. Surg. 16, 1–8 (2022).
Article Google Scholar
Goh, A. C. et al. Global evaluative assessment of robotic skills: validation of a clinical assessment tool to measure robotic surgical skills. J. Urol. 187, 247–252 (2012).
Article PubMed Google Scholar
Siddiqui, N. Y. et al. Validity and reliability of the robotic objective structured assessment of technical skills. Obstet. Gynecol. 123, 1193 (2014).
Article PubMed PubMed Central Google Scholar
Lovegrove, C. et al. Structured and modular training pathway for robot-assisted radical prostatectomy (RARP): validation of the RARP assessment score and learning curve assessment. Eur. Urol. 69, 526–535 (2016).
Article PubMed Google Scholar
Khan, H. et al. Use of Robotic Anastomosis Competency Evaluation (RACE) tool for assessment of surgical competency during urethrovesical anastomosis. Can. Urol. Assoc. J. 13, E10 (2019).
PubMed Google Scholar
Younes, M. M. et al. What are clinically relevant performance metrics in robotic surgery? A systematic review of the literature. J. Robot. Surg 17, 335–350 (2023).
Article PubMed Google Scholar
Perrenot, C. et al. The virtual reality simulator dV-Trainer® is a valid assessment tool for robotic surgical skills. Surg. Endosc. 26, 2587–2593 (2012).
Article PubMed Google Scholar
Martin, J. R. et al. Demonstrating the effectiveness of the fundamentals of robotic surgery (FRS) curriculum on the RobotiX Mentor Virtual Reality Simulation Platform. J. Robot. Surg. 15, 187–193 (2021).
Article PubMed Google Scholar
Lerner, M. A. et al. Does training on a virtual reality robotic simulator improve performance on the da Vinci® surgical system? J. Endourol. 24, 467–472 (2010).
Article PubMed Google Scholar
Bric, J. D. et al. Current state of virtual reality simulation in robotic surgery training: a review. Surg. Endosc. 30, 2169–2178 (2016).
Article PubMed Google Scholar
Collins, J. W. & Wisz, P. Training in robotic surgery, replicating the airline industry. How far have we come? World J. Urol. 38, 1645–1651 (2020).
Article PubMed Google Scholar
Shafiei, S. B., Hussein, A. A. & Guru, K. A. Cognitive learning and its future in urology: surgical skills teaching and assessment. Curr. Opin. Urol. 27, 342–347 (2017).
Article PubMed Google Scholar
Shafiei, S. B. et al. Association between functional brain network metrics and surgeon performance and distraction in the operating room. Brain Sci. 11, 468 (2021).
Article PubMed PubMed Central Google Scholar
Nemani, A. et al. Assessing bimanual motor skills with optical neuroimaging. Sci. Adv. 4, eaat3807 (2018).
Article PubMed PubMed Central Google Scholar
Keles, H. O. et al. High density optical neuroimaging predicts surgeons’s subjective experience and skill levels. PLoS ONE 16, e0247117 (2021).
Article CAS PubMed PubMed Central Google Scholar
Menekse Dalveren, G. G. & Cagiltay, N. E. Distinguishing intermediate and novice surgeons by eye movements. Front. Psychol. 11, 542752 (2020).
Article PubMed PubMed Central Google Scholar
Wu, C. et al. Eye-tracking metrics predict perceived workload in robotic surgical skills training. Hum. Factors 62, 1365–1386 (2020).
Article PubMed Google Scholar
Oğul, B. B., Gilgien, M. F. & Şahin, P. D. Ranking robot-assisted surgery skills using kinematic sensors. In European Conference on Ambient Intelligence (Springer, 2019).
Funke, I. et al. Video-based surgical skill assessment using 3D convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 14, 1217–1225 (2019).
Article PubMed Google Scholar
Yanik, E. et al. Deep neural networks for the assessment of surgical skills: A systematic review. J. Def. Model. Simul. 19, 159–171 (2022).
Article Google Scholar
Natheir, S. et al. Utilizing artificial intelligence and electroencephalography to assess expertise on a simulated neurosurgical task. Comput. Biol. Med. 152, 106286 (2023).
Article PubMed Google Scholar
Mohanavelu, K. et al. Dynamic cognitive workload assessment for fighter pilots in simulated fighter aircraft environment using EEG. Biomed. Signal Process. Control 61, 102018 (2020).
Article Google Scholar
Gao, Z. et al. EEG-based spatio–temporal convolutional neural network for driver fatigue evaluation. IEEE Trans. neural Netw. Learn. Syst. 30, 2755–2763 (2019).
Article PubMed Google Scholar
Chetwood, A. S. et al. Collaborative eye tracking: a potential training tool in laparoscopic surgery. Surg. Endosc. 26, 2003–2009 (2012).
Article PubMed Google Scholar
Zumwalt, A. C. et al. Gaze patterns of gross anatomy students change with classroom learning. Anat. Sci. Educ. 8, 230–241 (2015).
Article PubMed Google Scholar
Leff, D. R. et al. Could variations in technical skills acquisition in surgery be explained by differences in cortical plasticity? Ann. Surg. 247, 540–543 (2008).
Article PubMed Google Scholar
Lavanchy, J. L. et al. Automation of surgical skill assessment using a three-stage machine learning algorithm. Sci. Rep. 11, 1–9 (2021).
Google Scholar
Wang, Z. & Majewicz Fey, A. Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int. J. Comput. Assist. Radiol. Surg. 13, 1959–1970 (2018).
Article PubMed Google Scholar
Shafiei, S. B. et al. Surgical skill level classification model development using EEG and eye-gaze data and machine learning algorithms. J. Robot. Surg. 17, 1–9 (2023).
Article Google Scholar
Shadpour, S. et al. Developing cognitive workload and performance evaluation models using functional brain network analysis. npj Aging 9, 22 (2023).
Article PubMed PubMed Central Google Scholar
Chen, I. et al. Evolving robotic surgery training and improving patient safety, with the integration of novel technologies. World J. Urol. 39, 2883–2893 (2021).
Article PubMed Google Scholar
Marinescu, A. C. et al. Physiological parameter response to variation of mental workload. Hum. Factors 60, 31–56 (2018).
Article PubMed Google Scholar
Othman, N. & Romli, F. I. Mental workload evaluation of pilots using pupil dilation. Int. Rev. Aerosp. Eng. 9, 80–84 (2016).
Google Scholar
Hess, E. H. & Polt, J. M. Pupil size in relation to mental activity during simple problem-solving. Science 143, 1190–1192 (1964).
Article CAS PubMed Google Scholar
Guidetti, G. et al. Saccades and driving. Acta Otorhinolaryngol. Italica 39, 186 (2019).
Article CAS Google Scholar
Marquart, G., Cabrall, C. & de Winter, J. Review of eye-related measures of drivers’ mental workload. Procedia Manuf. 3, 2854–2861 (2015).
Article Google Scholar
Larsson, J., Landy, M. S. & Heeger, D. J. Orientation-selective adaptation to first-and second-order patterns in human visual cortex. J. Neurophysiol. 95, 862–881 (2006).
Article PubMed Google Scholar
Waberski, T. D. et al. Timing of visuo-spatial information processing: electrical source imaging related to line bisection judgements. Neuropsychologia 46, 1201–1210 (2008).
Article PubMed Google Scholar
Chauhan, P. & Preetam, M. Brain waves and sleep science. Int. J. Eng. Sci. Adv. Res. 2, 33–36 (2016).
Google Scholar
Zhang, J. X., Leung, H.-C. & Johnson, M. K. Frontal activations associated with accessing and evaluating information in working memory: an fMRI study. Neuroimage 20, 1531–1539 (2003).
Article PubMed Google Scholar
Ranganath, C., Johnson, M. K. & D’Esposito, M. Prefrontal activity associated with working memory and episodic long-term memory. Neuropsychologia 41, 378–389 (2003).
Article PubMed Google Scholar
Kübler, A., Dixon, V. & Garavan, H. Automaticity and reestablishment of executive control—an fMRI study. J. Cogn. Neurosci. 18, 1331–1342 (2006).
Article PubMed Google Scholar
Chevrier, A. D., Noseworthy, M. D. & Schachar, R. Dissociation of response inhibition and performance monitoring in the stop signal task using event‐related fMRI. Hum. Brain Mapp. 28, 1347–1358 (2007).
Article PubMed PubMed Central Google Scholar
Rogers, R. D. et al. Choosing between small, likely rewards and large, unlikely rewards activates inferior and orbital prefrontal cortex. J. Neurosci. 19, 9029–9038 (1999).
Article CAS PubMed PubMed Central Google Scholar
Goel, V. et al. Neuroanatomical correlates of human reasoning. J. Cogn. Neurosci. 10, 293–302 (1998).
Article CAS PubMed Google Scholar
Roux, F. et al. Gamma-band activity in human prefrontal cortex codes for the number of relevant items maintained in working memory. J. Neurosci. 32, 12411–12420 (2012).
Article CAS PubMed PubMed Central Google Scholar
Pockett, S., Bold, G. E. & Freeman, W. J. EEG synchrony during a perceptual-cognitive task: widespread phase synchrony at all frequencies. Clin. Neurophysiol. 120, 695–708 (2009).
Article PubMed Google Scholar
Postle, B. R. & D’esposito, M. “What”—then—“where” in visual working memory: an event-related fMRI study. J. Cogn. Neurosci. 11, 585–597 (1999).
Article CAS PubMed Google Scholar
Slotnick, S. D. & Schacter, D. L. A sensory signature that distinguishes true from false memories. Nat. Neurosci. 7, 664–672 (2004).
Article CAS PubMed Google Scholar
Shiferaw, B., Downey, L. & Crewther, D. A review of gaze entropy as a measure of visual scanning efficiency. Neurosci. Biobehav. Rev. 96, 353–366 (2019).
Article PubMed Google Scholar
Collell, G. & Fauquet, J. Brain activity and cognition: a connection from thermodynamics and information theory. Front. Psychol. 6, 818 (2015).
Article PubMed PubMed Central Google Scholar
Beer, J. et al. Areas of the human brain activated by ambient visual motion, indicating three kinds of self-movement. Exp. Brain Res. 143, 78–88 (2002).
Article PubMed Google Scholar
Kellenbach, M. L., Hovius, M. & Patterson, K. A pet study of visual and semantic knowledge about objects. Cortex 41, 121–132 (2005).
Article PubMed Google Scholar
Frey, S. H. et al. Cortical topography of human anterior intraparietal cortex active during visually guided grasping. Cogn. Brain Res. 23, 397–405 (2005).
Article Google Scholar
Meister, I. G. et al. Playing piano in the mind—an fMRI study on music imagery and performance in pianists. Cogn. Brain Res. 19, 219–228 (2004).
Article CAS Google Scholar
Akatsuka, K. et al. Neural codes for somatosensory two-point discrimination in inferior parietal lobule: an fMRI study. Neuroimage 40, 852–858 (2008).
Article PubMed Google Scholar
Dupont, P. et al. Many areas in the human brain respond to visual motion. J. Neurophysiol. 72, 1420–1424 (1994).
Article CAS PubMed Google Scholar
Rämä, P. et al. Working memory of identification of emotional vocal expressions: an fMRI study. Neuroimage 13, 1090–1101 (2001).
Article PubMed Google Scholar
Li, Z. H. et al. Functional comparison of primacy, middle and recency retrieval in human auditory short-term memory: an event-related fMRI study. Cogn. Brain Res. 16, 91–98 (2003).
Article Google Scholar
Shafiei, S. B., Hussein, A. A. & Guru, K. A. Dynamic changes of brain functional states during surgical skill acquisition. PLoS ONE 13, e0204836 (2018).
Article PubMed PubMed Central Google Scholar
Wickens, C. D. Multiple resources and performance prediction. Theor. Issues Ergono. Sci. 3, 159–177 (2002).
Article Google Scholar
Carswell, C. M., Clarke, D. & Seales, W. B. Assessing mental workload during laparoscopic surgery. Surg. Innov. 12, 80–90 (2005).
Article PubMed Google Scholar
Mohamed, R. et al. Validation of the National Aeronautics and Space Administration Task Load Index as a tool to evaluate-the learning curve for endoscopy training. Can. J. Gastroenterol. Hepatol. 28, 155–160 (2014).
Article PubMed PubMed Central Google Scholar
Reznick, R. K. & MacRae, H. Teaching surgical skills—changes in the wind. N. Engl. J. Med. 355, 2664–2669 (2006).
Article CAS PubMed Google Scholar
Ruiz-Rabelo, J. F. et al. Validation of the NASA-TLX score in ongoing assessment of mental workload during a laparoscopic learning curve in bariatric surgery. Obes. Surg. 25, 2451–2456 (2015).
Article PubMed Google Scholar
Khorgami, Z. et al. The cost of robotics: an analysis of the added costs of robotic-assisted versus laparoscopic surgery using the National Inpatient Sample. Surg. Endosc. 33, 2217–2221 (2019).
Article PubMed Google Scholar
Bhama, A. R. et al. A comparison of laparoscopic and robotic colorectal surgery outcomes using the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) database. Surg. Endosc. 30, 1576–1584 (2016).
Article PubMed Google Scholar
Wilson, M. R. et al. Development and validation of a surgical workload measure: the surgery task load index (SURG-TLX). World J. Surg. 35, 1961–1969 (2011).
Article PubMed PubMed Central Google Scholar
Luck, S. J. An Introduction to the Event-related Potential Technique (MIT Press, 2014).
Kayser, J. & Tenke, C. E. On the benefits of using surface Laplacian (current source density) methodology in electrophysiology. Int. J. Psychophysiol. 97, 171 (2015).
Article PubMed PubMed Central Google Scholar
Rosvall, M. et al. Searchability of networks. Phys. Rev. E 72, 046117 (2005).
Article CAS Google Scholar
Trusina, A., Rosvall, M. & Sneppen, K. Communication boundaries in networks. Phys. Rev. Lett. 94, 238701 (2005).
Article CAS PubMed Google Scholar
Goñi, J. et al. Resting-brain functional connectivity predicted by analytic measures of network communication. Proc. Natl Acad. Sci. USA 111, 833–838 (2014).
Article PubMed Google Scholar
Lynn, C. W. & Bassett, D. S. The physics of brain network structure, function and control. Nat. Rev. Phys. 1, 318–332 (2019).
Meijer, E. et al. Functional connectivity in preterm infants derived from EEG coherence analysis. Eur. J. Paediatr. Neurol. 18, 780–789 (2014).
Article CAS PubMed Google Scholar
Betzel, R. F. et al. Positive affect, surprise, and fatigue are correlates of network flexibility. Sci. Rep. 7, 520 (2017).
Article PubMed PubMed Central Google Scholar
Radicchi, F. et al. Defining and identifying communities in networks. Proc. Natl Acad. Sci. USA 101, 2658–2663 (2004).
Article CAS PubMed PubMed Central Google Scholar
Reddy, P. G. et al. Brain state flexibility accompanies motor-skill acquisition. Neuroimage 171, 135–147 (2018).
Article PubMed Google Scholar
Shafiei, S. B. et al. Evaluating the mental workload during robot-assisted surgery utilizing network flexibility of human brain. IEEE Access 8, 204012–204019 (2020).
Article Google Scholar
Blondel, V. D. et al. Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008, P10008 (2008).
Article Google Scholar
Jeub, L. et al. A generalized Louvain Method for Community Detection Implemented in MATLAB. https://github.com/GenLouvain/GenLouvain, (2011).
Bassett, D. S. et al. Task-based core-periphery organization of human brain dynamics. PLoS Comput. Biol. 9, e1003171 (2013).
Article CAS PubMed PubMed Central Google Scholar
Bassett, D. S. et al. Dynamic reconfiguration of human brain networks during learning. Proc. Natl Acad. Sci. USA 108, 7641–7646 (2011).
Article CAS PubMed PubMed Central Google Scholar
Rizzo, A. et al. A machine learning approach for detecting cognitive interference based on eye-tracking data. Front. Hum. Neurosci. 16, 806330 (2022).
Article PubMed PubMed Central Google Scholar
Dias, R. D. et al. Systematic review of measurement tools to assess surgeons’ intraoperative cognitive workload. J. Br. Surg. 105, 491–501 (2018).
Article CAS Google Scholar
Shafiei, S. B. et al. Electroencephalogram and eye-gaze datasets for robot-assisted surgery performance evaluation (version 1.0.0). PhysioNet. https://doi.org/10.13026/qj5m-n649 (2023).

Download references

Acknowledgements

The research reported in this publication was supported by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under grant number R01EB029398. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work was supported by the National Cancer Institute (NCI) grant P30CA016056, involving the use of Roswell Park Comprehensive Cancer Center’s shared resources (Comparative Oncology and the Applied Technology Laboratory for Advanced Surgery Shared Resources). The authors would like to thank all the study subjects.

Author information

Authors and Affiliations

Intelligent Cancer Care Laboratory, Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
Somayeh B. Shafiei
Department of Animal Biosciences, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
Saeed Shadpour
Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, 77843, USA
Farzan Sasangohar
Department of Urology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
James L. Mohler
Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
Kristopher Attwood & Zhe Jing

Authors

Somayeh B. Shafiei
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Shadpour
View author publications
You can also search for this author in PubMed Google Scholar
Farzan Sasangohar
View author publications
You can also search for this author in PubMed Google Scholar
James L. Mohler
View author publications
You can also search for this author in PubMed Google Scholar
Kristopher Attwood
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Jing
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.B.S. drafted the work and made substantial contributions to the conceptualization and design of the study, data acquisition and analysis, interpretation of the results, and funding acquisition. S.S. substantially revised the draft and made substantial contributions to the data analysis and interpretation of the results. F.S. substantially revised the draft and made substantial contributions to the interpretation of the results. J.L.M. substantially revised the draft. Z.J. and K.A. made significant contributions to statistical analysis. All authors reviewed the manuscript.

Corresponding author

Correspondence to Somayeh B. Shafiei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting summary

Supplementary Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shafiei, S.B., Shadpour, S., Sasangohar, F. et al. Development of performance and learning rate evaluation models in robot-assisted surgery using electroencephalography and eye-tracking. npj Sci. Learn. 9, 3 (2024). https://doi.org/10.1038/s41539-024-00216-y

Download citation

Received: 24 March 2023
Accepted: 08 January 2024
Published: 20 January 2024
DOI: https://doi.org/10.1038/s41539-024-00216-y

Subjects

Abstract

Similar content being viewed by others

Training and proficiency level in endoscopic sinus surgery change residents’ eye movements

The development of an eye movement-based deep learning system for laparoscopic surgical skills assessment

Directed information flow during laparoscopic surgical skill acquisition dissociated skill level and medical simulation technology

Introduction

Available skill evaluation methods in RAS

Proposed objective skill evaluation methods in RAS

The potential advantages of utilizing EEG and eye-tracking in RAS performance evaluation

The potential limitations of utilizing EEG and eye-tracking in RAS performance evaluation

Potential use of machine learning approaches for surgical skill assessment

Results

Tubes task

Suture Sponge task

Dots and Needles task

Effect of subject-wise standardization of eye-tracking features

Relationship between hours of experience with RAS and performance

Relationship between performance and mental workload

Discussion

Tubes

Suture Sponge task

Dots and Needles task

Effect of subject-wise standardization of eye-tracking features

Relationship between practice hours and performance

Relationship between performance and mental workload

Practical implications of the findings

Limitations of the study

Methods

Subjects

Skill level of subjects

Recruitment method

Data recording set up

Tasks and the purpose of each task

Tubes task

Suture Sponge task

Dots and Needles task

Attempts

Mental workload

Performance scores

Learning rate

EEG Pre-processing

Distribution of EEG channels across Brodmann Areas

Traditional names for numbered Brodmann’s areas (BAs)

Extraction of search information feature using EEG data

Extraction of temporal network flexibility feature using EEG data

Extraction of eye-tracking features

Statistical analysis for performance evaluation

Statistical analysis for learning rate evaluation

Regression models’ terms

Relationship between hours of experience with RAS and performance

Relationship between performance and mental workload

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Reporting summary

Supplementary Tables

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links