The EyeSi (Haag-Streit, Manheim, Germany) cataract surgery simulator is the most commonly used virtual reality simulator internationally to train Ophthalmology registrars. It consists of a Cataract Challenge Course (CCC), which is a virtual reality simulation (VRS) of cataract surgery. In this study, we aimed to determine any correlation between the parameters measured on the EyeSi virtual reality cataract surgery simulator and if they can predict the progression of microsurgical skill acquisition and development amongst ophthalmology trainees.
Data on the performance of 56 Ophthalmology trainees (training ophthalmic surgeons) at the Royal Victorian Eye and Ear Hospital were analysed from 2018 to 2022. The trainees ranged from first to fourth year of training. Analysed parameters included Initial Task Performance, Time to Gate (the time to reach a threshold score – 50% in this case), and Peak Performance. Relationships between the parameters were analysed with Pearson r, and the significance of the difference between correlations was analysed with the psych package in R.
The strongest correlation was found between initial and peak performance (r = 0.810), which was significantly greater than the correlation between Initial Task Performance and Time-to-Gate (r = 0.553, p = 0.03). Time-to-Gate was weakly correlated with Peak Performance (r = 0.475). The average total training time was 1123 minutes, ranging from 252 to 2039, and the mean peak CCC score was 442, ranging from 166 to 496.
Time-to-Gate, Initial Task Performance and Peak Performance are interlinked, indicating that trainees with the highest initial performance remain ahead in ability and can progress through VRS training more rapidly. Data also indicated that the EyeSi platform ultimately prioritizes a wide range of skills over mastery of a few – as participants who spend longer on ‘perfecting’ each stage of the simulator are not truly rewarded points-wise in comparison to those who rush through stages, as rushing through stages grants trainees a far higher overall score for each section (with one’s score ideally being a numeric representation of one’s ‘ability’). Consequently, the authors believe that virtual reality systems play a crucial role in training surgical registrars. However, their scoring systems should focus on skill mastery to facilitate maximal acquisition of skills.
The EyeSi cataract surgery virtual reality surgical simulator (VR magic, Haag-streit, Mannheim, Germany) is the predominant virtual reality simulator utilized internationally in training cataract surgery [1]. Within the Australian and New Zealand Ophthalmology Vocational Training Program, it has become a required component of education prior to the commencement of live surgery. It is been deemed an effective [2] and efficient [3] system on which trainees can practise and improve their surgical technique before they operate on real patients. Of course, there are other similarly effective virtual reality surgical simulators being used in ophthalmology, such as those aiming to improve trainees’ port delivery system implantation [4], and virtual reality software remains an area of continued growth in the domain of surgical education due to its high efficacy, ease of use and low risk to patients.
Before simulation-based training, ophthalmic surgery training was based on Halsted’s methodology [5] – a process in which the trainee is presented with frequent, repetitive and intense opportunities to care for patients under the supervision of qualified surgeons. Part of this process involves gradually building up the trainee’s skills, presenting them with more tasks to complete gradually, increasing the complexity of tasks that the trainee can complete [6].
Virtual reality simulation (VRS) allows the trainee to engage in risk-free practice prior to the commencement of live surgery. Alongside this, it allows for deliberate practice. Deliberate practice is a focused effort that is not inherently enjoyable, with the end goal of personal improvement [7]. Deliberate practice contains several important components: setting specific and realistic goals, breaking the skill down into smaller components, and challenging oneself and getting feedback.
It can facilitate reliable, reproducible components of a procedure, hence improving outcomes within the operating theatre [8]. It is known from previously published studies that virtual reality simulation use throughout training can reduce real-life complications – such as a virtual reality training system for phacoemulsification surgery [9] leading to significantly decreased complication rates [10].
Given the multi-step nature of cataract surgery, it is an ideal target for deliberate practice given that each of these stages can be practised specifically and incrementally. Each stage also contributes incrementally to the subsequent steps of cataract surgery.
The EyeSi platform aims to facilitate this type of education through its structure, taking trainees through four different ‘cataract challenge courses’, progressively increasing in complexity: CAT-A, where participants are trained in basic microsurgical skills; CAT-B, where participants are trained in individual steps of the cataract surgical process; CAT-C, where trainees practice advanced surgical techniques; and CAT-D, where participants are confronted with complex cataract surgery cases under demanding conditions with potential randomized tasks and complications [11]. After completing one of these elements satisfactorily three times consecutively, participants are encouraged to move on to the next, ‘more challenging’ test. Cataract surgery specialists have since verified that these stages truly do increase in difficulty as participants progress through the course. For every 60 minutes of training time, trainees have to perform a complete cataract procedure in sequential order – they only have one attempt at each cataract step and have 15 minutes to complete the entire procedure. This style of learning is akin to the learning models discussed in classical learning theory [12].
If one aspect of trainees’ performance is notably lacking, the simulator will adjust its tasks to hone in specifically on that skill so that trainees reach a general level of competence. This reflects the values and principles of deliberate practice, in that the simulator requires specific, focussed repetition of skills as well as providing specific and direct feedback to trainees after each iteration of the course.
We are aware of what the parameters of EyeSi are designed to assess on a biomechanical level. VRS is still a new technique used in Australia, and we are unable to correlate the data collected in this study with in vivo cataract surgery results – this work is still in progress and maybe the topic of future research. However, other studies have already proven that VRS is a safe, efficient and effective learning method for trainees, and that it has already markedly improved cataract surgical training as a whole [13].
These studies help to justify the software’s effectiveness simply as a training tool. However, they do little to show how exactly trainees learn, and if different learning curves can be modelled solely via the parameters the simulator measures. Whilst the raw numbers the parameters measure are well understood, a link still does not exist between these metrics and trainees’ approach to skill acquisition.
As such, the aim of our study was instead to determine if there were any correlations between simulation parameters – as knowing this may help to predict and identify trainee learning rates and styles, as well as flagging trainees who may require additional targeted learning (through a method such as deliberate practice).
This project was approved by the Royal Australia and New Zealand College of Ophthalmologists (RANZCO) Human Resources Ethics Committee (approval number 160.23).
Participants provided informed consent for their training data to be used for this analysis. The performance of 56 ophthalmology trainees from 2018 to 2022 at the Royal Victorian Eye and Ear Hospital was analysed, with the trainees ranging in experience from first year to fourth year. Data were extracted from PDF outputs of the data through the VRmagic software platform.
Throughout their training, trainees’ progress is modelled through multiple parameters – namely Initial Task Performance (ITP), Final Performance and Time-to-Gate. Feedback is provided to trainees not only on the time they take to complete each stage, but also on any specific segments that they demonstrated difficulty with. These ‘challenging’ components are then targeted in subsequent cases to ensure trainees are truly competent in all aspects of the course before progressing to the next stage.
After completing these stages, data were collected retrospectively solely on the EyeSi platform. The number of sessions needed to complete each stage was individual for each trainee. Each participant is given a percentile score for their overall ITP and Peak Performance after completion of the course. Their Time-to-Gate (time taken to reach a threshold score), total training time (number of minutes they have spent on the simulator) and overall raw score (Cataract Challenge Course [CCC] score out of 500 before percentile conversion) were measured.
Data were collected on the participant’s last five attempts on the course (each scored out of 500), for the highest and mean CCC scores. However, data still remain linked to the participant’s profile to calculate Initial Performance (mean of participants’ first three scores), Peak Performance (average of the participants’ best three consecutive scores) and Time-to-Gate (how long it takes trainees to reach the “reliability gate” – attaining three passing scores [50%] consecutively for a specific objective). The participant can see every raw score for every attempt, but these data are not available to educators [14].
The data were analysed using R [15]. The Caret [16] and Psych [17] packages were utilized to facilitate this data analysis.
Outlier CCC results were excluded as any task (even those which were measured at 0 if they were closed immediately) was measured as a data point by the EyeSi software, For this research, any score more than 200 below the participant’s highest was deemed to be an outlier. Fortunately, as other parameters rely on participants’ highest/initial scores, the removal of these low-scoring outliers has close to no effect on other parameters.
Descriptive statistics were collected on the data – mean, variance and standard deviation (SD) of each parameter were determined. Each parameter was graphed in order to express any noticeable visual trends. The kurtosis and skewness of every parameter were also determined.
The coefficients of correlation were determined between the different parameters measured in the data. Additionally, p-values and t-values were determined using the caret package in R.
The distribution of data was modelled, and Shapiro–Wilk tests were run on every parameter to ascertain the normality of the data.
Table 1 contains descriptive statistics about the study’s parameters. The parameters of ITP, Time-to-Gate and Peak Performance all had means of close to 50 – as they had already been converted to a percentile value within the EyeSi cohort (trainees in North America, South America and Australia). The mean highest CCC score was 441.7 (SD ±63.7), close to the maximum possible score of 500.
Mean | Standard deviation | |
---|---|---|
Initial Task Performance | 51.1 | 9.5 |
Time-to-Gate | 49.8 | 9.7 |
Peak Performance | 51.2 | 8.9 |
Total Training Time (minutes) | 1139.1 | 464.2 |
Highest CCC score | 441.7 | 63.7 |
Figure 1 collates the relationships between different parameters. The highest and mean CCC scores and Peak Performance were found to be strongly correlated with all the other measured parameters. Total Training Time negatively correlated with Time-to-Gate
Table 2 analyses the normality, skew and kurtosis of the data collected through the EyeSi software. From the Shapiro–Wilk testing, the authors determined that ITP and Peak Performance had a normal distribution (with p-values far above 0.05). Surprisingly, Time-to-Gate was determined to be non-normal through the Shapiro–Wilk test, whereas the raw data of the Total Training Time was found to be normal through both metrics. The highest and mean CCC scores also had very high kurtosis (leptokurtic) and negative skewness values.
Skew | Kurtosis | Shapiro–Wilk | |
---|---|---|---|
Initial Task Performance | 0.044 | -0.423 | 0.305 |
Time-to-Gate | 0.543 | 0.195 | 0.044 |
Peak Performance | 0.290 | −0.353 | 0.615 |
Total Training Time | 0.240 | −0.844 | 0.120 |
Highest CCC score | −3.047 | 10.533 | <0.001 |
Mean CCC score | −1.188 | 3.170 | <0.001 |
Whilst the EyeSi system has been used internationally in multiple countries for over a decade, this is the first study to the authors’ knowledge to analyse the EyeSi simulator’s data to identify correlations between the parameters themselves.
Our results indicate that most of the data (specifically in the ITP, Time-to-Gate, Peak Performance and Total Training Time categories) were relatively normally distributed as shown in Tables 1 and 2, and most of these parameters correlated as shown in Figure 1. Ostensibly, this is logical – as those who train for longer should theoretically develop more skills than those who do less training, and those who initially perform to a higher standard can likely maintain this advantage over those who initially struggle, although the magnitude of this skill difference does resolve over time.
The conversion of these metrics to a percentile poses some challenges to data analysis. Whilst Total Training Time is a raw number of minutes, and the highest and mean CCC scores are a score out of 500, the Time-to-Gate, Peak Performance and ITP have their output score presented as a percentile. Consequently, this may have caused discrepancies between the values of these parameters and the ‘true’ performance of trainees on the simulator, as many of the trainees’ results may have been clustered together, meaning a large percentile difference may not signify a large difference in raw scores. Participants’ ‘Peak Performance’ is also not directly related to participants’ highest CCC score (R = 0.45), as Peak Performance is calculated to be a percentile conversion of participants’ average score of their three best attempts in a row in the current task – as this is an average instead of one peak, participants’ Peak Performance is only weakly correlated to their highest CCC score, as attaining a high Peak Performance score also requires a degree of consistency. Due to all participants finishing the program competent (receiving three consecutive pass marks for the section in order to progress onto the next stage), their final marks in a specific course are often also the marks that contribute to their Peak Performance reading. This additionally complicates the task of ascertaining how participants’ final performance is affected by the parameters measured in their initial stages of cataract surgery training, as many of the highest CCC marks are clustered together (hence the distribution is negatively skewed).
Total Training Time is a highly useful metric for educators as it gives a rough indication as to the amount of time that should be scheduled and set aside for trainees to practice with the simulator [18].
However, the Total Training Time unintuitively demonstrated a weak correlation with Peak Performance (R = 0.19). Whilst its correlation with the highest CCC score (R = 0.49) was far stronger, the fact that Peak Performance is based on consecutive scores, rather than an average over a wider range, means that one poor score can severely drop one’s Peak Performance score much more than the highest CCC mark. Likely, as a consequence of the gated progression model which underlies the EyeSi program, those who spend longer on the software are unlikely to attain higher marks than those who progress more rapidly – as in both cases, the primary goal is ‘completion’ rather than mastery. This may specifically encourage participants to take less risks – as their goal is to attain a ‘minimum passing score’ three times consecutively, so taking a risk and failing the stage would cause a massive increase in the time taken to progress onto the next stage Those who take longer to complete stages are held back by an inability to complete the course, rather than a desire to truly master each section of the CCC. This may be impacted by other commitments such as family, or other training opportunities. also not providing them with a large points boost. Whilst most trainees’ main motivation is likely still the completion of the program, amassing very few points after substantial work also serves to dissuade trainees from continued progress. All of these factors mean that total training time does not effectively predict peak performance, as the ‘reliability gates’ that participants strive to reach remain the same regardless of one’s progression speed.
Additionally, the term ‘Total Training Time’ gives no indication to trainees’ quality of practice – amongst the cohort of trainees, there is likely a variety of learning styles and speeds which lead to marked inter-trainee variation in Total Training Time. Another issue with this metric is that some trainees may have been practising cataract surgery prior to commencing this course, meaning that many of the skills covered in Cat-A may have simply been revised for certain trainees, artificially lowering their Total Training Time measurement.
The strongest correlation between parameters (R = 0.79) was found between ITP and Peak Performance. Peak Performance was not directly correlated with participants’ highest score, so the correlation between ITP and the highest CCC score was weak (R = 0.33).
However, students’ final performance strongly depended on their initial performance, which is not a trend limited to our study. Literature from observations in medical schools [19], engineering schools [20] and chiropractors [21] has shown that grades in their initial period of study have a significant effect on their final and peak level of performance. To a certain extent, our data highlight that trainees with a higher degree of preliminary or baseline ability are bound to achieve threshold milestones earlier, rewarded especially through the ‘gated progression’ model of the EyeSi simulator.
Some researchers have theorized that this is due to the effect of marks on one’s psyche – the beneficial impacts of positive results on one’s self-efficacy and beliefs in one’s own ability [4] and the resulting adverse effects of substandard results. Other authors have stated that initial performance reflects not solely on ‘natural talent’ but also on one’s ‘resourcefulness, vigour and hardiness’ [22], and as these skills are required to perform at the highest level, initial and peak performances are hence profoundly intertwined.
However, this does not mean that all fields follow this relationship – in many, those who are deemed as more “naturally talented,” – that is, those who initially demonstrate a higher level of ability can stagnate, not attaining high levels of achievement possible to them [23].
Another interesting finding is that Time-to-Gate and Total Time Taken have no positive correlation (R = −0.24)
The ‘gate’ in Time-to-Gate refers to a ‘reliability gate’ where participants have to complete a task successfully three times in succession. The time to reach this gate is compared to other users, and if a gate is not reached, the remaining time is estimated and converted to a percentile value. A higher Time-to-Gate value implies that a participant is faster. As a result, a longer Total Time Taken would convert to a lower Time-to-Gate value, as the participant is completing tasks at a slower speed. Whilst they might simply be slower, they may also be intending to master their skills by repeating certain challenges as one’s Time-to-Gate value begins being measured immediately after the completion of the previous stage, these two parameters are inherently negatively correlated.
Those with a higher Total Time Taken may have spent more time on specific tasks, or they may simply have completed more tasks at a faster rate.
As a consequence, Time-to-Gate has a higher correlation with Peak Performance than Total Time taken (0.43–0.19), and this relationship is the opposite for trainees’ highest CCC score.
Trainees are provided with a huge amount of data concerning their performance after each training task. Each trainee receives a detailed evaluation report on various parameters such as instrument handling, surgical efficiency, tissue handling as well as live feedback during the procedure to point out surgical mistakes. At the initial stage of trainees’ learning, they receive substantial visual guidance from the system, with pop-ups indicating distances and ideal speeds – these start to disappear as trainees’ skill level improves.
However, even despite this, it is impossible to truly ascertain the extent to which ‘deliberate practice’ is being conducted in trainees’ usage of the VRS, largely because there was no real longitudinal follow-up beyond the completion of the simulator’s mandatory modules. It is without a doubt that each task of cataract surgery in a simulated context can be repeated, refined, feedback and improved in the framework of deliberate practice. However, this study could not clarify that specifically. Whilst the aim of this training technique is to simulate the role of a teacher in providing specific goals and focused feedback [24], this does not mean that trainees follow these recommendations and suggestions. It is possible to repeat the same task many times without improving or focusing on one’s flaws, more so than with an actual supervisor, even despite the pop-ups and ‘advice messages’ the software provides [25].
It may be assumed that most trainees are hence using ‘deliberate practice’ methods as set out by previous research [26], especially as this has been proven in other specialities like general surgery [27, 28]. However, it is difficult to conclusively prove this through the limited data available in each parameter. Whether trainees’ time on the simulator is purposeful and systematic cannot be determined through analysing these parameters, it can only be ascertained circumstantially.
Additionally, a core concept underlying deliberate practice is ‘mastery learning’, and this entails achieving a threshold score before progressing onto the next task. This is something which the EyeSi software does to an extent, but the aforementioned threshold score they set is at the arbitrarily defined 50% mark. Whilst this is the common score seen in academia for ‘passing classes’, target scores which vary depending on the task, with a greater scientific basis, would help to further reinforce the use of deliberate practice on the EyeSi simulator. This may be an intriguing area for future research.
Unfortunately, the tasks of the EyeSi simulator are highly specific to the CCC elements of cataract surgery. This means that other sections of cataract surgery are unlikely to be benefited to the same extent [29]. Especially due to the simulator’s gated progression method, it means that certain elements of cataract surgery are likely being neglected and need to be filled in via other avenues of surgical education, as vitreous loss unassociated with an errant CCC has proven to be unaffected by CCC training [28].
Ultimately, it is known comprehensively that deliberate practice can and will improve peak performance. However, its effect is diluted in this study by trainees being forced into a dilemma between repeating a level for little reward outside of their own intrinsic satisfaction or progressing onto the next stage for a far more significant ‘points boost’. This is especially relevant for those who are ‘time-poor’, as many trainees have to balance up competing factors like working hours, clinical responsibilities and family throughout their training. Consequently, training for the purpose of mastery may simply be infeasible for these trainees, hence pushing them to merely attempt to amass as many points as possible.
This is a relatively small study with the data of only 56 participants being utilized. The sample size could be increased if ophthalmic trainees across Australia and New Zealand are included; however, the EyeSi virtual reality software is not optimized for large-scale data collection.
In a study like this, the impact of outliers on the final outcome is quite substantial – and whilst results more than 200 below the participants’ final CCC score were deleted, a large number of ‘unreflective’ results remained in the data which skewed many of the metrics negatively – part of the reason why the highest and mean CCC scores displayed such a high level of skewness. In the EyeSi, incomplete runs are still recorded as complete, and as participants’ runs before their most recent five are automatically hidden from researchers, it is often difficult to get an accurate gauge of participants’ ‘peak ability’, as well as significantly challenging the task of modelling a trainee’s training trajectory for researchers, as we are not provided with access to these data. The EyeSi data also lack information about what year group different trainees originate from, making it difficult to determine whether or not a trainee’s level of experience plays a role in their skill acquisition.
Another study limitation is the lack of qualitative data. While this paper exclusively relies on quantitative data to analyse surgical simulation, it is essential to note that this approach has limitations. By omitting qualitative data, the study may overlook the nuanced psychological aspects of surgical training. Qualitative insights could provide a deeper understanding of motivation, self-efficacy and confidence development. The authors recommend conducting semi-structured interviews to enhance the findings in the following research iteration, especially as we look towards modelling trainee learning rates and styles to a more detailed degree.
Our results indicate that whilst virtual reality surgery training comes with its benefits (such as reducing early operative error and providing trainees with a less stressful environment to hone their skills), it may also not fully be facilitating ‘deliberate practice’ or mastery as trainees who initially perform best still typically reach the highest levels of simulator performance at the conclusion of their training due to the VRS’s emphasis on reaching certain milestones (gates). This also means that a degree of additional targeted learning may be lost on trainees who exceed the standard required of them – as whilst the software may provide ‘recommendations’, there is no real incentive for trainees to follow them.
Ultimately, it also appears necessary that more needs to be done to encourage participants to repeat ‘levels’ multiple times. The current virtual reality training software implementation is excellent in allowing trainees to learn multiple skills. However, it prioritizes possessing a wide range of competencies above mastery of fundamental techniques. Deliberate practice can only be encouraged in a system where mastery of fundamentals and basic skills remains a core tenet. However, logistical requirements mean that only a few trainees complicate this task significantly [30].
The EyeSi virtual reality cataract surgery simulator provides valuable insights into how trainees acquire microsurgical skills. Through the correlations between the parameters measured on the virtual reality software platform, it is indicated that a ‘gated progression’ model may not necessarily be the best way for participants to hone and develop their skills. In an ideal world, it would be best if participants all attained an equal, high skill level regardless of their baseline performance level. This is not achievable, given the different amounts of time trainees spend on the simulator and different baseline abilities. Our data showed that by far the best predictor of one’s final ‘performance’ on the simulator was their initial ability – likely a consequence of the platform promoting a ‘satisfactory’ level of completion for each ‘reliability’ gate, instead of encouraging excellence and mastery of specific skills. Consequently, those who perform better initially can progress through stages at a far faster rate (due to needing less time to learn), ultimately causing them to amass far more points than those who learn at a slower pace. Through our exploration of the relationships between the parameters on the EyeSi cataract surgery simulator, it has been shown that trainees possess different learning rates and styles – some of which fare far better in the ‘gated progression’ model of the simulator than others, but that the software does well in facilitating the deliberate and specific practice of certain important cataract surgical skills.
None declared.
None declared.
None declared.
None declared.
None declared.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.