Assessment of medical student clinical reasoning by “lay” vs physician raters: inter-rater reliability using a scoring guide in a multidisciplinary objective structured clinical examination

doi:10.1016/j.amjsurg.2011.08.003

The American Journal of Surgery

Volume 203, Issue 1, January 2012, Pages 81-86

https://doi.org/10.1016/j.amjsurg.2011.08.003 Get rights and content

Abstract

Background

To determine whether a “lay” rater could assess clinical reasoning, interrater reliability was measured between physician and lay raters of patient notes written by medical students as part of an 8-station objective structured clinical examination.

Methods

Seventy-five notes were rated on core elements of clinical reasoning by physician and lay raters independently, using a scoring guide developed by physician consensus. Twenty-five notes were rerated by a 2nd physician rater as an expert control. Kappa statistics and simple percentage agreement were calculated in 3 areas: evidence for and against each diagnosis and diagnostic workup.

Results

Agreement between physician and lay raters for the top diagnosis was as follows: supporting evidence, 89% (κ = .72); evidence against, 89% (κ = .81); and diagnostic workup, 79% (κ = .58). Physician rater agreement was 83% (κ = .59), 92% (κ = .87), and 96% (κ = .87), respectively.

Conclusions

Using a comprehensive scoring guide, interrater reliability for physician and lay raters was comparable with reliability between 2 expert physician raters.

Section snippets

Methods

A comprehensive scoring guide was developed to assess medical student patient notes for a multidisciplinary abdominal pain case. The case, which is briefly presented in Figure 1, is a 42-year-old woman with acute left lower quadrant abdominal pain. This case was specifically designed to have several possible differential diagnoses, ranging across several different clinical specialties, and was used as 1 of 8 cases in a high-stakes (passing grade required for graduation) OSCE administered to

Results

Cronbach's α coefficient was low for both the physician rater (.54) and the lay rater (.58), suggesting that the individual domains of clinical reasoning may be relatively independent. Agreement between the physician and lay rater in the initial 25-note sample was as follows: supporting evidence, 84% (κ = .69); evidence against, 71% (κ = .62); and diagnostic workup, 73% (κ = .69). After additional training and consensus development, agreement improved substantially for evidence against (87%; κ

Comments

Clinical reasoning is a complex entity that is not easily operationalized or assessed. Performance on the USMLE Step 2 Clinical Knowledge section has been shown to have minimal redundancy with performance on the Step 2 Clinical Skills section; therefore, it is essential that both components be adequately assessed.¹³ The OSCE-based USMLE Step 2 Clinical Skills section evaluates competency in the following categories: integrated clinical encounter (including data gathering from history and

Conclusions

The findings of this study suggest that with adequate training, lay raters may act as examiners in the assessment of the patient note clinical reasoning score in a multidisciplinary, high-stakes OSCE.

References (18)

S.R. Simon et al.
The relationship between second-year medical students' OSCE scores and USMLE Step 2 scores
J Eval Clin Pract
(2007)
A. Cuschieri et al.
A new approach to a final examination in surgeryUse of the objective structured clinical examination
Ann R Coll Surg Engl
(1979)
R.M. Harden et al.
Assessment of clinical competence using an objective structured clinical examination (OSCE)
Med Educ
(1979)
R.M. Harden et al.
Assessment of clinical competence using objective structured examination
Br Med J
(1975)
J. Wallenstein et al.
A core competency-based objective structured clinical examination (OSCE) can predict future resident performance
Acad Emerg Med
(2010)
E. Friedman et al.
Taking note of the perceived value and impact of medical student chart documentation on education and patient care
Acad Med
(2010)
G. Regehr et al.
Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination
Acad Med
(1998)
C. Mertler
Designing scoring rubrics for your classroom
Pract Assess Res Eval
(2001)
M.F. Ben-David et al.
Issues of validity and reliability concerning who scores the post-encounter patient-progress note
Acad Med
(1997)

There are more references available in the full text version of this article.

Cited by (11)

(En)trust me: Validating an assessment rubric for documenting clinical encounters during a surgery clerkship clinical skills exam
2020, American Journal of Surgery
Citation Excerpt :
There are various types of assessment tools that can take on holistic or analytic scoring forms and there has been some literature to support validity and reliability differences based on the type of scoring system utilized.18 Analytic scoring systems have been found to produce more reliable results.18 In the development of our rubric, we sought to include both analytic and holistic elements.
The AAMC developed 13 Core Entrustable Professional Activities (EPAs) for graduating medical students. EPA 5 is: Document a clinical encounter in the patient record. Our goal was to develop an assessment rubric and gather evidence to support its validity in measuring progress towards entrustability.
A rubric was developed for EPA 5. During the 2017 surgery clerkship, 57 students wrote a note for each of two standardized patient (SP) encounters. These notes were prospectively collected and assessed by two physician raters. Messick's validity framework was used to gather validity data.
Inter-rater reliability with two raters was excellent, ICC = 0.86 (ICC 95%, confidence interval (CI) 0.80–0.90) for overall note score. Correlation between note items and SP checklists ranged 0.39–0.46 (p < 0.05) and between note items and clinical evaluations 0.28–0.39 (p < 0.05).
There is initial reliability evidence supporting the use of our rubric for assessing progress towards entrustability of EPA 5.
Does objective structured clinical examinations score reflect the clinical reasoning ability of medical students?
2015, American Journal of the Medical Sciences
Citation Excerpt :
Although global rating by experts is regarded as the “gold standard” for clinical reasoning assessment,23 here the authors used analytic scoring to evaluate clinical reasoning ability. Compared with global rating, analytic scoring is known to be an effective method of giving feedback and to have increased reliability over global ratings.4 Furthermore, because the analytic score was rated by physicians, the scoring system to evaluate clinical reasoning might be more reliable than other analytic scoring systems.
Clinical reasoning ability is an important factor in a physician's competence and thus should be taught and tested in medical schools. Medical schools generally use objective structured clinical examinations (OSCE) to measure the clinical competency of medical students. However, it is unknown whether OSCE can also evaluate clinical reasoning ability. In this study, the authors investigated whether OSCE scores reflected students' clinical reasoning abilities.
Sixty-five fourth-year medical students participated in this study. Medical students completed the OSCE with 4 cases using standardized patients. For assessment of clinical reasoning, students were asked to list differential diagnoses and the findings that were compatible or not compatible with each diagnosis. The OSCE score (score of patient encounter), diagnostic accuracy score, clinical reasoning score, clinical knowledge score and grade point average (GPA) were obtained for each student, and correlation analysis was performed.
Clinical reasoning score was significantly correlated with diagnostic accuracy and GPA (correlation coefficient = 0.258 and 0.380; P = 0.038 and 0.002, respectively) but not with OSCE score or clinical knowledge score (correlation coefficient = 0.137 and 0.242; P = 0.276 and 0.052, respectively). Total OSCE score was not significantly correlated with clinical knowledge test score, clinical reasoning score, diagnostic accuracy score or GPA.
OSCE score from patient encounters did not reflect the clinical reasoning abilities of the medical students in this study. The evaluation of medical students' clinical reasoning abilities through OSCE should be strengthened.
A Multi-institutional Study of the Feasibility and Reliability of the Implementation of Constructed Response Exam Questions
2023, Teaching and Learning in Medicine
A resource efficient and reliable standard setting method for OSCEs: Borderline regression method using standardized patients as sole raters in clinical case encounters with medical students
2022, Medical Teacher
Assessing student competencies in antibiotic stewardship and patient counseling
2020, Family Medicine
Ethnic and Gender Biases in Clinical Performance Assessment (CPA) in Healthcare Education A Systematic Review
2019, Research Square

View all citing articles on Scopus

View full text

The Association for Surgical EducationAssessment of medical student clinical reasoning by “lay” vs physician raters: inter-rater reliability using a scoring guide in a multidisciplinary objective structured clinical examination

Abstract

Background

Methods

Results

Conclusions

Section snippets

Methods

Results

Comments

Conclusions

The relationship between second-year medical students' OSCE scores and USMLE Step 2 scores

J Eval Clin Pract

A new approach to a final examination in surgeryUse of the objective structured clinical examination

Ann R Coll Surg Engl

Assessment of clinical competence using an objective structured clinical examination (OSCE)

Med Educ

Assessment of clinical competence using objective structured examination

Br Med J

A core competency-based objective structured clinical examination (OSCE) can predict future resident performance

Acad Emerg Med

Taking note of the perceived value and impact of medical student chart documentation on education and patient care

Acad Med

Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination

Acad Med

Designing scoring rubrics for your classroom

Pract Assess Res Eval

Issues of validity and reliability concerning who scores the post-encounter patient-progress note

Acad Med

The Association for Surgical Education
Assessment of medical student clinical reasoning by “lay” vs physician raters: inter-rater reliability using a scoring guide in a multidisciplinary objective structured clinical examination