A comparison of 3- and 4-option multiple-choice items for medical subspecialty in-training examinations – BMC Medical Education

0
76


Our findings are consistent with the literature that there were minimal changes in physician performance and psychometric properties when changing 4- to 3-option MCIs [1, 4, 9, 10, 15]. Because no physicians chose one of its distractors, 9.3% of 4-option ITE-CCM items and 29.3% of 4-option ITE-PA items were de facto 3-option MCIs; the proportion of such items were likely higher among high-performing examinees as they tended to eliminate most obvious distractors and reduce their choices to one or two options. It is not surprising that the changes in physician percent-correct score and item difficulty were statistically significant for the ITE-CCM only, with a small-to-medium effect size. Agreeing with the literature [5, 7, 9], the results of distractor analyses for both subspecialty ITEs showed that the percentage of NFDs across all methods and criteria decreased and the percentage of items without any NFDs increased when the number of MCI options was reduced. This provides supportive evidence that 3-option MCIs are sufficient to distract examinees with lower overall ability and that the 4-option MCIs may be too “fat” to provide any additional meaningful information [15]. Therefore, these findings point to the feasibility of transitioning to 3-option MCIs for medical subspecialty certification exams.

Consistent with previous studies [4], physicians took less time to answer 3-option than 4-option MCIs for both subspecialty ITEs. On average, physicians’ response time to ITE-CCM items was longer than ITE-PA items, and response speed for 3-option ITE-PA items was actually slower than for 4-option ITE-PA items. The average word count per item was 68.4, 65.7, 42.4, and 39.2 for 4-option ITE-CCM, 3-option ITE-CCM, 4-option ITE-PA, and 3-option ITE-PA, respectively (Supplement Material 1 – sample questions for Critical Care Medicine and Pediatric Anesthesiology). Physicians seemed to have been required to obtain more information from a clinical vignette of a CCM item to make a diagnosis or a clinical judgment than a PA item, which was reflected in the fact that CCM items were longer than PA items and physicians spent about 10 more seconds per item on the ITE-CCM than the ITE-PA. In addition, the ITE-CCM included 50 images or tables and the ITE-PA included 9 images or tables, which were not reflected in the word count. Medical educators and test developers need to be aware that reducing the number of options per MCI may not necessarily save substantial testing time.

The change in the score reliability associated with the change in number of the options varies, depending on the option deletion method and other factors [1, 9, 10]. In general, if the least effective distractor is properly identified and removed from MCIs, the score reliability is expected to increase [16]. In this study, the reliability coefficient of the ITE-PA increased by 0.05 and yet the reliability coefficient of the ITE-CCM decreased by 0.01 after converting the 4-option questions to 3-option by removing the least effective distractor. Although the magnitude of these changes was minimal, it is worth noting that reducing the number of distractors does not necessarily positively impact the reliability of test scores. Reliability reflects how well the items on a test can consistently distinguish examinees with a range of abilities. The reason we found a minimal change of reliability coefficients may be the presence of a restricted range of physicians’ performance [17]. Only physicians enrolled in subspecialty fellowship training after completing a residency were eligible to take the exams. In general, physicians who are more motivated to specialize their clinical practice areas or more competent in medical knowledge and clinical skills are more likely to be admitted to fellowship programs. Also, the nature of the subspecialty and exam content may have played a role. The ITE-CCM was, overall, a more difficult and discriminative exam than the ITE-PA.

MCI distractors do not function equally well. Previous studies have shown that more than 90% of MCIs on medical exams have at least one distractor that attracts fewer than 5% of examinees [8, 9]. The distractor analyses in this study further support the conclusion that the quality of distractors matters more than the quantity [15]. Anecdotally, our question authors are very pleased with not having to come up with a 3rd distractor when constructing 3-option MCIs, which typically take disproportional amount of time compared to two distractors. We expect financial cost of developing 3-option MCIs will go down as the efficiency of developing such items increases. Although not having to create a 3rd distractor is desirable for question authors and test developers, it becomes more critical for 3-option MCIs that each distractor functions effectively (e.g., representing common misconceptions or errors in thinking and reasoning among lower-ability examinees). Training question authors in a systematic way, such as use of concept mapping and the crafting of realistic clinical scenarios, is essential for producing effective distractors to ensure successful implementation of 3-option MCIs [7]. In additional to the time and financial efficiency expected to be achieved by 3-option MCIs, these approaches could be particularly useful in the formative assessments to gauge learners’ understanding of important concepts and design instructions accordingly to clear common misconceptions or reasoning errors. Conversely, more advanced learners may appreciate fewer but higher-quality distractors as such distractors would promote more advanced reflective thinking of why they are plausible but not the correct answer.

This study was subject to limitations. First, the 4-option ITEs were administered in spring 2019 (prior to the Covid-19 pandemic), and the 3-option ITEs were administered in spring 2020 during the pandemic, which may affect the level of stress felt by the examinees. Second, our approach to the elimination of distractors was conservative in that distractors not chosen by any examinees were deleted first and then subject matter experts made their best judgment of which distractors should be deleted from other items. Using the sliding scale method to identify NFDs may accelerate the process of transforming 4-option to 3-option MCIs. Finally, no pass/fail decisions were made based on the ITEs reported in this study. Future studies should investigate how to set fair and defensible standard(s) for exams with 3-option MCIs if any definitive decisions have to be made about the examinees. The overall guess rate of the 3-option MCIs is naturally 33%, higher than the guess rate of 25% of 4-option MCIs. In contrast to the Classical Test Theory, the Item Response Theory estimates item parameters independently from the examinee samples [18]. When the sample size is large enough to achieve accurate parameter estimates (e.g., at least a thousand examinees for a 150-item exam [19], the 3-parameter Item Response Theory model could account for guess rate at the item level, in addition to item difficulty and item discrimination, which would help maintain the standard(s) more objectively.

In conclusion, this study extends previous evidence that 3-option MCIs function as robustly as their 4-option counterparts in item difficulty and item discrimination [4, 9, 10, 20] to the subspecialty ITEs offered by a medical specialty certifying board. Furthermore, the exam content and the distribution of examinee abilities may play an important role in the physician performance, response speed, response time, and psychometric properties of items and exams. Both quantitative indices and qualitative judgement from subject matter experts could contribute to identifying or revising ineffective distractors.

LEAVE A REPLY

Please enter your comment!
Please enter your name here