|Year : 2022 | Volume
| Issue : 1 | Page : 92-95
Evaluation of multiple-choice questions by item analysis, from an online internal assessment of 6th semester medical students in a rural medical college, West Bengal
Sharmistha Bhattacherjee, Abhijit Mukherjee, Kallol Bhandari, Arup Jyoti Rout
Department of Community Medicine, North Bengal Medical College, Darjeeling, West Bengal, India
|Date of Submission||26-Aug-2021|
|Date of Acceptance||12-Feb-2022|
|Date of Web Publication||16-Mar-2022|
Dr. Arup Jyoti Rout
Department of Community Medicine, North Bengal Medical College, Darjeeling - 734 012, West Bengal
Source of Support: None, Conflict of Interest: None
| Abstract|| |
Background: Properly constructed single best-answer multiple choice questions (MCQs) or items assess higher-order cognitive processing of Bloom's taxonomy and accurately discriminate between high and low achievers. However, guidelines for writing good test items are rarely followed, leading to generation and application of faulty MCQs. Materials and Methods: During lockdown period in 2020, internal assessment was taken through online mode using Google Forms. There were 60 'single response type' MCQs, each consisting of single stem and four options including one correct answer and three distractors. Each item was analyzed for difficulty index (Dif I), discrimination index (DI), and distractor efficiency (DE). Results: The mean of achieved marks was 42.92± (standard deviation [SD], 5.07). Dif I, DI, and DE were 47.95± (SD 16.39) in percentage, 0.12± (SD 0.10), and 18.42± (SD 15.35), respectively. 46.67% of the items were easy and 21.66% were of acceptable discrimination. Very weak negative correlation was found between Dif I and DI. Out of total 180 distractors, 51.66% were nonfunctional one. Conclusion: Item analysis and storage of MCQs with their indices provides opportunity for an examiner to select MCQs of appropriate difficulty level as per the need of assessment and decide their placement in the question paper.
Keywords: Bloom's taxonomy, difficulty index, discrimination index, distractor efficiency, item analysis, multiple-choice questions
|How to cite this article:|
Bhattacherjee S, Mukherjee A, Bhandari K, Rout AJ. Evaluation of multiple-choice questions by item analysis, from an online internal assessment of 6th semester medical students in a rural medical college, West Bengal. Indian J Community Med 2022;47:92-5
|How to cite this URL:|
Bhattacherjee S, Mukherjee A, Bhandari K, Rout AJ. Evaluation of multiple-choice questions by item analysis, from an online internal assessment of 6th semester medical students in a rural medical college, West Bengal. Indian J Community Med [serial online] 2022 [cited 2022 Jul 1];47:92-5. Available from: https://www.ijcm.org.in/text.asp?2022/47/1/92/339746
| Introduction|| |
Assessment of students by multiple-choice question (MCQ or item) is an well acceptable method for its (1) objectivity, (2) comparability, and (3) minimized assessor's bias.
In India, single best-answer MCQs have been commonly used for medical entrance and university examinations. It is a popular tool of assessment because such tests can be taken for a large number of students, easily scored, help in controlling cheating, and enable teachers to cover a wider range of syllabus. These types of questions were twice more reliable in evaluation of the students' knowledge compared to short-answer questions. Properly constructed MCQs assess higher-order cognitive processing of Bloom's taxonomy (interpretation, synthesis, and application of knowledge) instead of just testing recall of isolated facts and are thus able to accurately discriminate between high and low achievers.,
One best response type MCQs consist of a stem, one correct or best response (key), and few more wrong choices (distractors). The main challenge in preparing MCQs is to construct good test items, which requires good depth of knowledge of the subject, understanding of the objectives of assessment, and good skills in writing the items., Obviously, there are many guidelines for writing good test items but they are rarely followed, leading to the generation and application of faulty MCQs.
Item analysis is a process, which examines student responses to individual test items (questions) in order to assess the quality of those items and of the test as a whole. It is especially valuable in improving items, which will be used again in later tests.
Due to countrywide lockdown in 2020 owing to COVID pandemic, conventional internal assessment (offline answering of long and short answer type questions) was not conducted in a rural medical college, West Bengal. Hence, the Department of Community Medicine had decided to take the test in online mode using MCQs, followed by item analysis.
| Materials and Methods|| |
Ninety-eight MBBS students of 6th semester appeared for an internal assessment on August 14, 2020, through online mode using Google Forms. There were 60 “single response type” MCQs consisting 1 mark each without any negative marking for wrong answer/s. The time allotted was 80 min. The MCQs were constructed by all teachers in the department. All MCQs had single stem, one correct answer (key), and three incorrect alternatives (distractors). Each item was analyzed for difficulty index (Dif I), discrimination index (DI), and distractor efficiency (DE). The data so obtained were entered in MS Excel 2019 and analyzed. Scores of 98 students were arranged in descending order and were divided into three groups. The first group consisting of 1/3rd of total students with higher marks (top third) are labeled as high achievers and the 2nd group consisting of 1/3rd of total students with lower marks (bottom third) are labeled low achievers. Middle 1/3rd was discarded.
Calculations were made using the following formulae:,
- Dif I = (h + l)/n × 100
- DI = 2 (h– l)/n.
h = Number of students answering correctly in high achievers' group = 33 students
l = Number of students answering correctly in the low achievers' group = 33 students
n = Total number of students in both groups including nonresponders = 66 students
- Difficulty Index (Dif I)
Difficulty index describes the percentage of students who answered the item correctly and ranges between 0 and 100%. The higher the Dif I value; the lower is the difficulty (easy) and the lower the Dif I value; the greater is the difficulty of an item. Dif I >70% is considered as easy items, <30% as difficult and in-between percentage are acceptable.
DI is the ability of an item to distinguish between high and low achievers. It ranges from 0 to ≥0.4. Higher the DI, better the discrimination among high and low achievers. Negative DI means defective item/wrong key and the students of lower ability answer more correctly than those with higher ability.
Students who have not mastered the subject should choose the distractors more often, whereas the well-prepared students should discard them more frequently while choosing the correct option. Any distractor that has been selected by <5% of the students is considered to be a non-functional distractorsr (NFD). Items containing no NFDs have 100% DE, while items with 3 NFDs have no DE.
| Results|| |
Sixty MCQs with their 240 options (60 correct options and 180 distractors) were analyzed. The mean of achieved marks was 42.92± (standard deviation [SD] 5.07). Dif I, DI, and DE were 47.95± (SD 16.39) in percentage, 0.12± (SD 0.10), and 18.42± (SD 15.35), respectively [Table 1]. Items that can be categorized as difficult are found to be 15%, whereas 46.67% of the items were easy [Table 2]. Items with poor DI were 70% and 21.66% were of acceptable discrimination. Negative discrimination showed by 6.67% of the items [Table 3]. Very weak negative correlation was found between Dif I and DI [Figure 1]. Out of total 180 distractors, 51.66% were nonfunctional one. 1 NFD and 2 NFDs were found in 35% of items each. 16.67% items had all the three distractors as NFDs, whereas only 13.33% items had no NFD [Table 4].
|Table 1: Distribution of items according to mean±standard deviation of outcome variables (n=60)|
Click here to view
|Table 2: Distribution of items according to their difficulty index (n=60)|
Click here to view
|Table 3: Distribution of items according to their discrimination index (n=60)|
Click here to view
|Figure 1: Distribution of items according to correlation between difficulty index and discrimination index|
Click here to view
|Table 4: Distribution of items according to their distractor efficiency (n=60)|
Click here to view
| Discussion|| |
One-best multiple-choice questions
A large portion of curriculum is assessed in a short period of time requiring less effort on behalf of the student, although it takes a lot of effort and time spent by the examiner to make high quality one-best MCQs, as compared to descriptive questions. One-best MCQ is an efficient tool in identifying the strengths and weaknesses in students, as well as providing guidelines to teachers on their educational protocols.
Dif I, also called ease index, describes the percentage of students who correctly answered the item. It measures 'How difficult or easy the questions were?' Too difficult items (DIF I ≤30%) will lead to deflated scores, while the easy items (DIF I >70%) will result into inflated scores and a decline in motivation.
Two studies had shown that their mean of DIF I were 39.4 ± 21.4 and 52.53 ± 20.59, respectively., The mean Dif I of the present study was somewhere in between those two findings. The reason behind most of the items being easy could be most of the questions were from 'must know' part of the syllabus so proportion of marking the correct option was soaring in both high and low achievers.
Too easy items should be placed either at the start of the test as “warm-up” questions or removed altogether, similarly too difficult items should be reviewed for possible confusing language, areas of controversies, or even an incorrect key.
The difficulty and discrimination indices are often reciprocally related. While questions with high Dif I (easier questions) are considered as poor discriminators, questions with low Dif I (harder questions) are considered as good discriminators. In the present study, most of the items were of poor discrimination. As we have found that Dif I was mostly easy, assuming that those items were attempted correctly by every student, it renders poor discrimination.
In negative DI, students of lower ability answer questions correctly than those with higher ability. Reasons for negative DI can be wrong key, ambiguous framing of question, or generalized poor preparation of students. The present study was also not free from wrong key, but the proportion remained below 7%. Another reason may be a student of lower ability by guess selects correct response, while a good student suspicious of an easy question takes harder path to solve and end up being less successful.
It is actually a relationship between the total test score and the distractor chosen by the students.
More nonfunctional distractors (NFDs) in an item increases DIF I (makes item easy) and reduces DE, conversely item with more functioning distractors decreases DIF I (makes item difficult) and increases DE. The present study showed that more than half of the distractors were NFDs (reduced DE) and most of the test items were easy to answer (increased DIF I). Possible explanation may be inability of the teachers to choose good distractors. However, near similar results were reported by Namdeo and Sahoo with 53.4% NFDs. However, in contrast, Gajjar et al. reported only 11.4% NFDs, while Hingorjo et al. reported a mean DE of 81.4%, which is much higher than present mean of DE.,
| Conclusion|| |
MCQs cover wide area of the subject in a short period of time, are preferred method of objective assessment, and selection of good MCQs can obviously judge knowledge of the students. Item analysis is a simple procedure for evaluation of validity and reliability of MCQs. Item analysis and storage of MCQs with their indices provides opportunity for an examiner to select MCQs of appropriate difficulty level as per the need of assessment and decide their placement in the question paper.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Gajjar S, Sharma R, Kumar P, Rana M. Item and test analysis to identify quality multiple choice questions (MCQs) from an assessment of medical students of Ahmedabad, Gujarat. Indian J Community Med 2014;39:17-20.
] [Full text]
Kartha CC. Anguish of a medico. Curr Sci 2005;89:725-6.
Kuechler WL, Simkin MG. How well do multiple choice tests evaluate student understanding in computer programming classes? J Inf Syst Educ 2003;14:389-400.
Haladyna TM. Guidelines for developing MC items. In: Haladyna TM, editor. Developing and Validating Multiple-Choice Test Items. 3rd
ed. Mahwah, New Jersey: Lawrence Erlbaum Associates; 2004. p. 97-126.
Huang YM, Trevisan M, Storfer A. The impact of the 'all-of-the-above' option and student ability on multiple choice tests. Int J Scholarsh Teach Learn 2007;1:1-13.
Downing SM. The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract 2005;10:133-43.
Bai X, Ola A. A tool for performing item analysis to enhance teaching and learning experiences. Issues Inf Syst 2017;18:128-36.
Ananthkrishnan N. The item analysis. In: Medical Education Principles and Practice. 2nd
ed. Pondicherry: JIPMER; 2000. p. 131-7.
Tejinder S, Piyush G, Daljit S. Principles of Medical Education. 3rd
ed. India: Jaypee Brothers, Medical Publishers Pvt. Limited; 2009. p. 70-7.
Mitra NK, Nagaraja HS, Ponnudurai G, Judson JP. The levels of difficulty and discrimination indices in Type A multiple choice questions of preclinical semester 1 multidisciplinary summative tests. Jpn Soc Mech Eng 2009;3:2-7.
Tarrant M, Ware J, Mohammed AM. An assessment of functioning and non-functioning distractors in multiple-choice questions: A descriptive analysis. BMC Med Educ 2009;9:40.
Tan LT, McAleer JJ; Final FRCR Examination Board. The introduction of single best answer questions as a test of knowledge in the final examination for the fellowship of the Royal College of Radiologists in Clinical Oncology. Clin Oncol (R Coll Radiol) 2008;20:571-6.
Pande SS, Pande SR, Parate VR, Nikam AP, Agrekar SH. Correlation between difficulty and discrimination indices of MCQs in formative exam in physiology. South East Asian J Med Educ 2013;7:45-50.
Hingorjo MR, Jaleel F. Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. J Pak Med Assoc 2012;62:142-7.
Carroll RG. Evaluation of vignette-type examination items for testing medical physiology. Am J Physiol 1993;264:S11-5.
Namdeo SK, Sahoo B. Item analysis of multiple-choice questions from an assessment of medical students in Bhubaneswar, India. Int J Res Med Sci 2016;4:1716-9.
[Table 1], [Table 2], [Table 3], [Table 4]