Tailored cloze: an improved cloze procedure using acceptable word scoring and classical item analysis techniques
Embargo Lift Date: 2100-11-30
Güven Kurtay, Suna
Lim, Phyllis L.
Item Usage Stats
This study investigated the possibility of improving the reliability and validity as well as the mean and standard deviation of a cloze procedure by redesigning it with the outcome of classical item analysis when acceptable word scoring method was applied. The study was conducted at Istanbul Technical University, at the graduate English Language Preparatory School. The English language proficiency level of the students at the time of the study ranged from intermediate to upper-intermediate. The study consisted of two phases in which students were asked to complete a 30-item cloze test. In the first administration 329 students and in the second administration, 262 students participated in the study. For the first part of the data analysis, the data gathered from the first administration were used. For the second part of data analysis, on the other hand, only subjects who took part in both administrations, namely 231 subjects, were included. A 389-word passage was chosen from an intermediate ESL (English as a Second Language) reader, and five every-tenth-word deletion cloze passages, each having a different starting point in the text, were prepared. One of each of these five different versions was randomly assigned to students in the first administration. After this, items in all versions, namely 150 items, were analyzed for item facility and item discrimination indices, and based on certain criteria, 30 items were selected to be used in the tailored cloze version which was used in the second administration. The outcome of the tailored cloze test was compared to that of the original cloze tests for five sample groups. The validity coefficient improved significantly for the tailored cloze test in four of the samples. The reliability estimates for these same four samples were also larger (although not statistically tested for significance). The means improved significantly for three sample groups. However, the standard deviation for only one sample was improved significantly. Due to changes in the context surrounding individual items in the two versions--original and tailored cloze tests--it was expected that item facility and item discriminability might also change. However, the difference between mean item facility and mean item discriminability by sample separation and by point biserial correlation were found nonsignificant. To sum up, the results indicated that there was some improvement in test quality for most samples. However, improvement was not found for all statistical properties for all samples. Improvement semed most noticeable for validity and least noticeable for the standard deviation.