The reliability of holistic and analytic evaluations of EFL essays by Turkish university preparatory students
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
BUIR Usage Stats
views
downloads
Series
Abstract
T his s tu d y attem p ted to in v e stig a te a reliable m ethod of sco rin g e ss a y s. Two h y p o th e se s w ere te s te d . O b serv atio n s w ere made p erta in in g to th e sco rin g sy stem u sed a t th e p re p a ra to ry school of Ç u k u ro v a U n iv e rsity . A to tal of 150 EFL p re p a ra to ry s tu d e n ts p a rtic ip a te d in th e s tu d y . T hese s tu d e n ts w rote two e ss a y s: one fo r th e f ir s t h y p o th e sis and one fo r th e seco n d . The f ir s t e ss a y s w ere ra te d aneilytically by th e te a c h e rs a t Ç u k u ro v a U n iv e rsity . The second e ss a y s w ere ra te d holistically and a n a ly tically by fo u r r a te r s who have ex p erien ce a t EFL teach in g situ atio n fo r a t le a s t five y e a rs . C o rrelatio n s w ere made to find th e re la tio n sh ip s betw een th e sco res giv en by th e r a te r s fo r th e sco rin g m ethods. The f ir s t h y p o th e sis was th a t th e sco rin g system u sed a t Ç u kurova U n iv ersity did n o t have a h igh level of re lia b ility . The co rrelatio n al an aly sis of d a ta re je c te d th is h y p o th e sis (r= .9 7 ). H ow ever, d e sc rip tiv e an aly sis show ed th a t th e c o rrelatio n of th e sco res alone would not be su ffic ie n t to claim th a t th is system was reliab le. In fa c t, o b serv atio n s in d icate th e r a te r s who sco red e ssa y s fo r th e second time saw th e f ir s t sc o re s, th u s c re a tin g a self-fu lfillin g b ias. The second h y p o th e sis was th a t holistically sco red e ssa y s have sig n ifican tly g r e a te r re lia b ility th a n an aly tically sco red ones in th is ed u cational c o n te x t. The a n aly sis of d ata was tw ofold: in te r r a te r reliab ility and in tr a r a te r re lia b ility . The co rrelatio n fo r in te r r a te r reliab ility in d icated th a t both sco rin g sy stem s had high re lia b ilitie s. The in te r r a te r reliab ility of holistic sco rin g m ethod was .85, and of an aly tic sco rin g method was .84. The d ifferen ce is negligible. Since th e an aly tic sco rin g m ethod has five c a te g o rie s, th e stu d y in v e stig a te d th e reliab ility of each c ate g o ry in d iv id u ally as well as th e to ta l. The an aly sis of categ o ries rev ealed th a t th e reliab ility of th e c ate g o rie s was not as high as th e to ta l sco res fo r aneilytic ra tin g . The in te r r a te r re lia b ility was .75 fo r c o n te n t, .69 fo r o rg an izatio n , .80 fo r v o c ab u la ry , .82 fo r lan g u ag e u se , and .71 fo r m echanics. The c o rre la tio n s fo r in tr a r a te r reliab ility show ed th a t th e re was not a sig n ific a n t d ifferen c e betw een th e two sco rin g m ethods (p<.01 fo r both s c o rin g ). The in tr a r a te r reliab ility of holistic sco rin g ra n g e d from .70 to .85 and of an aly tic sco rin g from .65 to .86. H ow ever, th e c ate g o rie s sco red on th e an aly tic ru b ric had low in tr a r a te r relia b ilitie s. The in tr a r a te r reliab ility ra n g e d from .34 to .83 fo r c o n te n t, from .23 to .81 fo r o rg an izatio n , from .46 to .80 fo r v o cab u lary , from .63 to .77 fo r lan g u ag e u se, and from .55 to .80 fo r m echanics. We may conclude th a t holistic sco rin g is more reliab le th a n analytic sco rin g . A lthough th e to tal sco res of an aly tic sco rin g m ight have high re lia b ility , th e c ate g o rie s of th is sco rin g m ethod m ight have v e ry low reliab ility w hich may rciise a q u estio n ab o u t th e reliab ility of analytic sco rin g .