Comparing the effects of ChatGPT and automated writing evaluation on students’ writing and ideal L2 writing self

Comparing the effects of ChatGPT and automated writing evaluation on students’ writing and ideal L2 writing self

by Matt Bury -
Number of replies: 2
Picture of Particularly helpful Moodlers Picture of Plugin developers

Small #OpenAccess 11-week study reports that students receiving detailed, specific feedback on their writing, generated by ChatGPT, made better improvements than an established automated feedback system but the students felt worse about it. https://www.tandfonline.com/doi/full/10.1080/09588221.2025.2454541?mi=5fx7dw#abstract

Abstract

The affordances of ChatGPT in language learning and teaching have gained increasing traction. While studies began to investigate the potential of ChatGPT as a feedback provider, little attention was given to ChatGPT’s potential impact on students’ writing performance and the ideal L2 writing self vis-à-vis the established automated writing evaluation systems (AWE). To address these gaps, a sequential explanatory mixed methods design was adopted. One hundred and fifty second-year university students from three writing classes in a Chinese public university were recruited and randomly divided into a ChatGPT group, an AWE group, and a control group. After an eleven-week intervention, the ANCOVA results showed that while the ChatGPT group scored significantly higher than the AWE group and the control group in post-writing performance as measured by their writing score, in terms of students’ ideal L2 writing self, the ChatGPT group performed significantly lower than the AWE group with a medium effect size. Qualitative analysis of students’ reflection papers revealed students’ (over)reliance on the tool and the accompanying loss of creativity and agency. Pedagogical implications as well as directions for future research are also discussed.

Average of ratings: Useful (1)
In reply to Matt Bury

Re: Comparing the effects of ChatGPT and automated writing evaluation on students’ writing and ideal L2 writing self

by john kuti -
Picture of Particularly helpful Moodlers
I get the feeling that this article isn't telling us some of the key details.
 
In the screenshot example of students using chatgpt 
a student asks chatgpt to correct the mistakes in a story about making fried eggs with tomatoes
the prompt is simply "correct the mistakes in this paragraph" which I wouldn't consider feedback at all. 
From my point of view as a teacher of EAP, the comparison system (which also appears to be a kind of more specialised LLM) looks better, because it's trained on typical mistakes for Chinese speakers. On the other hand, with better prompts, I would imagine that you could get helpful feedback from chatgpt too.
Average of ratings: Useful (1)
In reply to john kuti

Re: Comparing the effects of ChatGPT and automated writing evaluation on students’ writing and ideal L2 writing self

by Matt Bury -
Picture of Particularly helpful Moodlers Picture of Plugin developers

AFAIK, the whole issue of the efficacy of explicit form-focused directive & epistemic feedback is still contentious even though nobody has yet provided good evidence that it improves writing accuracy in subsequent writing, i.e. not just a redraft. See the many articles by John Truscott on this, e.g.Truscott, J. (2019). The Effectiveness of Error Correction: Why Do Meta-analytic Reviews Produce Such Different Answers? In Epoch making in English teaching and learning: A special monograph for celebration of ETA-ROC’s 25th anniversary (pp. 129–141). https://www.researchgate.net/publication/335106040_The_Effectiveness_of_Error_Correction_Why_Do_Meta-analytic_Reviews_Produce_Such_Different_Answers

I very much doubt that the effects of explicit form-focused directive & epistemic feedback could account for an effect size of d .7, so I suspect that something else is happening between the students receiving what are essentially GPT LLM generated "recasts" of their compositions. Is it that they get to read & cast an analytical eye over high-quality versions of their previously expressed ideas & arguments & that it encourages linguistic uptake in some way? I'm very curious about how they got to those large effect sizes!