Assessment and GenAI

Often assessment focuses on the product and not the process; the submitted essay gets graded and not the process of writing it. Students can generate these material results of an assessed activity – a short essay, presentation, poster, or audio or video recording – more quickly and with a higher degree of linguistic accuracy than they can create them without the easily available GenAI tools (Fawzi, 2023). The same is even more likely for the translation of sentences in quizzes, for example. Entering into a whole dialog (as described above), however, puts more emphasis on the writing process.

Photo by Max Fischer on Pexels.com

Given this high level of form accuracy in generated and translated texts, it is also difficult to make the linguistic accuracy of the product, the text, the sole or even the main criterion for assessment, as is often done by language teachers. A lower level of accuracy could simply mean that the student did not use GenAI or other (inappropriate) tools (Bowen & Watson, 2024, p. 148ff.). In assessment, a changed focus and different strategies are necessary at the stage of course or lesson planning already. When assessing the proficiency development of students, test tasks need to be designed such that the process – of writing, for example – can be assessed rather than the product. Multiple sketches and drafts need to be submitted and are assessed to have different windows on the (writing) process. Students can also be enabled and encouraged to produce written or spoken language spontaneously to ensure equity in assessment, so that they learn and retain the Kulturtechniken necessary for producing meaningful and creative texts.

This blog post is an excerpt from the manuscript for Schulze, Mathias (2025). The impact of artificial intelligence (AI) on CALL pedagogies. In Lee McCallum & Dara Tafazoli (eds) The Palgrave Encyclopedia of Computer-Assisted Language Learning. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-51447-0_7-1. 
In 2024, I wrote this encyclopedia entry as my first attempt of gaining a better understanding of what was going on after GenAI burst into Language Education.

When assessing students’ written or spoken language proficiency, the three components of proficiency – complexity, accuracy, fluency – need to be assessed in a balanced way. For example, an increased textual complexity – a diverse range of vocabulary and grammatical constructions, including some or many sophisticated items, such as less frequently used words and constructions and longer sentences – is often an indicator of the student’s successful second-language development. In this context, appropriate task or activity design – instructing the learner to use the newly learned vocabulary and grammatical constructions – will make the task for the student clearer and will make it less likely that inappropriate tools can or will be used.

In general, more room should be given to approaches in assessment that do not focus on one final product but on accompanying the learning process. This way GenAI tools can be used during an appropriate phase of the activity, e.g., brainstorming ideas or looking up alternative constructions and synonyms. Here approaches such as Dynamic Assessment (Lantolf, 2009) and Integrated Performance Assessment (Adair-Hauck et al., 2006) help reduce the negative impact that GenAI tools can have on the viability and fairness of assessment procedures. This is particularly important in CALL, because the students are working in a digital environment already and access to GenAI tools is easy and thus tempting for many. The focus on the (learning) quality of the process rather than the product has the important advantage that it encourages creativity and risk-taking because the incremental assessment procedure has productive, non-threatening feedback and repair loops as its integral parts. This way creativity in writing can be rewarded and spelling, lexical, and grammatical mistakes do not have to be penalized heavily; some of the pressures of assessment are thus mitigated if not eliminated.

References

Adair-Hauck, B., Glisan, E. W., Koda, K., Swender, E. B., & Sandrock, P. (2006). The integrated performance assessment (IPA): Connecting assessment to instruction and learning. Foreign Language Annals, 39, 359–382.

Bowen, J. A., & Watson, C. E. (2024). Teaching with AI. A practical guide to a new era of human learning. John Hopkins University Press.

Fawzi, H. (2023). A Bleeding Edge or a Cutting Edge? A Systematic Review of ChatGPT and English as a Second and/or Foreign Language Learners’ Writing Abilities. In Conference Proceedings. WorldCALL 2023. CALL in Critical Times. (Chiang Mai, Thailand) (pp. 7-15). The International Academic Forum.

Lantolf, J. P. (2009). Dynamic assessment: The dialectic integration of instruction and assessment. Language Teaching, 42(3), 355-368. https://doi.org/https://doi.org/10.1017/S0261444808005569