Conversation and AI: Language in the wild?

From late 2022 to early 2025, several LLM-based chatbots were released, such as ChatGPT, Claude, Gemini, Qwen, Llama, and DeepSeek. All of them can generate conversational responses with remarkable linguistic accuracy and contextual appropriateness and at almost turn-taking speed. Such chatbots can be prompted to adjust the proficiency level of the output to the learner (but see Uchida (2025) for the challenges with adjusting for CEFR levels) and can stick to the desired topic and only use appropriate vocabulary. This way, these chatbots are far superior to those of early applications of AI and especially NLP in CALL (Schulze, 2025).

German chatbot. Generated by ChatGPT 5 as an illustration

Is a conversation with a chatbot the same as a human conversation? LLM-based chatbots have been compared to “stochastic parrots” (Bender et al., 2021), parrots that produce plausible utterances by chance. The computational linguists Bender et al. (2021) stated that

coherence is in fact in the eye of the beholder. Our human understanding of coherence derives from our ability to recognize interlocutors’ beliefs … and intentions … within context … That is, human language use takes place between individuals who share common ground and are mutually aware of that sharing (and its extent), who have communicative intents which they use language to convey, and who model each others’ mental states as they communicate (p. 616).

For language learners, this means that it is on them to control and steer the conversation with the machine. What they say – their prompt – determines what the machine says much more so than with a human interlocutor, who will reason about the student’s intention, which is ‘underneath’ the utterance. The machine will calculate the probability of subsequent word forms. The problem is compounded by the chatbot’s attempt to keep the conversation engaging with immediate questions, which makes it more difficult for the learner to steer the conversation. Similar challenges are to be considered in the other direction: the learner’s understanding of the generated text. Chatbots generate plausible texts that correspond to the student’s input because of highly sophisticated pattern matching processes; chatbots do not generate meaning. We, however, are used to decoding meaning from text. So, it is the learner imbuing the text with meaning when reading the machine output. Students are using their world knowledge, their linguistic capital, and their contextual awareness for interpreting the machine’s output. This is normally based on the human assumption that the chatbot “understood” the prompt. However, machines do not understand text in the way humans do – Hariri (2024) prefers the term alien intelligence for AI, because of their way of processing being radically different – they conduct a fast, sophisticated mathematical analysis and then generate a matching piece of text based on their pre-training. This interaction is akin to that with a calculator, which also cannot understand mathematics and can calculate accurately at great speed. Calculators are faster than humans and are more consistent without making mistakes, and so are LLM-based chatbots in both understanding and generating. So, chatbots have the advantage of speed, accuracy, consistency, and task focus – all without fatigue or stress. Yet, they are lacking in emotional intelligence, awareness of the situational context, and personal memory, for example, of similar conversations with the same person in the past.

Individual conversational practice with a GenAI chatbot is feasible and practical within the context of second-language development. Students are exposed to rich authentic language in the process (Schulze, 2025). However, a genuine negotiation of meaning does not take place, because both text understanding and meaning generation are only done by the learner. Clarification requests, confirmation checks, comprehension checks, paraphrasing, and repair – all part of a negotiation of meaning – can only come from the learner and, unless prompted specifically, do not come from the machine. Feedback on communicatively successful utterances, as we find it in the negotiation of meaning in human conversations, is as yet a challenge for GenAI. Its strengths are in being able to process learner input that contains one or more errors. Such errors often get corrected seamlessly in GenAI’s response or in the prompted correction.

This is part of a draft of an article I wrote with Phil Hubbard. In this paper, we are proposing a way in which teachers can organize their own professional development (PD) in the context of the rapid expansion of Generative AI. 
We call this PD sustained integrated PD (GenAI-SIPD). Sustained because it is continuous and respectful of the other responsibilities and commitments teachers have; integrated because the PD activities are an integral part of what teachers do anyway; the teacher retains control of the PD process.

The full article is available as open access:
Hubbard, Philip and Mathias Schulze (2025) AI and the future of language teaching – Motivating sustained integrated professional development (SIPD). International Journal of Computer Assisted Language Learning and Teaching 15.1., 1–17. DOI:10.4018/IJCALLT.378304 https://www.igi-global.com/gateway/article/full-text-html/378304

In many languages – the ones that are well represented on the internet such that they are a good basis for deep learning – GenAI chatbots can be “patient” and “focused” conversation partners for language learners. They can also go beyond simple turns in a written or spoken conversation and generate various text types for learners and teachers alike. Reading texts, quizzes, instructional sequences, lesson objectives, essays, letters, emails, and others can all be generated. This can create the illusion that a chatbot can be a language tutor.

References

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922

Harari, Y. N. (2024). Nexus: A brief history of information networks from the Stone Age to AI. Random House.

Schulze, M. (2025). ICALL and AI: Seven lessons from seventy years. In Y. Wang, A. Alm, & G. Dizon (Eds.), Insights into AI and language teaching and learning. (pp. 11–31) Castledown Publishers. https://doi.org/10.29140/9781763711600-02

Uchida, S. (2025). Generative AI and CEFR levels: Evaluating the accuracy of text generation with ChatGPT-4o through textual features. Vocabulary Learning and Instruction, 14(1), 2078. https://doi.org/10.29140/vli.v14n1.2078