Education and AI: Tool versus tutor

Of course, a language teacher is more than a benevolent conversation partner. In AI, an intelligent tutoring system (ITS) would be more akin to a language teacher than a chatbot would. An ITS consists of three interacting components (see Heift & Schulze, 2007):

  1. The expert model, which captures the domain knowledge or the information that students should learn;
  2. The tutor model, which makes decisions about the instructional sequences and steps as well as appropriate feedback and guidance for the group as a whole and for individual students;
  3. The student model, which records and structures information about the learning progress and instruction received, domain beliefs and acquired information, as well as the learning preferences and styles of each student.
This is part of a draft of an article I wrote with Phil Hubbard. In this paper, we are proposing a way in which teachers can organize their own professional development (PD) in the context of the rapid expansion of Generative AI. 
We call this PD sustained integrated PD (GenAI-SIPD). Sustained because it is continuous and respectful of the other responsibilities and commitments teachers have; integrated because the PD activities are an integral part of what teachers do anyway; the teacher retains control of the PD process.

The full article is available as open access:
Hubbard, Philip and Mathias Schulze (2025) AI and the future of language teaching – Motivating sustained integrated professional development (SIPD). International Journal of Computer Assisted Language Learning and Teaching 15.1., 1–17. DOI:10.4018/IJCALLT.378304 https://www.igi-global.com/gateway/article/full-text-html/378304

Only if the sole learning objective is conversational ability, can one assume that the LLM has elements of an expert model. The other two models, however, cannot be mimicked by a GenAI tool. Consequently, teachers still have to teach – determine instructional sequences, time appropriate feedback, remember and work with an individual student’s strengths and weaknesses – also when using GenAI tools in various phases of the learning process. GenAI tools can provide multiple ideas for engaging learning activities, texts for reading with a ready-made glossary, or drafts of an entire unit or lesson plan. However, it is the teacher who must understand, select, adapt, and implement them. The entire teaching process and its success are still the responsibility of the teacher.

Grammar teaching in antiquity
Grammar teaching in Ancient Rome (generated by ChatGPT 5.1)

In an educational institution, teachers can meet this responsibility because learners normally trust their expert knowledge, because teachers have been trained, certified, and frequently evaluated. The same is not (yet) true of GenAI tools. They have been trained through machine learning, but their semantic accuracy and pragmatic appropriateness have often been found lacking. The generated text is plausible, but not necessarily factually correct or complete. This way, GenAI output is an insufficient basis for successful learning. This becomes apparent not only when one tries out a GenAI tool in the area of one’s own expertise, but also when one looks back on what teachers have said about the various levels of trustworthiness of internet texts, which also formed the basis for the machine learning for LLMs, for the last thirty years: sources have to be checked and validated. In machine learning for LLMs, the texts and sources are not checked nor validated. This can impact the content accuracy of LLM output. Of course, learners cannot be expected to check the accuracy of information they are only about to learn; believing the truth value of the information is a prerequisite for learning. Critical analysis and questioning the information learnt is always a second step. Also, first studies have emerged that show that GenAI can create the illusion of knowing and thus of learning (Mollick, 2024); consequently, chatbots are not always a tool for successful learning.

The main thing to remember is: these GenAI chatbots are a tool and not a tutor – more like a hammer than an artisan, more like a dictionary than an interpreter, and more like an answering machine (remember those?) than a teacher.

References

Heift, Trude and Mathias Schulze (2007). Errors and intelligence in CALL: Parsers and pedagogues. Routledge.

Mollick, E. (2024). Post-apocalyptic education: What comes after the homework apocalypse. https://www.oneusefulthing.org/p/post-apocalyptic-education

Language and AI: A mathematical equation

The 70 years of AI (see McCarthy et al. (1955)) have seen an intertwining of language and computing. At first, computers, as the name says, were meant for computation, for the fast calculation of a few complex equations or many simple ones. It was later that calculations were done with texts as input. Famously, first successful computations of and with letters were done at the Government Code and Cypher School at Bletchley Park to break the German Enigma cipher as part of the British effort in World War II. After the mathematician Alan Turing and his colleagues deciphered messages by the German Luftwaffe and navy successfully, he proposed that these new machines could also be used for language (Turing (1948) quoted in Hutchins, 1986, pp. 26-27). The Turing test (Turing, 1950) stipulated that a calculating machine, a computer, could show intelligence if a human interlocutor on one side of a screen could not tell whether they had a conversation with another human or a machine on the other side of the screen. ChatGPT passed this test successfully in 2024 (Jones & Bergen, 2024).

Mathematical equations. Generated by ChatGPT 5 as an illustration

With the beginning of the Cold War, machine translation seemed to hold a lot of promise. Researchers’ predictions of success were based – at least in part – on the idea that translating from Russian into English is just like deciphering an encrypted message; letters have to be exchanged for other letters according to certain patterns in a deterministic mathematical process. Of course, this did not do justice to the complexities of language, communication, and translation. So, the then nascent field of natural language processing (NLP) turned to grammatical rules of formal (mathematical) grammar and items, the words in electronic dictionaries. The computer would “understand” a text by parsing it phrase by phrase to build an information structure similar to a syntactic tree, using grammatical rules. Such rules and the list of items with their linguistic features had to be hand-crafted. Therefore, the coverage of most NLP systems was limited. In the 1990s, researchers began to move away from symbolic NLP, which used linguistic symbols and rules and applied set theory, a form of mathematical logic, on to statistical NLP. Statistical NLP meant that language patterns were captured with calculating probabilities. The probability of one word (form) following some others is calculated for each word in a large principled collection of texts, which is called a corpus. In the 1990s and 2000s more and more corpora in more and more languages became available.

This is part of a draft of an article I wrote with Phil Hubbard. In this paper, we are proposing a way in which teachers can organize their own professional development (PD) in the context of the rapid expansion of Generative AI. 
We call this PD sustained integrated PD (GenAI-SIPD). Sustained because it is continuous and respectful of the other responsibilities and commitments teachers have; integrated because the PD activities are an integral part of what teachers do anyway; the teacher retains control of the PD process.

The full article is available as open access:
Hubbard, Philip and Mathias Schulze (2025) AI and the future of language teaching – Motivating sustained integrated professional development (SIPD). International Journal of Computer Assisted Language Learning and Teaching 15.1., 1–17. DOI:10.4018/IJCALLT.378304 https://www.igi-global.com/gateway/article/full-text-html/378304

In the 1990s, progress in capturing such probabilities was made because of the use of machine learning. Corpora could be used for machines to “learn” what the probability of certain word sequences is. This machine learning is based on statistics and mathematical optimization. In NLP, the probability of the next word in a text is calculated, and in training, that result is compared to the word that actually occurred in the text next. In case of an error, the equation used gets tweaked and the calculation process starts anew. The sequences of words are called n-grams.

The resulting n-gram models were replaced in the mid-2010s with artificial neural networks, resulting in the first generative pre-trained transformer (GPT) – GPT-1 – in 2018. This marks the beginning of GenAI as we know it today. GPTs are large language models (LLMs) from OpenAI. Today, an LLM is pre-trained using deep learning, which is a more complex subset of machine learning. The pre-training means that when processing the text prompt, each artificial neuron in the network of the LLM receives input from multiple neurons in the previous layer and carries out calculations and passes the result to neurons in the next layer. GPT-4, for example, processes text in 120 layers. The first layer converts the input words, or tokens, into vectors with 12,288 dimensions. The number in each of the 12,288 dimensions encodes syntactic, semantic, or contextual information. Through these calculations, the model provides a finer and finer linguistic analysis at each subsequent layer.

The enormous number of calculations – an estimated 7.5 million calculations for a sentence with five words – results in plausible text output and consumes a lot of electric power. The latter is the main cause for the environmental impact of GenAI. The former is the main factor in the attractiveness of GenAI not only in language education but also in industry and increasingly in society at large.

References

Hutchins, J. (1986). Machine translation: Past, present, and future. Ellis Horwood.

Jones, C. R., & Bergen, B. K. (2024). Does GPT-4 pass the Turing test? arXiv. https://doi.org/10.48550/arXiv.2310.20216

McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (1955). A proposal for the Dartmouth Summer Research Project on Artificial Intelligence. http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html

Turing, A. M. (1948). Intelligent Machinery (Report for the National Physical Laboratory). Reprinted in D. C. Ince (Ed.), Mechanical Intelligence: Collected Works of A. M. Turing (pp. 107–127). Amsterdam: North‐Holland. 

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433

Language Learning and AI: 7 lessons from 70 years (conclusion)

Seven Lessons

There has always been some interaction between AI and language and learning for the last 70 years. In computer-assisted language learning (CALL), people have worked on applying AI – and they called it ICALL – for almost 50 years. For GenAI, what can we learn from these efforts of working with good old-fashioned AI for such a long time?

Photo by Julia M Cameron on Pexels.com
My inspiration for this title came from the book  
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025), 
Insights into AI and language teaching and learning.
Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

In conclusion, we will recapitulate and condense the seven lessons that we can learn from ‘good old-fashioned AI’ and ICALL with its declarative knowledge, engineered algorithms, and symbolic NLP and see how they can be applied to GenAI with its machine-learnt complex artificial neural networks.

  1. Exposure to rich, authentic language
    GenAI is capable of providing ample exposure to rich language just in time, on the right topic, and at the right level. Generated texts consist of mostly accurate language forms and are plausible, so that they lend themselves to an interpretation in context by the students. This gives such a text an authentic feel. Here GenAI compares very well to the limited linguistic scope of ICALL systems.
  2. Communication in context
    GenAI, also because of the comprehensive coverage of the LLMs, can sustain conversations with learners on different topics. Its natural language understanding is such that it can take into consideration prior textual context, making any conversation more natural. This was impossible with ICALL systems and chatbots of the past. However, teachers and students need to be aware that they are communicating with a machine, a stochastic parrot (Bender et al., 2021). This requires informed reflection on a new form of communication and learning, to avoid the anthropomorphizing of machine and its output.
  3. Appropriate error correction and contingent feedback
    This is the area where we can learn most from ICALL and tutorial CALL. Especially with giving metalinguistic feedback, GenAI has too many shortcomings. Researchers need to explore how the automatic error correction, which happens frequently, impacts aspects of language learning such as noticing.
  4. Varied interaction in language learning tasks
    This is the area where we have many new opportunities to explore, although we can take inspiration particularly from projects in ICALL and game-base language learning. GenAI is most suitable as a partner in conversation and learning.
  5. Recording learner behavior and student modeling
    Student modeling has a long tradition – not just in ICALL – in AI and education. GenAI tools by themselves are that – tools and not tutors. They can be embedded in other learning systems, but they cannot be used as virtual tutors, because their information about learners and the learning context are serendipitous at best.
  6. Dynamic individualization
    GenAI provides teachers and students with an individual experience with generated texts of high quality.  The adaptive instruction (Schulze et al., 2025 in press), however, which has been an ambition of ICALL research, has not yet been achieved. Broader research and development in AI, beyond GenAI, is still necessary to achieve dynamic individualization in what can truly be termed ICALL.
  7. Gradual release of responsibility
    Since the instructional sequences, pedagogical approaches, and teaching methods are not present in GenAI, teachers need to design the use of GenAI as one of the tools in the learning process carefully. Teachers must not render the control of curricular and pedagogical decisions about activity design, learning goals, lesson contents, and learning materials to the machine.

GenAI, due to its powerful LLMs, has lifted AI in language education to a new quality. Such a disruptive technology shows great promise, provides many additional opportunities, and poses some challenges for teachers, students, and researchers alike.