Language Learning and AI: 7 lessons from 70 years (#5)

5. Recording learner behavior and student modeling

The intelligent tutoring systems in ICALL had this knowledge stored in a student model (Schulze, 2012). Student modeling (e.g., Bull, 1993; Bull, 1994, 2000; Mabbott & Bull, 2004; McCalla, 1992; Michaud & McCoy, 2000; Schulze, 2008; Self, 1974; Tsiriga & Virvou, 2003) is a challenging endeavor; student data needs to be recorded and structured into a student profile, then inferences can be drawn to construct a student model over time. The model has structured information about prior learning, learner beliefs, strategies, and preferences, and language beliefs. Basically, it models the information teachers have about their students both through student records and the teacher’s experience. Such information helps to tailor instructional sequences, guidance and help, and corrective feedback individually so that it becomes relevant and most effective. GenAIs have LLMs which contain enormous information about language and languages (Wolfram, 2023, February 14); their knowledge of the learner is often non-existent or serendipitous at best. Currently and in the context of language education and especially in the context of previous research in ICALL and student modeling in general, the lack of a student model means that GenAIs cannot be treated nor employed as an intelligent tutoring system (ITS), because ITS consist of a knowledge base, a student model, and a pedagogical module (Wikipedia contributors, 2024, December 20) to imitate the behavior of a human tutor and provide individualized tutoring.

My inspiration for this title came from the book  
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025), 
Insights into AI and language teaching and learning.
Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.
Photo by Anna Tarazevich on Pexels.com
Thus far, I have given a historical introduction and talked about the necessary exposure to authentic language, communication in context, interaction in language learning with GenAI, and appropriate error correction and contingent feedback. The following describes the basis for lesson #5.

To be continued …

References

Bull, S. (1993). Towards User/System Collaboration in Developing a Student Model for Intelligent Computer-Assisted Language Learning. Computer Assisted Language Learning, 8, 3-8.

Bull, S. (1994). Student modeling for second language acquisition. Computers and Education, 23(1-2), 13-20.

Bull, S. (2000). ‘Do It Yourself’ Student Models for Collaborative Student Modelling and Peer Interaction. In B. P. Goettl, H. M. Halff, C. Redfield Luckhardt, & V. J. Shute (Eds.), Intelligent Tutoring Systems. 4th International Conference, ITS ’98, San Antonio, Texas, USA,  August 16-19, 1998 Proceedings (pp. 176-185). Springer Verlag.

Mabbott, A., & Bull, S. (2004). Alternative Views on Knowledge: Presentation of Open Learner Models. In J. C. Lester, R. M. Vicari, & F. Paraguacu (Eds.), Intelligent Tutoring Systems: 7th International Conference (pp. 689-698). Springer-Verlag.

McCalla, G. I. (1992). The Centrality of Student Modelling to Intelligent Tutoring Systems. In E. Costa (Ed.), New Directions for Intelligent Tutoring Systems (pp. 107-131). Springer Verlag.

Michaud, L. N., & McCoy, K. F. (2000). Supporting Intelligent Tutoring in CALL by Modeling the User’s Grammar. In Proceedings of the Thirteenth Annual International Florida Artificial Intelligence Research Symposium, May 22-24, 2000, Orlando, Florida (pp. 50-54). AAAI Press.

Schulze, M. (2008). Modeling SLA Processes Using NLP. In C. Chapelle, Y.-R. Chung, & J. Xu (Eds.), Towards Adaptive CALL: Natural Language Processing for Diagnostic Assessment. (pp. 149-166). Iowa State University. https://apling.engl.iastate.edu/wp-content/uploads/sites/221/2015/05/5thTSLL2007_proceedings.pdf

Schulze, M. (2012). Learner modeling. In C. A. Chapelle (Ed.), The Encyclopaedia of Applied Linguistics. 10 volumes (pp. online n.p.). Wiley-Blackwell.

Self, J. A. (1974). Student Models in Computer-Aided Instruction. International Journal of Man-Machine Studies, 6, 261-276.

Tsiriga, V., & Virvou, M. (2003). Modelling the Student to Individualise Tutoring in a Web-Based ICALL. International Journal of Continuing Engineering Education and Life-Long Learning, 13(3-4), 350-365.

Wikipedia contributors. (2024, December 20). Intelligent tutoring system. In Wikipedia, The Free Encyclopedia.

Wolfram, S. (2023, February 14). What is ChatGPT doing … and why does it work? Stephen Wolfram Writings. https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work

Language Learning and AI: 7 lessons from 70 years (#4)

4. Appropriate error correction and contingent feedback

Rather than focusing on engaging the learner in communicative interaction, learning with ICALL systems was often based on the assumption that corrective feedback on learner language is of great importance. ICALL research particularly in the 1990s and early 2000s focused on corrective feedback and relied on the three steps of traditional error analysis: effective recognition, description, and explanation (Heift & Schulze, 2007, chapter 3: Error analysis and description). Error analysis (Corder, 1974) was the main approach in second-language acquisition research in the 1960s and 70s. It’s contributions to language education, applied linguistics, and ICALL are manifold and impacted language teaching to this day. Although error correction is still a part common teaching practices today, applied linguistics research has shifted the focus away from deficits in the learner’s language to operationalizing and encouraging their abilities (compare the National Council of State Supervisors for Languages (NCSSFL) and the American Council on the Teaching of Foreign Languages (ACTFL) Can-Do Statements, which were introduced in 2013). This has changed the perspective on corrective feedback in language education. More nuances were introduced and also ICALL began to look at providing help and guidance to learners through text augmentation by, for example, enriching a reading text with linked online glossaries and information on morphological paradigms (e.g., Amaral & Meurers, 2011; Wood, 2011). Text augmentation appears to be as yet underexplored in GenAI and language education research since late 2022.

My inspiration for this title came from the book  
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025), 
Insights into AI and language teaching and learning.
Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.
Photo by Ann H on Pexels.com
We are onto lesson 4. Part 0 gives a historical introduction. Lesson 1 focuses on the necessary exposure to authentic language and whether this can be done with GenAI. Lesson 2 looked at communication in context, which is central in language learning. We turned to the role of interaction in language learning with GenAI with lesson 3.

How does ICALL with its symbolic NLP compare with language feedback and guidance to GenAI with its LLMs and ANNs? The texts GenAI produces are mostly well-formed, especially if the text’s language is English or one of the other languages in which many texts on the internet are written (Schulze, in press). So, how suitable would a GenAI be for appropriate error correction and contingent feedback? In an ICALL system, a fragment of the grammar of the learnt language would be described with rules and items, using a formal grammar, in the expert model and parser. This computational grammar could ‘understand’ the linguistically well-formed words, phrases, and sentences, which were covered by the rules of the expert model. To be able to parse student errors, the expert model needed to be adapted. Errors were captured in an error grammar – the buggy rules that were parallel to the rules that covered error-free linguistic units – or in relaxed constraints (Dini & Malnati, 1993). An example of a buggy rule and its error-free counterpart in German is (in pseudo-code for legibility):

default rule(subject-verb agreement) := 
        if subject(NUMBER) = verb(NUMBER)
        and
        subject(PERSON) = verb(PERSON)
        then 
        parse successfully and move on
        else 
        buggy rule(subject-verb agreement)

buggy rule(subject-verb agreement)  := 
       if subject(NUMBER_S) <> verb(NUMBER_V) 
       then 
       give feedback("The subject is in",
          [subject(NUMBER_S)],".
          You need to choose a verb ending that indicates
          [subject(NUMBER_S)],", too. The verb in your 
          sentence is in", [verb(NUMBER_V)]) 
       else next
       then 
       if subject(PERSON_S)  <> verb(PERSON_V)
       then 
       give feedback("The subject is in", [subject(PERSON_S)],
       ". You need to choose a verb ending that indicates
       [subject(PERSON_S)],", too. The verb in your sentence 
       is in", [verb(PERSON_V)])

Buggy rules required a high level of error anticipation, because to cover an error, a particular buggy rule needed to be written. Since buggy rules are deterministic, if they were sufficient to parse the student input, they were robust in the feedback they provided. Relaxed constraints reached a slightly wider coverage and required less error anticipation, because the constraint that, for example, the subject and finite verb of German sentence needed to agree in number and person has been relaxed. This means that whether or not subject and verb agree the sentence is parsed successfully with one less constraint rule: 

relaxed rule(subject-verb agreement) := 
         subject(NUMBER_S) and verb(NUMBER_V) 
         and subject(PERSON_S) and verb(PERSON_V)
         if NUMBER_S = NUMBER_V
         then next
         else 
         give feedback("The subject is in",
             [subject(NUMBER_S)],". You need to choose 
             a verb ending that indicates 
             [subject(NUMBER_S)],", too. The verb in your 
             sentence is in",[verb(NUMBER_V)])
         then 
         if PERSON_S = PERSON_V
         then next
         else 
         give feedback("The subject is in",
             [subject(PERSON_S)],". 
             You need to choose a verb ending that indicates 
             [subject(PERSON_S)],", too. The verb in 
             your sentence is in", [verb(PERSON_V)])
         then parse successfully and move on

This pseudo-code illustration shows how labor-intensive the coding of symbolic NLP for ICALL with its focus on error correction and feedback was. The lack of coverage of the computational lexica and grammars and the additional parsing challenges introduced with including parser coverage of errors learners make meant that even the few ICALL systems that were used by students had limited coverage (e.g., Heift, 2010; Nagata, 2002).

Coverage is not a problem for GenAI, as we saw above. However, LLMs and multidimensional ANNs were not intended to provide corrective feedback to language learners. Their error correction of GenAI can be illustrated best with the automatic correction of spelling errors. The prompt “Tell me please what the capitel of germany is.” with its two spelling errors yields the following result: “The capital of Germany is **Berlin**  …” (Microsoft Copilot, 2025, January 17, my emphasis) For languages with LLMs, the automatic error correction in the natural language understanding is accurate and comprehensive, as can be seen from the answer in the example. However, the feedback on such errors, only given when specifically requested, is all too often flawed in parts or incomplete. Stated in brief, GenAIs are good at error correction and are limited in providing appropriate corrective feedback. Many teachers, and language learners, have at least anecdotal evidence that metalinguistic explanations are not suitable for language learners and that errors are often underreported or over-flagged. This is understandable if one considers that GenAIs are working with probabilistic patterns in the LLM for their error correction, diagnosis, and (metalinguistic) feedback. This works often for the correction, but is shaky at best for diagnosis and feedback. The computational linguist and ICALL researcher Detmar Meuers  (2024) argued in this context that assuming a GenAI is a suitable language teacher is worse than asking a speaker of that language to start teaching systematic language classes. His argument was also based on the fact the a GenAI has no ‘knowledge’ of the prior learning history, language abilities and beliefs, and the general profile of the learner.

To be continued …

References

Amaral, L., & Meurers, W. D. (2011). On using Intelligent Computer-Assisted Language Learning in real-life foreign language teaching and learning. ReCALL, 23(1), 4-24.

Corder, P. (1974). Error Analysis. In J. P. B. Allen & P. Corder (Eds.), The Edinburgh Course in Applied Linguistics. Volume 3 – Techniques in Applied Linguistics (pp. 122-131). Oxford University Press.

Dini, L., & Malnati, G. (1993). Weak Constraints and Preference Rules. In P. Bennett & P. Paggio (Eds.), Preference in Eurotra (pp. 75-90). Commission of the European Communities.

Heift, T. (2010). Developing an Intelligent Tutor. CALICO JOURNAL, 27(3), 443-459.

Heift, T., & Schulze, M. (2007). Errors and Intelligence in CALL. Parsers and Pedagogues. Routledge.

Meurers, D. (2024). #3×07 – Intelligente Tutorielle Systeme (mit Prof. Dr. Detmar Meurers) In Auftrag:Aufbruch. Der Podcast des Forum Bildung Digitalisierung. https://auftrag-aufbruch.podigee.io/30-intelligente-tutorielle-systeme-mit-detmar-meurers

Microsoft Copilot. (2025, January 17). Tell me please what the capitel of germany is. Microsoft Copilot.

Nagata, N. (2002). BANZAI: An Application of Natural Language Processing to Web-Based Language Learning. CALICO, 19(3), 583-599.

Schulze, M. (2025). The impact of artificial intelligence (AI) on CALL pedagogies. In Lee McCallum & Dara Tafazoli (eds) The Palgrave Encyclopedia of Computer-Assisted Language Learning. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-51447-0_7-1.

Wood, P. (2011). Computer assisted reading in German as a foreign language. Developing and testing an NLP-based application. CALICO JOURNAL, 28(3), 662-676.

Language Learning and AI: 7 lessons from 70 years (#1)

1. Exposure to rich, authentic language

The texts – or the language – that a computer can understand or generate depend on its capacity for NLP. Computer scientists added the adjective ‘natural’ because the parsing of programming language(s) was possible, and necessary, before they turned to parsing texts produced by humans. In early NLP, computational linguists wrote grammatical rules and compatible dictionaries in programming languages such as Prolog and LISP. Rules and items were written by hand, relying on different (mathematical) grammar formalisms. This made the development process slow, error-prone, computationally expensive, and labor-intensive. This might be the main reason why the coverage and robustness of ICALL systems and applications remained limited over the years. Parsing a single sentence – the analysis of the grammatical constructions and the production of an equivalent information structure, something similar to a syntactic tree, which the computer could “understand” – took from a couple of seconds to a few minutes, depending on the computer hardware and the efficiency of the parsing algorithm. This approach to NLP is called symbolic, because it uses and processes symbols for syntactic phrases, such as NP for a noun phrase and VP for a verb phrase, and for lexical items, such as N for a noun and V for a verb, and their grammatical feature structures. Symbolic NLP in ICALL resulted in sentence-based language learning activities in a tutorial system. Since the dictionary was also hand-written and hence usually small, the language to which students using the ICALL system were exposed was limited to the vocabulary of a textbook at best.

Photo by Alex P on Pexels.com
The part #0 gave a general introduction. Here we have the first section on what language learners should expect of generative-AI tools. Other parts will follow.
My inspiration for this title came from the book  
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025), 
Insights into AI and language teaching and learning.
Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

In the 1990s, more electronic corpora (large, principled collections of texts) also in languages other than English became available. The approach to NLP that relies on the mathematical analysis of large corpora has been called statistical NLP. In this approach, language patterns are detected in corpus analyses. For these patterns or contiguous sequences of words – called n-grams with n being the number of words in each and every sequence – the probability of one word following the other(s) is calculated. In their simplest form, the probabilistic connection of linear word sequences is calculated. This results in a wider coverage of language, because of the underlying use of large corpora. However, the limitation was that, for example, long-distance dependencies as in the following sentence still posed a problem as they had in symbolic NLP.

The student who had finally given the right answer proceeded to ask the next question.

For any human reader, it is immediately clear that it is ‘the student’ who ‘proceeded’ and not ‘the right answer’. For the computer, this connection between grammatical subject and finite verb poses a challenge because the words are not in the same n-gram(s) and the pattern cannot be detected easily.

This and other challenges were overcome by relying on artificial neural networks (ANNs), models that are inspired by the neural networks of human brains (for a comprehensive overview of how GPTs (generalized pre-trained transformers) work, see Wolfram, 2023, February 14). ANNs are multidimensional and do not only rely on linear sequences of words. Their individual nodes, the neurons, receive input and send output to other neurons. The processing of input to produce output is basically done through a mathematical equation. Thus, this output depends on the (probabilistically) weighted input. The network learns by adjusting the weights, which multiply different input values and biases, the latter are added independently of the input, to improve the accuracy of the result. If this machine learning relies on neurons organized in multiple layers – the input layer, the output layer, and in-between two or more hidden layers, then we talk about a deep network and deep learning (LeCun et al., 2015). “GPT-3 has 96 layers. GPT-4’s exact number of layers hasn’t been publicly disclosed, but it is expected to be significantly larger than GPT-3” (Microsoft Copilot, 2024, December 26). Deep learning and ANNs are the underpinnings of the large language models (LLMs) (for an accessible overview of ANNs and LLMs, see Naveed et al., 2024), which in turn are the backbone of GenAI chatbots such as ChatGPT (OpenAI), Claude (Anthropic), Copilot (Microsoft), and Gemini (Google). Thus, LLMs essentially rely on enormous corpora of texts scraped from the internet and on machine-learned neural networks. In these ANNs, individual tokens – which can be individual letters, words, and parts of a word – are represented by long lists of numbers, which are called word vectors. The parameters in the network, which are tiny little rules and steps, help to determine which word follows the previous word. “GPT-3 has 175 billion parameters, which include the weights and biases of the neurons. GPT-4 is speculated to have trillions of parameters, though the exact number hasn’t been confirmed” (Microsoft Copilot, 2024, December 26).

This new computational approach is far removed from the reliance on linguistic rules and items in early NLP and ICALL, because it is steeped in the complex calculations in the hidden layers of the LLM and arrays upon arrays of numbers. That’s why GenAI’s coverage, scope, and speed of NLP is vastly superior to previous systems in ICALL. Therefore, we can argue that students using GenAI are exposed to rich language at the paragraph and not only the sentence level. But is this generated language authentic? In an early paper on authenticity in the language classroom, Breen (1985) proposes that “that authentic texts for language learning are any sources of data which will serve as a means to help the learner to develop an authentic interpretation” (p. 68). The question then becomes: can a learner develop an authentic interpretation of a turn or text generated by a GenAI chatbot or a translation rendered by a GenAI machine translation tool? Since the generated texts are certainly well-formed and plausible, they appear to provide a good basis for the learner’s interpretation and thus for language learning. Also, because they are based on actual language use as found in the texts on the internet, which were used to train the LLM, we have another indication that generated texts in chat with a GenAI or a translation from a GenAI potentially qualify as authentic. However, the real key to authenticity of language is found in communication.

References

Breen, M. P. (1985). Authenticity in the Language Classroom. Applied Linguistics, 6(1), 60-70. https://doi.org/10.1093/applin/6.1.60

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521, 436–444.

Microsoft Copilot. (2024, December 26). How many nodes and layers does the ANN of the GPT large language model have? Microsoft Copilot.

Naveed, H., Khan, A. U., Qiub, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2024). A Comprehensive Overview of Large Language Models. https://dx.doi.org/10.48550/arxiv.2307.06435 http://arxiv.org/pdf/2307.06435

Wolfram, S. (2023, February 14). What is ChatGPT doing … and why does it work? Stephen Wolfram Writings. https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work


… to be continued …