Language Learning and AI: 7 lessons from 70 years (#4)

4. Appropriate error correction and contingent feedback

Rather than focusing on engaging the learner in communicative interaction, learning with ICALL systems was often based on the assumption that corrective feedback on learner language is of great importance. ICALL research particularly in the 1990s and early 2000s focused on corrective feedback and relied on the three steps of traditional error analysis: effective recognition, description, and explanation (Heift & Schulze, 2007, chapter 3: Error analysis and description). Error analysis (Corder, 1974) was the main approach in second-language acquisition research in the 1960s and 70s. It’s contributions to language education, applied linguistics, and ICALL are manifold and impacted language teaching to this day. Although error correction is still a part common teaching practices today, applied linguistics research has shifted the focus away from deficits in the learner’s language to operationalizing and encouraging their abilities (compare the National Council of State Supervisors for Languages (NCSSFL) and the American Council on the Teaching of Foreign Languages (ACTFL) Can-Do Statements, which were introduced in 2013). This has changed the perspective on corrective feedback in language education. More nuances were introduced and also ICALL began to look at providing help and guidance to learners through text augmentation by, for example, enriching a reading text with linked online glossaries and information on morphological paradigms (e.g., Amaral & Meurers, 2011; Wood, 2011). Text augmentation appears to be as yet underexplored in GenAI and language education research since late 2022.

My inspiration for this title came from the book  
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025), 
Insights into AI and language teaching and learning.
Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.
Photo by Ann H on Pexels.com
We are onto lesson 4. Part 0 gives a historical introduction. Lesson 1 focuses on the necessary exposure to authentic language and whether this can be done with GenAI. Lesson 2 looked at communication in context, which is central in language learning. We turned to the role of interaction in language learning with GenAI with lesson 3.

How does ICALL with its symbolic NLP compare with language feedback and guidance to GenAI with its LLMs and ANNs? The texts GenAI produces are mostly well-formed, especially if the text’s language is English or one of the other languages in which many texts on the internet are written (Schulze, in press). So, how suitable would a GenAI be for appropriate error correction and contingent feedback? In an ICALL system, a fragment of the grammar of the learnt language would be described with rules and items, using a formal grammar, in the expert model and parser. This computational grammar could ‘understand’ the linguistically well-formed words, phrases, and sentences, which were covered by the rules of the expert model. To be able to parse student errors, the expert model needed to be adapted. Errors were captured in an error grammar – the buggy rules that were parallel to the rules that covered error-free linguistic units – or in relaxed constraints (Dini & Malnati, 1993). An example of a buggy rule and its error-free counterpart in German is (in pseudo-code for legibility):

default rule(subject-verb agreement) := 
        if subject(NUMBER) = verb(NUMBER)
        and
        subject(PERSON) = verb(PERSON)
        then 
        parse successfully and move on
        else 
        buggy rule(subject-verb agreement)

buggy rule(subject-verb agreement)  := 
       if subject(NUMBER_S) <> verb(NUMBER_V) 
       then 
       give feedback("The subject is in",
          [subject(NUMBER_S)],".
          You need to choose a verb ending that indicates
          [subject(NUMBER_S)],", too. The verb in your 
          sentence is in", [verb(NUMBER_V)]) 
       else next
       then 
       if subject(PERSON_S)  <> verb(PERSON_V)
       then 
       give feedback("The subject is in", [subject(PERSON_S)],
       ". You need to choose a verb ending that indicates
       [subject(PERSON_S)],", too. The verb in your sentence 
       is in", [verb(PERSON_V)])

Buggy rules required a high level of error anticipation, because to cover an error, a particular buggy rule needed to be written. Since buggy rules are deterministic, if they were sufficient to parse the student input, they were robust in the feedback they provided. Relaxed constraints reached a slightly wider coverage and required less error anticipation, because the constraint that, for example, the subject and finite verb of German sentence needed to agree in number and person has been relaxed. This means that whether or not subject and verb agree the sentence is parsed successfully with one less constraint rule: 

relaxed rule(subject-verb agreement) := 
         subject(NUMBER_S) and verb(NUMBER_V) 
         and subject(PERSON_S) and verb(PERSON_V)
         if NUMBER_S = NUMBER_V
         then next
         else 
         give feedback("The subject is in",
             [subject(NUMBER_S)],". You need to choose 
             a verb ending that indicates 
             [subject(NUMBER_S)],", too. The verb in your 
             sentence is in",[verb(NUMBER_V)])
         then 
         if PERSON_S = PERSON_V
         then next
         else 
         give feedback("The subject is in",
             [subject(PERSON_S)],". 
             You need to choose a verb ending that indicates 
             [subject(PERSON_S)],", too. The verb in 
             your sentence is in", [verb(PERSON_V)])
         then parse successfully and move on

This pseudo-code illustration shows how labor-intensive the coding of symbolic NLP for ICALL with its focus on error correction and feedback was. The lack of coverage of the computational lexica and grammars and the additional parsing challenges introduced with including parser coverage of errors learners make meant that even the few ICALL systems that were used by students had limited coverage (e.g., Heift, 2010; Nagata, 2002).

Coverage is not a problem for GenAI, as we saw above. However, LLMs and multidimensional ANNs were not intended to provide corrective feedback to language learners. Their error correction of GenAI can be illustrated best with the automatic correction of spelling errors. The prompt “Tell me please what the capitel of germany is.” with its two spelling errors yields the following result: “The capital of Germany is **Berlin**  …” (Microsoft Copilot, 2025, January 17, my emphasis) For languages with LLMs, the automatic error correction in the natural language understanding is accurate and comprehensive, as can be seen from the answer in the example. However, the feedback on such errors, only given when specifically requested, is all too often flawed in parts or incomplete. Stated in brief, GenAIs are good at error correction and are limited in providing appropriate corrective feedback. Many teachers, and language learners, have at least anecdotal evidence that metalinguistic explanations are not suitable for language learners and that errors are often underreported or over-flagged. This is understandable if one considers that GenAIs are working with probabilistic patterns in the LLM for their error correction, diagnosis, and (metalinguistic) feedback. This works often for the correction, but is shaky at best for diagnosis and feedback. The computational linguist and ICALL researcher Detmar Meuers  (2024) argued in this context that assuming a GenAI is a suitable language teacher is worse than asking a speaker of that language to start teaching systematic language classes. His argument was also based on the fact the a GenAI has no ‘knowledge’ of the prior learning history, language abilities and beliefs, and the general profile of the learner.

To be continued …

References

Amaral, L., & Meurers, W. D. (2011). On using Intelligent Computer-Assisted Language Learning in real-life foreign language teaching and learning. ReCALL, 23(1), 4-24.

Corder, P. (1974). Error Analysis. In J. P. B. Allen & P. Corder (Eds.), The Edinburgh Course in Applied Linguistics. Volume 3 – Techniques in Applied Linguistics (pp. 122-131). Oxford University Press.

Dini, L., & Malnati, G. (1993). Weak Constraints and Preference Rules. In P. Bennett & P. Paggio (Eds.), Preference in Eurotra (pp. 75-90). Commission of the European Communities.

Heift, T. (2010). Developing an Intelligent Tutor. CALICO JOURNAL, 27(3), 443-459.

Heift, T., & Schulze, M. (2007). Errors and Intelligence in CALL. Parsers and Pedagogues. Routledge.

Meurers, D. (2024). #3×07 – Intelligente Tutorielle Systeme (mit Prof. Dr. Detmar Meurers) In Auftrag:Aufbruch. Der Podcast des Forum Bildung Digitalisierung. https://auftrag-aufbruch.podigee.io/30-intelligente-tutorielle-systeme-mit-detmar-meurers

Microsoft Copilot. (2025, January 17). Tell me please what the capitel of germany is. Microsoft Copilot.

Nagata, N. (2002). BANZAI: An Application of Natural Language Processing to Web-Based Language Learning. CALICO, 19(3), 583-599.

Schulze, M. (2025). The impact of artificial intelligence (AI) on CALL pedagogies. In Lee McCallum & Dara Tafazoli (eds) The Palgrave Encyclopedia of Computer-Assisted Language Learning. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-51447-0_7-1.

Wood, P. (2011). Computer assisted reading in German as a foreign language. Developing and testing an NLP-based application. CALICO JOURNAL, 28(3), 662-676.

Language Learning and AI: 7 lessons from 70 years (#3)

3. Varied interaction in language-learning tasks

The human-machine conversation often works because we are used to adhere – even if the machine cannot and is not — to Grice’s four maxims of conversation (Grice, 1975): quantity (be informative), quality (be truthful), relation (be relevant), and manner (be clear). Interaction in dialog works because readers look at the mathematically compiled output of the GenAI and assume that it is informative, truthful, and relevant. Due to its generation of linguistically accurate and plausible text, the GenAI appears to be clear. Communicative interaction proceeds successfully as long as the human reader does not detect that the machine output is not truthful or factually accurate because of, for example, hallucinations (Nananukul & Kejriwal, 2024) or errors or is not relevant because of misinterpreting an ambiguity in different contexts (e.g., when asked about bats, giving information about the mammal rather than the intended sports instrument).

Photo by Ksenia Chernaya on Pexels.com
And here is lesson #3 of a short series. Part 0 gives a historical introduction. Lesson #1 focuses on the necessary exposure to authentic language and whether this can be done with GenAI. Lesson #2 looked at communication in context, which is central in language learning. And now we are turning to the role of interaction in language learning with GenAI.

Besides these hurdles, GenAIs have become interesting verbal interactants in language education. On the other hand, ICALL systems, mainly due to their limited language coverage (see above), have provided limited interaction. Systems with or without AI worked with branching trees and canned text, for example in Quandary, a software of the Hot Potatoes suite (Arneil & Holmes, n.d.), which does not have NLP built in. Other systems were more like Chatbots whose conversation was limited to one topic or topic area (e.g., Underwood, 1982). Such CALL chatbots were inspired by Weizenbaum’s Eliza (for his reflection see Weizenbaum (1976)) and SHRDLU (Winograd, 1971) and often relied on regular expressions (Computer Science Field Guide, n.d.) and keyword searches. More sophisticated NLP was employed in the interactive games Spion (Sanders & Sanders, 1995) and Kommissar (DeSmedt, 1995). These early examples of the direct interaction of a learner with a machine with some AI capabilities, especially a level of NLP, show that GenAI has opened a door to the possibility of many more complex and comprehensive verbal interactions and role plays in a variety of languages.

My inspiration for this title came from the book  
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025), 
Insights into AI and language teaching and learning.
Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

Of course, language learning tasks (see  Willis (1996) for an early introduction to the now commonly applied Task-based Language Teaching) are not only rooted in conversations and role plays. GenAI can also generate model answers for different task components or be employed for brainstorming first ideas in the pre-task steps, for example. This was impossible with the ICALL systems based on symbolic NLP and (limited) expert systems. A discussion of the affordances and challenges of this powerful generation of (partial) task outcomes and components both by the student or the teacher is beyond the confines of this chapter, but it is an area within the application of GenAI in language education that is in urgent need of discussion. This agentive collaboration in dialog, possible scaffolding, and student guidance can either support or hinder and even prevent learning.

To be continued …

References

Arneil, S., & Holmes, M. (n.d.). Quandary. Retrieved January 17 from https://hcmc.uvic.ca/project/quandary/

Computer Science Field Guide. (n.d.). Regular expressions – Formal Languages. Retrieved January 27 from https://www.csfieldguide.org.nz/en/chapters/formal-languages/regular-expressions/

DeSmedt, W. H. (1995). Herr Kommissar: An ICALL Conversation Simulator for Intermediate German. In V. M. Holland, J. D. Kaplan, & M. R. Sams (Eds.), Intelligent Language Tutors: Theory Shaping Technology (pp. 153-174). Lawrence Erlbaum Associates.

Grice, H. P. (1975). Logic and Conversation. In D. Cole & J. Morgan (Eds.), Syntax and Semantics: Speech Acts (pp. 41-58). Academic Press.

Nananukul, N., & Kejriwal, M. (2024). HALO: an ontology for representing and categorizing hallucinations in large language models. Proc. SPIE 13058, Disruptive Technologies in Information Sciences VIII, 130580B (6 June 2024),

Sanders, R. H., & Sanders, A. F. (1995). History of an AI Spy Game: Spion. In V. M. Holland, J. D. Kaplan, & M. R. Sams (Eds.), Intelligent Language Tutors: Theory Shaping Technology (pp. 141-151). Lawrence Erlbaum Associates.

Underwood, J. H. (1982). Simulated Conversation as CAI Strategy. Foreign Language Annals, 15, 209-212.

Weizenbaum, J. (1976). Computer Power and Human Reason: From Judgment To Calculation. W. H. Freeman.

Willis, J. R. (1996). A framework for task-based learning. Longman.

Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. https://hci.stanford.edu/winograd/shrdlu/AITR-235.pdf

Language Learning and AI: 7 lessons from 70 years (#2)

2. Communication in context

Oxford (1993) desired that “communicative competence must be the cornerstone of ICALL”  (p. 174), noting that many ICALL projects of her time did not meet that goal, although communication and by extension communicative language teaching have been central ideas in applied linguistics for decades. Canale and Swain (1980) transferred the concept of communicative competence by Dell Hymes – developed in opposition to the Chomskyan linguistic competence – from sociolinguistics to language learning. Hymes (1974) had introduced communicative competence with the mnemonic SPEAKING = Setting and Scene (time and place), Participants (speaker and audience), Ends (purpose and outcome), Act Sequence (progression of speech acts), Key (tone, manner), Instrumentalities (language modalities), Norms (social rules), and Genre (kind of speech act or text) (pp. 53–62). These different facets and components of communication go well beyond the idea of a generative grammar (Chomsky, 1957) and that of linguistic competence. In ICALL as in NLP generally, however, generative grammar and other formal grammars, which are sets of rules that rewrite strings using mathematical operations, are the backbone of a system. The partial disconnect between formal grammars, such as Head-driven Phrase Structure Grammar and Categorial Grammar, and communicative competence with its focus on meaning, situation, and context, as clearly illustrated also by Hymes’ mnemonic, meant that ICALL systems did hardly play a role in communicative language teaching.

3 computer laptops talking to one another
Image generated in WordPress.com
This is the third part of a short series. All parts are based on a manuscript that I wrote recently. Part 0 gives a historical introduction. Lesson 1 focuses on the necessary exposure to authentic language and whether this can be done with GenAI. And I mean exposure and not so-called comprehensible input.

The GenAI chatbots, however, have been said to be a suitable conversation partner (Baidoo-anu & Owusu Ansah, 2023) and learning buddy (https://www.khanmigo.ai/). The GenAI output in a number of languages is certainly well-formed and plausible; GenAI’s natural language understanding is fast and precise. But is a conversation with a chatbot the same as a human conversation? Is it a negotiation of meaning as understood in communicative language teaching? The NLP researcher Emily Bender and her colleagues compared GenAI chatbots to stochastic parrots, which skillfully aim but proceed by guesswork (see Merriam-Webster, n.d.), and argue that “coherence is in fact in the eye of the beholder. Our human understanding of coherence derives from our ability to recognize interlocutors’ beliefs … and intentions … within context … That is, human language use takes place between individuals who share common ground and are mutually aware of that sharing (and its extent), who have communicative intents which they use language to convey, and who model each others’ mental states as they communicate” (Bender et al., 2021, p. 616). In other words, the chatbot spits out forms that are plausible but that do not mean anything; the (student or teacher) reader imbues these hollow forms with meaning and thus anthropomorphizes the GenAI tool, by then reasoning about its ‘intention’ and basing their response on the result of the reasoning, as humans do in conversation. Computers, however, do not have or formulate intentions. Something was clicked, data was input, and a condition was met. This triggered a digital operation, and forms that are numbers to the computer and look like words to the human user of the device became visible or audible. We add meaning after the form of the text has been generated.

My inspiration for this title came from the book  
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025), 
Insights into AI and language teaching and learning.
Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

References

Baidoo-anu, D., & Owusu Ansah, L. (2023). Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning. Journal of AI, 7(1), 52-62.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610-623). https://doi.org/10.1145/3442188.3445922

Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1(1), 1-47.

Chomsky, N. (1957). Syntactic Structures. Mouton.

Hymes, D. H. (1974). Foundations in Sociolinguistics: An Ethnographic Approach. University of Pennsylvania Press.

Merriam-Webster. (n.d.). Stochastic. In Merriam-Webster’s unabridged dictionary. Retrieved December 30 from https://unabridged.merriam-webster.com/unabridged/stochastic

Oxford, R. L. (1993). Intelligent computers for learning languages: The view for Language Acquisition and Instructional Methodology. Computer Assisted Language Learning, 6(2), 173-188.


… to be continued …