Language Learning and AI: 7 lessons from 70 years (#4)

4. Appropriate error correction and contingent feedback

Rather than focusing on engaging the learner in communicative interaction, learning with ICALL systems was often based on the assumption that corrective feedback on learner language is of great importance. ICALL research particularly in the 1990s and early 2000s focused on corrective feedback and relied on the three steps of traditional error analysis: effective recognition, description, and explanation (Heift & Schulze, 2007, chapter 3: Error analysis and description). Error analysis (Corder, 1974) was the main approach in second-language acquisition research in the 1960s and 70s. It’s contributions to language education, applied linguistics, and ICALL are manifold and impacted language teaching to this day. Although error correction is still a part common teaching practices today, applied linguistics research has shifted the focus away from deficits in the learner’s language to operationalizing and encouraging their abilities (compare the National Council of State Supervisors for Languages (NCSSFL) and the American Council on the Teaching of Foreign Languages (ACTFL) Can-Do Statements, which were introduced in 2013). This has changed the perspective on corrective feedback in language education. More nuances were introduced and also ICALL began to look at providing help and guidance to learners through text augmentation by, for example, enriching a reading text with linked online glossaries and information on morphological paradigms (e.g., Amaral & Meurers, 2011; Wood, 2011). Text augmentation appears to be as yet underexplored in GenAI and language education research since late 2022.

My inspiration for this title came from the book
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025),
Insights into AI and language teaching and learning. Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

We are onto lesson 4. Part 0 gives a historical introduction. Lesson 1 focuses on the necessary exposure to authentic language and whether this can be done with GenAI. Lesson 2 looked at communication in context, which is central in language learning. We turned to the role of interaction in language learning with GenAI with lesson 3.

How does ICALL with its symbolic NLP compare with language feedback and guidance to GenAI with its LLMs and ANNs? The texts GenAI produces are mostly well-formed, especially if the text’s language is English or one of the other languages in which many texts on the internet are written (Schulze, in press). So, how suitable would a GenAI be for appropriate error correction and contingent feedback? In an ICALL system, a fragment of the grammar of the learnt language would be described with rules and items, using a formal grammar, in the expert model and parser. This computational grammar could ‘understand’ the linguistically well-formed words, phrases, and sentences, which were covered by the rules of the expert model. To be able to parse student errors, the expert model needed to be adapted. Errors were captured in an error grammar – the buggy rules that were parallel to the rules that covered error-free linguistic units – or in relaxed constraints (Dini & Malnati, 1993). An example of a buggy rule and its error-free counterpart in German is (in pseudo-code for legibility):

default rule(subject-verb agreement) := 
        if subject(NUMBER) = verb(NUMBER)
        and
        subject(PERSON) = verb(PERSON)
        then 
        parse successfully and move on
        else 
        buggy rule(subject-verb agreement)

buggy rule(subject-verb agreement)  := 
       if subject(NUMBER_S) <> verb(NUMBER_V) 
       then 
       give feedback("The subject is in",
          [subject(NUMBER_S)],".
          You need to choose a verb ending that indicates
          [subject(NUMBER_S)],", too. The verb in your 
          sentence is in", [verb(NUMBER_V)]) 
       else next
       then 
       if subject(PERSON_S)  <> verb(PERSON_V)
       then 
       give feedback("The subject is in", [subject(PERSON_S)],
       ". You need to choose a verb ending that indicates
       [subject(PERSON_S)],", too. The verb in your sentence 
       is in", [verb(PERSON_V)])

Buggy rules required a high level of error anticipation, because to cover an error, a particular buggy rule needed to be written. Since buggy rules are deterministic, if they were sufficient to parse the student input, they were robust in the feedback they provided. Relaxed constraints reached a slightly wider coverage and required less error anticipation, because the constraint that, for example, the subject and finite verb of German sentence needed to agree in number and person has been relaxed. This means that whether or not subject and verb agree the sentence is parsed successfully with one less constraint rule:

relaxed rule(subject-verb agreement) := 
         subject(NUMBER_S) and verb(NUMBER_V) 
         and subject(PERSON_S) and verb(PERSON_V)
         if NUMBER_S = NUMBER_V
         then next
         else 
         give feedback("The subject is in",
             [subject(NUMBER_S)],". You need to choose 
             a verb ending that indicates 
             [subject(NUMBER_S)],", too. The verb in your 
             sentence is in",[verb(NUMBER_V)])
         then 
         if PERSON_S = PERSON_V
         then next
         else 
         give feedback("The subject is in",
             [subject(PERSON_S)],". 
             You need to choose a verb ending that indicates 
             [subject(PERSON_S)],", too. The verb in 
             your sentence is in", [verb(PERSON_V)])
         then parse successfully and move on

This pseudo-code illustration shows how labor-intensive the coding of symbolic NLP for ICALL with its focus on error correction and feedback was. The lack of coverage of the computational lexica and grammars and the additional parsing challenges introduced with including parser coverage of errors learners make meant that even the few ICALL systems that were used by students had limited coverage (e.g., Heift, 2010; Nagata, 2002).

Coverage is not a problem for GenAI, as we saw above. However, LLMs and multidimensional ANNs were not intended to provide corrective feedback to language learners. Their error correction of GenAI can be illustrated best with the automatic correction of spelling errors. The prompt “Tell me please what the capitel of germany is.” with its two spelling errors yields the following result: “The capital of Germany is **Berlin** …” (Microsoft Copilot, 2025, January 17, my emphasis) For languages with LLMs, the automatic error correction in the natural language understanding is accurate and comprehensive, as can be seen from the answer in the example. However, the feedback on such errors, only given when specifically requested, is all too often flawed in parts or incomplete. Stated in brief, GenAIs are good at error correction and are limited in providing appropriate corrective feedback. Many teachers, and language learners, have at least anecdotal evidence that metalinguistic explanations are not suitable for language learners and that errors are often underreported or over-flagged. This is understandable if one considers that GenAIs are working with probabilistic patterns in the LLM for their error correction, diagnosis, and (metalinguistic) feedback. This works often for the correction, but is shaky at best for diagnosis and feedback. The computational linguist and ICALL researcher Detmar Meuers (2024) argued in this context that assuming a GenAI is a suitable language teacher is worse than asking a speaker of that language to start teaching systematic language classes. His argument was also based on the fact the a GenAI has no ‘knowledge’ of the prior learning history, language abilities and beliefs, and the general profile of the learner.

To be continued …

References

Amaral, L., & Meurers, W. D. (2011). On using Intelligent Computer-Assisted Language Learning in real-life foreign language teaching and learning. ReCALL, 23(1), 4-24.

Corder, P. (1974). Error Analysis. In J. P. B. Allen & P. Corder (Eds.), The Edinburgh Course in Applied Linguistics. Volume 3 – Techniques in Applied Linguistics (pp. 122-131). Oxford University Press.

Dini, L., & Malnati, G. (1993). Weak Constraints and Preference Rules. In P. Bennett & P. Paggio (Eds.), Preference in Eurotra (pp. 75-90). Commission of the European Communities.

Heift, T. (2010). Developing an Intelligent Tutor. CALICO JOURNAL, 27(3), 443-459.

Heift, T., & Schulze, M. (2007). Errors and Intelligence in CALL. Parsers and Pedagogues. Routledge.

Meurers, D. (2024). #3×07 – Intelligente Tutorielle Systeme (mit Prof. Dr. Detmar Meurers) In Auftrag:Aufbruch. Der Podcast des Forum Bildung Digitalisierung. https://auftrag-aufbruch.podigee.io/30-intelligente-tutorielle-systeme-mit-detmar-meurers

Microsoft Copilot. (2025, January 17). Tell me please what the capitel of germany is. Microsoft Copilot.

Nagata, N. (2002). BANZAI: An Application of Natural Language Processing to Web-Based Language Learning. CALICO, 19(3), 583-599.

Schulze, M. (2025). The impact of artificial intelligence (AI) on CALL pedagogies. In Lee McCallum & Dara Tafazoli (eds) The Palgrave Encyclopedia of Computer-Assisted Language Learning. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-51447-0_7-1.

Wood, P. (2011). Computer assisted reading in German as a foreign language. Developing and testing an NLP-based application. CALICO JOURNAL, 28(3), 662-676.

Language Learning and AI: 7 lessons from 70 years (#0)

What do we know about artificial intelligence (AI) in language teaching and learning already? What can we see if we look back more than two or so years? In the last two years, discourses on generative AI (GenAI) in the academic literature on (language) education, writing, publishing, (machine) translation, computer science, and many other areas as well as in mainstream and specialized media have resulted in a multitude of articles, books, chapters, columns, essays, guidelines, opinion pieces, and tip sheets. Here, the time window will be much wider, to provide a more leveled, quasi-historical lens on the rapidly evolving AI approaches and tools in the context of language education (see also Stockwell (2024) for another brief retrospective).

My inspiration for this title came from the book
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025),
Insights into AI and language teaching and learning. Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

Three very early milestone years are important in this context: 1948, 1950, and 1955. In 1948, the first publication that connects AI and language learning came out. In it, Alan Turing, often called the father of AI, mentions a number of different ways in which computers would be able to demonstrate their intelligence in the future: “(i) Various games, for example, chess, noughts and crosses, bridge, poker; (ii) The learning of languages; (iii) Translation of languages; (iv) Cryptography; (v) Mathematics” (Turing (1948) quoted in Hutchins, 1986, pp. 26-27, my emphasis). Also in 1948, the then brand-new field of Applied Linguistics reached a noticeable breakthrough with the publication of the first issue of Language Learning. A Quarterly Journal of Applied Linguistics (Reed, 1948). In 1950, what we call today the Turing Test was published as the “Imitation Game” (Turing, 1950). Seventy-four years passed before newspapers and magazines announced that ChatGPT-4 had passed the Turing Test. Researchers at UC San Diego had published a preprint (under review) about their replication of the Turing test (Jones & Bergen, 2024). “Human participants had a 5 minute conversation with either a human or an AI, and judged whether or not they thought their interlocutor was human. GPT-4 was judged to be a human 54% of the time” (p. 1). It was five years after the proposal of the Turing Test, which is meant to test the intelligence of a machine, that research and development in the field of Artificial Intelligence started. McCarthy et al. (1955) proposed “that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire,” coining the name of the field — Artificial Intelligence.

The intersection of artificial intelligence and (computer-assisted) language learning has thus had a trajectory of about 70 years and had been termed Intelligent CALL (ICALL) up until the advent of GenAI, when AI became the buzzword and label. The documented development of ICALL software and systems occurred later; Bowerman notes that “Weischedel et al. (1978) produced the first ICALL system …” (1993, p. 31). The system was a prototype German tutor implemented as an Augmented Transition Network with a semantic and syntactic component. Weischedel et al. (1978) reference earlier work in CALL, for example an article by Nelson et al. (1976). However, this and other earlier publications elsewhere seem to rely on string comparison, often character by character replacement, or regular expressions rather than natural language processing. It would only be the latter that is part of AI research and thus ICALL. ICALL played a significant role in Tutorial CALL (Heift & Schulze, 2015; Hubbard & Bradin-Siskin, 2004; Schulze, 2024) over many years, but it never became mainstream in CALL in terms of research and development. The label tutorial CALL captures the learning interaction of the student with the computer rather than interaction of the learner with other persons via the computer, as in computer-mediated communication. It is not only the utilization of AI that GenAI and ICALL have in common, GenAI has also brought a revival of the learner interacting with the machine and can thus be described as a form of tutorial CALL.

Heift and Schulze (2007) identified and discussed 119 ICALL projects over about thirty years, but with very rare exceptions these were research prototypes. Only a few ICALL projects had limited use in language classrooms (e.g., Heift, 2010; Nagata, 2002). In a review article, Schulze (2008) used a list of nine key desiderata for ICALL by the applied linguist Rebecca Oxford (1993) to discuss developmental trajectories in ICALL:

Communicative competence must be the cornerstone of ICALL.
ICALL must provide appropriate language assistance tailored to meet student needs.
ICALL must offer rich, authentic language input.
The ICALL student model must be based in part on a variety of learning styles.
ICALL material is most easily learned through associations, which are facilitated by interesting and relevant themes and meaningful language tasks.
ICALL tasks must involve interactions of many kinds and these interactions need not be just student-tutor interactions.
ICALL must provide useful, appropriate error correction suited to the student’s changing needs.
ICALL must involve all relevant language skills and must use each skill to support all other skills.
ICALL must teach students to become increasingly self-directed and self-confident language learners through explicit training in the use of learning strategies. (p. 174)

Here, these desiderata will be adapted and used as a tertium comparationis when drawing lessons from the ‘history’ of ICALL for the emerging use of GenAI in language education, using them also as a structuring criterion for these blog posts as follows:

A discussion of the work in ICALL as such over the decades is beyond the scope of this chapter; in addition to the review article mentioned above (Schulze, 2008), overviews of ICALL research can be found in the monograph by Heift and Schulze (2007), which also provides an introduction to the main concepts and research questions in the field about 20 years ago, in a chapter (Nerbonne, 2003) in The Oxford Handbook of Computational Linguistics, and in articles (Gamper & Knapp, 2002; Matthews, 1993) in CALL journals. Many publications on ICALL appeared in edited volumes and in refereed conference proceedings and journals on computational linguistics, broadly conceived, and thus outside of the literature on CALL. This might be one of the reasons why GenAI was such a surprising novelty in language education in general and CALL in particular and why a focused retrospective can further our understanding of role and developmental trajectory of GenAI in language education today. We start with an excursion into a branch of AI that is relevant here – natural language processing (NLP).

… to be continued …

References

Bowerman, C. (1993). Intelligent Computer-Aided Language Learning. LICE: A System to Support Undergraduates Writing in German [PhD Thesis, UMIST]. Manchester.

Gamper, J., & Knapp, J. (2002). A Review of Intelligent CALL Systems. Computer Assisted Language Learning, 15(4), 329-342.

Heift, T. (2010). Developing an Intelligent Tutor. CALICO JOURNAL, 27(3), 443-459.

Heift, T., & Schulze, M. (2007). Errors and Intelligence in CALL. Parsers and Pedagogues. Routledge.

Heift, T., & Schulze, M. (2015). Tutorial CALL. Language Teaching, 48(4), 471–490.

Hubbard, P., & Bradin-Siskin, C. (2004). Another Look at Tutorial CALL. ReCALL, 16(2), 448–461.

Hutchins, J. (1986). Machine Translation – Past, Present and Future. Ellis Horwood.

Jones, C. R., & Bergen, B. K. (2024). People cannot distinguish GPT-4 from a human in a Turing test. http://dx.doi.org/10.48550/arXiv.2310.20216

Matthews, C. (1993). Grammar Frameworks in Intelligent CALL. CALICO JOURNAL, 11(1), 5-27.

McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (1955). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence. Retrieved Sep 30 from http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html

Nagata, N. (2002). BANZAI: An Application of Natural Language Processing to Web-Based Language Learning. CALICO, 19(3), 583-599.

Nelson, G. E., Ward, J. R., Desch, S. H., & Kaplow, R. (1976). Two New Strategies for Computer-Assisted Language Instruction (CALI). Foreign Language Annals, 9(1), 28-37.

Nerbonne, J. A. (2003). Computer-Assisted Language Learning and Natural Language Processing. In R. Mitkov (Ed.), The Oxford Handbook of Computational Linguistics (pp. 670-698). Oxford University Press.

Oxford, R. L. (1993). Intelligent computers for learning languages: The view for Language Acquisition and Instructional Methodology. Computer Assisted Language Learning, 6(2), 173-188.

Reed, D. W. (1948). Editorial. Language Learning, 1(1), 1–2.

Schulze, M. (2008). AI in CALL – Artificially Inflated or Almost Imminent? CALICO JOURNAL, 25(3), 510-527.

Schulze, M. (2024). Tutorial CALL — Language practice with the computer. In R. Hampel & U. Stickler (Eds.), Bloomsbury Handbook of Language Learning and Technologies (pp. 35–47). Bloomsbury Publishing.

Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

Stockwell, G. (2024). ChatGPT in language teaching and learning: Exploring the road we’re travelling. Technology in Language Teaching & Learning, 6(1), 1–9.

Turing, A. (1950). Computing machinery and intelligence. Mind, LIX(236), 433-460.

Weischedel, R. M., Voge, W. M., & James, M. (1978). An Artificial Intelligence Approach to Language Instruction. Artificial Intelligence, 10, 225-240.

Rupture

Rupture!

Writing gives me a chance to think. It does not happen very often. I need to make it happen. Often. As often as I can. The thinking. The writing helps. Helps me remember. Helps me to slow down. Slow down my thinking. When it’s slow, it gets deeper. Alright. That’s a cliché. No, it’s not. Who said that. I got interrupted. Now my stream of thoughts has been disrupted. How did that happen?

Let me go back to the chance to think. About disruption. Disruptive. Is this good or bad? The question is too simple, too linear. Just one alternative. And there are so many. Alternatives. Alternatives after a disruption. It’s complex. Linear is just one of a zillion alternatives. Is zillion a number? Apparently not. I learnt that at trivia night three weeks ago. Interrupted again. I was thinking about disruption. In recent posts, I was talking about AI. Generative AI. Technology. And now disruption. Disruptive technologies.

No, I am not getting all businessy in this blog. Business folk like to talk about disruptive technologies. And so do I. It happens all the time. The disruption. Film disrupted theater. TV disrupted movie theaters. Video cassettes disrupted movie theaters too. VCR. VCR disrupted BetaMax. BetaMax was of better quality. It was discontinued. VCR prevailed. Until … Until the DVD disruption came. Netflix used to send out VCR cassettes. Streaming services disrupted the DVD. During COVID lockdown new films were streamed. The theaters were closed.

Is this all good or bad? You decide. All of you. Each time you decide. Again. And again. Each of you. Separately. And together. It’s complex. It can’t be linear. There are a zillion alternatives. And sometimes only one seems to prevail. For a short time. A disruption. And another one. In different areas. Not just film and theater and video. A disruption. Disruptive technologies. And we are taken by surprise. At times.

Disruptive technologies. And since 2022 we have been talking about AI. A disruption? AI disruption. Sure. What will it bring? What will we gain? What will we lose? In learning and for teachers, we read about new tools. The lesson plan that writes itself? The text the kids will read that was generated on the teacher’s computer. The feedback the machine gave, the errors corrected. With new errors?

Rupture.