Language Learning and AI: 7 lessons from 70 years (conclusion)

Seven Lessons

There has always been some interaction between AI and language and learning for the last 70 years. In computer-assisted language learning (CALL), people have worked on applying AI – and they called it ICALL – for almost 50 years. For GenAI, what can we learn from these efforts of working with good old-fashioned AI for such a long time?

My inspiration for this title came from the book
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025),
Insights into AI and language teaching and learning. Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

In conclusion, we will recapitulate and condense the seven lessons that we can learn from ‘good old-fashioned AI’ and ICALL with its declarative knowledge, engineered algorithms, and symbolic NLP and see how they can be applied to GenAI with its machine-learnt complex artificial neural networks.

Exposure to rich, authentic language
GenAI is capable of providing ample exposure to rich language just in time, on the right topic, and at the right level. Generated texts consist of mostly accurate language forms and are plausible, so that they lend themselves to an interpretation in context by the students. This gives such a text an authentic feel. Here GenAI compares very well to the limited linguistic scope of ICALL systems.
Communication in context
GenAI, also because of the comprehensive coverage of the LLMs, can sustain conversations with learners on different topics. Its natural language understanding is such that it can take into consideration prior textual context, making any conversation more natural. This was impossible with ICALL systems and chatbots of the past. However, teachers and students need to be aware that they are communicating with a machine, a stochastic parrot (Bender et al., 2021). This requires informed reflection on a new form of communication and learning, to avoid the anthropomorphizing of machine and its output.
Appropriate error correction and contingent feedback
This is the area where we can learn most from ICALL and tutorial CALL. Especially with giving metalinguistic feedback, GenAI has too many shortcomings. Researchers need to explore how the automatic error correction, which happens frequently, impacts aspects of language learning such as noticing.
Varied interaction in language learning tasks
This is the area where we have many new opportunities to explore, although we can take inspiration particularly from projects in ICALL and game-base language learning. GenAI is most suitable as a partner in conversation and learning.
Recording learner behavior and student modeling
Student modeling has a long tradition – not just in ICALL – in AI and education. GenAI tools by themselves are that – tools and not tutors. They can be embedded in other learning systems, but they cannot be used as virtual tutors, because their information about learners and the learning context are serendipitous at best.
Dynamic individualization
GenAI provides teachers and students with an individual experience with generated texts of high quality. The adaptive instruction (Schulze et al., 2025 in press), however, which has been an ambition of ICALL research, has not yet been achieved. Broader research and development in AI, beyond GenAI, is still necessary to achieve dynamic individualization in what can truly be termed ICALL.
Gradual release of responsibility
Since the instructional sequences, pedagogical approaches, and teaching methods are not present in GenAI, teachers need to design the use of GenAI as one of the tools in the learning process carefully. Teachers must not render the control of curricular and pedagogical decisions about activity design, learning goals, lesson contents, and learning materials to the machine.

GenAI, due to its powerful LLMs, has lifted AI in language education to a new quality. Such a disruptive technology shows great promise, provides many additional opportunities, and poses some challenges for teachers, students, and researchers alike.

Language Learning and AI: 7 lessons from 70 years (#1)

1. Exposure to rich, authentic language

The texts – or the language – that a computer can understand or generate depend on its capacity for NLP. Computer scientists added the adjective ‘natural’ because the parsing of programming language(s) was possible, and necessary, before they turned to parsing texts produced by humans. In early NLP, computational linguists wrote grammatical rules and compatible dictionaries in programming languages such as Prolog and LISP. Rules and items were written by hand, relying on different (mathematical) grammar formalisms. This made the development process slow, error-prone, computationally expensive, and labor-intensive. This might be the main reason why the coverage and robustness of ICALL systems and applications remained limited over the years. Parsing a single sentence – the analysis of the grammatical constructions and the production of an equivalent information structure, something similar to a syntactic tree, which the computer could “understand” – took from a couple of seconds to a few minutes, depending on the computer hardware and the efficiency of the parsing algorithm. This approach to NLP is called symbolic, because it uses and processes symbols for syntactic phrases, such as NP for a noun phrase and VP for a verb phrase, and for lexical items, such as N for a noun and V for a verb, and their grammatical feature structures. Symbolic NLP in ICALL resulted in sentence-based language learning activities in a tutorial system. Since the dictionary was also hand-written and hence usually small, the language to which students using the ICALL system were exposed was limited to the vocabulary of a textbook at best.

The part #0 gave a general introduction. Here we have the first section on what language learners should expect of generative-AI tools. Other parts will follow.

My inspiration for this title came from the book
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025),
Insights into AI and language teaching and learning. Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

In the 1990s, more electronic corpora (large, principled collections of texts) also in languages other than English became available. The approach to NLP that relies on the mathematical analysis of large corpora has been called statistical NLP. In this approach, language patterns are detected in corpus analyses. For these patterns or contiguous sequences of words – called n-grams with n being the number of words in each and every sequence – the probability of one word following the other(s) is calculated. In their simplest form, the probabilistic connection of linear word sequences is calculated. This results in a wider coverage of language, because of the underlying use of large corpora. However, the limitation was that, for example, long-distance dependencies as in the following sentence still posed a problem as they had in symbolic NLP.

The student who had finally given the right answer proceeded to ask the next question.

For any human reader, it is immediately clear that it is ‘the student’ who ‘proceeded’ and not ‘the right answer’. For the computer, this connection between grammatical subject and finite verb poses a challenge because the words are not in the same n-gram(s) and the pattern cannot be detected easily.

This and other challenges were overcome by relying on artificial neural networks (ANNs), models that are inspired by the neural networks of human brains (for a comprehensive overview of how GPTs (generalized pre-trained transformers) work, see Wolfram, 2023, February 14). ANNs are multidimensional and do not only rely on linear sequences of words. Their individual nodes, the neurons, receive input and send output to other neurons. The processing of input to produce output is basically done through a mathematical equation. Thus, this output depends on the (probabilistically) weighted input. The network learns by adjusting the weights, which multiply different input values and biases, the latter are added independently of the input, to improve the accuracy of the result. If this machine learning relies on neurons organized in multiple layers – the input layer, the output layer, and in-between two or more hidden layers, then we talk about a deep network and deep learning (LeCun et al., 2015). “GPT-3 has 96 layers. GPT-4’s exact number of layers hasn’t been publicly disclosed, but it is expected to be significantly larger than GPT-3” (Microsoft Copilot, 2024, December 26). Deep learning and ANNs are the underpinnings of the large language models (LLMs) (for an accessible overview of ANNs and LLMs, see Naveed et al., 2024), which in turn are the backbone of GenAI chatbots such as ChatGPT (OpenAI), Claude (Anthropic), Copilot (Microsoft), and Gemini (Google). Thus, LLMs essentially rely on enormous corpora of texts scraped from the internet and on machine-learned neural networks. In these ANNs, individual tokens – which can be individual letters, words, and parts of a word – are represented by long lists of numbers, which are called word vectors. The parameters in the network, which are tiny little rules and steps, help to determine which word follows the previous word. “GPT-3 has 175 billion parameters, which include the weights and biases of the neurons. GPT-4 is speculated to have trillions of parameters, though the exact number hasn’t been confirmed” (Microsoft Copilot, 2024, December 26).

This new computational approach is far removed from the reliance on linguistic rules and items in early NLP and ICALL, because it is steeped in the complex calculations in the hidden layers of the LLM and arrays upon arrays of numbers. That’s why GenAI’s coverage, scope, and speed of NLP is vastly superior to previous systems in ICALL. Therefore, we can argue that students using GenAI are exposed to rich language at the paragraph and not only the sentence level. But is this generated language authentic? In an early paper on authenticity in the language classroom, Breen (1985) proposes that “that authentic texts for language learning are any sources of data which will serve as a means to help the learner to develop an authentic interpretation” (p. 68). The question then becomes: can a learner develop an authentic interpretation of a turn or text generated by a GenAI chatbot or a translation rendered by a GenAI machine translation tool? Since the generated texts are certainly well-formed and plausible, they appear to provide a good basis for the learner’s interpretation and thus for language learning. Also, because they are based on actual language use as found in the texts on the internet, which were used to train the LLM, we have another indication that generated texts in chat with a GenAI or a translation from a GenAI potentially qualify as authentic. However, the real key to authenticity of language is found in communication.

References

Breen, M. P. (1985). Authenticity in the Language Classroom. Applied Linguistics, 6(1), 60-70. https://doi.org/10.1093/applin/6.1.60

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521, 436–444.

Microsoft Copilot. (2024, December 26). How many nodes and layers does the ANN of the GPT large language model have? Microsoft Copilot.

Naveed, H., Khan, A. U., Qiub, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2024). A Comprehensive Overview of Large Language Models. https://dx.doi.org/10.48550/arxiv.2307.06435 http://arxiv.org/pdf/2307.06435

Wolfram, S. (2023, February 14). What is ChatGPT doing … and why does it work? Stephen Wolfram Writings. https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work

… to be continued …

Language Learning and AI: 7 lessons from 70 years (#0)

What do we know about artificial intelligence (AI) in language teaching and learning already? What can we see if we look back more than two or so years? In the last two years, discourses on generative AI (GenAI) in the academic literature on (language) education, writing, publishing, (machine) translation, computer science, and many other areas as well as in mainstream and specialized media have resulted in a multitude of articles, books, chapters, columns, essays, guidelines, opinion pieces, and tip sheets. Here, the time window will be much wider, to provide a more leveled, quasi-historical lens on the rapidly evolving AI approaches and tools in the context of language education (see also Stockwell (2024) for another brief retrospective).

My inspiration for this title came from the book
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025),
Insights into AI and language teaching and learning. Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

Three very early milestone years are important in this context: 1948, 1950, and 1955. In 1948, the first publication that connects AI and language learning came out. In it, Alan Turing, often called the father of AI, mentions a number of different ways in which computers would be able to demonstrate their intelligence in the future: “(i) Various games, for example, chess, noughts and crosses, bridge, poker; (ii) The learning of languages; (iii) Translation of languages; (iv) Cryptography; (v) Mathematics” (Turing (1948) quoted in Hutchins, 1986, pp. 26-27, my emphasis). Also in 1948, the then brand-new field of Applied Linguistics reached a noticeable breakthrough with the publication of the first issue of Language Learning. A Quarterly Journal of Applied Linguistics (Reed, 1948). In 1950, what we call today the Turing Test was published as the “Imitation Game” (Turing, 1950). Seventy-four years passed before newspapers and magazines announced that ChatGPT-4 had passed the Turing Test. Researchers at UC San Diego had published a preprint (under review) about their replication of the Turing test (Jones & Bergen, 2024). “Human participants had a 5 minute conversation with either a human or an AI, and judged whether or not they thought their interlocutor was human. GPT-4 was judged to be a human 54% of the time” (p. 1). It was five years after the proposal of the Turing Test, which is meant to test the intelligence of a machine, that research and development in the field of Artificial Intelligence started. McCarthy et al. (1955) proposed “that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire,” coining the name of the field — Artificial Intelligence.

The intersection of artificial intelligence and (computer-assisted) language learning has thus had a trajectory of about 70 years and had been termed Intelligent CALL (ICALL) up until the advent of GenAI, when AI became the buzzword and label. The documented development of ICALL software and systems occurred later; Bowerman notes that “Weischedel et al. (1978) produced the first ICALL system …” (1993, p. 31). The system was a prototype German tutor implemented as an Augmented Transition Network with a semantic and syntactic component. Weischedel et al. (1978) reference earlier work in CALL, for example an article by Nelson et al. (1976). However, this and other earlier publications elsewhere seem to rely on string comparison, often character by character replacement, or regular expressions rather than natural language processing. It would only be the latter that is part of AI research and thus ICALL. ICALL played a significant role in Tutorial CALL (Heift & Schulze, 2015; Hubbard & Bradin-Siskin, 2004; Schulze, 2024) over many years, but it never became mainstream in CALL in terms of research and development. The label tutorial CALL captures the learning interaction of the student with the computer rather than interaction of the learner with other persons via the computer, as in computer-mediated communication. It is not only the utilization of AI that GenAI and ICALL have in common, GenAI has also brought a revival of the learner interacting with the machine and can thus be described as a form of tutorial CALL.

Heift and Schulze (2007) identified and discussed 119 ICALL projects over about thirty years, but with very rare exceptions these were research prototypes. Only a few ICALL projects had limited use in language classrooms (e.g., Heift, 2010; Nagata, 2002). In a review article, Schulze (2008) used a list of nine key desiderata for ICALL by the applied linguist Rebecca Oxford (1993) to discuss developmental trajectories in ICALL:

Communicative competence must be the cornerstone of ICALL.
ICALL must provide appropriate language assistance tailored to meet student needs.
ICALL must offer rich, authentic language input.
The ICALL student model must be based in part on a variety of learning styles.
ICALL material is most easily learned through associations, which are facilitated by interesting and relevant themes and meaningful language tasks.
ICALL tasks must involve interactions of many kinds and these interactions need not be just student-tutor interactions.
ICALL must provide useful, appropriate error correction suited to the student’s changing needs.
ICALL must involve all relevant language skills and must use each skill to support all other skills.
ICALL must teach students to become increasingly self-directed and self-confident language learners through explicit training in the use of learning strategies. (p. 174)

Here, these desiderata will be adapted and used as a tertium comparationis when drawing lessons from the ‘history’ of ICALL for the emerging use of GenAI in language education, using them also as a structuring criterion for these blog posts as follows:

A discussion of the work in ICALL as such over the decades is beyond the scope of this chapter; in addition to the review article mentioned above (Schulze, 2008), overviews of ICALL research can be found in the monograph by Heift and Schulze (2007), which also provides an introduction to the main concepts and research questions in the field about 20 years ago, in a chapter (Nerbonne, 2003) in The Oxford Handbook of Computational Linguistics, and in articles (Gamper & Knapp, 2002; Matthews, 1993) in CALL journals. Many publications on ICALL appeared in edited volumes and in refereed conference proceedings and journals on computational linguistics, broadly conceived, and thus outside of the literature on CALL. This might be one of the reasons why GenAI was such a surprising novelty in language education in general and CALL in particular and why a focused retrospective can further our understanding of role and developmental trajectory of GenAI in language education today. We start with an excursion into a branch of AI that is relevant here – natural language processing (NLP).

… to be continued …

References

Bowerman, C. (1993). Intelligent Computer-Aided Language Learning. LICE: A System to Support Undergraduates Writing in German [PhD Thesis, UMIST]. Manchester.

Gamper, J., & Knapp, J. (2002). A Review of Intelligent CALL Systems. Computer Assisted Language Learning, 15(4), 329-342.

Heift, T. (2010). Developing an Intelligent Tutor. CALICO JOURNAL, 27(3), 443-459.

Heift, T., & Schulze, M. (2007). Errors and Intelligence in CALL. Parsers and Pedagogues. Routledge.

Heift, T., & Schulze, M. (2015). Tutorial CALL. Language Teaching, 48(4), 471–490.

Hubbard, P., & Bradin-Siskin, C. (2004). Another Look at Tutorial CALL. ReCALL, 16(2), 448–461.

Hutchins, J. (1986). Machine Translation – Past, Present and Future. Ellis Horwood.

Jones, C. R., & Bergen, B. K. (2024). People cannot distinguish GPT-4 from a human in a Turing test. http://dx.doi.org/10.48550/arXiv.2310.20216

Matthews, C. (1993). Grammar Frameworks in Intelligent CALL. CALICO JOURNAL, 11(1), 5-27.

McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (1955). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence. Retrieved Sep 30 from http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html

Nagata, N. (2002). BANZAI: An Application of Natural Language Processing to Web-Based Language Learning. CALICO, 19(3), 583-599.

Nelson, G. E., Ward, J. R., Desch, S. H., & Kaplow, R. (1976). Two New Strategies for Computer-Assisted Language Instruction (CALI). Foreign Language Annals, 9(1), 28-37.

Nerbonne, J. A. (2003). Computer-Assisted Language Learning and Natural Language Processing. In R. Mitkov (Ed.), The Oxford Handbook of Computational Linguistics (pp. 670-698). Oxford University Press.

Oxford, R. L. (1993). Intelligent computers for learning languages: The view for Language Acquisition and Instructional Methodology. Computer Assisted Language Learning, 6(2), 173-188.

Reed, D. W. (1948). Editorial. Language Learning, 1(1), 1–2.

Schulze, M. (2008). AI in CALL – Artificially Inflated or Almost Imminent? CALICO JOURNAL, 25(3), 510-527.

Schulze, M. (2024). Tutorial CALL — Language practice with the computer. In R. Hampel & U. Stickler (Eds.), Bloomsbury Handbook of Language Learning and Technologies (pp. 35–47). Bloomsbury Publishing.

Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

Stockwell, G. (2024). ChatGPT in language teaching and learning: Exploring the road we’re travelling. Technology in Language Teaching & Learning, 6(1), 1–9.

Turing, A. (1950). Computing machinery and intelligence. Mind, LIX(236), 433-460.

Weischedel, R. M., Voge, W. M., & James, M. (1978). An Artificial Intelligence Approach to Language Instruction. Artificial Intelligence, 10, 225-240.