Language Learning and AI: 7 lessons from 70 years (#3)

3. Varied interaction in language-learning tasks

The human-machine conversation often works because we are used to adhere – even if the machine cannot and is not — to Grice’s four maxims of conversation (Grice, 1975): quantity (be informative), quality (be truthful), relation (be relevant), and manner (be clear). Interaction in dialog works because readers look at the mathematically compiled output of the GenAI and assume that it is informative, truthful, and relevant. Due to its generation of linguistically accurate and plausible text, the GenAI appears to be clear. Communicative interaction proceeds successfully as long as the human reader does not detect that the machine output is not truthful or factually accurate because of, for example, hallucinations (Nananukul & Kejriwal, 2024) or errors or is not relevant because of misinterpreting an ambiguity in different contexts (e.g., when asked about bats, giving information about the mammal rather than the intended sports instrument).

Photo by Ksenia Chernaya on Pexels.com
And here is lesson #3 of a short series. Part 0 gives a historical introduction. Lesson #1 focuses on the necessary exposure to authentic language and whether this can be done with GenAI. Lesson #2 looked at communication in context, which is central in language learning. And now we are turning to the role of interaction in language learning with GenAI.

Besides these hurdles, GenAIs have become interesting verbal interactants in language education. On the other hand, ICALL systems, mainly due to their limited language coverage (see above), have provided limited interaction. Systems with or without AI worked with branching trees and canned text, for example in Quandary, a software of the Hot Potatoes suite (Arneil & Holmes, n.d.), which does not have NLP built in. Other systems were more like Chatbots whose conversation was limited to one topic or topic area (e.g., Underwood, 1982). Such CALL chatbots were inspired by Weizenbaum’s Eliza (for his reflection see Weizenbaum (1976)) and SHRDLU (Winograd, 1971) and often relied on regular expressions (Computer Science Field Guide, n.d.) and keyword searches. More sophisticated NLP was employed in the interactive games Spion (Sanders & Sanders, 1995) and Kommissar (DeSmedt, 1995). These early examples of the direct interaction of a learner with a machine with some AI capabilities, especially a level of NLP, show that GenAI has opened a door to the possibility of many more complex and comprehensive verbal interactions and role plays in a variety of languages.

My inspiration for this title came from the book  
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025), 
Insights into AI and language teaching and learning.
Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

Of course, language learning tasks (see  Willis (1996) for an early introduction to the now commonly applied Task-based Language Teaching) are not only rooted in conversations and role plays. GenAI can also generate model answers for different task components or be employed for brainstorming first ideas in the pre-task steps, for example. This was impossible with the ICALL systems based on symbolic NLP and (limited) expert systems. A discussion of the affordances and challenges of this powerful generation of (partial) task outcomes and components both by the student or the teacher is beyond the confines of this chapter, but it is an area within the application of GenAI in language education that is in urgent need of discussion. This agentive collaboration in dialog, possible scaffolding, and student guidance can either support or hinder and even prevent learning.

To be continued …

References

Arneil, S., & Holmes, M. (n.d.). Quandary. Retrieved January 17 from https://hcmc.uvic.ca/project/quandary/

Computer Science Field Guide. (n.d.). Regular expressions – Formal Languages. Retrieved January 27 from https://www.csfieldguide.org.nz/en/chapters/formal-languages/regular-expressions/

DeSmedt, W. H. (1995). Herr Kommissar: An ICALL Conversation Simulator for Intermediate German. In V. M. Holland, J. D. Kaplan, & M. R. Sams (Eds.), Intelligent Language Tutors: Theory Shaping Technology (pp. 153-174). Lawrence Erlbaum Associates.

Grice, H. P. (1975). Logic and Conversation. In D. Cole & J. Morgan (Eds.), Syntax and Semantics: Speech Acts (pp. 41-58). Academic Press.

Nananukul, N., & Kejriwal, M. (2024). HALO: an ontology for representing and categorizing hallucinations in large language models. Proc. SPIE 13058, Disruptive Technologies in Information Sciences VIII, 130580B (6 June 2024),

Sanders, R. H., & Sanders, A. F. (1995). History of an AI Spy Game: Spion. In V. M. Holland, J. D. Kaplan, & M. R. Sams (Eds.), Intelligent Language Tutors: Theory Shaping Technology (pp. 141-151). Lawrence Erlbaum Associates.

Underwood, J. H. (1982). Simulated Conversation as CAI Strategy. Foreign Language Annals, 15, 209-212.

Weizenbaum, J. (1976). Computer Power and Human Reason: From Judgment To Calculation. W. H. Freeman.

Willis, J. R. (1996). A framework for task-based learning. Longman.

Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. https://hci.stanford.edu/winograd/shrdlu/AITR-235.pdf

Language Learning and AI: 7 lessons from 70 years (#2)

2. Communication in context

Oxford (1993) desired that “communicative competence must be the cornerstone of ICALL”  (p. 174), noting that many ICALL projects of her time did not meet that goal, although communication and by extension communicative language teaching have been central ideas in applied linguistics for decades. Canale and Swain (1980) transferred the concept of communicative competence by Dell Hymes – developed in opposition to the Chomskyan linguistic competence – from sociolinguistics to language learning. Hymes (1974) had introduced communicative competence with the mnemonic SPEAKING = Setting and Scene (time and place), Participants (speaker and audience), Ends (purpose and outcome), Act Sequence (progression of speech acts), Key (tone, manner), Instrumentalities (language modalities), Norms (social rules), and Genre (kind of speech act or text) (pp. 53–62). These different facets and components of communication go well beyond the idea of a generative grammar (Chomsky, 1957) and that of linguistic competence. In ICALL as in NLP generally, however, generative grammar and other formal grammars, which are sets of rules that rewrite strings using mathematical operations, are the backbone of a system. The partial disconnect between formal grammars, such as Head-driven Phrase Structure Grammar and Categorial Grammar, and communicative competence with its focus on meaning, situation, and context, as clearly illustrated also by Hymes’ mnemonic, meant that ICALL systems did hardly play a role in communicative language teaching.

3 computer laptops talking to one another
Image generated in WordPress.com
This is the third part of a short series. All parts are based on a manuscript that I wrote recently. Part 0 gives a historical introduction. Lesson 1 focuses on the necessary exposure to authentic language and whether this can be done with GenAI. And I mean exposure and not so-called comprehensible input.

The GenAI chatbots, however, have been said to be a suitable conversation partner (Baidoo-anu & Owusu Ansah, 2023) and learning buddy (https://www.khanmigo.ai/). The GenAI output in a number of languages is certainly well-formed and plausible; GenAI’s natural language understanding is fast and precise. But is a conversation with a chatbot the same as a human conversation? Is it a negotiation of meaning as understood in communicative language teaching? The NLP researcher Emily Bender and her colleagues compared GenAI chatbots to stochastic parrots, which skillfully aim but proceed by guesswork (see Merriam-Webster, n.d.), and argue that “coherence is in fact in the eye of the beholder. Our human understanding of coherence derives from our ability to recognize interlocutors’ beliefs … and intentions … within context … That is, human language use takes place between individuals who share common ground and are mutually aware of that sharing (and its extent), who have communicative intents which they use language to convey, and who model each others’ mental states as they communicate” (Bender et al., 2021, p. 616). In other words, the chatbot spits out forms that are plausible but that do not mean anything; the (student or teacher) reader imbues these hollow forms with meaning and thus anthropomorphizes the GenAI tool, by then reasoning about its ‘intention’ and basing their response on the result of the reasoning, as humans do in conversation. Computers, however, do not have or formulate intentions. Something was clicked, data was input, and a condition was met. This triggered a digital operation, and forms that are numbers to the computer and look like words to the human user of the device became visible or audible. We add meaning after the form of the text has been generated.

My inspiration for this title came from the book  
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025), 
Insights into AI and language teaching and learning.
Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

References

Baidoo-anu, D., & Owusu Ansah, L. (2023). Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning. Journal of AI, 7(1), 52-62.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610-623). https://doi.org/10.1145/3442188.3445922

Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1(1), 1-47.

Chomsky, N. (1957). Syntactic Structures. Mouton.

Hymes, D. H. (1974). Foundations in Sociolinguistics: An Ethnographic Approach. University of Pennsylvania Press.

Merriam-Webster. (n.d.). Stochastic. In Merriam-Webster’s unabridged dictionary. Retrieved December 30 from https://unabridged.merriam-webster.com/unabridged/stochastic

Oxford, R. L. (1993). Intelligent computers for learning languages: The view for Language Acquisition and Instructional Methodology. Computer Assisted Language Learning, 6(2), 173-188.


… to be continued …

Language Learning and AI: 7 lessons from 70 years (#1)

1. Exposure to rich, authentic language

The texts – or the language – that a computer can understand or generate depend on its capacity for NLP. Computer scientists added the adjective ‘natural’ because the parsing of programming language(s) was possible, and necessary, before they turned to parsing texts produced by humans. In early NLP, computational linguists wrote grammatical rules and compatible dictionaries in programming languages such as Prolog and LISP. Rules and items were written by hand, relying on different (mathematical) grammar formalisms. This made the development process slow, error-prone, computationally expensive, and labor-intensive. This might be the main reason why the coverage and robustness of ICALL systems and applications remained limited over the years. Parsing a single sentence – the analysis of the grammatical constructions and the production of an equivalent information structure, something similar to a syntactic tree, which the computer could “understand” – took from a couple of seconds to a few minutes, depending on the computer hardware and the efficiency of the parsing algorithm. This approach to NLP is called symbolic, because it uses and processes symbols for syntactic phrases, such as NP for a noun phrase and VP for a verb phrase, and for lexical items, such as N for a noun and V for a verb, and their grammatical feature structures. Symbolic NLP in ICALL resulted in sentence-based language learning activities in a tutorial system. Since the dictionary was also hand-written and hence usually small, the language to which students using the ICALL system were exposed was limited to the vocabulary of a textbook at best.

Photo by Alex P on Pexels.com
The part #0 gave a general introduction. Here we have the first section on what language learners should expect of generative-AI tools. Other parts will follow.
My inspiration for this title came from the book  
Snyder, T. (2017). On tyranny: Twenty lessons from the twentieth century. Tim Duggan Books.

I am sharing these early drafts of a book chapter I published in
Yijen Wang, Antonie Alm, & Gilbert Dizon (Eds.) (2025), 
Insights into AI and language teaching and learning.
Castledown Publishers.

https://doi.org/10.29140/9781763711600-02.

In the 1990s, more electronic corpora (large, principled collections of texts) also in languages other than English became available. The approach to NLP that relies on the mathematical analysis of large corpora has been called statistical NLP. In this approach, language patterns are detected in corpus analyses. For these patterns or contiguous sequences of words – called n-grams with n being the number of words in each and every sequence – the probability of one word following the other(s) is calculated. In their simplest form, the probabilistic connection of linear word sequences is calculated. This results in a wider coverage of language, because of the underlying use of large corpora. However, the limitation was that, for example, long-distance dependencies as in the following sentence still posed a problem as they had in symbolic NLP.

The student who had finally given the right answer proceeded to ask the next question.

For any human reader, it is immediately clear that it is ‘the student’ who ‘proceeded’ and not ‘the right answer’. For the computer, this connection between grammatical subject and finite verb poses a challenge because the words are not in the same n-gram(s) and the pattern cannot be detected easily.

This and other challenges were overcome by relying on artificial neural networks (ANNs), models that are inspired by the neural networks of human brains (for a comprehensive overview of how GPTs (generalized pre-trained transformers) work, see Wolfram, 2023, February 14). ANNs are multidimensional and do not only rely on linear sequences of words. Their individual nodes, the neurons, receive input and send output to other neurons. The processing of input to produce output is basically done through a mathematical equation. Thus, this output depends on the (probabilistically) weighted input. The network learns by adjusting the weights, which multiply different input values and biases, the latter are added independently of the input, to improve the accuracy of the result. If this machine learning relies on neurons organized in multiple layers – the input layer, the output layer, and in-between two or more hidden layers, then we talk about a deep network and deep learning (LeCun et al., 2015). “GPT-3 has 96 layers. GPT-4’s exact number of layers hasn’t been publicly disclosed, but it is expected to be significantly larger than GPT-3” (Microsoft Copilot, 2024, December 26). Deep learning and ANNs are the underpinnings of the large language models (LLMs) (for an accessible overview of ANNs and LLMs, see Naveed et al., 2024), which in turn are the backbone of GenAI chatbots such as ChatGPT (OpenAI), Claude (Anthropic), Copilot (Microsoft), and Gemini (Google). Thus, LLMs essentially rely on enormous corpora of texts scraped from the internet and on machine-learned neural networks. In these ANNs, individual tokens – which can be individual letters, words, and parts of a word – are represented by long lists of numbers, which are called word vectors. The parameters in the network, which are tiny little rules and steps, help to determine which word follows the previous word. “GPT-3 has 175 billion parameters, which include the weights and biases of the neurons. GPT-4 is speculated to have trillions of parameters, though the exact number hasn’t been confirmed” (Microsoft Copilot, 2024, December 26).

This new computational approach is far removed from the reliance on linguistic rules and items in early NLP and ICALL, because it is steeped in the complex calculations in the hidden layers of the LLM and arrays upon arrays of numbers. That’s why GenAI’s coverage, scope, and speed of NLP is vastly superior to previous systems in ICALL. Therefore, we can argue that students using GenAI are exposed to rich language at the paragraph and not only the sentence level. But is this generated language authentic? In an early paper on authenticity in the language classroom, Breen (1985) proposes that “that authentic texts for language learning are any sources of data which will serve as a means to help the learner to develop an authentic interpretation” (p. 68). The question then becomes: can a learner develop an authentic interpretation of a turn or text generated by a GenAI chatbot or a translation rendered by a GenAI machine translation tool? Since the generated texts are certainly well-formed and plausible, they appear to provide a good basis for the learner’s interpretation and thus for language learning. Also, because they are based on actual language use as found in the texts on the internet, which were used to train the LLM, we have another indication that generated texts in chat with a GenAI or a translation from a GenAI potentially qualify as authentic. However, the real key to authenticity of language is found in communication.

References

Breen, M. P. (1985). Authenticity in the Language Classroom. Applied Linguistics, 6(1), 60-70. https://doi.org/10.1093/applin/6.1.60

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521, 436–444.

Microsoft Copilot. (2024, December 26). How many nodes and layers does the ANN of the GPT large language model have? Microsoft Copilot.

Naveed, H., Khan, A. U., Qiub, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2024). A Comprehensive Overview of Large Language Models. https://dx.doi.org/10.48550/arxiv.2307.06435 http://arxiv.org/pdf/2307.06435

Wolfram, S. (2023, February 14). What is ChatGPT doing … and why does it work? Stephen Wolfram Writings. https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work


… to be continued …