Language and AI: A mathematical equation

The 70 years of AI (see McCarthy et al. (1955)) have seen an intertwining of language and computing. At first, computers, as the name says, were meant for computation, for the fast calculation of a few complex equations or many simple ones. It was later that calculations were done with texts as input. Famously, first successful computations of and with letters were done at the Government Code and Cypher School at Bletchley Park to break the German Enigma cipher as part of the British effort in World War II. After the mathematician Alan Turing and his colleagues deciphered messages by the German Luftwaffe and navy successfully, he proposed that these new machines could also be used for language (Turing (1948) quoted in Hutchins, 1986, pp. 26-27). The Turing test (Turing, 1950) stipulated that a calculating machine, a computer, could show intelligence if a human interlocutor on one side of a screen could not tell whether they had a conversation with another human or a machine on the other side of the screen. ChatGPT passed this test successfully in 2024 (Jones & Bergen, 2024).

Mathematical equations. Generated by ChatGPT 5 as an illustration

With the beginning of the Cold War, machine translation seemed to hold a lot of promise. Researchers’ predictions of success were based – at least in part – on the idea that translating from Russian into English is just like deciphering an encrypted message; letters have to be exchanged for other letters according to certain patterns in a deterministic mathematical process. Of course, this did not do justice to the complexities of language, communication, and translation. So, the then nascent field of natural language processing (NLP) turned to grammatical rules of formal (mathematical) grammar and items, the words in electronic dictionaries. The computer would “understand” a text by parsing it phrase by phrase to build an information structure similar to a syntactic tree, using grammatical rules. Such rules and the list of items with their linguistic features had to be hand-crafted. Therefore, the coverage of most NLP systems was limited. In the 1990s, researchers began to move away from symbolic NLP, which used linguistic symbols and rules and applied set theory, a form of mathematical logic, on to statistical NLP. Statistical NLP meant that language patterns were captured with calculating probabilities. The probability of one word (form) following some others is calculated for each word in a large principled collection of texts, which is called a corpus. In the 1990s and 2000s more and more corpora in more and more languages became available.

This is part of a draft of an article I wrote with Phil Hubbard. In this paper, we are proposing a way in which teachers can organize their own professional development (PD) in the context of the rapid expansion of Generative AI. 
We call this PD sustained integrated PD (GenAI-SIPD). Sustained because it is continuous and respectful of the other responsibilities and commitments teachers have; integrated because the PD activities are an integral part of what teachers do anyway; the teacher retains control of the PD process.

The full article is available as open access:
Hubbard, Philip and Mathias Schulze (2025) AI and the future of language teaching – Motivating sustained integrated professional development (SIPD). International Journal of Computer Assisted Language Learning and Teaching 15.1., 1–17. DOI:10.4018/IJCALLT.378304 https://www.igi-global.com/gateway/article/full-text-html/378304

In the 1990s, progress in capturing such probabilities was made because of the use of machine learning. Corpora could be used for machines to “learn” what the probability of certain word sequences is. This machine learning is based on statistics and mathematical optimization. In NLP, the probability of the next word in a text is calculated, and in training, that result is compared to the word that actually occurred in the text next. In case of an error, the equation used gets tweaked and the calculation process starts anew. The sequences of words are called n-grams.

The resulting n-gram models were replaced in the mid-2010s with artificial neural networks, resulting in the first generative pre-trained transformer (GPT) – GPT-1 – in 2018. This marks the beginning of GenAI as we know it today. GPTs are large language models (LLMs) from OpenAI. Today, an LLM is pre-trained using deep learning, which is a more complex subset of machine learning. The pre-training means that when processing the text prompt, each artificial neuron in the network of the LLM receives input from multiple neurons in the previous layer and carries out calculations and passes the result to neurons in the next layer. GPT-4, for example, processes text in 120 layers. The first layer converts the input words, or tokens, into vectors with 12,288 dimensions. The number in each of the 12,288 dimensions encodes syntactic, semantic, or contextual information. Through these calculations, the model provides a finer and finer linguistic analysis at each subsequent layer.

The enormous number of calculations – an estimated 7.5 million calculations for a sentence with five words – results in plausible text output and consumes a lot of electric power. The latter is the main cause for the environmental impact of GenAI. The former is the main factor in the attractiveness of GenAI not only in language education but also in industry and increasingly in society at large.

References

Hutchins, J. (1986). Machine translation: Past, present, and future. Ellis Horwood.

Jones, C. R., & Bergen, B. K. (2024). Does GPT-4 pass the Turing test? arXiv. https://doi.org/10.48550/arXiv.2310.20216

McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (1955). A proposal for the Dartmouth Summer Research Project on Artificial Intelligence. http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html

Turing, A. M. (1948). Intelligent Machinery (Report for the National Physical Laboratory). Reprinted in D. C. Ince (Ed.), Mechanical Intelligence: Collected Works of A. M. Turing (pp. 107–127). Amsterdam: North‐Holland. 

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433

What’s AI got to do with it?

Photo by Pavel Danilyuk on Pexels.com

artificial intelligence – natural stupidity
artificial flowers – natural incense – sensible intel

Hey, Friend,

It’s been a while. A while ago – in January, precisely – I wrote this line. What’s AI got to do with it? Tina Turner was still alive and the writers of Hollywood were writing and not striking to protect themselves also from the likes of ChatGPT. I like ChatGPT, like I like gadgets. And I like to think about these things. So, what’s AI got to do with it? And what is ‘it’? Oh, and what is AI? ‘it’ is easy. For me. ‘it’ is language and learning a language. For language and learning, AI and I crossed paths. So what’s AI got to do with that? With language and learning?

In 1948, the English mathematician Alan Turing wrote an essay envisaging how computers could demonstrate their intelligence and made a list of five. “(i) Various games, for example, chess, noughts and crosses, bridge, poker; (ii) The learning of languages; (iii) Translation of languages; (iv) Cryptography; (v) Mathematics.” I am not sure whether Alan Turing, who is often called the father of Artificial Intelligence, meant with learning of languages that the computer would learn or the computer would be intelligent enough to help all of us learn another language. The latter – using computers in different ways to help learn a language – has interested me for many years. It still does. Fascinating! A number-crunching machine deals with language. That’s intelligence. Artificial intelligence. AI. And AI is so much bigger than just language and learning. Different branches in research. Research on intelligent machines – think robots and self-driving cars – on perception – think face recognition and telling a ball from a person’s head –  on problem solving – think chess playing and building a schedule for an entire university – on machine learning – think … Oh wait, this has to do with language – the large language models – which means it warrants its own post. And so does the one area that fronts language: natural language processing. Natural? To a computer scientist, programming languages must have felt more familiar, so they didn’t feel a need to add an adjective, they used ‘natural’ instead for our languages. But that’s a whole other post. Later.

Back to Alan Turing. He mentions language in three of five: learning of languages, translation of languages, and cryptography. Cryptography. Cracking the code. The code of the Enigma machine. Alan Turing was part of this British effort during World War II. The German wehrmacht, navy, and air force used this complex mechanical and electric cipher device for their secret communication. Alan Turing and the cryptoanalysts, math geeks, crossword puzzlers, and secretaries working at Bletchley Park near London, cracked the code of the Enigma machine. Almost everyone kept quiet about it for 50 years. Didn’t even call a friend. After that war and with the beginning of the Cold War, the Americans wanted machines to translate Russian texts quickly, and the Soviets pointed their machines at English. Machine translation. Artificial intelligence. And some computer scientists believed that translating a language was like cracking a secret code. Breaking the Russian code. But you can’t break a language. They realized that soon. You couldn’t use math to read Pushkin’s poetry, Tolstoi’s novels, or – in those days – the KGB reports and instructions. Now you can. With ChatGPT. It just uses math to write you a text. Translate a text. Fake a text. Many people find ChatGPT too long; they call it AI. That’s not wrong, but it’s not right either. Yes, the intelligent machines, the perception and problem solving are AI. ChatGPT is one example of a small, small area of AI. Generative AI. It generates. It’s the AI that can produce texts, images, audio, and other synthetic data. Synthetic. Like a nylon shirt.

And now I am digressing. So, let’s stop here and come back to reading text, writing text, understanding text. Soon. I hope. I have not been the most regular. With writing. This blog. Maybe I should get tech help. From ChatGPT. But that would be synthetic.