Predicting language evolution: Analogy in morphological change

Project Overview


Predicting language evolution: Analogy in morphological change

Project members:

Dr Helen Sims-Williams

Prof Matthew Baerman

Dr Oliver Bond

Period of award

October 2022 - September 2026


The Leverhulme Trust

The staggering diversity of the world’s languages is the cumulative effect of small-scale evolutionary processes. One such process is analogy. Analogical change happens when speakers notice and extend linguistic patterns, as when digged was replaced by dug in English, extending the pattern of verbs like stick (present) vs stuck (past).

Such changes may look like mere idiosyncrasies or errors on an individual level, but over time, they build up and radically change the structure of our languages. Understanding which patterns get replicated like this is the key to revealing the relative likelihood of different pathways of language evolution, casting light not only on language change, but also human prehistory and the inner workings of the human mind.

This project systematically investigates the statistical tendencies of a large body of analogical changes. This is crucial to understanding both how our languages came to be the way they are – i.e. the pathways of change underpinning individual language systems – and why, since analogical change reveals the assumptions and biases of human beings learning from incomplete and sometimes self-contradictory language data.

The objective is to produce a computationally-implemented model of analogical change capable of defining a set of possible changes and ranking their probabilities.

This will be achieved through research activities in three different methodological stages.

In the first stage of the research, we will develop a computational method for defining possible changes in a language.

In the second stage, we will investigate which of these changes actually happened, by comparing the program’s output to real historical data. This will come from case studies into the history of six languages (Italian, Occitan, Greek, Aramaic, Tibetan and Estonian), drawn from four language families and chosen to provide a good balance between rich historical documentation (to maximise our sample size, increasing robustness of our statistical inferences) and typological diversity (to minimise statistical bias). These case studies will be conducted with the help of collaborators with expertise in the history of each language.

In the final stage we will analyse what statistical properties differentiate our sample of attested changes (identified in stage 2) from the hypothetical changes generated in stage 1 that did not occur. We expect to find three types of effect: frequency effects, similarity effects, and scope effects. Our measures of the corresponding properties will feed into statistical models, which will produce a concrete method for judging the relative likelihood that a particular change occurred in linguistic prehistory.