Projects

Surrey Morphology Group's research focuses on the consequences linguistic diversity has for developing theories of language and the role this plays in understanding the human mind. 

NILOMORPH: The evolution of suprasegmental morphology in West Nilotic

The NILOMORPH project sets out to trace the origin of one of the world's most amazing morphological systems, through the synergy of three teams: MORPHOLOGY (Surrey), DESCRIPTION (Edinburgh) and RECONSTRUCTION (CNRS). Joining fieldwork, experimental work, computational simulations and the comparative method, we will explain how a rich system of morphological affixes were absorbed into the word stem to create syllables with the most densely packed clustering of grammatical information that language has ever seen.

REVOLUPHON - Rational Evolutionary Phonology

REVOLUPHON aims to transform the theory of phonology by explaining the field's most significant observations through a new synthesis with resource rationality; and cultural evolution. Resource rationality is a theory of the motivations and advantages of cognitive computational strategies. We hypothesise that resource rationality can illuminate the cognitive motivations for classic, descriptively-successful phonological representations. Phonologies undergo cultural evolution, and evolution leads to optimisation. We hypothesise that phonological evolution will explain why phonologies appear (mostly) to optimise, how they do, and when they do not.

Analogical growth of grammatical systems

As languages change over time, the subsystems within change as well. This project focuses on co-headed verbs (e.g. auxiliaries) and the systematic change they go through in languages that have a strikingly large and complex co-headed verb system. We use synchronic and diachronic data from languages around the world to build a novel typology of co-headed verb systems and to describe their evolution. The key objective is to verify to what extent analogy explains the extensive growth of grammatical systems.

What Is Understood ? Simulating Human-Scale Word Comprehension Using AI

Speakers constantly produce and understand sentences and words that they have never heard before. A central question for linguistic theory is how this unique ability, and the knowledge supporting it, are organized in their brain. In the past decades, linguists have been developing theories of how speakers participate in this cognitively demanding behaviour, yet in comparison the equally impressive role of listeners, who must process and comprehend this creativity in real time, has received little attention. In this Leverhulme Early Career Fellowship, I aim to elaborate a theory of how our comprehension of words might be represented in the mind by modelling this task using neural networks.

Predicting language evolution: Analogy in morphological change

Analogy is where speakers notice patterns in their language and extend them to new environments. Understanding which patterns get replicated like this is the key to revealing the relative likelihood of different pathways of language evolution, casting light not only on language change, but also human prehistory and the inner workings of the human mind. This multidisciplinary project uses innovative computational and statistical techniques to produce a predictive model of analogy, a missing puzzle piece in a complete theory of language change.

Nuer Literacy Initiative

The Nuer Literacy Initiative targets inequalities in access to mother-tongue education in the Nuer language, spoken in South Sudan and Ethiopia. The major outcome will be 58 new digital books, and 15,000 physical books to benefit Nuer speakers, heritage learners and second-language learners in East Africa and the global diaspora. These will be produced by Nuer authors, translators and illustrators in conjunction with the South African Institute for Distance Education. The books will have a transformative effect in providing teachers and parents with the materials they need to engender a love of reading Nuer in children.

Declining Case: Inflectional Loss in Progress

Many languages of Europe have changed dramatically over the centuries. One of the major events was the loss of rich case systems found in languages like Latin or Old English, leaving us just with remnants such as 'she' versus 'her'. This evolution is preserved only spottily in texts and is poorly understood. But a similar process is still underway in South Slavonic varieties spoken between East Serbia and West Bulgaria. Here the dialects reflect the different historical stages, allowing us to investigate language change directly, and to develop statistically-based models and test existing theories of grammatical evolution.

Solving the word puzzle: morphological analysis beyond stem and affixes

In the few milliseconds necessary for speakers to say a word and for listeners to understand it, they both make several elaborate deductions. The internal structure of words can be a crucial source of information for these deductions, particularly when words have multiple grammatical forms, a process known as inflection. Across languages, the nature and number of contrasts expressed through inflection can vary greatly. While a language such as English has only a handful of grammatical distinctions, some languages can have up to thousands.

Accelerating our discovery of the linguistic past

Even closely related languages can exhibit a stunning diversity of morphological complexity, which raises the question: how does this complexity evolve over time, and can we design computational models that would allow us to turn the clock backwards? This project elaborates an evolutionary theory of inflectional systems, such as verb conjugations and noun declensions. It emphasises interdisciplinary parallels between linguistics and fields such as cultural evolution and seeks to demystify the potential of mathematical modelling for inquiry into the linguistic past.

Morpho-syntax of mutual intelligibility in the Turkic languages of Central Asia

Mutual intelligibility (MI) between languages is observed when a speaker of one language can understand a speaker of another (related) language without any special preparation. Currently, there is no consensus amongst linguists on how MI should be tested and measured, and which linguistic factors are the primary determinants of the degree of MI between languages. This project scrutinises the relation of MI and morpho-syntactic features, using experimental methods to investigate asymmetries in MI in three Turkic languages: Kazakh, Karakalpak and Uzbek.

Optimal categorisation

The very existence of gender is a source of bafflement: why in Russian is 'elbow' masculine, while 'knee' is neuter and 'bone' is feminine? Why do some Dutch speakers distinguish three genders, and others only two? It challenges language learners and excites linguists and psychologists no less. The origin of grammatical gender is a major question in linguistics, and the related issue of how entities are categorised by speakers of different languages is a key question in psychology. How do such systems arise, and what is their impact on the speakers to whom they are native? And most pertinently, why do such different methods of categorisation exist?

External agreement

Whilst seemingly rare typologically, 'external agreement' appears with fascinating regularity in languages of the Nakh- Daghestanian family spoken in the Caucasus where there are 17 languages with diachronically unrelated instances of external agreement. Such an abundance of examples appearing in languages with considerable variation in their syntactic systems makes external agreement in Nakh-Daghestanian an ideal opportunity for research into morphosyntactic, semantic and pragmatic mechanisms which regulate not only agreement, a fundamental part of a grammar of many languages, but also the less obvious relationships between syntactic elements in a sentence.

Seri verbs: multiple complexities

Many languages show seemingly arbitrary elaboration of their inflection. For example, most English nouns can take a plural ending, but it is not the same for every noun: compare bird-s, ox-en, and phenomen-a. Exactly which one to use is an additional fact that must be learnt and remembered. Within a general theory of language, such morphological complexity is rather the elephant in the room: it is far from clear what it’s doing there, and why it is taking up so much space. One of the most extreme examples of inflectional variation – the most extreme, we would argue – comes from Seri, a language isolate spoken by approximately 900 people on the Sonoran coast in Mexico.

Lexical splits

Differences in meaning are often expressed through transparent changes in the forms of a word (e.g. the verbs ‘talk’, ‘jump’ and ‘shout’ all take the suffix ’-ed’ to indicate past tense: ‘talked‘, ‘jumped‘ and ‘shouted‘). But sometimes differences in meaning are expressed by unexpected changes, which we refer to as “splits”; for example, the verb ‘go’ exhibits a “split” between ‘go’ in the present tense and ‘went’ in the past tense. Investigating diverse splits across the world’s languages will reveal the surprising extent to which individual ‘forms’ may differ from each other while belonging to the same ‘word’.

Loss of inflection

Over the last 1200 years English has lost nearly all of its complex inflectional system, radically transforming its character, and similar developments have occurred in the histories of language all across the world. At first glance this looks simply like decay, and this is often how it figures in the public imagination. But the loss of inflection is a complex and multidimensional process. The processes involved in the loss of inflectional loss are a potential source of insight into the workings of grammar, seen from a unique perspective.

Prominent possessors

Syntacticians generally assume that the properties of the head of a phrase, are more important for phrase-external syntactic processes than the properties of the non-head subconstituent. Yet possessive constructions pose a challenge for an adequate theoretical account of possible linguistic systems since several languages exist in which the properties of a possessor, standardly assumed to act as a non-head daughter within a possessive phrase, figure more prominently in syntax than expected by triggering grammatical agreement on the clausal predicate or by participating as a controller in the switch reference system.

Morphological complexity in Nuer

One of the world's most extreme examples of a morphologically complex language comes from Nuer, a member of the West Nilotic branch of the Nilo-Saharan language family, mainly spoken in the Republic of South Sudan. This complexity is not due so much to a large number of forms, but rather to their unpredictability and internal structure. Alongside the structural complexities of the system, the prosodic system poses particular descriptive challenges of its own, in which the overlapping effects of tone, phonation type and the typologically unusual three-way vowel length distinction must be untangled through careful acoustic measurements

A typology of distributed exponence

A typology of distributed exponence

Inflection is often fairly transparent, involving an incremental mapping between meanings and form. One alternative is distributed exponence, in which the marking of grammatical meaning is distributed across a number of smaller pieces of the word, each of which contribute a subcomponent of that meaning.

Combining gender and classifiers in natural language

Combining gender and classifiers in natural language

Typically, a language will have only gender or classifiers, but we sometimes find both systems together. How fundamentally different systems of categorization interact in a language can uncover important principles underlying the interaction between semantics, morphology, and cognitive categories in general.

Verb classification in Gújjolaay Eegimaa (Atlantic, Niger-Congo)

Verb classification in Gújjolaay Eegimaa (Atlantic, Niger-Congo)

Mental categorisation is reflected in some languages by grouping names of entities into nominal classes, and more rarely events and states into verbal classes. Experimental data from Eegimaa speakers collected in Senegal provides insight into a complex cross-categorial categorisation system.

Optional ergative case marking: What can be expressed by its absence?

Optional ergative case marking: What can be expressed by its absence?

The factors underlying differential subject marking systems are conditional and probabilistic. In the Tibeto-Burman languages of Manang District, Nepal, ergative case marking is largely determined by information-structural properties of a clause rather than purely structural syntactic constraints.

Endangered complexity: Inflectional classes in Oto-Manguean languages

Endangered complexity: Inflectional classes in Oto-Manguean languages

Inflectional classes appear to be functionally useless, but can be highly structured and remarkably resilient over time. The Oto-Manguean languages of Mexico provide important evidence of the degree of the limits of inflectional idiosyncrasy that a human language can tolerate.

From competing theories to fieldwork: The challenge of an extreme agreement system

From competing theories to fieldwork: The challenge of an extreme agreement system

The Archi language of Daghestan presents an unusually pervasive agreement system that poses challenges for the central tenets of different syntactic theories. This extreme morphosyntactic system provides a rich testing ground for comparing and evaluating the claims and predictions of HPSG, LFG and Minimalism.

Morphological complexity: Typology as a tool for delineating cognitive organization

Morphological complexity: Typology as a tool for delineating cognitive organization

Morphological systems introduce an extra layer of structure in between meaning and its expression. Such apparently arbitrary distinctions may exhibit an astonishing degree of complexity: a key resource for understanding mental processes that are unconscious, yet reflect a highly structured autonomous system.

SENĆOŦEN on the web: Access for linguists and community

SENĆOŦEN on the web: Access for linguists and community

SENĆOŦEN is the language of the Saanich First Nations community from the Saanich Peninsula of Vancouver Island and neighbouring Gulf and San Juan Islands on the west coast of Canada. Along with five closely related Northern Straits dialects, it is one of 32 indigenous languages of British Columbia.

Alor-Pantar languages: Origins and theoretical impact

Alor-Pantar languages: Origins and theoretical impact

The Alor-Pantar languages are a group of about 20 endangered non-Austronesian languages spoken on the islands Alor and Pantar in the eastern Indonesian province of Nusa Tenggara Timur. Two typologically interesting phenomena in these languages shed light on the semantic underpinnings of grammatical features.

The corpus of Russian regional dialects: Acoustic database with discourse annotation

The corpus of Russian regional dialects: Acoustic database with discourse annotation

Russian is a language of tremendous geographic breadth and of remarkable linguistic diversity. Audio data recorded in a wide range of locations, from Siberia and Far East to Southern Russia provides the basis for examining the relationship between phonology, morphology, syntax, the lexicon, discourse and socio-linguistic factors.

Brighter, cleverer, but more intelligent: Understanding periphrasis

Brighter, cleverer, but more intelligent: Understanding periphrasis

Periphrasis is a widespread and significant phenomenon, and a valuable indicator of how a language functions. It reveals how the construction of meaning in language is apportioned between morphology ('bright' and 'brighter') and syntax ('intelligent' and 'more intelligent').

A typology of defectiveness

A typology of defectiveness

The fact that inflectional paradigms may have such anomalous gaps in them has been known since at least the days of the classical grammarians. The term 'defectiveness' refers to gaps in inflectional paradigms — specifically, those which do not appear to follow from natural restrictions imposed by meaning or function.

More projects

  • Turning owners into actors: Possessive morphology as subject-indexing in languages of the Bougainville region

    A fundamental communicative task for all languages is to show which participant in a sentence is the subject. Languages have various ways of achieving this, including word-order, agreement, and case-marking. In some North-West Solomonic languages, subject is indicated using word-forms normally indicating possessors of nouns.

  • Short term morphosyntactic change

    Languages change by gaining and losing word forms over time, but an equally significant role in their history is played by subtle shifts in the function of existing forms. While the system of forms in Russian has changed relatively little over a long period, the use of these forms has undergone a remarkable degree of change over the last 200 years.

  • A dictionary of Archi (Daghestanian) with sound files and cultural materials

    The Archi language is characterised by a remarkable morphological system, with extremely large paradigms, and irregularities on all levels. The online dictionary of Archi contains sound files for every word form of the lexeme, digital pictures of culturally significant objects, idioms and example sentences with interlinear glossing.

  • Grammatical features: A key to understanding language

    In attempting to understand language, a central notion is features. Examples of features (and their possible values) include Person (1st, 2nd, 3rd), Number (singular, plural, dual...) and Tense (present, past...). Features have proven invaluable for analysis and description, and have a major role in contemporary linguistics, across the discipline.

  • Essential documentation of seven highly endangered Oceanic languages

    Northwest Solomonic is a linkage of languages spoken on Bougainville, Papua New Guinea and on the islands of Santa Isabel and Choiseul and in the New Georgia group of islands, all belonging to the Solomon Islands. It comprises several highly endangered languages in need of language documentation, description and analysis.

  • Extended deponency: The right morphology in the wrong place

    Deponency arises when there is a mismatch between the apparent morphosyntactic value of a morphological form and its actual value in a given syntactic context. In the context of typology and morphological theory, an informed account of deponency must reveal which features may be affected, and what the characteristics of the resulting paradigm can be.

  • 'Possible words': The outer bounds of inflectional morphology

    Speakers 'know' what a word is, yet linguists have said little about possible words. Words often have different forms, and these are normally related in predictable ways. However, there are also cases where the relations involve more challenging properties such as suppletion, syncretism, defectiveness and deponency and displaced grammatical information.

  • Paradigms in use

    A paradigm is the complete set of related word forms associated with a given lexeme. Sometimes, the word forms in a paradigm are syncretic and result in grammatical ambiguity, where one form can have multiple functions. Investigation into the relationship between frequency of use and syncretism can shed light on the factors that constrain paradigms.

  • The notion 'possible word' and its limits: A typology of suppletion

    While linguists have investigated the notion ‘possible sentence’, less has been done to establish the notion 'possible word'. Suppletion, where different inflectional forms of a word are not related phonologically, is common, involves extremely frequent words and provides a ripe testing ground for examining the bounds of possibility for the word.

  • Agreement: An investigation into the distribution of information

    Agreement is the 'displaced' expression of grammatical information. Along with government, it is one a pair of morphosyntactic phenomenon that involves the morphological expression of a syntactic relation through the displacement of inflectional information associated with an agreement controller on an available target.

  • Where word forms collide: A typology of syncretism

    Syncretism is a surprising yet widespread and poorly understood phenomenon in natural language. A form is said to be an instance of syncretism if it fulfils two or more different functions within a paradigm and is found even in English, whose inflectional morphology is simple in comparison with many languages.

  • Predicting the past: Reconstructing the Slavonic colour lexicon

    The notion of default inheritance can be used to relate different languages, as well as different stages of a single language’s development. Using a computational tool for modelling lexical knowledge, changes in the meaning of colour terms in Slavonic languages can be plotted through time to demonstrate its viability for historical linguistics.

  • Number use in language: A quantitative and typological investigation

    The relationship between the general availability of a grammatical category across languages (such as number), and the way it is used by speakers of a single language (such as Russian), can be investigated to reveal the extent to which a hierarchy modelling cross-linguistic tendencies accurately reflects the way a grammatical feature is used.

  • Feature-based approaches to exceptional cases in Russian

    Innovations can spread and eventually pervade a language, they can fail to take hold, or they can remain, without ever affecting a large number of lexical items. These exceptional cases are interesting because they provide us with insights into why a linguistic system does not favour such innovations.

  • Challenges for inflectional description

    Different languages such as Slave (an Athabaskan language of Canada), Pirahã, (an Mura language from Brazil) and Breton (a Celtic language of France) present different types of challenges for the description of an inflectional system. Diverse data provide the best opportunity to examine what types of analyses of morphology are required.

  • The theory of Network Morphology

    An adequate morphological theory must be able to account for morphology at opposite ends of the spectrum of possibilities; fusional morphology (where disparate information is packed into small segments) is common in Slavonic, while polysynthetic languages, such as the Eskimo language Yup'ik, can build up long, complex but segmentable word-forms.

  • Russian verbal morphology: Alternative perspectives and implementations and their theoretical justification

    Russian verbal morphology: Alternative perspectives and implementations and their theoretical justification

  • Frontiers of research in morphology

    After a long period of relative neglect, morphology emerged in the 1980s as a key area of interest characterized by a good interaction between different schools and approaches. The frontiers of research at different British institutions are in part complementary and in part overlapping, facilitating a productive forum for collaboration.

  • A computer implementation of Russian derivational morphology represented in DATR

    Not all morphological processes are equally productive. The English suffix -ness productively derives nouns from adjectives, as in good > goodness, whereas the suffix -th is limited to warmth and a few others. In a computer implementable theory of morphology this difference can be captured using the notion of defaults.

  • A DATR theory of Russian morphology

    The treatment of declension classes as nodes in an inheritance hierarchy contrasts strongly with the traditional notion of paradigms as discrete entities which do not share information. Using default inheritance hierarchies in DATR to model word structure we see evidence for a great deal of information sharing between classes.

  • Canonical Phonology

close