Solving the word puzzle: morphological analysis beyond stem and affixes

Project Overview


Project members:

Dr Sacha Beniamine

Prof Greville Corbett

Period of award

1 February 2021 - 31 January 2023


British Academy

In the few milliseconds necessary for speakers to say a word and for listeners to understand it, they both make several elaborate deductions. The internal structure of words can be a crucial source of information for these deductions, particularly when words have multiple grammatical forms, a process known as inflection. Across languages, the nature and number of contrasts expressed through inflection can vary greatly. While a language such as English has only a handful of grammatical distinctions, some languages can have up to thousands. Moreover, these distinctions can be manifested by diverse intricate sound contrasts. For example, the verbal system of English would be simple if all verbs conformed to the pattern of jump~jumped, which can be neatly segmented into a stem (jump) and affixes (-ed). But across languages, many words behave more like the pair think~thought which resist segmentation. In many languages, layers of regularity and idiosyncrasy further complicate the matter. Understanding the puzzling complexity of inflection is essential to explain the structure and evolution of the world's languages. Yet, linguistics still lacks a consistent, predictable methodology to study inflection.

To assess inflectional complexity across languages, this project investigates word structures across typologically diverse languages, using quantitative, computational tools.

Current studies in this area have two main – but related – shortcomings. First, they often start from pre-analysed paradigms, where forms have been segmented by hand into stems (removed from the data) and affixes. These affixal tables are not commensurate across languages. Second, studies focus on assessing how difficult it is for speakers to predict forms for a given meaning, and ignore the parallel problem of deducing the grammatical meaning of a given form. This question is key to automating word structure analysis.

The project remedies both by providing data, developing computational tools to analyse inflected words, and studying the organisation of inflectional exponence. We work on gathering, digitising, and standardising inflectional lexicons, coordinating with the international morphology community to spread the use of common standards and ensure interoperability. To solve the long standing Segmentation Problem, we write computational tools which focus on characterizing gradient information in words. Finally, our goal is to build a quantitative typology of inflected word structure.