A DATR theory of Russian morphology

Project Overview


A DATR theory of Russian morphology

Project members:

Prof Greville G. Corbett
Dr Norman M. Fraser
Dr Dunstan Brown

Period of award

September 1992 - August 1995


Economic and Social Research Council (ESRC) - R000233633

The aim of this project was to model a significant proportion of the inflectional morphology of a relatively complex language, Russian, expressing the analysis formally using the lexical knowledge representation language DATR (Evans and Gazdar 1989a, 1989b, 1995; Keller 1995).

Earlier work on syncretism, inflectional features and the status and number of paradigms in Russian by Corbett (1981; 1982) combined with work on gender (Corbett 1991) were combined with formal modelling of natural language structures using default inheritance, an area in which one of the researchers has considerable experience (Andry et al 1992; Fraser and Hudson 1992).

The project had five stated objectives:

(1) Encoding in machine readable form a substantial fragment of Russian inflectional morphology

We concentrated on particularly difficult issues of the inflectional system: the organisation of Russian paradigms, gender assignment, the problematic case of the Russian genitive plural of nouns whose stems ended in a 'soft' (palatalised or palatoalveolar) consonant, the question of related verbal stems and the issue of whether verbs which were paired aspectually were separate lexemes (syntactic words).

Once we had developed theoretical solutions to the problematic cases, we set about developing lexicons based on frequency information from Zasorina's (1977) frequency dictionary. Accounting for the most frequent items means accounting for the most irregular, as irregular items are nearly always among the most frequent.

This objective was met by creating lexicons of the first 1500 most frequent nouns and 700 most frequent verbs and these, combined with the theoretical fragment, have been checked computationally.

(2) Comparing inheritance hierarchies with the more traditional organisation in terms of morphological paradigms

This was a long term goal of the project. The treatment of declension classes as nodes in an inheritance hierarchy contrasts strongly with the traditional notion of paradigms as discrete entities which do not share information. Using default inheritance hierarchies to model wordstructure we see that there is a great deal of information sharing.

(3) Demonstrating new insights into specific, well-established problem areas of Russian morphology

Difficult and challenging areas of Russian were modelled by the researchers using DATR: animacy (Corbett and Fraser, 1993), gender and animacy assignment (Fraser and Corbett, 1995), conflict in genitive plural assignment (Brown and Hippisley 1994), the nominal stress system (Brown et al, to appear) and the stem structure of the Russian verb (Brown, to appear).

(4) Evaluation of the usability and utility of DATR as a tool for linguists

The insights gained through the application of DATR to real data have been set out in a practical document entitled DATR for Linguists.

(5) The generation of design recommendations for improving the usability and utility of DATR for linguists

A document developed as part of the project entitled Practical DATR contains a number of recommendations for modifications and changes to DATR. This document arose as a result of close observation of conceptual and practical problems encountered during the application of DATR.

Using DATR to represent our analyses of Russian enabled us to take a fresh look at several problem areas of Russian morphology including gender assignment, inflection classes, the genitive plural and verbal stems. It allowed us to start to develop a theoretical framework, Network Morphology, which constrains possible DATR representations. A number of informal principles of this framework guided the computational modelling of Russian morphology using DATR. These include the assumption that morphology is a network of hierarchies, a distinction between a lexemic hierarchy and an inflectional hierarchy and the treatment of inflection classes as nodes in the inflectional hierarchy.

The project paved the way for the development of principles which constrain relations between the hierarchies. The development of such principles in dealing with a relatively complex language such as Russian means that it is reasonable to assume that they might well carry over to the representation of other languages of differing typological diversity.