Number use in language: A quantitative and typological investigation

Project Overview


Number use in language: A quantitative and typological investigation

Project members:

Prof Greville G. Corbett
Dr Dunstan Brown
Dr Andrew Hippisley
Dr Paul Marriot

Period of award

September 1997 - August 1998


Economic and Social Research Council (ESRC)

This project investigated the relationship between the availability of a grammatical category across languages and the way it is used by speakers of a single language where the grammatical category is generally available. The category examined was number (singular and plural) and the language where its use was analysed was Russian (and Slovene to a lesser extent).

The research investigated the applicability of Smith-Stark's hierarchy of number availability, an extended version of which is given in (1). Nouns with number marking (formally distinguishing singular and plural) typically occupy some top portion. Different languages make the 'split' at different points on the hierarchy (e.g. only Speaker, Addressee, and Kin terms may mark number).

(1) Speaker > Addressee  > Kin  > Non-human rational  > Human rational  > Human non-rational  > Animate  > Concrete inanimate > Abstract inanimate

The chief aim of the research was to investigate to what extent Smith-Stark's hierarchy of  number availability impacted on the way number was used. The general methodology was to analyse the way in which the nominals of a one million word Russian corpus distributed their singular and plural forms, and compare that with the nominals' position on the Smith-Stark hierarchy.

The research was interdisciplinary, combining work in linguistics with work in statistics. Our analysis of the large Russian corpus has demonstrated that there is a clear relationship between the points on the Smith-Stark hierarchy and a nominal's use of singular and plural.  The exact nature of this relationship is revealed by the proportion of  plural occurrences found in the corpus for nominals belonging to different categories. The proportions are shown in Table 1.

Animacy categorySingular formsPlural formsMedian plural proportion (plural/freq.)p values
Speaker 6197 3413 35.5% 0.53
Addressee 2600 205 8.7% 0.52
Kin 3733 422 5% 0.05
Non-human rational 248 19 5.5% 0.52
Human rational 9427 7737 45.5% < 0.001
Human non-rational 851 1181 61.8% < 0.001
Animate 1588 1227 50% < 0.001
Concrete inanimate 59830 26285 23% < 0.001
Abstract inanimate 89875 28068 1.5% < 0.001

Table 1: Plural proportions for the animacy categories

The results were positive, and strongly indicative of a relationship between availability and use. The p value in Table 1 represents the probability that the observed median was due to chance variation. There is very strong evidence that there is structure in most of the categories. (A value less than 0.5 is strong evidence that the median is significantly different from the corpus.) From Table 1 we see that the evidence is less strong for Speaker, Addressee, and Non-human rational; however a separate test for evidence for structure across all categories gave a value of less than 0.001. For each category the median point is significantly different, indicating structure in the data. Comparing the results in Table 1 with the hierarchy in (1) we see that the proportion of plurals decreases from Speaker to Addressee, then steadily increases from Kin through to Human non-rational, where it peaks; the proportions then steadily decrease through to Abstract inanimate, where the median proportion is under two plural occurrences for every one hundred singular occurrences.

The results of our analysis are statistically significant and represent a typology of number use. Outputs include a statistical model of this number use typology.