Number use in language: A quantitative and typological investigation
Project Overview
Project
Number use in language: A quantitative and typological investigation
Project members:
Prof Greville G. Corbett
Dr Dunstan Brown
Dr Andrew Hippisley
Dr Paul Marriot
Period of award
September 1997 - August 1998
Funder:
Economic and Social Research Council (ESRC)
This project investigated the relationship between the availability of a grammatical category across languages and the way it is used by speakers of a single language where the grammatical category is generally available. The category examined was number (singular and plural) and the language where its use was analysed was Russian (and Slovene to a lesser extent).
The research investigated the applicability of Smith-Stark's hierarchy of number availability, an extended version of which is given in (1). Nouns with number marking (formally distinguishing singular and plural) typically occupy some top portion. Different languages make the 'split' at different points on the hierarchy (e.g. only Speaker, Addressee, and Kin terms may mark number).
(1) Speaker > Addressee > Kin > Non-human rational > Human rational > Human non-rational > Animate > Concrete inanimate > Abstract inanimate
The chief aim of the research was to investigate to what extent Smith-Stark's hierarchy of number availability impacted on the way number was used. The general methodology was to analyse the way in which the nominals of a one million word Russian corpus distributed their singular and plural forms, and compare that with the nominals' position on the Smith-Stark hierarchy.
The research was interdisciplinary, combining work in linguistics with work in statistics. Our analysis of the large Russian corpus has demonstrated that there is a clear relationship between the points on the Smith-Stark hierarchy and a nominal's use of singular and plural. The exact nature of this relationship is revealed by the proportion of plural occurrences found in the corpus for nominals belonging to different categories. The proportions are shown in Table 1.
Animacy category | Singular forms | Plural forms | Median plural proportion (plural/freq.) | p values |
---|---|---|---|---|
Speaker | 6197 | 3413 | 35.5% | 0.53 |
Addressee | 2600 | 205 | 8.7% | 0.52 |
Kin | 3733 | 422 | 5% | 0.05 |
Non-human rational | 248 | 19 | 5.5% | 0.52 |
Human rational | 9427 | 7737 | 45.5% | < 0.001 |
Human non-rational | 851 | 1181 | 61.8% | < 0.001 |
Animate | 1588 | 1227 | 50% | < 0.001 |
Concrete inanimate | 59830 | 26285 | 23% | < 0.001 |
Abstract inanimate | 89875 | 28068 | 1.5% | < 0.001 |
Table 1: Plural proportions for the animacy categories
The results were positive, and strongly indicative of a relationship between availability and use. The p value in Table 1 represents the probability that the observed median was due to chance variation. There is very strong evidence that there is structure in most of the categories. (A value less than 0.5 is strong evidence that the median is significantly different from the corpus.) From Table 1 we see that the evidence is less strong for Speaker, Addressee, and Non-human rational; however a separate test for evidence for structure across all categories gave a value of less than 0.001. For each category the median point is significantly different, indicating structure in the data. Comparing the results in Table 1 with the hierarchy in (1) we see that the proportion of plurals decreases from Speaker to Addressee, then steadily increases from Kin through to Human non-rational, where it peaks; the proportions then steadily decrease through to Abstract inanimate, where the median proportion is under two plural occurrences for every one hundred singular occurrences.
The results of our analysis are statistically significant and represent a typology of number use. Outputs include a statistical model of this number use typology.