Combining gender and classifiers in natural language

Project Overview


Combining gender and classifiers in natural language

Project members:

Prof Greville G. Corbett
Dr Matthew Baerman
Dr Dunstan Brown (University of York)
Dr Sebastian Fedden
Dr Timothy Feist


Prof Maria Polinsky (University of Maryland)
Prof Gunter Senft (Max Planck Institute for Psycholinguistics, Nijmegen)

Period of award

April 2013 - May 2016


Arts and Humanities Research Council (AHRC)

In many languages, nouns are systematically categorized into groups. In a gender system, as in Italian, this is based on sex: all nouns are treated as either masculine or feminine — even those nouns whose meaning has nothing to do with biological sex. Quite a different approach is taken by languages with a classifier system. Here categorization is based on fine-grained meaning, involving shape, function, arrangement, place or time interval. Such a language is Kilivila (an Oceanic language spoken on the Trobriand Islands in Papua New Guinea), which has at least 177 distinct categories. For the most part a language will have only one system or the other, gender or classifiers, but in a few interesting cases we find both systems together. A key language for this project is Mian, a Papuan language spoken by 1,400 people in Papua New Guinea. Mian has both a gender system and a system of verbal classifiers in the form of prefixes on verbs of object handling (e.g. give, take, put, lift, throw) or object movement (fall). How such fundamentally different systems of categorization interact within a single language is a question which has not yet been seriously considered, but it potentially uncovers a great deal about the interaction of semantics, morphology, and cognitive categories in general.

Preliminary investigations suggest that dual categorization systems are indeed more than the sum of their parts. Canonical gender systems and canonical classifier systems occur in very different types of languages: gender is characteristic of inflecting languages, while classifiers are characteristic of languages that lack complex inflectional morphology. Naturally, where the two co-occur in a single language, there is a conflict between these opposing tendencies, leading to a deviation from expectations. Such languages are limited in their distribution, largely in Oceania and the Americas. For our typology of languages with dual categorization systems we concentrate on the following language groupings: Papuan, (specifically Mian and Tidore), Witotoan (Miraña), Mayan (Akatek), Arawakan (Tariana), Tucanoan (Retuarã), and Australian (Ngan'gityemerri), complemented by a broader sample based on the World Atlas of Language Structures.

A further fascinating question we investigate is which principles determine how nouns are grouped into their respective categories in languages which combine gender and classifiers. Little is known about how they interact in languages where each noun has dual category affiliation. For example, in Mian nouns are categorized simultaneously by gender (distinguishing masculine, feminine, neuter type 1 and neuter type 2) and classifiers (distinguishing such categories as bundles, long things and covering things). These two systems partly overlap, but we still lack a precise way of characterizing the relative contribution of each system and the way in which they interact. This is where the rigorous computational model that we propose can bring clarity to this complex network of factors. Using a computational model further allows us to evaluate the relative degree of success of different candidate analyses based on particular rule systems.

An important function of any nominal categorization system is in reference tracking, namely the various ways speakers maintain a link to the elements introduced into discourse. A typical means is the use of anaphoric pronouns, whose form will match the category (gender or classifier type) of the noun it is referring back to. In a language with dual categorization, both types potentially play a role simultaneously in reference tracking. Since so little is known about how this works, we study this issue in the context of actual discourse, through corpus studies, using four languages from four different language families. These are: a language with only gender (Italian, Indo-European), a language with only classifiers (Kilivila, Austronesian), a language with both gender and classifiers (Mian, Papuan), and a language where there is no clearly identifiable set for either of these (Yup'ik, Eskimo-Aleut). This yields an exhaustive typology with respect to the presence or absence of gender and classifiers.

The project uses Canonical Typology to gauge how canonical the individual categorization systems in the languages under investigation are, and Network Morphology to model category assignment. Both of these theoretical approaches were developed by the Surrey Morphology Group.