Databases

The Surrey Morphology Group have produced a large number of digital language resources and freely accessible typological databases that cover a wide range of linguistic phenomena across a large spectrum of languages

SMG Zenodo community

SMG Zenodo community

Find archived data from our databases on our Zenodo community. Zenodo is a multi-disciplinary open repository maintained by CERN. Datasets, documents and other research materials can be located via the Zenodo search engine. It is compliant with the data management requirements of Horizon Europe, the ERC, and the Plan S requirements for Open Access Repositories.

Paralex Standard

Paralex is a standard for morphological lexicons which document inflectional paradigms. It strives to provide data which is FAIR, so it can be used automatically, CARE, so it respects and empowers language communities, and DeAR (our own set of principles), so we can create a virtuous data ecosystem. A Paralex lexicon is a set of tables which follow a set of conventions and is accompanied by frictionless metadata.

Chichimec Paradigm Visualisations

The inflectional morphology of Chichimec, an Oto-Manguean language of Mexico, involves complex stem alternations patterns which may co-occur with tonal contrasts. These interactive visualisations allow users to rearrange the paradigm and view the effects of each process in isolation, illustrating how technology can aid us in viewing – and understanding – complex linguistic data.

Nuer Lexicon

Nuer is a Nilo-Saharan language spoken in South Sudan and Ethiopia. Nuer Lexicon is the first interactive online dictionary of this major language of East Africa. Users can search in Nuer or English, listen to examples from different dialects and explore noun and verb paradigms in Nuer orthography and IPA using tools that reveal different morphological patterns.

Database of Prominent Internal Possessors

A Prominent Internal Possessor is a possessor that behaves, fully or partially, as if it were a clause-level element and the head of its own phrase, even though there is no independent evidence that it is external to the nominal phrase to which the possessed item belongs. Such possessors can be said to exhibit a higher level of syntactic (and possibly functional) prominence than their regular counterparts.

Lexical Splits Database

A lexical split is an inconsistency in the paradigm of an inflected word. The splits in this cross-linguistic database cover a broad range of phenomena. In many cases, lexemes exhibit multiple splits which cross-cut each other; these ‘component splits’ are clearly illustrated in the database to allow users to see how the different factors contribute to the overall surface complexity of words.

Endangered Languages and Cultures of Siberia

This multimedia collection of linguistic and cultural information consists of a corpus of original texts in several minority languages in audio or video format with transcriptions, translations and analyses, as well as general information about the language, references, dictionaries, and images, to document a wide range of phenomena from Siberian languages and cultures.

Skolt Saami Paradigm Visualisations

The inflectional morphology of Skolt Saami, a Finno-Ugric language of Finland, involves three morphophonological processes which cross-cut each other, giving rise to multiple stem allomorphs. These interactive visualisations allow users to rearrange the paradigm and view the effects of each process in isolation, illustrating how technology can aid us in viewing – and understanding – complex linguistic data.

The Mian & Kilivila Collection

This online exhibition of pictures and cultural artefacts for the Austronesian language Kilivila and the Papuan language Mian explores the threatened cultures and categorisation systems of two endangered languages of Melanesia. While Kilivila has a an extensive system of classifiers with a great number of distinctions, Mian has a dual system, which combines four genders and six classifiers.

Oto-Manguean Inflectional Class Database

The inflectional morphology of Oto-Manguean languages can be realised by a rich array of morphological forms within a single word, resulting in some of the world's most complex morphological systems. The database contains over 13,000 verbal entries from twenty Oto-Manguean languages, along with information pertaining to each verb's inflectional class membership.

Dictionary of Archi

Archi is a Lezgic language spoken by about 1200 people in the highlands of Daghestan. The online version of the Archi-Russian-English Dictionary contains sounds files, digital pictures of culturally significant objects, idioms and example sentences with interlinear glossing. It can be searched in English, Russian and Archi (using Cyrillic or IPA).

Surrey Morphological Complexity Database

Morphological complexity is the morphologically-conditioned deviation between inflectional forms and the inflectional features they realize is manifested both within the paradigm (e.g. as syncretism or patterns of stem alternation) and across sets of lexemes (as inflection classes and lexically-conditioned allomorphy).

Saanich Verb Database

SENĆOŦEN is a Salish language spoken by the Saanich First Nations community of Vancouver Island, Canada. The Saanich Verb Database is a searchable resource of SENĆOŦEN data provided by four Saanich elders between 2005 and 2012.

Grammatical Features Inventory

Features are fundamental components of linguistic description that have proven invaluable for grammatical analysis and and have a major role in contemporary linguistics. The Grammatical Features Inventory provides evidence for the diverse content of features in the world's languages and discussion of some of their formal properties

Surrey Periphrasis Database

Periphrasis reveals how the construction of meaning in language is apportioned between morphology ('bright' and 'brighter') and syntax ('intelligent' and 'more intelligent'). The Surrey Periphrasis database systematically catalogues data from a sample of 19 languages in a fully structured way to help explore the role of periphrasis in inflectional paradigms.

Surrey Deponency Databases

Deponency describes mismatches between morphology and morphosyntax. A mismatch occurs where the word form is used in some function incompatible with its normal function. The Typological Database on Deponency records the logical space of deponency: What features may be affected, and what are the characteristics of the resulting paradigm? The Cross-linguistic Database on Deponency looks at the presence of morphological mismatches in a controlled sample of genetically and geographically diverse languages.

Surrey Defectiveness Databases

The term 'defectiveness' refers to gaps in inflectional paradigms which do not follow from natural restrictions imposed by meaning or function. The Typological Database on Defectiveness illustrates different types of defective paradigm according to various morphological and morphosyntactic parameters. The Cross-linguistic Database on Defectiveness looks at inflectional defectiveness in a controlled sample of languages.

Surrey Suppletion Database

Suppletion is a morphological phenomenon where different inflectional forms of the same sign are maximally regular in their semantics, yet maximally irregular in form. For a sample of 34 languages, the Surrey Suppletion Database encode phonologically distinct stems that belong to the same paradigm and defines the categories along which the suppletion happens.

Surrey Short Term Morphosyntactic Change Databases

The notion of 'short term morphosyntactic change' can be used to characterise changes in the use of forms in a short period of time even when the forms themselves have changed relatively little. The Short Term Morphosyntactic Change (STMC) Databases explore change in six different morphosyntactic phenomena in Russian over a 200 year period from 1801-2000.

Surrey Database of Agreement

Agreement is the expression of grammatical information in the ‘wrong place’: a relation that can be described in terms of controllers, targets, domains, categories and conditions. The Surrey Database of Agreement encodes information on agreement in fifteen genetically diverse languages and contains reports for the sample languages, providing pointers to examples illustrating different instances of the phenomenon.

Surrey Turning Owners into Actors Database

Possessive morphology marking owners or custodians may be used as a source of subject-indexing marking actors or agents in the languages surrounding the Bougainville region of Papua New Guinea. The Turning Owners into Actors Database encodes data from nine different language phenomena in eight different languages: Bannoni, Halia, Kokota, Nehan, Sisiqa, Solos, Torau and Vangunu.

Surrey Person Syncretism Database

Person syncretism occurs when two or more person values are represented by a single form in the inflectional paradigm for agreement with an argument on verbs. The Surrey Person Syncretism Database records properties which might be conditioning factors for syncretism (such as TAM, inflectional class, gender of the subject and syntactic context) in a sample of 111 languages.

Surrey Syncretisms Database

The term 'syncretism' refers to the phenomenon whereby a single form fulfils two or more different functions within the inflectional morphology of a language. The Surrey Syncretism Database encodes information on inflectional syncretism in 30 genetically and geographically diverse languages, across morphosyntactic features such as case, person, number and gender.

Annotated Bibliographies

Annotated Bibliographies

As part of the research conducted within Surrey Morphology Group a substantial number of annotated and working bibliographies have been produced covering different methodological approaches, language families or morphology properties.

Paradigms in Use

Eight excel datasets reporting frequencies of Russian nominals in two corpora. These were created for the ESRC-funded project 'Paradigms in use' (RES-000-23-0082).

close