Pemandangan, Minahasa Regency, Indonesia, by T. Brickell (2013), CC-BY-NC-SA 4.0

Tondano

Brickell, Timothy C.

The Toulour dialect of Tondano (ISO 639-3: tdn) is an Austronesian (Malayo-Polynesian > Philippine > Minahasa > North > Northeast) language spoken in and to the east of the town of Tondano, which is located in the Minahasa regency of North Sulawesi, Indonesia. Speaker numbers are difficult to ascertain, however earlier estimations of 70,000 (Sneddon 1975:1) and 91,000 (Wurm & Hattori 1981) are now almost certainly incorrect. All the Minahasan languages are endangered and have been shifting to the most commonly used language of wider communication, Manado Malay (ISO 639-3: xmm), since the early 20th century (Wolff 2010:299). Anecdotal evidence and the personal experience of the researcher result in an upper range figure of 30,000 fluent speakers as being considered more accurate.

Tondano is not dominant in any domains of use, and is rarely used in everyday communication such as in workplaces, markets, or in the home. The last domain in which Tondano use remained strong was traditional agricultural work. However, with almost all remaining fluent Tondano speakers now aged 50 years and above, this situation is changing as speakers cease working in the fields. In contemporary society the language has little more than a token role in certain cultural settings such as church services, weddings, or occasionally speech contests in which people read from pre-prepared texts.

This data corpus is the result of fieldwork which was was undertaken by Dr. Timothy Brickell as part of PhD candidature at La Trobe University, Melbourne, Australia between 2011 and 2015. The speakers recorded were of both genders, of various ages, and from a number of professions, with many older speakers already retired. The texts in Multi-CAST are a subset of the 20 recordings made by Brickell. In some instances speakers discuss a topic chosen just prior to recording, in others they talk while engaging in traditional activities, while in some they narrate an elicitation video which depicts other community members carrying out traditional cultural activities.

Linguistic background

Cite

Brickell, Timothy C. 2016. Tondano. In: Haig, Geoffrey & Schnell, Stefan (eds.), Multi-CAST (Multilingual Corpus of Annotated Spoken Texts), https://lac.uni-koeln.de/multicast-tondano/, date accessed.

Content

Read More

Badab-e Surt, Mazandaran, Iran, by M. Samaee (2010), CC-BY 3.0
http://www.panoramio.com/user/3163512?with_photo_id=41442092

Persian

Adibifar, Shirin

Persian (ISO 639-3: pes) is an Iranian language with official variants spoken in Iran, Afghanistan, and parts of Tajikistan; the variety spoken in Iran is also referred to as Farsi. The texts in this corpus are narrative retellings of the Pear film (Chafe 1980), a roughly five minutes long short about a boy stealing the fruit a man had been picking. The recordings were made by Shirin Adibifar in Tehran and locations in the Mazandaran province in 2015. Of the 29 speakers, 17 of are female and 12 male; the median age is 25, with a range of 20 to 39. All speakers have received at least some measure of university-level education.

Linguistic background

Cite

Adibifar, Shirin. 2016. Persian. In: Haig, Geoffrey & Schnell, Stefan (eds.), Multi-CAST (Multilingual Corpus of Annotated Spoken Texts), https://lac.uni-koeln.de/multicast-persian/, date accessed.

Content

Read More

banner-multicast-1920x628

Multi-CAST: Overview

Geoffrey Haig, Stefan Schnell

Multi-CAST (Multilingual Corpus of Annotated Spoken Texts) is a collection of non-elicited, spoken texts from different languages, most of them monologic narratives. The corpus was compiled and annotated under the supervision of Geoffrey Haig and Stefan Schnell, with technical implementation undertaken by LAC at the University of Cologne. (more…)

Read More

Multi-CAST Annotations: Background and resources

Along with transcription and word-for-word translation, the texts in Multi-CAST have also been annotated with the annotation system GRAID (Grammatical Relations and Animacy in Discourse), developed by Geoffrey Haig and Stefan Schnell. GRAID provides a uniform set of tags and a simple syntax for their combination; it is designed to be applicable to typologically diverse languages.

(more…)

Read More

banner_teop

Teop (Oceanic, Bougainville, Papua New Guinea)

Ulrike Mosel, Stefan Schnell

Teop (ISO 639-3: tio) is a Western Oceanic language spoken on Bougainville Island, Papua New Guinea. The texts, all traditional narratives, were recorded by Ulrike Mosel and Enoch Horai Magum over the course of a language documentation project (principal investigator: Ulrike Mosel) funded by the Volkswagen Foundation (grant no. II 77 973). Details on the project can be found online at http://dobes.mpi.nl/projects/teop/. A sketch grammar of Teop (Mosel & Thiesen 2007) and additional materials are also available there.
The texts were annotated for Multi-CAST by Ulrike Mosel and Stefan Schnell.

Linguistic background

Cite

Mosel, Ulrike and Schnell, Stefan. Teop. In: Haig, Geoffrey and Schnell, Stefan. Multi- CAST (Multilingual Corpus of Annotated Spoken Texts), https://lac.uni-koeln.de/multicast-teop/, date accessed.

Content

Read More

banner_veraa

Vera’a (Oceanic, Vanuatu)

Stefan Schnell

Vera’a (ISO 639-3: vra) is an Oceanic (Austronesian) language from the village of the same name on Vanua Lava (13.80° S 167.47° E), one of the Banks Islands in North Vanuatu. The language has approximately 450 speakers and is the first language of most inhabitants of Vera’a and the coast line to the north of it. Vera’a is closely related to the neighbouring language Vurës, and speakers of Vera’a also speak Vurës. (more…)

Read More

banner_nkurd

Northern Kurdish (also known as Kurmanji)

Geoffrey Haig, Hanna Thiele

Northern Kurdish (ISO 639-3: kmr), also known as Kurmanji, is a Northwest Iranian language spoken in eastern Turkey, Iraq, Syria, and parts of western Iran. The two texts recorded here are traditional narratives, from a female and a male speaker who grew up near the townships of Erzurum and Muš respectively. The texts were recorded in Germany in the late 1990’s and early 2000’s, and subsequently transcribed, translated and annotated by Geoffrey Haig, Abdullah Incekan, and Hanna Thiele.

Linguistic background

Cite

Haig, Geoffrey and Thiele, Hanna. Northern Kurdish. In: Haig, Geoffrey and Stefan Schnell. Multi-CAST (Multilingual Corpus of Annotated Spoken Texts), https://lac.uni-koeln.de/multicast-northern-kurdish/, date accessed.

Content

Read More

banner_english

English

Nils Norman Schiborr

English (ISO 639-3: eng) is a Western Germanic language; the recordings in this corpus feature varieties native to various parts of Great Britian. The Multi-CAST English texts are mainly autobiographical narratives in the form of oral history interviews. Special care has been taken to select recordings with long, uninterrupted stretches of interviewee monologue and minimal interviewer interlocution.

The texts were made available as part of the Freiburg English Dialects corpus (FRED), compiled under the supervision of Bernd Kortmann and Lieselotte Anderwald at the Chair of English Language and Linguistics, Department of English, University of Freiburg. The texts used in Multi-CAST are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public Licence. A detailed description of FRED is available in Hernández (2006).

The first set of texts (kent01, kent02) was recorded in 1975 in Faversham, Kent, by Michael Winstanley, and annotated for Multi-CAST by Nils Schiborr.

Cite

Schiborr, Nils Norman. English. In: Haig, Geoffrey and Stefan Schnell. Multi-CAST (Multilingual Corpus of Annotated Spoken Texts), https://lac.uni-koeln.de/multicast-english/, date accessed.

Content

Read More

banner_cypgreek

Cypriot Greek

Harris Hadjidas, Maria Vollmer

Cypriot Greek (ISO 639-3: ell) is the variety of Greek spoken in Cyprus. The three texts in this sub-corpus, all of which are traditional narratives, were originally recorded in the 1960’s, and later compiled and published by Konstantinos Giangoullis as part of a book of traditional Cypriot tales (Giangoullis 2009). While unfortunately no audio recordings are available for this sub-corpus, the texts appear to have been only minimally edited and reflect reasonably faithfully the spoken language used in traditional narratives. The texts were initially transliterated into the Roman alphabet and translated into English by a native speaker, Harris Hadjidas, who also conducted the first round of syntactic annotation. A second round of annotation was completed by Maria Vollmer under supervision of Geoffrey Haig.

The author of the text collection, Konstantinos Giangoullis, has kindly given his permission for the three texts in this sub-corpus to be made freely available as part of the archive. We refer to the author’s website for further information on the author and related publications: http://www.cypriot-folk-poets.com/.

Giangoullis, Konstantinos G. 2009. Kypriaka paradosiaka paramytha. Ek stomatos Elenis Mich. Satsia. Apo to Geri-Pyroi (1887–1982) [Traditional Cypriot tales as told by Eleni Mich. Satsia from Yeri-Pyroi (1887–1982)]. Viviothiki Kyprion Laikon Poiiton, Ar. 71. Theopress Publications: Leukosia.

Linguistic background

Cite

Hadjidas, Harris and Vollmer, Maria. Cypriot Greek. In: Haig, Geoffrey and Schnell, Stefan. Multi- CAST (Multilingual Corpus of Annotated Spoken Texts), https://lac.uni-koeln.de/multicast-cypriot-greek/, date accessed.

Content

Read More