Research Interests

Languages

I have a long-standing interest in the description and analysis of Mbyá, a Tupí-Guaraní language spoken in Argentina, Brazil and Paraguay. I have been doing linguistic fieldwork on Mbyá in these three countries since 2007.

Corpus Linguistics and Digital Linguistics

My main research interest at the moment is usage based research on the grammar Mbyá Guaraní using probabilistic models trained on richly annotated corpora. Representative papers and presentations are Thomas et al. 2021, Thomas 2024 and Thomas and Duarte, to appear.

This endeavour goes hand in hand with development of digital resources for the study and processing of Mbyá. This includes a dependency treebank, part of which is available in the Universal Dependencies project. I am currently writing a finite state morphological analyzer based on Robert Dooley’s dictionary of Mbyá.

Semantics and Morphosyntax

Another research interest is the study of meaning (semantics and pragmatics) in relation to word and sentence structure (morphosyntax). Some topics I have investigated in this area are the expression of tense and aspect (temporal marking on nouns, vacuous uses of the present tense, and resultative predicates), the expression of incremental additivity from a morpho-semantic and typological perspective, generalized bare singular nominals and complex predicates (restructuring and converbs in Mbyá).

Language Documentation

Between October 2013 and May 2015, I served as the scientific coordinator of the National Inventory of the Cultural Heritage of the Guaranis of the state of Rio de Janeiro and Espírito Santo in Brazil. This project aimed to document aspects of the oral culture of the Mbyá people, such as songs, narratives, ceremonial discourses and oratory. The project was community based, as it was written with representatives of the participating communities, and the documentation team consisted entirely of Mbyá researchers. It was financed by the Brazilian National Institute for the Historical and Artistic Heritage, and was carried out by the Indigenous Museum of Rio de Janeiro. About eighty hours of audio and video recording were produced in the project, which are still being edited and annotated. A portion of this corpus will soon be made available online using the LingView interface.

Guillaume Thomas

Languages

Corpus Linguistics and Digital Linguistics

Semantics and Morphosyntax

Language Documentation