|
Work package 3
The main purpose of this research is to provide us with a measure of frequencies of the different patterns identified in languages within the WP2, and the parameters, both shared and language-specific, which play a role in spatial asymmetries across and within individual languages. The description of language use will take shape through the setting up of a large crosslinguistic database, built on the basis of language elicitation with the help of new and adapted elicitation tools .
Scientific team leaders: Alice Vittrant & Anetta Kopecka
WP3.1: Designing a visual elicitation tool for crosslinguistic data collection
-
Rather than rely solely on generic language descriptions, we will compare language structures and language use. In order to achieve this goal, we will rely on a series of elicitation tools, including existing ones which have been extensively tested and can be adapted and improved to meet our needs, for instance the Trajectoire videos, and new ones which will be specifically devoted to eliciting spatial asymmetries. Previous research has shown that elicitation is invaluable, in so far as it facilitates cross-linguistic investigations and comparisons and makes it feasible to go well beyond standard typologies relying solely on existing language descriptions. However, it has also shown that elicited data are not perfectly natural: accordingly, for languages that have not been described in sufficient depth or detail thus far, we need to gather ‘natural’ data with which our elicitations may be compared. While they are already available for well-described languages, such additional data need to be collected for under-described languages.
We are aware of both advantages and limits of this methodology, and, more importantly, we know how to optimize these tools.
Given the fact that the elicitation material will be used in different filed sites and different cultures, we will pay special attention to its ecological value (natural settings with limited European culture-specific cues) and ethical appropriateness (e.g. types of events, scenery, and clothing). The main goal is to collect data for systematic cross-linguistic comparisons, thus enabling us to analyze how speakers of different languages conceptualize and describe the same visual scenarios, and to examine the use and frequency of different patterns
The visual stimuli designed to elicit language data on spatial asymmetries will be organized according to specific grouping principles in order to test the role of different factors that might play a role in spatial asymmetries,
- different types of events – caused motion (e.g. putting vs taking, receiving vs giving), spontaneous motion (e.g. animate vs non-animate protagonists), change of posture (standing vs sitting vs lying);
- different parameters: orientation (horizontal vs vertical), boundary-crossing (with vs without), intentionality (intentional vs non-intentional), deixis (centrifugal vs centripetal vs transversal);
- exploratory parameters, such as animate vs non-animate, which have been shown to be crucial for the source-goal asymmetry.
Scientific team leaders: Benjamin Fagard & Anetta Kopecka
WP3.2: Data transcription, annotation and coding in view of comparative analyses
-
All data will be audio-recorded, transcribed, annotated and coded. Given our experience in language description and typological linguistics, we know how complicated it is to agree on a common semantic or even morphosyntactic coding for languages with different typological features. It is indeed a difficult and problematic task. The data gathered will therefore be minimally enriched, i.e. with lemma, gloss and part-of-speech (POS) tags, as has been done in various ANR projects in which the Lattice took part, for various languages (e.g. Old French, which is notoriously problematic for POS tagging).
The data collected with the visual stimuli will be analyzed following a unified theoretical approach, bridging the traditionally separate analytical/qualitative and experimental/quantitative studies.
Scientific team leaders: Benjamin Fagard, Anetta Kopecka, Christine Lamarre & Alice Vittrant
WP3.3: Building a cross-linguistic database
-
All data gathered within WP1 and WP2 will be systematically included in a large crosslinguistic database. We will rely on the expertise and experience of the consortium in Natural Language Processing tools and database set-up to reach this specific goal, which is by no means an easy task. It is not entirely new, since there are examples of public, available databases with linguistic features of many different languages, such as the WALS database or, in lexical typology, the DECOLAR or DatSemShifts databases.
This database will rely on existing solutions for data storage, such as CoCoON. This technical platform for scientists in human and social research fields is dedicated to structuring oral data and depositing them in the TGIR Huma-Num archives (e.g. PANGLOSS).
|