Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Kielellisen diversiteetin ja digitaalisten ihmistieteiden maisteriohjelma"

Sort by: Order: Results:

  • Koivusalo, Liisa (2022)
    Speaking fluently is an important goal for second language (L2) learners. In L2 research, fluency is often studied by measuring temporal features in speech. These features include speed (rate of speech), breakdown (use of silent and filled pauses), and repair (self-corrections and repetitions) phenomena. Fluent speakers generally have a higher rate of speech and fewer hesitations and interruptions than beginner language learners. In this thesis, phonetic fluency of high school students’ L2 Finnish speech is studied in relation to human ratings of fluency and overall proficiency. The topic is essential for the development of automated assessment of L2 speech, as phonetic fluency measures can be used for predicting a speaker’s fluency and proficiency level automatically. Although the effect of different fluency measures on perceived fluency level has been widely studied during the last decades, research on phonetic fluency in Finnish as L2 is still limited. Phonetic fluency in high school students’ speech in L2 Finnish has not been studied before. The speech samples and ratings used in this thesis are a part of a larger dataset collected in the DigiTala research project. The analyzed data contained spontaneous speech samples in L2 Finnish from 53 high school students of different language backgrounds. All samples were assessed by expert raters for fluency and overall proficiency. The speech samples were annotated by marking intervals containing silent pauses, filled pauses, corrections and repetitions, and individual words. Several phonetic fluency measures were calculated for each sample from the durations of the annotated intervals. The contribution of phonetic fluency measures to human ratings of fluency and proficiency was studied using simple and multiple linear regression models. Speech rate was found to be the strongest predictor for both fluency and proficiency ratings in simple linear regression. Articulation rate, portion of long silent pauses, mean duration of long silent pauses, mean duration of breaks between utterances, and rate of short silent pauses per minute were also statistically significant predictors of both fluency and proficiency ratings. Multiple linear regression models improved the simple models for both fluency and proficiency: for fluency, a model with a combination of articulation rate and the portion of long silent pauses performed the best, and for proficiency, a model with a combination of speech rate and mean duration of short silent pauses. Perceived fluency level is often affected by a combination of different phonetic fluency measures, and it seems that human raters ground their assessments on this combination, although some phonetic fluency measures might be more important on their own than others. The findings of this thesis expand previous knowledge on phonetic fluency in L2 Finnish and can benefit both language learners and teachers, as well as developers of automatic assessment of L2 speech.
  • Keturi, Joonas (2022)
    The subject of the thesis is the comparison of lexical semantics and phonetics. The thesis investigates with computational methods if there is significantly more phonetic variance in words that belong to the same semantic domains than with phonetically similar words from other semantic domains. In other words, phonetically very similar words and especially phonological minimal pairs would be in separate semantic domains. The method clusters word embedding vectors and distinctive phonological feature vectors from multiple languages, and the phonetic and semantic standard deviations are calculated for each cluster, and the mean standard deviations of cluster sets are compared. In addition to semantic and phonetic clusters, two test clusters are constructed which have the same number and the same size of clusters as the semantic clusters. The first test clusters use the words from phonetic clusters in order and the second test clusters are randomly permuted. These different cluster sets are compared by their mean standard deviations and cluster set similarity index. The results imply that words on the same semantic domains contain rarely phonetically very similar words, and those words are usually in separate semantic domains.
  • Božović, Dušica (2023)
    The aim of this research was to investigate the teaching of pluricentric languages as heritage languages in Finland, examine how they are perceived, and explore the expectations related to their teaching. Moreover, the study aimed to identify successful approaches in the teaching of pluricentric heritage languages. The motivation for conducting this study was my personal experience of teaching a pluricentric language as a heritage language and the limited coverage of this topic in academic literature. In addition, the lack of attention paid to attitudes in heritage language studies was also noted in the literature. The method used is a direct measures approach. Respondents provided their answers through a questionnaire predominantly including Likert-scale statements. The findings indicate that there is a desire to improve communication among the stakeholders in heritage language teaching. Respondents expressed positive attitudes towards groups with different language varieties and active inclusion of different varieties in class. They believed that all varieties should be treated as equally valid, and teachers should not treat forms of other varieties as mistakes. Studying in a linguistically heterogeneous group was seen as an enriching experience that can contribute to combating prejudices and building solidarity among speakers. The limitations of the study included a small number of respondents and imbalanced material in terms of language. The findings of the study have practical implications for heritage language coordinators and educators in their planning and teaching activities, as well as for policymakers seeking to enhance heritage language education. Additionally, the study advances the academic discourse on heritage language teaching and suggests areas for further research. Heritage language teaching in general requires significant improvement to achieve its aims. The study highlights the importance of addressing issues in pluricentric heritage language teaching and implementing strategies that promote positive attitudes towards language varieties and effective communication between coordinators, teachers, and guardians.
  • Hynynen, Jussi-Veikka (2023)
    Using language that is easy to understand when presenting information in a written form is critical for ensuring effective communication. Yet, using language that is too complex or technical for its intended audience is a common pitfall in many domains, such as legal and medical text. Automatic text simplification (ATS) aims to automatize the conversion of complex text into a simpler, more easily comprehensible form. This study explores ATS models for English that can be controlled in terms of the readability of the output text. Readability is measured with an automatically calculated readability level that corresponds to a school grade level. The readability- controlled models take a readability level as a parameter and simplify input text to match the reading level of the intended audience corresponding to the parameter value. In total, six readability-controlled sentence simplification models with different control attribute configurations are trained in this study. The models use a pretrained sequence-to-sequence model architecture that is finetuned on a dataset of sentence pairs in regular and simple English. The trained models are evaluated using automatic evaluation metrics and compared to each other and ATS systems from previous research. Additionally, the simplified sentences produced by the best performing model are evaluated manually to identify errors and the types of text transformations that the model employs to simplify sentences. When the readability level input value is optimized to maximise model performance on validation data, the readability-controlled models surpass systems from previous works in terms of automatic evaluation metrics, suggesting that the addition of readability level as a control attribute results in improved simplification quality. Manual evaluation shows that readability-controlled models are capable of splitting long sentences to multiple shorter sentences to reduce syntactic complexity of text. This finding suggests that readability level metrics can be used to effectively control syntactic complexity in ATS models as a lightweight alternative to previously applied, more computationally demanding methods that rely on dependency parsing. Finally, this study discusses the different types errors produced by the models, their potential causes and ways to reduce errors in future ATS systems.
  • Matulis, Haralds (2024)
    Master’s thesis, guided by an overarching research question of usability of computational methods in the study of digitized collection of self-writing, is examining Latvian Diary Corpus (LDC), which was compiled in 2021 and contains 36 handwritten, digitized diaries spanning from 1917 to 2012, with a corpus size totalling 2,771,300 tokens. The theoretical and methodological framework of the master’s thesis is situated in digital humanities, drawing on corpus linguistics approach in corpora compilation, and informed by digital curation and archival practices of cultural heritage domain. Diary, as a genre, is a part of the self-writing field, and various humanities disciplines, such as folkloristics, literary studies, and cultural anthropology, examine diary from different viewpoints. The main body of master’s thesis is structured into an Introduction, four chapters, and Conclusions. Chapters build onto each other to discuss from different perspectives: (1) the representativeness and heterogeneity of digital collections in the humanities; (2) conceptualizations of the diary in self-writing research field and how these theoretical concepts translate into practical decisions regarding diary domain operationalization, crowdsourcing diaries from population, and methodological border cases encountered by curators in composing LDC; (3) statistical exploration of LDC and probing the correlation between diary length and four variables: time intervals between diary entries, and three linguistic features – personal pronouns, past and present tense, activity and non-activity verbs. The results of computational analysis reveal significant variance in diaries, suggesting not only diverse writing styles of individual diarists but also structural heterogeneity within LDC. There is a reason to believe that texts in LDC, merged under the umbrella term of “diary”, contain several specific sub-genres of self-writing, each with its own distinct signature. Starting the research by inquiring the concept of representativeness, the findings of this master’s thesis suggest that it also could be fruitful to study heterogeneous digital collections, their diversity being not a drawback, but a source of richness, which can be further leveraged by using computational methods to uncover and analyze these heterogeneities, which are then assessed by critical reading methods. To apply computational methods on heterogeneous humanities collections with a sufficient degree of generalizability of results, the master’s thesis proposes careful domain operationalization, source criticism, and cultural analysis steps, and then, guided be the particular research question, subsetting a homogeneous sub-corpus from a larger collection of heterogenous items for computational analysis.
  • Pöyhönen, Teemu (2023)
    While natural language generation (NLG) and large-language models (LLM) seem to be transforming many industries, video games have yet to be affected. This study investigates the potential of using NLG systems to generate dialogue for non-playable characters (NPCs) in role-playing games (RPGs). For this, dialogue data is extracted from six popular RPGs and is then used to fine-tune Microsoft’s GODEL to create an “RPG chatbot” (RPG-GPT). Motivated by computational creativity frameworks, a survey and an interactive experiment were conducted to evaluate the creativity and the effectiveness of RPG-GPT in generating relevant and engaging responses to player input. Survey respondents rated dialogues on a 5-point agree-disagree Likert scale, with questions related to e.g. the relevance of the NPC answers. Results indicate that RPG-GPT can provide relevant responses with a mean difference of game relevance of 3.93 vs. 3.85 of RPG-GPT (p=0.0364). Also, the participants of the interactive experiment reported engagement when interacting with RPG-GPT. Overall, the results suggest that creative NLG has the potential to enhance gaming experiences through task-oriented game dialogue (TOGD) systems. In this framework, creative TOGD systems could solve a common issue where pre-written NPCs are unable to provide the specific information sought by players. Additionally, the study discusses a concept of how players through their interaction with the NLG models can expand the lore of a game, which is a new consideration for game designers and developers when implementing such systems. Future work could explore ways to incorporate external knowledge and context to improve the performance of a TOGD system.
  • Pöllänen, Roosa (2022)
    In earlier research, the sociative causative has been considered a subcategory of a prototypical causative and not a category of its own. In the sociative causative the causer both initiates the event and participates in it, unlike in the prototypical causative in which the causer is only the initiator. It has been proposed that the causer can participate in the event either by acting together with the causee, helping the causee, or supervising the causee. The sociative causative can be marked on the predicate by using a specific sociative causative marker or it can be a reading of a prototypical causative construction or a reading of an applicative. The objective of the thesis is twofold. First, the intention is to find out, using a typological sampling method, if there are more languages with a specific sociative causative construction beyond those that are currently known and, second, how these constructions behave. Special attention is paid to the exact semantics of the sociative causation to see if it reflects the semantics proposed in the earlier literature. The contexts in which the prototypical causatives and applicatives can get the sociative reading are also studied. The intention is to find out where the sociative causative aligns in the causative continuum. It has been proposed in the previous literature that the sociative causative is an areal feature of the South American indigenous languages, and 26 languages were previously known to have sociative causative. In addition to these 26 languages, a genealogically balanced sampling method was applied and four languages with sociative causative function were found. Since South America is one of the world’s most linguistically diverse areas the data gathering was limited to the western part of the continent. The 30 languages were analyzed formally and semantically. The analysis shows that the sociative causative usually describes the type of causation in which the causer is a co-actor with the causee or the causer helps the causee. The supervision type of sociative causation, however, occurred rarely. The sociative causative tends to be used with intransitive verbs that express motion or physical activity. In the causative continuum it seems to be in the middle, as the previous research proposes.
  • Myllylä, Ida-Lotta (2023)
    This thesis investigated a sound-space phenomenon related to sound-symbolic associations between vowel sounds [i] and [æ] and spatial meanings up and down. This vowel-height congruency effect was investigated with two experiments utilizing speeded choice reaction time (CRT) tasks. In Experiment 1, participants were required to vocalize [i] or [æ] while being presented with visual stimuli moving either up or down. The task was indirect, so that the phenomenon under investigation was masked by instructing the vocalizations to be produced according to distance of movement, rather than location. Due to this masking, the sound-magnitude effect typically associating high (close) vowels with small distances and low (open) vowels with large distances was also investigated in this thesis. In Experiment 2, participants produced responses according to the location of visual stimulus (up/down) or according to the aurally presented vowels [i] and [æ], while being presented with both stimuli simultaneously. In both experiments, reaction time (RT) measures were analyzed. In Experiment 1, acoustic characteristics (fundamental frequency F0, and formants F1, F2) of the vocalizations were also analyzed. The results showed, that there is a sound-symbolic association between the vowel [i] and spatial meaning up, based on the stimulus-response congruency observed in reaction time measures. The sound-magnitude effect was also found to be robust in these experiments. The sound-space association between [æ] and spatial meaning down was not found to be significant. The sound-space effect also emerged only in the experiment requiring vocalizations, and not in the experiment requiring manual responses. The sound-space effect was present in the reaction time measures, and not in the vocal characteristics of vocalizations. It was concluded, that the vowel-height congruency effect can be robustly observed (i.e., in relation to both vocal responses) only when the experimental task requires intentional and task-relevant processing of the concepts up and down. It was also estimated, that the sound-space effect related to vowel sounds [i] and [æ] and spatial meanings up/down may not be as strong, as for instance the sound-magnitude effect. Regarding the possible underlying mechanisms of sound-symbolic associations, some evidence supporting the embodiment-based articulatory views on sound symbolism was found. In addition, the intrinsic vowel pitch (IVP) phenomenon was replicated in this thesis, and it was demonstrated, that the intrinsic pitch is an important core property of vowel sounds that influences also sound-symbolic associations.
  • Peura, Telma (2023)
    Maisterintutkielmassani tutkin kvantitatiivisin metodein, miten suomenkielisten romaanijulkaisujen monimuotoisuus on kehittynyt viimeisen 50 vuoden aikana. Tutkimukseni perustuu Kirjasampo-tietokannan metadataan suomalaisten julkisten kirjastojen kokoelmasta, ja keskityn analysoi- maan tekstin ulkoisia piirteitä. Moninaisuuden indikaattoreina käytän kirjoituskieliä, kirjailijoiden kansalaisuutta ja sukupuolta sekä romaanien genreluokituksia. Lisäksi tarkastelen julkaisijoita ja pohdin, kuinka he toimijaryhmänä vaikuttavat kirjallisuuden monimuotoisuuteen. Kvantitatiivisten analyysien rinnalla kuljetan digitaalisille ihmistieteille tyypillisesti runsaasti kvalitatiivisia havaintoja taustoittamaan tuloksia. Lähestyn kirjallisuutta kansainvälisenä dynaamisena kokonaisuutena, jossa eri kirjalliset kulttuurit ovat vuorovaikutuksessa toisiensa kanssa, muodostaen kirjalliseen tilaan paikallisia keskuksia ja periferioita. Ylirajaisuuden käsitteen avulla kuvaan, kuinka globalisoituvaa kirjallista kenttää on mahdotonta rajata kokonaan erillisiin kirjallisuuksiin, vaan se kehittyy yli kansallisuus-, kieli- ja genrerajojen. Tulokset osoittavat, että romaanikirjallisuus on 1990-luvun jälkeen alkanut kehittyä monimuotoisemmaksi määrittelemieni indikaattoreiden perusteella. Silti kenttää hallitsevat kotimaisen kirjallisuuden osalta suomenkielinen ja käännöskirjallisuuden osalta angloamerikkalainen sekä pohjoismainen kirjallisuus. Kustantajien tarkastelu viittaa siihen, että kentällä on paljon erikokoisia toimijoita. Erityisesti vuosituhannen vaihteen jälkeen pienten toimijoiden sekä omakustannejulkaisujen osuus on kasvanut ja haastanut kustantajien perinteisen roolin kirjallisuuden portinvartijana. Tutkimus osoittaa, kuinka kirjastojen metadataa voidaan käyttää hyväksi digitaalisessa kirjallisuudentutkimuksessa. Runsaudessaan Kirjasampo osoittautui monipuoliseksi tietolähteeksi, jonka perusteella voi tehdä päätelmiä suomalaisen kirjallisuuden laajoista kehityskaarista.
  • Bedretdin, Ümit (2022)
    Tämä työ esittelee ohjattuun koneoppimiseen perustuvan tekstiluokittelijan kehitysprosessin mediatutkimuksen näkökulmasta. Valittu lähestymistapa mahdollistaa mediatutkijan asiantuntijatiedon valjastamisen laaja-alaiseen laskennalliseen analyysiin ja suurten aineistojen käsittelyyn. Työssä kehitetään neuroverkkopohjainen tekstiluokittelija, jonka avulla vertaillaan tekstistä erotettujen erilaisten luokittelupiirteiden kykyä mallintaa journalististen tekstien kehystystaktiikoita ja aihepiirejä. Kehitystyössä käytetyt aineistot on annotoitu osana kahta mediatutkimusprojektia. Näistä ensimmäisessä tutkitaan tapoja, joilla vastamedia MV-lehti uudelleenkehystää valtamedian artikkeleita. Siinä on aineistona 37 185 MV-lehden artikkelia, joista on eristetty kolme erilaista kehystystaktiikkaa (Toivanen et al. 2021), jotka luokittelijan on määrä tunnistaa tekstistä automaattisesti. Toisessa projektissa keskiössä on valtamedioissa käyty alkoholipolitiikkaa koskeva keskustelu, jota varten kerättiin 33 902 artikkelin aineisto Ylen, Iltalehden ja STT:n uutisista (Käynnissä oleva Vallan virrat -tutkimusprojekti). Luokittelijan tehtävänä on tunnistaa aineistosta artikkelit, jotka sisältävät keskustelua alkoholipolitiikasta. Työn tarkoituksena on selvittää, mitkä tekstin piirteet soveltuvat parhaiten luokittelupiirteiksi kulloiseenkin tehtävään, ja mitkä niistä johtavat parhaaseen luokittelutarkkuuteen. Luokittelupiirteinä käytetään BERT-kielimallista eristettyä virketason kontekstuaalista tietoa, artikkelin muotoiluun liittyviä ominaisuuksia, kuten lihavointeja ja html-koodia, ja aihemallinnuksen avulla tuotettuja artikkelikohtaisia aihejakaumia. Alustavat kokeet pelkästään kontekstuaalista tietoa hyödyntävällä luokittelijalla olivat lupaavia, mutta niidenkään tarkkuus ei yltänyt tarvittavalle tasolle. Oli siis tarpeen selvittää, paraneeko luokittelijan suorituskyky yhdistelemällä eri piirteitä. Hypoteesi on uskottava, sillä esimerkiksi BERT-pohjaiset upotukset koodaavat muutaman virkkeen pituisen sekvenssin lingvististä ja jakaumallista informaatiota, kun taas aihemalli sisältää laajempaa rakenteellista informaatiota. Nämä piirteet täydentäisivät toisiaan artikkelitason luokitustehtävässä. Yhdistelemällä tekstien kontekstuaalista informaatiota aihemallinnukseen on hiljattain saavutettu parannuksia erilaisissa tekstinluokittelutesteissä ja sovelluksissa (Peinelt et al. 2020, Glazkova 2021). Yhdistämällä kontekstuaaliset piirteet aihemallin informaatioon päästään tässä työssä tosin vain marginaalisiin parannuksiin ja vain tietyissä ympäristöissä. Tästä huolimatta kehitetty luokittelija suoriutuu monesta luokittelutehtävästä paremmin kuin pelkästään kontekstuaalisia piirteitä hyödyntävä luokittelija. Lisäksi löydetään potentiaalisia kehityskohteita, joilla voitaisiin päästä edelleen parempaan luokittelutarkkuuteen. Kokeiden perusteella kehysanalyysiin perustuva automaattinen luokittelu neuroverkkojen avulla on mahdollista, mutta luokittelijoiden tarkkuudessa ja tulkittavuudessa on vielä kehityksen varaa, eivätkä ne vielä ole tarpeeksi tarkkoja korkeaa varmuutta vaativiin johtopäätöksiin.
  • Kajala, Jukka (2023)
    According to Malchukov, Haspelmath and Comrie a ditransitive construction is a construction consisting of a ditransitive verb, an agent argument, a recipient-like argument, and a theme argument. The relations between these arguments are coded in languages by different methods, namely flagging, or noun-based marking methods; indexing, or verb-based marking methods; or the relation is determined by word order. Typologically ditransitive construction can be divided into three alignment groups, indirective, secundative or neutral. In indirective alignment the recipient argument is marked using a different marking method from theme and monotransitive patient arguments; in secundative alignment the theme argument is marked using different methods; in neutral alignment all three arguments are marked using the same method. Swahili is a prominent lingua franca spoken in Eastern Africa by approximately 100 million people belonging to the language family of Bantu languages. Swahili is an agglutinative language with rich verbal morphology. The Swahili morphosyntax is based on noun class system, in which each noun belongs to a certain noun class. Briefly, the Swahili verb cluster is constructed by adding subject and object markers, which are determined by the nouns or person affiliated with them, to the verbal root. Swahili verb cluster permits only zero or one object marker. Prior studies on Swahili object marking and ditransitive constructions reveal that the patient argument is marked using indexing. Swahili has no case marking, so no flagging methods are used. In ditransitive constructions the recipient is marked as an object marker to the verb. Because recipient and patient arguments are marked using same method, the alignment type of Swahili ditransitive clauses is secundative. In the early grammars and textbooks, the linear word order of the two overt ditransitive objects is suggested to be recipient first, theme second. Later studies suggest that the order might vary. As a part of this study, a corpus study using the Helsinki Corpus of Swahili was carried out. The findings from the corpus study confirm the later findings, the linear order of the two objects shows variation. The syntactically more heavy objects seems to prefer the position of the later object.
  • Koho, Tiina (2022)
    Tekstin normalisointi on prosessi, jossa epästandardia kirjoitettua kieltä muutetaan standardisoituun muotoon. Murteet ovat yksi esimerkki epästandardista kielestä, joka voi poiketa huomattavastikin standardisoidusta yleiskielestä. Lisäksi suomen kieli on ortografialtaan varsin pitkälti foneemista, minkä ansiosta myös puhutun kielen ominaispiirteet on mahdollista tuoda esille kirjoitetussa muodossa. Etenkin epävirallisilla alustoilla ja arkikielisessä kontekstissa, kuten sosiaalisessa mediassa, suomen kielen puhujat saattavat kirjoittaa sanat kuten ääntäisivät ne normaalisti puhuessaan. Tällaista epästandardista kielestä koostuvaa aineistoa voi löytää myös luonnollisen kielen käsittelyn tarpeisiin esimerkiksi Twitteristä. Perinteiselle yleiskieliselle tekstiaineistolle suunnatut luonnollisen kielen käsittelyn työkalut eivät kuitenkaan välttämättä saavuta toivottavia tuloksia puhekieliselle aineistolle sovellettuna, jolloin ratkaisuna voidaan käyttää välivaiheena tekstin normalisointia. Normalisointiprosessissa syötteenä käytettävä puhekielinen tai muutoin epästandardia kieltä sisältävä teksti muutetaan standardisoituun kirjoitusasuun, jota luonnollisen kielen käsittelyn työkalut paremmin ymmärtävät. Tämä työ pohjaa aiempaan tutkimukseen, jota on tehty suomen murteiden normalisoinnin parissa. Aiemmissa tutkimuksissa on todettu, että merkkipohjaiset BRNN-neuroverkkomallit (Bidirectional Recurrent Neural Nerwork) saavuttavat hyviä tuloksia suomen kielen murteiden normalisoinnissa, kun syötteenä käytetään sanoja kolmen kappaleen lohkoissa. Tämä tarkoittaa, että järjestelmä saa syötteenä kerrallaan kolmen sanan joukon, ja jokainen sana on edelleen pilkottu välilyönnein eroteltuihin kirjoitusmerkkeihin. Tässä työssä pyrittiin käyttämään samoja metodeja ja aineistoa kuin aiemmassa tutkimuksessa, jotta tulokset olisivat vertailukelpoisia. Aineistona on käytetty Kotimaisten kielten keskuksen ylläpitämää Suomen kielen näytteitä -korpusta, ja normalisointiin on käytetty OpenNMT-nimistä avoimen lähdekoodin kirjastoa. Työssä toteutetuista kokeiluista saadut tulokset näyttävät vahvistavan aiempien tutkimustulosten pohjalta tehdyt löydökset, mutta lisäksi on viitteitä siitä, että neuroverkkomallit saattaisivat pidemmistä lohkoista koostuvista syötteistä. BRNN-mallin lisäksi työssä kokeillaan myös muita neuroverkkoarkkitehtuureja, mutta vertailtaessa sanavirheiden suhdelukua mittaavaa WER-arvoa (Word Error Rate) voidaan todeta, että BRNN-malli suoriutuu normalisointitehtävästä muita neuroverkkoarkkitehtuureja paremmin.
  • Junctorius, Lina (2024)
    The pressing challenge of climate change and its uncertainties require effective communication to engage mitigation efforts. Data visualizations enable presenting complex data to layperson and professionals. People’s perception, however, seems to be affected by their motivations, and uncertainty in data. This study investigates the influence of prior beliefs and uncertainty representation on climate-aware people’s interpretation of climate data visualizations. In an online experiment, participants estimated the correlation of variables displayed in scatterplots. The plots were labelled either with meaningful or abstracted variables, and either included uncertainty representation or not. Participants also indicated how much they believed the meaningful variables to be correlated. When a correlation triggered their beliefs, participants estimated higher correlations than when they did not have beliefs about the displayed data. The representation of uncertainty alone did not influence the estimation performance. When participants had beliefs about a correlation and uncertainty was represented in the plot, participants’ estimation was higher than in the other conditions and the least accurate. The findings suggest that people’s interpretation was biased by their prior beliefs, especially in combination with uncertainty representation. This might be explained by prior beliefs guiding participant’s attention to features of the visualization supporting their views. This biased perception seems to affect their interpretation. Uncertainty representation might increase the bias by expanding the range of possible interpretations, potentially prompting people to rely on their prior beliefs more strongly.
  • Calame, Héloïse (2024)
    Research on negation and evidentiality has seen a significant increase in the last decades, both from a typological perspective and for specific languages. The interaction of both domains with other categories has been investigated (e.g. Aikhenvald 2004, Miestamo 2005). However, the interaction of evidentiality with negation is heavily understudied. Apart from a few mentions (e.g. de Haan 1997) and language-specific analyses, I am not aware of comparative research on the topic. The present study analyses and draws a comparative picture of how clausal negation and grammatical evidentiality interact cross-linguistically. Semantically, there are two possibilities, illustrated in example (1) with a visual source of evidence (expressed lexically due to the characteristics of the English language) and the negator: in (1a), the proposition is negated, and in (1b), the source of evidence is negated. (1) a. ‘I see that it is not raining.’ b. ‘I do not see that it is raining.’ Since negation is a function universally grammaticalized in natural languages (Dahl 1979: 79), but grammatical evidentiality is only found in around a fourth of the world’s languages (Aikhenvald 2004: 1), the typological sample for this study contains languages that are known to have at least one evidential. De Haan’s typological study of evidentiality for the World Atlas of Language Structures (2013a) provides a good basis for sampling: the sample for the present study contains one language per family classified by de Haan as having evidentials, adding up to 70. In order to show maximal variety, languages known to be of interest for this phenomenon are also discussed, such as Akha (Aikhenvald 2004) and Cheyenne (Murray 2016). All in all, this study shows that the interaction of negation and evidentiality is of interest both from semantic as well as morphosyntactic points of view, and as much for language-specific research as for typological studies. It gives an overview of the diversity of interactions between negation and evidentiality, and their frequency in the 70-language sample. In short, it is a typology of not knowing what happened and knowing what did not.
  • Knapen, Martijn Gerardus Theodorus Maria (2021)
    Research on the interaction of the Amuric languages (referred to as “Nivkh” or “Ghilyak” when regarded as a single language) with the Tungusic languages was initiated by Grube (1892). His focus on loanwords has been the object of study until the present day. Recently, Janhunen (2010: 292, 296; 2016: 23) has suggested that contact between the two families already started between their ultimate ancestors: Pre-Proto-Amuric and Proto-Tungusic. This thesis investigates whether some of the lexical parallels proposed by earlier research belong to this period. As the thesis is written from the perspective of language contact, the parallels are regarded as the result of borrowing instead of inheritance. The distinction between these two modes of transmission formed the theoretical basis for the methodology that was employed. To prove ancient contact, it had to be shown that the Amuric and Tungusic languages inherited their shared lexemes from their respective ancestors and that these ancestors may have borrowed from each other. As the methodology relied on the literature on Amuric and Tungusic historical phonology, an overview of this topic is also included. First, fifty parallels were drawn from those listed in previous research. These could be reconstructed to Proto-Amuric and Proto-Tungusic using the Comparative Method and thus could have been inherited from them or an earlier ancestor in the case of Proto-Amuric. Additionally, they exhibited phonological similarities that could reasonably be expected from borrowing between Pre-Proto-Amuric and Proto-Tungusic. Afterwards, a direction of borrowing had to be established, the principal evidence of borrowing. For that purpose, nine criteria were developed. These criteria considered morphology, diachronic and synchronic phonology, extent of attestation, semantics and extra-linguistic factors. Finally, the data was separated into older and younger strata, since in the selection phase only the Proto-Amuric stage was considered, while the target was Pre-Proto-Amuric. These layers were classified on the basis of phonological developments. For most of the fifty parallels the direction of borrowing could be determined. In this stage of analysis, fifteen of them were ultimately dismissed as recent or doubtful. The remaining thirty-five were examined for properties that could have resulted from the sound changes that followed Pre-Proto-Amuric that were proposed in earlier research. Ultimately, it could only be proven that the absence of vowels in non-initial syllables was a property characteristic of ancient lexemes in the Amuric lineage. Consequently, although a substantially old stratum of Amuric-Tungusic parallels was found, further research is needed to show that any of them date to Pre-Proto-Amuric and Proto-Tungusic times.
  • Weidinger, Lucas (2024)
    Speech, as the evolutionary pinnacle of human communication, is not just defined by its content, but only gains meaning with prosody. Prosody plays a vital role in conveying the words spoken as well as the underlying emotions and intentions of the speaker. While certain aspects of prosody and its relation to emotion have been studied, the concept of sincerity within speech remains a complex and active area of research, spanning linguistic, ethical, and philosophical dimensions. This thesis explored the perception of sincerity in speech through the manipulation of prosodic features using neural network-based speech synthesis. The primary research question explored the impact of modifying prosody on the perceived sincerity of synthetic speech. Three prosodic features — speaking rate, f0-mean, and f0-standard deviation — were evaluated to refine the analysis. Commissive utterances, dependent on sincerity, form the basis of the research material. Data from 40 commissive utterances are subjected to eight prosodic modifications, and linear regression confirms the intended effects. A perception experiment involving 115 native Finnish speakers revealed intriguing results. While the categories of prosodic modifications showed no significant impact on perceived sincerity, analyzing individual prosodic feature values uncovered significant correlations. Increased speaking rate and f0-Standard Deviation correlated positively with perceived sincerity, validating secondary hypotheses. However, no significant correlation was found for increased f0-Mean. The null result in category-based analysis suggests some methodological limitations, possibly obscuring direct conclusions. Nevertheless, the nuanced prosodic features challenge participants' discernment, impacting sincerity evaluation. Due to only partially confirming the hypotheses due to practical constraints, future research avenues hold promise for uncovering deeper insights.
  • Pomare, Adriano (2023)
    Statistical learning (SL) is a set of cognitive mechanisms which allow an organism to subconsciously pick up recurring patterns from its environment. While research in this field has flourished over the past decades, its relationship with multilingualism remains unclear. Our goal is to estimate the extent of this relationship by comparing individual language skills with the performance in a statistical language learning (SLL) task. For this purpose, we conducted an online experiment to collect information about the participants' linguistic background and to test their SL ability via a Statistically Induced Chunking Recall (SICR) task. Additionally, visual linguistic stimuli were generated to examine how the phonotactic rules of the participants' native tongue would impact SL. In particular, we tested how violating vowel harmony in Finnish affected the performance of participants with different degrees of multilingualism. To measure multilingualism, we created the Multilingualism Score (MS), a multifactorial index designed to gauge one's multilingualism level and to analyse its relationship with the performance in the SICR task. Our results exhibit positive correlation between these two factors, suggesting that multilingualism and SLL are significantly correlated. We also observed overall lower performance associated with the violation of vowel harmony. However, we were not able to establish a clear connection between multilingualism and the performance gap.
  • Busheva, Anna (2023)
    This thesis investigates the realisation of tone in dialects of Southern Angami, a language of Tibeto-Burman family spoken in the state of Nagaland, North-East India. The audio recordings of native speakers are analysed to determine how the tones differ in pitch movement patterns, accounting for context and dialect variation. The research questions concern the significance of pitch contours and duration in a level tone system, as well as tone unit interaction. It was concluded that the fundamental frequency is the main determining factor, and neither pitch contour nor duration have a more prominent effect than pitch value; however, it is possible that duration plays a role in discerning tones 2 and 3, and a pitch curve is a consistent feature of tones 1 and 4. No significant difference was found in tone systems of Jotsoma and Kigwema.
  • Hyvönen, Anu (2024)
    This thesis examines the typology and stability of evidentiality in contact settings. The goal is to estimate the stability of evidentiality in different contact scenarios, and to find whether evidential structures are more likely to change in language contact situations than to remain stable. Furthermore, this study aims to develop methodology to approach the typology of evidentiality to examine its contact effects in the first place. Earlier research has described evidentiality as an unstable feature that diffuses easily in contact situations, but systematic research examining evidentiality in multiple contact settings and its stability in contact is yet lacking. Moreover, evidentiality has been studied widely, but there is no previous typological approach on evidentiality or on its contact effects that would be suitable for the purposes of this thesis. This study takes a typological approach to the study of evidentiality and language contact. The examination of contact effects is based on six sampling units of three-language sets across the globe, wherein contact effects are estimated on an external benchmark. The collected linguistic data from the sampling units was analyzed into logical outcomes of contact and turned into probability distributions. This finally resulted in the aggregated probability of convergence. The probability of convergence is contrasted to the stability of evidentiality and in that continuum this study estimates how likely it is that evidentiality has been affected due to contact. Furthermore, this thesis focuses on finding a suitable way to approach the first research goal and therefore presents a typological approach on evidentiality and defines grammatical evidentiality. The primary results of this study suggest high probability of convergence and evidentiality seems to be an unstable linguistic domain that diffuses easily. These findings were further contrasted to some other linguistic domains indicating that evidentiality is among the most unstable domains. This study also suggests that the semantic properties of evidentiality are more unstable than the morphological ones. The findings also highlight the sensitivity of the methodology, and these limitations are demonstrated and reflected upon.