Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Kielellisen diversiteetin ja digitaalisten ihmistieteiden maisteriohjelma"

Sort by: Order: Results:

  • Pänkäläinen, Anni (2024)
    Tämä tutkielma käsittelee konsonanttien äännesymboliikkaa. Äännesymboliikka on ilmiö, jossa yksittäisillä äänteillä kieli pyrkii muistuttamaan kielenulkoista maailmaa, kuten ääntä, esineiden muotoa tai väriä, ja usein samat äänteet yhdistetään samoihin ilmiöihin maailmanlaajuisesti. Aiemmin erityisesti vokaalien laadulla on havaittu olevan yhteyksiä valoon ja kirkkauteen, mutta konsonanttien kohdalla tulokset ovat olleet epävarmempia. Tässä tutkielmassa selvitetään, onko konsonanteilla mahdollista ilmaista äännesymbolisesti valon määrää tai puutetta ja mihin se mahdollisesti perustuu. Analyysi perustuu kuuteen sanaan/affiksiin (musta/valkoinen, yö/päivä ja tumma/vaalea), jotka on kerätty maailmanlaajuisesta areaalisesti ja geneettisesti tasapainotetusta 95 kielen otoksesta. Merkityspareja tarkastellaan paitsi yksittäisten konsonanttien, myös äänteiden rajat ylittävien distinktiivisten piirteiden ja sonorisuuden kautta sekä koko aineiston näkökulmasta että makroalueita vertaillen. Tavoitteena oli selvittää, esiintyykö semanttisesti tummien tai vaaleiden merkitysten ryhmissä yliedustettuja äänteitä tai piirteitä. Tilastoanalyyseissä on hyödynnetty logistista regressiota sekä Studentin t-testiä. Aineisto osoittaa, että tietyt äänteet korostuvat sekä tummien ([m], [ŋ]) että vaaleiden ([r], [h]) merkitysten ryhmissä, mutta sen sijaan sonorisuuden tai distinktiivisten piirteiden pohjalta ei voi tehdä maailmanlaajuisia yleistyksiä. Todennäköisemmin kyseessä ovat monien eri tekijöiden äännekohtaiset yhdistelmät, joita pohditaan tulosten ja aiemman tutkimuskirjallisuuden valossa. Tutkielma huomioi myös äännesymboliikan monitieteisen luonteen.
  • Ahvenharju, Panu (2024)
    The topic of the thesis is integrating Natural Language Processing (NLP) and Computer-assisted Language Learning (CALL) into teacher-led Spanish instruction. The aim is to present a development process and a CALL application to be used to study learning results. The study seeks answers to questions on how an NLP-based CALL application can be used to investigate learning, and how its usage rate and usage affect learning outcomes. Also, the focus is on usability, asking how usable the students evaluate the application to be, and what kind of open feedback they give for it. 108 secondary school students and four teachers from the Helsinki Metropolitan region participated in the study, where a gamified application creates a competitive setting between five teaching groups. The students use the application to solve textbook-based cloze exercises that are generated using a combination of a neural language model and a rule-based exercise creation. The vocabulary tests measure learning by selecting test words according to the usage analytics so that they are from outside the cloze fields of exercise sentences. The students who used the application were divided into two groups: those (N=26) who encountered the test words in the application and those (N=31) who did not. The results are being compared to those in the control group (N=8) who did not use the application. The results show that the group encountering the test words performed 11.39 percentage points better than the control group. Interestingly, the students who did not encounter the words performed 25.21 percentage points better in tests than the control group. Despite the positive results, statistical analysis revealed a significant relationship only between usage rate and encountering the test words, not between the test words and the vocabulary test results. This may be explained by the different sizes of the groups, the random way how the application selected exercises, and the fact that the students did not encounter the words often enough. The method requires many enhancements before utilising it on a larger scale. The students evaluated the application's usability to be good, and they left 18 open feedback responses, which were mostly positive.
  • Koivisto, Emma (2022)
    Niin kutsuttu helsinkiläinen ässä on 1800-luvulta lähtöisin oleva sosiolingvistinen ilmiö, jonka mukaan helsinkiläisillä suomen kielen puhujilla on muualla Suomessa asuvia terävämpi, sihisevämpi [s]-äänne. Ilmiön pitkästä historiasta huolimatta siitä ei ole aiemmin tehty foneettista tutkimusta joka selvittäisi, onko helsinkiläisten puhujien tuottama [s]-äänne akustisesti tarkasteltuna tavallista suomen kielen [s]-äännettä terävämpi. Tämän maisterintutkielman tavoitteena on tarkastella helsinkiläisten puhujien tuottamia [s]-äänteitä akustisin menetelmin. Tutkielmassa selvitetään, mitkä tekijät vaikuttavat pitkän [s]-äänteen terävyyteen, onko terävyydessä eroa mies- ja naispuhujien välillä ja ovatko helsinkiläisten tuottamat [s]-äänteet tavanomaista terävämpiä akustisesti mitattuna. Aineistona käytettiin Kielipankin tarjoaman Helsingin puhekielen pitkittäiskorpuksen vuonna 2013 kerättyä osakorpusta. Koehenkilöitä oli 13 ja heidän puheestaan poimittuja, analysoitavia pitkiä [s]-äänteitä 622 kpl. Pitkistä [s]-äänteistä mitattiin Centre of Gravity -arvo (COG), joka kuvaa, mille taajuusalueelle [s]-äänteen energia on keskimääräisesti sijoittunut. Kyseisen arvon voidaan ajatella kuvaavan [s]-äänteen terävyyttä, sillä terävässä [s]-äänteessä energia on sijoittunut korkeille taajuuksille ja vähemmän terävässä eli ääntöpaikaltaan takaisemmassa [s]-äänteessä matalammille taajuuksille. Työssä tarkasteltiin puhujan ominaisuuksien (sukupuoli, ikä, koulutustausta) sekä pitkää [s]-äännettä edeltävän vokaalin ominaisuuksien (etisyys/takaisuus, suppeus-/väljyysaste sekä pyöreys/laveus) vaikutusta [s]-äänteen COG-arvoon. Lisäksi tarkasteltiin COG-arvoltaan erityisen korkeiden (COG-arvo yli 6000 Hz) pitkien [s]-äänteiden osajoukkoa ja sitä, mitkä tekijät olivat COG-arvoltaan erityisen korkeiden [s]-äänteiden taustalla. Pitkästä [s]-äänteestä mitatun Centre of Gravity -arvon korkeuteen vaikuttivat vähintään tilastollisesti merkitsevästi niin puhujan sukupuoli, ikä, koulutustausta kuin [s]-äännettä edeltävän vokaalin etisyys/takaisuus, suppeus-/väljyysaste kuin sen pyöreys/laveus. COG-arvoltaan kaikista korkeimmat pitkät [s]-äänteet löydettiin ammattikoulutaustaisilta keski-ikäisiltä naispuhujilta edeltävän vokaalin ollessa etinen, lavea ja suppea tai puolisuppea. Naisten tuottamien [s]-äänteiden COG-arvo oli korkeampi kuin miesten, mikä tukee käsitystä, jonka mukaan helsinkiläinen ässä on erityisesti naisten puheen piirre. Tämän tutkimuksen perusteella ei voida todeta, että helsinkiläisten tuottamat [s]-äänteet olisivat keskimääräistä suomen [s]-äännettä terävämpiä, sillä aineistoon sisältyi niin Centre of Gravity -arvon kuin auditiivisen arvion perusteella monenlaisia pitkiä [s]-äänteitä. Mukana oli kuitenkin myös useita COG-arvoltaan hyvin korkeita ja auditiivisesti arvioituna hyvin teräviä [s]-äänteitä, mikä puolestaan viittaa sekä siihen, että COG-arvo olisi kelvollinen mittari [s]-äänteen terävyyden tarkastelemiseksi että ennen kaikkea siihen, että helsinkiläisellä ässällä voisi olla myös foneettista pohjaa.
  • Sidoroff, Teemu (2024)
    Topic modelling is an unsupervised machine learning method that can be used for extracting topics from a collection of documents. Topic models discover shared themes across the collection and return a distribution of words over each topic and a second distribution of topics over each document as their output. This thesis introduces and compares three different topic modelling techniques and their evaluation methods. Latent Dirichlet Allocation (LDA), Non-negative Matrix Factorization (NMF) and a type of neural topic model called Contextual Topic Model are presented and their distinguishing features are described. Then, intrinsic and extrinsic evaluation methods and metrics of the topic models are described. Intrinsic metrics such as coherence describe how interpretable the created topics are for humans. The measurement of coherence can be approximated by metrics that can be computationally calculated, which allows iterative optimisation of topic models. Finally, to complete the survey part, visualisation tools and libraries are discussed. This thesis applies these three different modelling techniques to the domain of mobile game descriptions and seeks answers to two research questions: (1) to what extent can topic modelling be used to identify latent game genres or game features? (2) How well do the genres extracted from the text descriptions correlate with the categories defined in the existing categories? First, a dataset consisting of 13,000 game descriptions and the associated metadata is constructed, and then the three different topic modelling techniques are applied. All of the models are optimised towards the best coherence metric and the results are compared. The best results, i.e. the most coherent topics, are acquired from the NMF topic model, although all techniques show promise to be effective as long as they are properly utilised. As the answer to the research questions, topic modelling is shown to help extract information about mobile games that correlate with the existing category information in the dataset and can be used to identify new facets regarding the game settings and themes.
  • Ahola, Noora (2023)
    This thesis is an investigation of inalienably possessed lexical items in the languages of the New Guinea area. Inalienable possession is a linguistic feature in languages with an alienability distinction: in these languages, there are two distinct possessive noun phrases, the choice of which depends on the semantics of the possessed noun. Inalienable possession covers those possessive relations which are conceived as inherent, whereas alienable possession pertains to more prototypical ownership relations. Cross-linguistically, then, the nouns involved in inalienable possession are kinship terms, body part nouns, and spatial nouns. Additionally, there are often language-specific concepts that are inalienably possessed. The inalienable lexical items are not semantically uniform across languages, but it is rather highly language-specific which nouns are treated as inalienable. This variation forms the core of the thesis: the objective is to examine inalienable possession from the lexical perspective to gain a detailed understanding of the semantic nature of inalienability. The study is based on a genealogically balanced sample of 23 languages. The analysis of the inalienable lexical items is based on language descriptions and dictionaries of the individual languages. The results show that in general, the semantic characteristics of inalienable possession follow the cross-linguistic tendencies: kinship nouns and body part terms are the most common inalienable lexical items. Spatial relations, however, are not as commonly inalienable. The analysis shows that it is the most prototypical nouns in their respective semantic categories that are most frequently inalienably possessed. The languages also have inalienable nouns beyond kinship terms, body part nouns, and spatial nouns, and the majority of these nouns are inalienable in one language only. Explanations to these semantic deviations are proposed, and explanations to the tendencies present in the data are also discussed. The study also briefly addresses the areal distribution of alienability distinctions. Alienability distinctions are relatively common in the languages of the New Guinea area, but the study shows that although they are attested in languages spoken across the area, they are notably lacking from the southern part of the island.
  • Kapellis, Panagiotis (2024)
    The lack of large-scale typological research on the relationship of grammatical contact effects and the social settings language contact occurs in is glaring. The recent project Linguistic Adaptation: Typological and Sociolinguistic Perspectives to Language Variation (GramAdapt) closes this research gap to a degree. It provides a typological framework for analysing language contact from both a grammatical and a sociolinguistic point of view, and focuses specifically on how intensity of social contact affects various aspects of 47 pairs of languages in contact with each other. A question that is still left open, however, is whether the linguistic evidence for contact effects reflects sociolinguistic information about the language contact situations. The goal of the following study is to determine this for one aspect of the social context of the contact scenarios, linguistic dominance. Specifically, it asks whether contact-induced changes on nominal number marking and noun phrase possession marking can be indicative of the social dominance of the change-inducing language. To answer this research question, I made use of the framework developed in the GramAdapt project to compare the likelihood for one of two given languages in contact with each other having induced contact effects in the nominal number marking and noun phrase possession marking strategies of the other. Using a similar methodology, I analysed the likelihood for one of the two languages being more dominant than the other in the social domains where language contact can take place. I did both analyses for 22 pairs of languages from across the world. I then compared these two likelihoods (for direction of linguistic contact effects and for social dominance) with each other for each pair of languages by using the Kendall's tau-b statistical test. As a result, I achieved a p-value of 0.003207, suggesting that there is a statistically significant correlation between them. This means that, based on my global sample of 22 pairs of languages in contact with each other, effects of language contact on nominal number marking and possession marking in noun phrases do indicate which language is more dominant in the social contexts of the language contact.
  • Tammilehto, Olli (2024)
    Threat processing prepares animals to act in the presence of threat, with adequate responses depending on the physical and psychological distance to the threat. Mounting evidence from human and animal studies indicate that this processing occurs in phylogenetically conserved large scale networks that include amygdala with rodent studies pointing out the particular importance of its lateral, basal, and central subnuclei in fear conditioning and extinction. Previous fMRI studies have shown that human amygdala activates both in threatening situations and situations involving members perceived as belonging to an outgroup, suggesting a link between threat and prejudice. Not every outgroup receives the same prejudice: some are deemed more threatening than others. Amygdala studies taking multiple outgroups into account have been scarce. This thesis amends the situation by investigating the relationship between outgroup types, derived from stereotype content model, and amygdala responses by using machine learning to classify fMRI responses from virtual intergroup contact task that included threatening elements and events. The results showed that the classifiers were able to distinguish outgroups above chance only in few meaningful events using data from specific regions. Classifying different task events provided evidence that amygdala is sensitive to modulations of interpersonal distance. Despite being above chance, the classifiers performed modestly in both cases. The absolute differences to the chance level were only marginal and classifiers tended to confuse some of the categories. Basal subnucleus of amygdala and the situation involving an outgroup member starting to approach the perceiver were found particularly important for successful classification. Using MVPA methods to analyse amygdala subnuclei show promise but are possibly limited by fMRI resolution.
  • Sholihat, Aliva (2023)
    Statistical learning is a universal cognitive mechanism that allows humans to detect patterns and regularities in their environment, playing a crucial role in various cognitive functions, including language acquisition. This research delved into the relationship between subjective sleep quality, measured using the PSQI (Pittsburgh Sleep Quality Index) questionnaire, and statistical language learning in adults. In two separate studies, participants' performances in statistical language learning were measured: Study 1 (N = 97) and its replication in Study 2 (N = 120). Both studies utilised the two-alternative forced choice (2AFC) recognition task, complemented by a confidence judgement rating. The results showed a significant learning effect above chance in both studies, highlighting adults' capability for statistical language learning. Explicit learning mechanisms significantly contributed to statistical language learning, highlighting the vital role of the declarative memory hippocampal-prefrontal cortex system in adult statistical language learning. Study 1 found that a logarithmic model most suitably represented the relationship between subjective sleep quality and statistical language learning performance. This model showed an initial drop in learning performance as subjective sleep quality declined, but performance stabilised with a further decline in subjective sleep quality. However, this relationship was not statistically significant in Study 2. While this research provides novel insights into the interplay between sleep quality and statistical language learning, future studies should consider subjective and objective sleep measures for a more comprehensive investigation. The research findings have implications for understanding the cognitive mechanisms underpinning language learning and the potential influence of sleep quality on these processes.
  • Laine, Emma (2024)
    This thesis studies the use of four discourse particles (jees, jes, jess, yes) and their positions in a sequence. Data of this thesis comes from Suomi24 corpus. The goal is to find out how English-origin particle yes has adapted in borrowing process to its own specific particles and do these have distinct or similar functions in Finnish. There is also the question of the significance of sequence position in consideration, for example, does the end position correlate with closing the sequence function? The theories used are conversation analysis, discourse analysis and marginally sociolinguistics. There is a consideration of English in Finland and elsewhere on the background, including pragmatic borrowing. Methodology is based on corpus-methods. 200 particles per variant (jees, jes, jess and yes) have been collected from Suomi24 corpus and imported into an Excel sheet. The frequencies and percentages will be counted. This is conducted according to sequence position (start, middle, end) and function (adjective, affirmative, e.g.). Jees is the most established particle out of all the particles, and it has the function of adjective the most. Thus, it has gone through a pragmatic borrowing process. Yes-particle is most used as an interjection. The results of these two particles correlated with earlier research the most. Jes and jess show dispersion in their functions and positions. Change-of-state-token and closing the sequence have only marginal functions. Sequence positions are not in complementary distribution while they have certain tendencies. The study illustrates that these particles, specifically jes, jess and yes need more research, for there are a lot of dispersion in both functions and positions. In fact, one limit in the study is that the spoken language influence has not been considered. These particles could be studied more in corpus and spoken language studies to gain a broader understanding of them. This study does give guidelines in regards with what kind of properties these English-based Finnish particles do have.
  • Nikula, Ottilia (2023)
    Recent progress in natural language generation tools has raised concerns that the tools are being used to generate neural fake news. Fake news impacts our society in many ways, and they have been used for monetization schemes, to tip political elections, and have been shown to have a severe effect on people’s mental health. Accordingly, being able to detect neural fake news and countering their spread is becoming increasingly important. The aim of the thesis is to explore whether there are linguistic features that can help detect neural news. Using Grover, a neural language model, I generate a set of articles based on both real and fake human-written news. I then extract a range of linguistic features, previously found to differ between human-written real and fake news, to investigate whether the same features can be used detect Grover-written news, whether there are features that can differentiate between Grover-written news, whose source material is different, and whether based on these features Grover-written news are more similar to real or fake news. The data consists of 64 articles, of which 16 are real news sourced from reputable news sites and 16 are fake news articles from the ISOT Fake News Dataset. The other 32 articles are written by Grover, with having either the real news or fake news articles as source text (16 each). A broad range of linguistic features are extracted from the article bodies and titles to capture the style, complexity, and sentiment of the articles. The features measured include punctuation, quotes, syntax tree depths, and emotion counts. The results show that the same features which have been found to differ between real and fake news, can with some limitations be used to discern Grover Fake News (Grover-written articles based on fake news). However, Grover Real News (Grover-written articles based on real news) cannot reliably be discerned from real news. Moreover, while the features measured do not provide a reliable method for discerning Grover Real News and Grover Fake News from each other, there are still noticeable differences between the two groups. Grover Fake News can be differentiated from real news, but the texts can be considered of better quality than fake news. These findings also align with previous research, showcasing that Grover is adept at re-writing misinformation and making it more credible to readers, and that feature extraction alone cannot reliably distinguish neural fake news, but that human evaluation also needs to be considered.
  • Zhixu, Gu (2023)
    Neural machine translation (NMT) has been a mainstream method for the machine translation (MT) task. Despite its remarkable progress, NMT systems still face many challenges when dealing with low-resource scenarios. Common approaches to address the data scarcity problem include exploiting monolingual data or parallel data in other languages. In this thesis, transformer-based NMT models are trained on Finnish-Simplified Chinese, a language pair with limited parallel data and the models are improved using various techniques such as hyperparameter tuning, transfer learning and back-translation. Finally, the best NMT system is an ensemble model that combines different single models. The results of our experiments also show that different hyperparameter settings can cause a performance gap of up to 4 BLEU scores. The ensemble model shows a 35% improvement over the baseline model. Overall, the experiments suggest that hyperparameter tuning is crucial for training vanilla NMT models. Back-translation offers more benefits for model improvement than the transfer learning method. The results also show that adding sampling in back-translation does not improve NMT model performance in this low-data setting. The findings may be useful for future research on low-resource NMT, especially the Finnish-Simplified Chinese MT task.
  • Matysek, Ida (2023)
    The linguistic landscape of the Podlasie region in Poland is characterized by the presence of multiple minority languages, particularly local dialects influenced by Belarusian and Ukrainian. Traditionally, Polish, Belarusian, Ukrainian, and Lithuanian languages have been spoken in the area. Currently, Polish is the majority language and Belarusian has the status of an official supporting language in 5 municipalities. As a result of extended language and culture contact multiple vernaculars (called here Podlachian Varieties) and a local identity has emerged. This sociolinguistic questionnaire-based study explores the relationship between minority language attitudes and identities found in multilingual young adults (aged 18 to 29) from Podlasie. This study adopts the poststructuralist understanding of identity as fluid, multidimensional, and socially constructed (Hall 1999, Norton 2013). As Anchimbe (2007) underlines language is an important marker of identity especially in heterogenous communities as individuals and groups need to establish their boundaries to safeguard what they perceive as their distinct characteristics. Attitudes towards a language may determine whether it will head towards extinction or preserve in the community. This study approaches the issue of minority language speakers’ attitudes using Communication Accommodation Theory, developed by Giles. In CAT individuals adjust their communication styles to either converge or diverge with others based on their social motivations, underlining either similarities or differences respectively. The analysed material was gathered through an online questionnaire in December 2020. The questionnaire consisted of 23 questions and received 391 responses, out of which 39 were discarded due to irrelevance. Two-thirds of the participants believed that Podlachian Varieties are disappearing due to passing of older generations, lack of intergenerational language transmission, and the young generation feeling ashamed of the language. Those reasons demonstrate belief in the low perceived status of the language varieties leading to a converging communication strategy towards the Polish majority, which in turn results in intergenerational language shift and identity accommodation. This confirms analysis of Barszczewska (2010), who observed integration process and language shifts in the population. Polish identity holds the dominant position among the group. Belarusian identity was seldom declared (5%). In respect of identity, divergence and assimilation tendencies can be observed. People with local identity strive to diverge from both Polish and Belarusian identities, with the stronger trend seen in diverging from Belarusian. The assimilation trend is seen in native speakers of Belarusian, as nearly half of them identified as Polish and one-third as local. In the light of this study, it is evident that the Varieties are vulnerable and if the situation does not change in the close future, their continued existence might be threatened. The occurring assimilation and language shift poses a great threat to the vitality of Podlachian Varieties and the rapidly progressing urbanization process will continue to foster the language shift towards Polish.
  • Zolotilin, Mikhail (2024)
    Language tags are additional tokens in the source corpus that indicate the language of the corresponding sentence in the target corpus. Like all words, they receive their own vector numerical representations in the translation model, which can then be used for various experiments. This work explores the use of language tag transformations in a multilingual translation model to produce mixed-language output, aiming to create an "intermediate" language variant. It delves into the nuances of interpolating between multiple languages via their embeddings and the language generation characteristics at these boundary regions. The experiments in this work were conducted with two multilingual translation models: English to Slavic languages and Slavic-to-Slavic languages, with target languages represented in both models and comparing their embeddings in vector space. The study investigates the conditions under which maximum language mixing occurs, examining how factors such as the source language, target languages, and script influence the process. It analyzes outputs from both pre-trained models and trains several models with varied features to understand how these elements affect the potential for target language mixing during interpolation. Due to the absence of reference-based automatic evaluation, the degree of mixing was assessed using a language identification model. The study also conducts a detailed qualitative linguistic analysis of the mixed generated output, examining the level and extent to which the grammar and lexicon of several languages can be mixed. Findings indicate that the extent and location of mixing vary according to different source and target languages. Notably, languages that have similar scripts but differ grammatically yielded the most interesting results, suggesting that standardizing the script across training data could enhance mixing quality. Several smaller multilingual translation models were trained from scratch, incorporating features such as alternative word segmentation (character-based) and script tags, enabling control over the script, not just the language of the output. In the case of smaller models, despite significantly less data, some common trends were observed in the interpolation with similar experiments on larger models: for example, the influence of the script. Additionally, introducing an extremely small number of alternative examples into the training corpus of the model noticeably affected its perception of the script category. The results suggest that mixing or averaging multiple language variants is viable with a uniform script, effective segmentation/encoding, sufficient data, and in-depth exploration of the spaces between embeddings to identify the most balanced and optimal interlanguage variant.
  • Lang, Sean (2024)
    Communicative efficiency principles are an area of great interest in linguistics research. Analyses are performed into determining how potentially infinite outputs of human language can be formed within the bounds of limited memory. One way in which the cognitive burden of a sentence is measured is through dependency distances. In this thesis, the idea that morphological marking could be used to alleviate communicative memory burdens was evaluated using token-based quantitative typological methods to extract tendencies of language use. Large, multilingual, labeled corpora were parsed to find and evaluate more than 300,000 simple transitive sentences for patterns of morphological agreement and case-marking in relation to dependency distances. No significant, meaningful, cross-linguistic correlation was found between morphological agreement and dependency distances when it was examined in usual patterns of sentence construction. Nor was a correlation found to suggest that marking would allow for longer dependencies in exceptional circumstances, indicating that marking was not of any assistance in alleviating memory burdens. Preliminary evidence was discovered which may suggest an inverse correlation between agreement and dependency distance, advocating for the future work into the process of ensuring agreement increasing cognitive burdens.
  • Hyttinen, Saana (2022)
    This thesis explores the language practices, attitudes, and identities of multilingual couples that use English as a lingua franca in the relationship (ELF couples). The goal is to investigate how these couples utilize their multilingual resources and if they report using translanguaging or other language mixing practices. As a part of ELF couples’ language practices, the family language practices of families formed by ELF couples as parents are also addressed. Furthermore, the study aims to find out what kinds of attitudes ELF couples have towards translanguaging, as well as how the use of English as a lingua franca shows in their language identities. Earlier research has shown that translanguaging is an essential part of the use of English as a lingua franca especially in the context of informal social contact and close relationships. However, ELF couples as a target group have been studied little and most of the research so far has been qualitative. The focus in this thesis is quantitative, and the study was conducted using an online questionnaire which received 563 suitable responses. The main findings show that while the primary language used in ELF couples’ conversations is usually English, also the partners’ first languages are used to a varying extent. Translanguaging is present in ELF couples’ language practices also in larger scale, even though varying results regarding this aspect showcase the uniqueness of individual couples’ language practices. Moreover, the couples have positive attitudes towards language mixing in general, and many of them respond to it in a relaxed manner. Regarding ELF couples’ language identities, the data shows that the couples often identify themselves as English-speakers but also multilinguals, both individually and as a couple. Consequently, English as a lingua franca seems to have an important role in the relationships, and many of the couples report difficulties in attempts or even unwillingness to change the main language of the relationship to something else than English after having started the relationship using English as a lingua franca. The results also show that language mixing is used much less in the family context when addressing children, and that children seem to be one of the main triggers for more conscious language practices.
  • Alminas, Juozas (2023)
    Adopting the narrative approach of linguistic biographies as the data collection method, this thesis explores the linguistic practices and ideologies of Tibetans living in Finland. Although the presence of many multilingual communities in Finland is known, not many studies on the topic have been done, and there hasn’t been any previous work involving Tibetan speakers. I was curious as to what Tibetans themselves think about their language and the ways to maintain it in an expatriate setting. I came to discover, that the present-day linguistic situation and linguistic attitudes can only be understood through the socio-cultural landscape of consultants’ native Sikkim in India. Through this research I hope to answer two main questions: what are Tibetans’ linguistic ideologies and how do the consultants’ multilingual practices manifest in daily life? The collected data is based on fieldwork interviews conducted with Tibetan consultants. In line with a more inclusive approach towards the linguistic fieldwork, I have tried to present the speakers through their own words, allowing them to speak for themselves. The lives of the consultants have been shaped in the highly multilingual landscape of Sikkim. The linguistic ideologies are deeply rooted within that landscape, but also within the Tibetan Buddhism. Consequently, the puristic ideologies and expectations of a good linguistic performance can sometimes overshadow and hinder Tibetan language learning. However, the demands of the present world are beginning to reshape individuals’ identities, whereby the linguistic performance is not anymore a preclusion for linguistic and ethnic belonging. In the second part of the thesis I analyze how the consultants’ linguistic ideologies have been shaped and what languages have a performative function and in what contexts. I go on to discuss the linguistic practices of the consultants and propose the label ‘translanguaging’ as the most adequate do describe their multilingual performance. The results of the study showcase a multilayered and complex linguistic and social landscape in which Tibetans live. I suggest that the studies geared towards small-scale multilingualism could offer a deeply holistic approach through which to study such landscapes and situations. Which in turn would shine more light on language vitality and its usage. The study’s findings suggest that the vitality of Tibetan language lies in its ability to adapt to the speakers’ world and mix fluidly with other languages. With this work I hope to bring forth the importance of individuals’ ideologies in studying linguistic change and contribute to our understanding of complex multilingual practices.
  • Melander, Etta (2024)
    Tämä tutkielma käsittelee tekijän häivyttämistä rikosotsikoissa Iltalehdessä ja Ilta-Sanomissa vuonna 2022. Tekijän häivyttäminen on ilmiö, jossa otsikossa tekijä taka-alaistetaan tai tekijää ei ilmaista. Tekijän häivyttäminen voi auttaa mediaa pysymään objektiivisena sekä neutraalina, mutta etenkin rikosotsikoissa tekijän häivyttäminen voi vääristää lukijoiden käsitystä tilanteesta. Tässä tutkielmassa tutkitaan, esiintyykö rikosotsikoissa tekijän häivyttämistä ja jos häivyttämistä esiintyy, missä konteksteissa. Tutkimus on osa kielen ja sukupuolen tutkimusta, mutta se on relevantti myös viestinnän tutkimukselle. Analyysi perustuu diskurssintutkimukseen, jonka keinoin analysoidaan noin 500:sta pistokoemaisesti kerätystä otsikosta koostuva data. Data on kerätty Iltalehden sekä Ilta-Sanomien paperi- ja digilehdistä vuonna 2023 ja 2024. Tavoitteena oli selvittää, mitkä tekijät ovat läsnä otsikoissa, joista tekijä häivytetään, missä tilanteissa tekijä häivytetään sekä millä keinoilla tekijän häivyttäminen tapahtuu. Aineisto osoittaa, että tekijä häivytetään otsikoista esimerkiksi silloin, kun kyse on vasta tutkinnan vaiheessa olevasta teosta. Tyypillisimmin tekijä-viittauksia vältetään kertomalla asiasta uhrin kautta. Merkittävää on kuitenkin se, että rikoksista, joissa epäilty tai tekijä on nainen, on todennäköisempää kertoa sukupuoli jo otsikossa. Miesten kohdalla rikoksista uutisoitaessa on todennäköisempää sekä rikosepäilyn että tuomion kohdalla, että tekijän tai epäillyn sukupuoli ei paljastu otsikossa, kuin verrattuna rikoksiin, joissa (epäilty) tekijä tekijä on nainen.
  • Kylliäinen, Ilmari (2022)
    Automatic question answering and question generation are two closely related natural language processing tasks. They both have been studied for decades, and both have a wide range of uses. While systems that can answer questions formed in natural language can help with all kinds of information needs, automatic question generation can be used, for example, to automatically create reading comprehension tasks and improve the interactivity of virtual assistants. These days, the best results in both question answering and question generation are obtained by utilizing pre-trained neural language models based on the transformer architecture. Such models are typically first pre-trained with raw language data and then fine-tuned for various tasks using task-specific annotated datasets. So far, no models that can answer or generate questions purely in Finnish have been reported. In order to create them using modern transformer-based methods, both a pre-trained language model and a sufficiently big dataset suitable for question answering or question generation fine-tuning are required. Although some suitable models that have been pre-trained with Finnish or multilingual data are already available, a big bottleneck is the lack of annotated data needed for fine-tuning the models. In this thesis, I create the first transformer-based neural network models for Finnish question answering and question generation. I present a method for creating a dataset for fine-tuning pre-trained models for the two tasks. The dataset creation is based on automatic translation of an existing dataset (SQuAD) and automatic normalization of the translated data. Using the created dataset, I fine-tune several pre-trained models to answer and generate questions in Finnish and evaluate their performance. I use monolingual BERT and GPT-2 models as well as a multilingual BERT model. The results show that the transformer architecture is well suited also for Finnish question answering and question generation. They also indicate that the synthetically generated dataset can be a useful fine-tuning resource for these tasks. The best results in both tasks are obtained by fine-tuned BERT models which have been pre-trained with only Finnish data. The fine-tuned multilingual BERT models come in close, whereas fine-tuned GPT-2 models are generally found to underperform. The data developed for this thesis will be released to the research community to support future research on question answering and generation, and the models will be released as benchmarks.
  • Raatikainen, Riikka (2022)
    Tutkielma käsittelee optimismivinouman esiintymistä tulevaisuusskenaarioissa, joiden aiheena on ilmastonmuutos. Siinä missä skenaariomenetelmän käyttö voi vähentää tiettyjen kognitiivisten vinoumien vaikutusta tulevaisuutta koskevissa arvioissa, toiset vinoumat voivat puolestaan haitata skenaarioiden laatimista ja arviointia. On arveltu, että useissa eri konteksteissa esiintyvä optimismivinouma näyttäytyisi myös skenaariomenetelmän yhteydessä. Tutkimus selvittää kokeellisesti, esiintyykö ilmastonmuutosaiheisten skenaarioiden arvioinneissa optimismivinoumaa, eli pitävätkö koehenkilöt positiivisia skenaarioita muita todennäköisempinä. Lisäksi tarkastellaan, onko skenaario-optimismi yhteydessä optimismivinoumaan toisessa kontekstissa mitattuna sekä muihin muuttujiin. Tutkimuskysymysten selvittämiseksi koostettiin kyselylomake, joka lähetettiin Helsingin yliopiston ainejärjestöjen sähköpostilistoille. Kyselyyn tuli 182 vastausta. Tutkittaville esitettiin neljä skenaariota, jotka vaihtelivat positiivisesta negatiiviseen, ja ne käsittelivät saimaannorpan selviytymistä ja kannan kokoa 50 vuoden päästä. Koehenkilöiden tuli asettaa skenaariot todennäköisyysjärjestykseen, jonka pohjalta kullekin vastaajalle laskettiin tietty optimistisuuden taso. Keskimäärin vastaajat olivat pessimistisiä arvioissaan, ja tämä optimistisuuslukema jäi alle neutraalina pidetyn arvon. Skenaarioarvioissa ei siis esiintynyt optimismivinoumaa. Optimismivinoumaa mitattiin myös laittamalla koehenkilöt arvioimaan eri elämäntapahtumien todennäköisyyksiä omalla kohdallaan verrattuna muihin. Näissä kysymyksissä optimismivinoumaa esiintyi, sillä vastaajat arvelivat keskimäärin kokevansa positiivisia tapahtumia muita todennäköisemmin ja negatiivisia muita epätodennäköisemmin. Elämäntapahtumaoptimismin määrä myös korreloi positiivisesti skenaario-optimismin kanssa. Lomakkeella selvitettiin myös muiden muuttujien yhteyttä skenaarioarviointien mahdolliseen optimismivinoumaan. Yleisen optimismin tasoa selvitettiin valmiilla kyselyllä, mutta tämä ei korreloinut skenaario-optimismin kanssa. Ilmastonmuutosasenne puolestaan korreloi negatiivisesti skenaario-optimismin kanssa, eli ilmastonmuutokseen vakavasti suhtautuvat arvioivat skenaarioita pessimistisemmin. Vastaajien ikä, sukupuoli tai saimaannorppatiedon määrä ei vaikuttanut skenaarioarviointeihin. Optimismivinouman puute skenaarioarvioissa oli yllättävä tulos, jonka tarkkaa syytä ei voida sanoa täsmällisesti. Tämä voi johtua joko skenaariomenetelmän kognitiivisia vinoumia vähentävästä vaikutuksesta tai skenaarioiden aiheena olleen ilmastonmuutoksen herättämistä negatiivisista mielikuvista. Olisikin tarvetta tutkia aihetta lisää edustavammalla otoksella sekä tutkimusasetelmalla, joka erottelisi skenaariomenetelmän ja ilmastonmuutosaiheen vaikutukset toisistaan. Skenaarioiden käytön kannalta optimismivinouman puute voidaan kuitenkin nähdä hyvänä asiana.
  • Salmi, Vili (2023)
    Tässä maisterintutkielmassa pyrin kuvaamaan ruotsin opettamisen lopputuloksia arkisessa ympäristössä eli kauppakeskuksissa. Tämän lisäksi pyrin kuvaamaan ruotsin statusta pääsääntöisesti pakollisena kouluaineena koskevaa keskustelua sekä ilmiötä itsessään. Aihettani kuvaakin parhaiten sana ”pakkoruotsi”, sillä aiheesta käytävä keskustelu on osasyy itse oppimistulosten heikkouteen, mutta ennen kaikkea aiheen toistuvuus ja lähes ikuinen ajan-kohtaisuus toimi kohdallani alkuperäisenä tutkimuksen alulle panneena syynä. Pyrin kuvaamaan aihetta sen ansaitsemalla monipuolisella ja moniulotteisella lähestymistavalla kontrastina aiheen pelkälle vastustamiselle ja puolustamiselle. Oma panokseni aiheeseen on lahtelaisissa ja helsinkiläisissä kauppakeskuksissa toteutettu kyselytutkimus, jossa pyrin kartoittamaan kauppakeskuksien työntekijöiden käsitystä ruotsin taidon arvostamisesta työnantajien taholta, asiakaspalvelijoiden todennäköisyyttä ainakin edes yrittää palvella ruotsia puhuvaa asiakasta ruotsiksi sekä ruotsin käytön tarvetta asiakaspalvelutyössä. Lisäksi halusin tietää kyselyyn vastanneiden ruotsinkielisen viihteen kulutuksesta sekä uskomuksista pakkoruotsikysymykseen liittyen.