Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Kalinauskaite, Kristina"

Sort by: Order: Results:

  • Kalinauskaite, Kristina (2023)
    In this study, I use the elements of two machine translation quality evaluation approaches: test suites and error analysis. I conduct a surname-focused error analysis of machine-translated news text segments. Surname rendering is not a widely studied topic in machine translation studies but deserves attention. Machine translation of infrequent words, such as surnames, has a higher chance of resulting in translation errors. Furthermore, morphologically rich languages, like Finnish, have a higher chance for inaccurate translation due to the way they form words. This makes surnames in Finnish texts an interesting subject of study. The nature of my study is descriptive, and the goal is to gain a better understanding of surname rendering challenges in Finnish-to-English machine translation. My dataset is based on a news texts corpus that consists of news items in Finnish, sent to other media channels by the Finnish News Agency (Suomen Tietotoimisto) between 2019 and 2021. I compiled a dataset of 4,000 surname-containing segments and translated them from Finnish to English using two free web-based neural machine translation engines, DeepL and Google Translate. Afterwards, I identified the errors and categorised them. I analysed the surname rendition errors from different perspectives. Most of the surnames in news segments were rendered correctly, however, error analysis still offers interesting insights. Most of the errors came from only two error categories, both of which are characterised by incorrect lemmatisation of the surname. Although surnames in nominative made up more than a half of all the surnames in the dataset, most were rendered correctly. The highest number of errors came from the second biggest grammatical case group, genitive. None of the surnames in the sentence-initial position were translated as common nouns, even though quite a few of them were based on root words. Both translation engines had a similar number of incorrect surname renditions, with a total number of 191 for DeepL and 186 for Google Translate. Additionally, they displayed similar patterns of error distribution when comparing different grammatical case groups, and surname origin. However, there were some differences as well. Google Translate showed more case ending errors in its output compared to DeepL. On the other hand, DeepL had more than double the number of errors in the nonsensical output category compared to Google Translate. The overrepresentation of the Finnish surnames in the error list, as well as the two biggest error categories that are closely linked to Finnish morphology, demonstrate that Finnish surnames are, indeed, problematic when it comes to surname rendition.