Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Kinnunen, Samuli (2024)
    Chemical reaction optimization is an iterative process that targets identifying reaction conditions that maximize reaction output, typically yield. The evolution of optimization techniques has progressed from intuitive approaches to simple heuristics, and more recently, to statistical methods such as Design of Experiments approach. Bayesian optimization, which iteratively updates beliefs about a response surface and suggests parameters both exploiting conditions near the known optima and exploring uncharted regions, has shown promising results by reducing the number of experiments needed for finding the optimum in various optimization tasks. In chemical reaction optimization, the method allows minimizing the number of experiments required for finding the optimal reaction conditions. Automated tools like pipetting robots hold potential to accelerate optimization by executing multiple reactions concurrently. The integration of Bayesian optimization to automation reduces not only the workload and throughput but also optimization efficiency. However, adoption of these advanced techniques faces a barrier, as chemists often lack proficiency in machine learning and programming. To bridge this gap, Automated Chemical Reaction Optimization Software (ACROS) is introduced. This tool orchestrates an optimization loop: Bayesian optimization suggests reaction candidates, the parameters are translated into commands for a pipetting robot, the robot executes the operations, a chemist interprets the results, and data is fed back to the software for suggesting the next reaction candidates. The optimization tool was evaluated empirically using a numerical test function, in a Direct Arylation reaction dataset, and in real-time optimization of Sonogashira and Suzuki coupling reactions. The findings demonstrate that Bayesian optimization efficiently identifies optimal conditions, outperforming Design of Experiments approach, particularly in optimizing discrete parameters in batch settings. Three acquisition functions; Expected Improvement, Log Expected Improvement and Upper Confidence Bound; were compared. It can be concluded that expected improvement-based methods are more robust, especially in batch settings with process constraints.
  • Thapa Magar, Purushottam (2021)
    Rapid growth and advancement of next generation sequencing (NGS) technologies have changed the landscape of genomic medicine. Today, clinical laboratories perform DNA sequencing on a regular basis, which is an error prone process. Erroneous data affects downstream analysis and produces fallacious result. Therefore, external quality assessment (EQA) of laboratories working with NGS data is crucial. Validation of variations such as single nucleotide polymor- phism (SNP) and InDels (<50 bp) is fairly accurate these days. However, detection and quality assessment of large changes such as the copy number variation (CNV) continues to be a concern. In this work, we aimed to study the feasibility of an automated CNV concordance analysis for the laboratory EQA services. We benchmarked variants reported by 25 laboratories against the highly curated gold standard for the son (HG002/NA24385) of the askenazim trio from the Personal Genome Project published by the Genome in a Bottle Consortium (GIAB). We employed two methods to conduct concordance of CNVs, the sequence based comparison with Truvari and the in-house exome-based comparison. For deletion calls of two whole genome sequencing (WGS) submissions, Truvari gained a value greater than 88% and 68% for precision and recall respectively. Conversely, the in-house method’s precision and recall score peaked at 39% and 7.9% respectively for one WGS submission for both deletion and duplication calls. The results indicate that automated CNV concordance analysis of the deletion calls for the WGS-based callset might be feasible with Truvari. On the other hand, results for panel-based targeted sequencing for the deletion calls showed precision and recall rates ranging from 0-80% and 0-5.6% respectively with Truvari. The result suggests that automated concordance analysis of CNVs for targeted sequencing remains a challenge. In conclusion, CNV concordance analysis depends on how the sequence data is generated.
  • Ilse, Tse (2019)
    Background: Electroencephalography (EEG) depicts electrical activity in the brain, and can be used in clinical practice to monitor brain function. In neonatal care, physicians can use continuous bedside EEG monitoring to determine the cerebral recovery of newborns who have suffered birth asphyxia, which creates a need for frequent, accurate interpretation of the signals over a period of monitoring. An automated grading system can aid physicians in the Neonatal Intensive Care Unit by automatically distinguishing between different grades of abnormality in the neonatal EEG background activity patterns. Methods: This thesis describes using support vector machine as a base classifier to classify seven grades of EEG background pattern abnormality in data provided by the BAby Brain Activity (BABA) Center in Helsinki. We are particularly interested in reconciling the manual grading of EEG signals by independent graders, and we analyze the inter-rater variability of EEG graders by building the classifier using selected epochs graded in consensus compared to a classifier using full-duration recordings. Results: The inter-rater agreement score between the two graders was κ=0.45, which indicated moderate agreement between the EEG grades. The most common grade of EEG abnormality was grade 0 (continuous), which made up 63% of the epochs graded in consensus. We first trained two baseline reference models using the full-duration recording and labels of the two graders, which achieved 71% and 57% accuracy. We achieved 82% overall accuracy in classifying selected patterns graded in consensus into seven grades using a multi-class classifier, though this model did not outperform the two baseline models when evaluated with the respective graders’ labels. In addition, we achieved 67% accuracy in classifying all patterns from the full-duration recording using a multilabel classifier.
  • Valta, Akseli Eero Juhana (2023)
    Puumala orthohantavirus (PUUV) is a single stranded negative sense RNA virus, carried by the bank vole (Myodes glareolus). Like other orthohantaviruses, it does not cause visible symptoms in the host species, but when transmitted to humans, it can cause a mild version of hemorrhagic fever with renal syndrome (HFRS) called nephropathia epidemica (NE). PUUV is the only pathogenic orthohantavirus that is endemic to Finland, where it has a relatively high incidence of approximately 35 in 100 000 inhabitants or 1000 to 3000 diagnosed cases annually. Here we describe a miniaturized immunofluorescence assay (mini-IFA) for measuring antibody response against PUUV from bank vole whole blood and heart samples as well as from patient serum samples. The method outline was based on the work done by Pietiäinen et al., (2022), but it was adapted for the detection of PUUV antibodies. Transfected cells expressing the PUUV structural proteins (N, GPC, Gn and Gc) were used instead of PUUV infected cells, which allowed for performing all steps outside of bio-safety level 3 (BSL3) conditions. This method also enables the simultaneous measurement of IgM, IgA and IgG antibody response from each sample in a more efficient and higher output manner, when compared to traditional immunofluorescence methods. Our results show that the method is effective for testing large amounts of samples for PUUV antibodies and it allows for quick and convenient access to high-quality images that can be used for both detecting interesting targets for future studies, as well as producing a visual archive of the test results.
  • Aaltonen, Topi (2024)
    Positron annihilation lifetime spectroscopy (PALS) is a method used to analyse the properties of materials, namely their composition and what kind of defects they might consist of. PALS is based on the annihilation of positrons with the electrons of a studied material. The average lifetime of a positron coming into contact with a studied material depends on the density of electrons in the surroundings of the positron, with higher densities of electrons naturally resulting in faster annihilations on average. Introducing positrons in a material and recording the annihilation times results in a spectrum that is, in general, a noisy sum of exponential decays. These decay components have lifetimes that depend on the different density areas present in the material, and relative intensities that depend on the fractions of each area in the material. Thus, the problem in PALS is inverting the spectrum to get the lifetimes and intensities, a problem known as exponential analysis in general. A convolutional neural network architecture was trained and tested on simulated PALS spectra. The aim was to test whether simulated data could be used to train a network that could predict the components of PALS spectra accurately enough to be usable on spectra gathered from real experiments. Reasons for testing the approach included trying to make the analysis of PALS spectra more automated and decreasing user-induced bias compared to some other approaches. Additionally, the approach was designed to require few computational resources, ideally being trainable and usable on a single computer. Overall, testing showed that the approach has some potential, but the prediction performance of the network depends on the parameters of the components of the target spectra, with likely issues being similar to those reported in previous literature. In turn, the approach was shown to be sufficiently automatable, particularly once training has been performed. Further, while some bias is introduced in specifying the variation of the training data used, this bias is not substantial. Finally, the network can be trained without considerable computational requirements within a sensible time frame.
  • Kovanen, Veikko (2020)
    Real estate appraisal, or property valuation, requires strong expertise in order to be performed successfully, thus being a costly process to produce. However, with structured data on historical transactions, the use of machine learning (ML) enables automated, data-driven valuation which is instant, virtually costless and potentially more objective compared to traditional methods. Yet, fully ML-based appraisal is not widely used in real business applications, as the existing solutions are not sufficiently accurate and reliable. In this study, we introduce an interpretable ML model for real estate appraisal using hierarchical linear modelling (HLM). The model is learned and tested with an empirical dataset of apartment transactions in the Helsinki area, collected during the past decade. As a result, we introduce a model which has competitive predictive performance, while being simultaneously explainable and reliable. The main outcome of this study is the observation that hierarchical linear modelling is a very potential approach for automated real estate appraisal. The key advantage of HLM over alternative learning algorithms is its balance of performance and simplicity: this algorithm is complex enough to avoid underfitting but simple enough to be interpretable and easy to productize. Particularly, the ability of these models to output complete probability distributions quantifying the uncertainty of the estimates make them suitable for actual business use cases where high reliability is required.
  • Kallonen, Leo (2020)
    RPA (Robotic process automation) is an emerging field in software engineering that is applied in a wide variety of industries to automate repetitive business processes. While the tools to create RPA projects have evolved quickly, testing in these projects has not yet received much attention. The purpose of this thesis was to study how the regression testing of RPA projects created using UiPath could be automated while avoiding the following most common pitfalls of test automation projects: unreliability, too high cost, lack of re-usable components and too difficult implementation. An automated regression test suite was created as a case study with UiPath for an existing RPA project that is currently being tested manually. The results imply that UiPath can be used to also create the regression test suite, not just the RPA project. The automated test suite could be used to run all the tests in the regression test suite that is currently run manually. The common test automation project pitfalls were also mostly avoided: the structure of the project can be re-used for other test projects, the project can recover from unexpected errors and the implementation of the tests does not require a high level of programming knowledge. The main challenge proved to be the implementation cost which was increased by the longer then expected test development time. Another finding was that the measures taken to address test automation project pitfalls will likely work only with RPA projects that are simpler or as complex as the sample RPA project. With more complex projects, there will also likely be more challenges with test data creation. As a result, for complex projects, manual regression testing could be a better option.
  • Vainio, Antero (2020)
    Nowadays the Internet is being used as a platform for providing a wide variety of different services. That has created challenges related to scaling IT infrastructure management. Cloud computing is a popular solution for scaling infrastructure, either by building a self-hosted cloud or by using cloud platform provided by external organizations. This way some the challenges related to large scale can be transferred to the cloud administrators. OpenStack is a group of open-source software projects for running cloud platforms. It is currently the most commonly used software for building private clouds. Since initially published by NASA and Rackspace, it has been used by various organizations such as Walmart, China Mobile and Cern nuclear research institute. The largest production deployments of OpenStack clouds consist of thousands of physical server computers located in multiple datacenters. The OpenStack community has created many deployment methods that take advantage of automated software configuration management. The deployment methods are built with state of the art software for automating different administrative tasks. They take different approaches to automating infrastructure management for OpenStack. This thesis compares some of the automated deployment methods for OpenStack and examines the benefits of using automation for configuration management. We present comparisons based on technical documentations as well as reference literature. Additionally, we conducted a questionnaire for OpenStack administrators about the use of automation. Lastly, we tested one of the deployment methods in a virtualized environment.
  • Stenudd, Juho (2013)
    This Master's Thesis describes one example on how to automatically generate tests for real-time protocol software. Automatic test generation is performed using model-based testing (MBT). In model-based testing, test cases are generated from the behaviour model of the system under test (SUT). This model expresses the requirements of the SUT. Many parameters can be varied and test sequences randomised. In this context, real-time protocol software means a system component of Nokia Siemens Networks (NSN) Long Term Evolution (LTE) base station. This component, named MAC DATA, is the system under test (SUT) in this study. 3GPP has standardised the protocol stack for the LTE eNodeB base station. MAC DATA implements most of the functionality of the Medium Access Control (MAC) and Radio Link Control (RLC) protocols, which are two protocols of the LTE eNodeB. Because complex telecommunication software is discussed here, it is challenging to implement MBT for the MAC DATA system component testing. First, the expected behaviour of a system component has to be modelled. Because it is not smart to model everything, the most relevant system component parts that need to be tested have to be discovered. Also, the most important parameters have to be defined from the huge parameter space. These parameters have to be varied and randomised. With MBT, a vast number of different kind of users can be created, which is not reasonable in manual test design. Generating a very long test case takes only a short computing time. In addition to functional testing, MBT is used in performance and worst-case testing by executing a long test case based on traffic models. MBT has been noticed to be suitable for challenging performance and worst-case testing. This study uses three traffic models: smartphone-dominant, laptop-dominant and mixed. MBT is integrated into continuous integration (CI) system, which automatically runs MBT test case generations and executions overnight. The main advantage of the MBT implementation is the possibility to create different kinds of users and simulate real-life system behaviour. This way, hidden defects can be found from test environment and SUT.
  • Lehtimäki, Laura (2019)
    The assessment of nonverbal interaction is currently based on observations, interviews and questionnaires. The quantitative methods for assessment of nonverbal interaction are few. Novel technology allows new ways to perform assessment, and new methods are constantly being developed. Many of them are based on movement tracking by sensors, cameras and computer vision. In this study the use of OpenPose, a pose estimation algorithm, was investigated in detection of nonverbal interactional events. The aim was to find out whether the same meaningful interactional events could be found from videos by the algorithm and by human annotators. Another purpose was to find out the best way to annotate the videos in a study like this. The research material consisted of four videos of a child and a parent blowing soap bubbles. The videos were first run by OpenPose to track the poses of the child and the parent frame by frame. The data obtained by the algorithm was further processed by Matlab to extract the activities of the child and the parent, the coupling of the activities and the closeness of child’s and parent’s hands at each time point. The videos were manually annotated in two different ways: Both the basic units, such as the gaze directions and thehandling soap bubble jar, and the interactional events, such as communication initiatives, turn-taking and joint attention, were annotated. The results obtained by the algorithm were visually compared to annotations. The communication initiatives and turn-taking could be seen as peaks in hand closeness and as alternation in activities. However, interaction events were not the only reasons that caused changes in hand closeness and in activities, so they could not be distinguished from other actions solely by these factors. There also existed interaction that was not related to jar handling, which could not be seen from the hand closeness curves. With current recording arrangements, the gaze directions could not be detected by the algorithm and therefore the moments of joint attention could not be determined either. In order to enable the detection of gaze directions in the future studies, the faces of subjects are visible all the time. Distinguishing individual interaction events may not be the best way to assess interaction, and the focus of assessment should be in global units, such as synchrony between interaction partners. The best way to annotate the videos depends on the aim of the study.
  • Lintunen, Milla (2023)
    Fault management in mobile networks is required for detecting, analysing, and fixing problems appearing in the mobile network. When a large problem appears in the mobile network, multiple alarms are generated from the network elements. Traditionally Network Operations Center (NOC) process the reported failures, create trouble tickets for problems, and perform a root cause analysis. However, alarms do not reveal the root cause of the failure, and the correlation of alarms is often complicated to determine. If the network operator can correlate alarms and manage clustered groups of alarms instead of separate ones, it saves costs, preserves the availability of the mobile network, and improves the quality of service. Operators may have several electricity providers and the network topology is not correlated with the electricity topology. Additionally, network sites and other network elements are not evenly distributed across the network. Hence, we investigate the suitability of a density-based clustering methods to detect mass outages and perform alarm correlation to reduce the amount of created trouble tickets. This thesis focuses on assisting the root cause analysis and detecting correlated power and transmission failures in the mobile network. We implement a Mass Outage Detection Service and form a custom density-based algorithm. Our service performs alarm correlation and creates clusters of possible power and transmission mass outage alarms. We have filed a patent application based on the work done in this thesis. Our results show that we are able to detect mass outages in real time from the data streams. The results also show that detected clusters reduce the number of created trouble tickets and help reduce of the costs of running the network. The number of trouble tickets decreases by 4.7-9.3% for the alarms we process in the service in the tested networks. When we consider only alarms included in the mass outage groups, the reduction is over 75%. Therefore continuing to use, test, and develop implemented Mass Outage Detection Service is beneficial for operators and automated NOC.
  • Suomalainen, Lauri (2019)
    Hybrid Clouds are one of the most notable trends in the current cloud computing paradigm and bare-metal cloud computing is also gaining traction. This has created a demand for hybrid cloud management and abstraction tools. In this thesis I identify shortcomings in Cloudify’s ability to handle generic bare-metal nodes. Cloudify is an open- source vendor agnostic hybrid cloud tool which allows using generic consumer-grade computers as cloud computing resources. It is not however capable to automatically manage joining and parting hosts in the cluster network nor does it retrieve any hardware data from the hosts, making the cluster management arduous and manual. I have designed and implemented a system which automates cluster creation and management and retrieves useful hardware data from hosts. I also perform experiments using the system which validate its correctness, usefulness and expandability.
  • Gafurova, Lina (2018)
    Automatic fall detection is a very important challenge in the public health care domain. The problem primarily concerns the growing population of the elderly, who are at considerably higher risk of falling down. Moreover, the fall downs for the elderly may result in serious injuries or even death. In this work we propose a solution for fall detection based on machine learning, which can be integrated into a monitoring system as a detector of fall downs in image sequences. Our approach is solely camera-based and is intended for indoor environments. For successful detection of fall downs, we utilize the combination of the human shape variation determined with the help of the approximated ellipse and the motion history. The feature vectors that we build are computed for sliding time windows of the input images and are fed to a Support Vector Machine for accurate classification. The decision for the whole set of images is based on additional rules, which help us restrict the sensitivity of the method. To fairly evaluate our fall detector, we conducted extensive experiments on a wide range of normal activities, which we used to oppose the fall downs. Reliable recognition rates suggest the effectiveness of our algorithm and motivate us for improvement.
  • Sutinen, Marjo (2017)
    Tämä Pro gradu -tutkielma käsittelee monivalintamuotoisten aukkotehtävien automaattista generointia suomen kielen sanataivutuksen harjoittelua varten. Aukkotehtävät ovat suosittu formaatti kielen opiskelussa ja kielitaidon arvioinnissa. Koska ne ovat muodoltaan melko hyvin kontrolloituja, niiden laatimisen automatisointi on ollut useiden akateemisten ja kaupallisten projektien tavoitteena viimeisten parin vuosikymmenen ajan. Tehtävä on osoittautunut haasteelliseksi. Jos aukkotehtävä generoidaan yksinkertaisesti poistamalla lauseesta sana, ja antamalla sen täyttäminen opiskelijalle tehtäväksi, käy helposti niin, ettei tehtävä ole mielekäs: usein näin tuotettuun aukkoon sopii monta vaihtoehtoista sanaa tai rakennetta. Yksi suurimmista haasteista aukkotehtävien generoinnissa on siis niin sanottu “aukkojen luotettavuus”: sen varmistaminen, että aukkoon sopiva ja epäsopiva vastaus pystytään erottamaan toisistaan. Yksi tapa varmistaa tämä on rajoittaa mahdollisten vastausten joukkoa antamalla vastausvaihtoehtoja, joiden tiedetään olevan vääriä. Tällöin automaattisen generoinnin haasteeksi nousee vääräksi tiedettyjen vaihtoehtojen löytäminen. Väärät vaihtoehdot eivät kuitenkaan saa olla sitä liian ilmeisellä tavalla: oikean vaihtoehdon valitsemisen täytyy muodostaa mielekäs haaste opiskelijalle. Tutkielmani pääasiallinen tavoite on tutkia luotettavien ja potentiaalisesti haastavien monivalintamuotoisten aukkotehtävien generoimista suomen kielen sanataivutuksen opiskelua varten. Kokeellisessa osiossa testaamaani metodia on aiemmin sovellettu menestyksekkäästi verrattavaan tarkoitukseen englannin kielen prepositioiden kontekstissa. Metodissa etsitään suuresta tekstikorpuksesta sellaisia prepositioita, jotka esiintyvät usein yhden aukon kontekstisanan kollokaationa, mutteivat koskaan kahden kontekstisanan kollokaationa samaan aikaan. Tavoitteeni on osoittaa, että metodia voi soveltaa myös suomen kielen taivutustehtävien generoimiseen. Testaan myös erityyppisten korpusten käyttöä tehtävän suorittamisessa, nimittäin yhtäältä peräkkäisyyteen perustuvia n-grammeja ja toisaalta syntaktiseen dependenssirakenteeseen perustuvia n-grammeja. Kokeellisen työn lisäksi erittelen työssäni kattavasti erilaisia tapoja muodostaa taivutusaukkotehtäviä, ja esittelen keksimäni aukkotehtävämallin. Keskeisin löydökseni on, että kyseisellä metodilla pystyy lisäämään aukkotehtävien luotettavuutta merkittävästi: sellaisissa testitapauksissa, joissa käytetty data on muutaman yksinkertaisen kriteerin mukaisesti arvioituna riittävää, jopa 80 % alun perin epäluotettavista aukoista muuttuu luotettaviksi. Lopussa pohdin tehtävien haasteellisuuden evaluointia sekä riittämättömän datan kysymyksiä. Mitä jälkimmäiseen tulee, argumentoin, että vaikka esille tulleiden datan riittävyyteen liittyvien haasteiden ratkaiseminen parantaisi tuloksia nykyisestään, voi metodia pitää tarkoitukseen sopivana jo sellaisenaan.
  • Huusari, Riikka (2016)
    This study is part of the TEKES funded Electric Brain -project of VTT and University of Helsinki where the goal is to develop novel techniques for automatic big data analysis. In this study we focus on studying potential methods for automated land cover type classification from time series satellite data. Developing techniques to identify different environments would be beneficial in monitoring the effects of natural phenomena, forest fires, development of urbanization or climate change. We tackle the arising classification problem with two approaches; with supervised and unsupervised machine learning methods. From the former category we use a technique called support vector machine (SVM), while from the latter we consider Gaussian mixture model clustering technique and its simpler variant, k-means. We introduce the techniques used in the study in chapter 1 as well as give motivation for the work. The detailed discussion of the data available for this study and the methods used for analysis is presented in chapter 2. In that chapter we also present the simulated data that is created to be a proof of concept for the methods. The obtained results for both the simulated data and the satellite data are presented in chapter 3 and discussed in chapter 4, along with the considerations for possible future works. The obtained results suggest that the support vector machines could be suitable for the task of automated land cover type identification. While clustering methods were not as successful, we were able to obtain as high as 93 % accuracy with the data available for this study with the supervised implementation.
  • Vehomäki, Varpu (2022)
    Social media provides huge amounts of potential data for natural language processing but using this data may be challenging. Finnish social media text differs greatly from standard Finnish and models trained on standard data may not be able to adequately handle the differences. Text normalization is the process of processing non-standard language into its standardized form. It provides a way to both process non-standard data with standard natural language processing tools and to get more data for training new tools for different tasks. In this thesis I experiment with bidirectional recurrent neural network models and models based on the ByT5 foundation model, as well as the Murre normalizer to see if existing tools are suitable for normalizing Finnish social media text. I manually normalize a small set of data from the Ylilauta and Suomi24 corpora to use as a test set. For training the models I use the Samples of Spoken Finnish corpus and Wikipedia data with added synthetic noise. The results of this thesis show that there are no existing tools suitable for normalizing Finnish written on social media. There is a lack of suitable data for training models for this task. The ByT5-based models perform better than the BRNN models.
  • Puonti, Oula (2012)
    Magnetic resonance imaging (MRI) provides spatially accurate, three dimensional structural images of the human brain in a non-invasive way. This allows us to study the structure and function of the brain by analysing the shapes and sizes of different brain structures in an MRI image. Morphometric changes in different brain structures are associated with many neurological and psychiatric disorders, for example Alzheimer's disease. Tracking these changes automatically using automated segmentation methods would aid in diagnosing a particular brain disease and follow its progression. In this thesis we present a method for automatic segmentation of MRI brain scans using parametric generative models and Bayesian inference. Our method segments a given MRI scan to 41 different structures including for example hippocampus, thalamus and ventricles. In contrast to the current state-of-the-art methods in whole-brain segmentation, our method does not pose any constraints on the MRI scanning protocol used to acquire the images. Our model is based on two parts: the first part is a labeling model that models the anatomy of the brain and the second part is an imaging model that relates the label images to intensity images. Using these models and Bayesian inference we can find the most probable segmentation of a given MRI scan. We show how to train the labeling model using manual segmentations performed by experts and how to find optimal imaging model parameters using expectation-maximization (EM) optimizer. We compare our automated segmentations against expert segmentations by means of Dice scores and point out places for improvement. We then extend the labeling and imaging models and show, using a database consisting of MRI scans of 30 subjects, that the new models improve the segmentations compared to the original models. Finally we compare our method against the current state-of-the-art segmentation methods. The results show that the new models are an improvement over the old ones, and compare fairly well against other automated segmentation methods. This is encouraging, because there is still room for improvement in our models. The labeling model was trained using only nine expert segmentations, which is quite a small amount, and the automated segmentations should improve as the number of training samples grows. The upside of our method is that it is fast and generalizes straightforwardly to MRI images with varying contrast properties.
  • Gold, Ayoola (2021)
    The importance of Automatic Speech Recognition cannot be underestimated in today’s worlds as they play a significant role in human computer interaction. ASR systems have been studied deeply over time, but their maximum potential is yet to be explored for the Finnish language. Development of a traditional ASR system involves a number of hand-crafted engineering which has made this technology quite difficult and resourceful to develop. However, with advancements in the field of neural networks, end-to-end ASR neural networks can be developed which can automatically learn the mappings of audio to its corresponding transcript., therefore reducing hand crafted engineering requirements. End-to-end neural network ASR systems have been largely developed commercially by tech giants such as Microsoft, Google and Amazon. However, there are limitations to these commercial services such as data privacy and cost of usage. In this thesis, we explored existing studies in the development of an end-to-end neural network ASR for Finnish language. One successful technique utilized in the development of neural network ASR in the advent of inadequate data is Transfer learning. This is the approach explored in this thesis for the development of the end-to-end neural network ASR system. In addition, the success of this approach was evaluated. In order to achieve this purpose, dataset collected from the Finnish Bank of Finland and Kaggle were used to fine-tune Mozilla DeepSpeech model which is a pretrained end-to-end neural network ASR in English language. The results obtained by fine-tuning the pretrained neural network ASR in English for Finnish language showed a word error rate as low as 40% and character error rate as low as 22%. We therefore concluded that transfer learning is a successful technique for creating ASR model for a new language using a pretrained model in another language with little effort, data and resources.
  • Kjellman, Martin (2021)
    This thesis examines how representatives of service providers for news automation perceive a) journalists and news organisations and b) the service providers’ relationship to these. By introducing new technology (natural language generation, i.e. the transformation of data into everyday language) that influences both the production and business models of news media, news automation represents a type of media innovation. The service providers represent actors peripheral to journalism. The theoretical framework takes hybrid media logics as its starting point, meaning that the power dynamics of news production are thought to be influenced by the field-specific logics of the actors involved. The hybridity metaphor is deepened by using a typology for journalistic strangers that takes into account the different roles peripheral actors adopt in relation to journalists and news organisations. Journalism is understood throughout as a professional ideology encountered by service providers who work with news organisations. Semi-structured interviews were conducted with representatives from companies that create natural language generation software used to produce journalistic text based on data. Participants were asked about their experiences working with news media and the interviews (N=6) were analysed phenomenologically. The findings form three distinct but interrelated dimensions of how the service providers perceive news media and journalism: an area that sorely needs innovators (potential) but lacks resources in terms of knowledge, money and will to innovate (obstacles), but one that they can ultimately learn from and collaborate with (solutions). Their own relationship to journalism and news media is not fixed to one single role. Instead, they alternate between challenging news media (explicit interloping) and inhabiting a supportive role (implicit interloping). This thesis serves as an exploration into how service providers for news automation affect the power dynamics of news production. It does so by unveiling how journalists and news organisations are perceived, and by adding further understanding to previous research on actors peripheral to journalism. In order to further untangle how service providers for news automation shift the balance of power shaping news production, future research should attempt to unify the way traditional news media actors and service providers perceive each other and their collaborations.
  • Steen, Iida (2021)
    Digitalisaation myötä rahoituspalvelut tulevat yhä suuremman joukon saataville. Palvelujen digitaalinen käyttö lisääntyy jatkuvasti, ja markkinatoimijat innovoivat uusia liiketoimintamalleja. Digitaaliset palvelut lisäävät mittakaavaetuja, jolloin yhä laadukkaampia palveluja on saatavilla edullisemmin. EU:n tavoitteena on edistää digitaalisten palvelujen käyttöönottoa kuluttajien ja yritysten eduksi ja tulla johtajaksi digitaaliseksi toimijaksi. Samalla muutokseen liittyviä riskejä on pyrittävä hallinnoimaan sijoittajansuoja varmistaen. Digitalisaation myötä myös sijoitusneuvonta on alkanut automatisoitua. Automatisoidussa sijoitusneuvonnassa sijoitussuosituksen tuottaa algoritmi, eikä ihmisten välistä vuorovaikutusta ole välttämättä ollenkaan. Automatisaatio demokratisoi sijoituspalveluja ja siihen liittyy huomattavia tehokkuusetuja palvelun edullisuuden ansiosta. Toisaalta uudenlainen liiketoimintamalli saattaa tuoda mukanaan sijoittajansuojaa vaarantavia riskejä. Tutkielmassa on pyritty hahmottamaan automatisoitua sijoitusneuvontaa koskeva eurooppaoikeudellinen sääntelykehys, jolloin keskeiseen asemaan nousee rahoitusvälineiden markkinoista annettu direktiivi (MiFID II). Sama sääntely kattaa niin perinteisen kuin automatisoidunkin sijoitusneuvonnan, minkä vuoksi tutkielmassa on pyritty luomaan digitaalisten palveluiden erityispiirteet huomioon ottavia tulkintasuosituksia. Tutkielman ollessa eurooppaoikeudellinen, tulkinnassa korostuvat EU-oikeuden tavoitteet, periaatteet ja EU-oikeudelle tyypillinen tulkintametodiikka, erityisesti teleologinen tulkinta. EU-oikeuden tavoitteiden taloudellisen luonteen vuoksi tutkielmassa painottuu oikeustaloustiede sekä paikoin myös oikeustaloustieteellisesti orientoitunut sääntelyteoria. Sijoituspalvelun tarjoajaan kohdistuvien MiFID II -direktiivin mukaisten velvoitteiden lisäksi tutkielmassa tarkastellaan relevantin sääntelyn roolia kansallisessa täytäntöönpanossa. Automatisoitua sijoitusneuvontaa rasittaa palvelun rajat ylittävän luonteen vuoksi korostunut sääntelyarbitraasin riski, joka johtuu kansallisen täytäntöönpanon eriävyydestä jäsenvaltioiden välillä. Lisäksi oikeuksien täytäntöönpanoon liittyy tiettyjä teknologian käytöstä johtuvia ongelmia, kuten uudenlaisia vastuukysymyksiä ja algoritmin läpinäkymättömyydestä johtuvia ongelmia näytön esittämisessä. Tältä osin tutkielma esittelee MiFID II -direktiiville tulkintatapoja, jotka johtavat mahdollisimman yhdenmukaiseen oikeuksien täytäntöönpanoon. Tutkielmassa päädytään esittämään, että teleologisesti tulkiten sama sääntely voi onnistuneesti olla sovellettavissa niin perinteiseen kuin automatisoituun sijoitusneuvontaan. Tulkinta ei kaikissa tapauksissa ole ongelmatonta, mutta uutta EU-sääntelyä on aiheen tiimoilta jo tulossa. Tuleva sääntely saattaa vähentää niin tutkielmassa havaittuja tulkintaongelmia kuin sääntelyarbitraasin riskiä.