Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Life Science Informatics -maisteriohjelma"

Sort by: Order: Results:

  • Niinikoski, Eerik (2020)
    The aim of this thesis is to predict total career racing performance of Finnish trotter horses by using trotters early career racing performance and other early career variables. This thesis presents a brief introductory of harness racing and horses used in Finnish trotting sport. The data is presented and modified for predictions, with descriptive statistics of tables and visuals. The machine learning method of Random forests for regression is introduced and used in the predictions. After training the model, this thesis presents the prediction accuracy and variables of importance of the predictions of total career racing performance for both Finnhorse trotters and Finnish Standardbred trotter population. Finally, the writer discusses on the shortages and possible improvements for future research. The data for this thesis was provided by The Finnish trotting and breeding association (Suomen Hippos ry), which included all information of harness races from 1984 to the end of 2019, raced in Finland. From almost three million rows, the data was summarised to a data table of 46704 rows of trotters, that have started their career at earliest allowed three age groups. A total of 37 independent variables were used to predict three outcomes of total career earnings, total number of career starts and total number of career first placings, as separate models. The predictors are derived from other studies that estimate the environmental and genetic factors of racing performance of a trotter. The three models performed poor to moderate, with total earnings having the highest prediction accuracy. The model predicted quite well larger amounts of earnings, but was avid to predict some earnings when there in fact were none. Prediction accuracy of total number of starts was poor, especially when the true amount of starts was low. Model that predicted total number of career first placings performed the worst. This can partially be explained by the fact that winning is a rare event for a trotter in general. The models fit better for Finnish Standardbred trotters than for Finnhorse trotters. This thesis works as a good basis for future similar research, where massive amounts of data and machine learning is used to predict trotter’s career, racing performance or other factors. The results show that predicting total career racing performance as a classification problem could be a better fit than regression. These adequate classes, as well as possible better predictors and suitable imputes for missing values, should be consulted with an audience of superior knowledge in harness racing.
  • Koch, Bradley (2024)
    Renewable energy is the key for a sustainable future in a world currently run by coal and oil, and one of these sources could be bioelectrochemical systems [McCormick et. al., Energy Environ. Sci, 2015]. This is very different from traditional renewable energy sources, in that traditionally the process for generating the solar cells requires exotic material, or has a relatively extensive manufacturing process [Ren et. al., Solar Energy, 2020]. One type of these bioelectrochemical systems are biophotovoltaic systems, which utilize solar energy and water to produce electrons or other reducing agents outside of the organism, which can then be harvested for external usage [McCormick et. al., Energy Environ. Sci, 2015]. This type of system has many different focuses to improve efficiency, including substrate design, reactor design, and electrode properties [Anam et. al., Sustainable Energy Fuels, 2021]. While these are important, there is another avenue to be explored, namely the exoelectrogenesis pathway itself [Okedi et. al., bioRxiv, 2021]. This pathway analysis has been explored briefly with Hilbert-Huang transforms to figure out their oscillatory components, which has been partially mapped to photosystem II core expression [Okedi et. al., bioRxiv, 2021]. In my analysis, I will be using generated data from cyanobacteria which exhibit enhanced photosystem II and see if the exact mechanisms for this phenomenon can be captured. The data provided by the sequencing vendor comes in a FASTA Extension format, so the process and tools to translate this data into usable variant calling format files will be described. I will then iterate the additional analysis in the way of variant comparisons through strain concordance with gene comparisons, as well as phylogenetic trees. The first analysis is to compare a wild type to a mutated strain, with subsequent analysis being to compare multiple wild type strains to each other. Further analysis on phenotype expression compared to the variant calling will also be explored.