Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Lai, Ching Kwan (2012)
    Gas chromatography is considered to be a powerful tool for studying a wide range of phys-icochemical parameters. Gas chromatography is able to separate components from a mixture and determine the physicochemical properties simultaneously; only simple modifications of the gas chromatographic instrument is required for various measurement conditions, and the advances in technology has improved the gas chromatographic hardware for measurement. However, the analysis of aerosols or other complex samples requires excellent separation efficiency, which may not be achieved by a single chromatographic technique alone. Therefore, comprehensive two-dimensional gas chromatography is a very powerful separation technique that can be used for complex samples. The focus of this thesis was the measurement of physicochemical parameters using gas chromatographic techniques. The experimental part focused on the development of a new method for simultaneous determination of vapor pressures for several compounds in complex matrices. The main aim was to develop a technique for utilization of comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry in order to study unknown compounds with similar structures from aerosol sample. In addition, several approaches for the vapor pressure determination were studied for some environmentally important compounds. The development of the measurement technique included isothermal and temperature-programmed gas chromatography with different polar and derivatized homologous series as reference. The new technique for vapor pressure determination using comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry was successfully developed. It is a simple, rapid and versatile method that can be used to analyze other complex samples without extensive sample pretreatment. Inverse gas chromatography has become a powerful technique in evaluating properties of materials, including their physicochemical properties. However, this thesis focused on the potential of other gas chromatographic techniques such as those utilizing Kováts' retention indices, headspace-gas chromatography techniques and the gas-stripping method to measure some selected thermodynamic parameters. The basic fundamental theories of the methods, experimental practices and their applications were described.
  • Kari, Eetu (2013)
    Biological interactions have been studied by many different techniques. One of the frequently applied techniques is affinity capillary electrophoresis (ACE), which has proven to be effective, fast and simple, especially for the elucidation of molecular non-covalent interactions and for the determination of binding constants between receptors and ligands. However, traditional ACE has disadvantages, when highly UV absorbing compounds, such as proteins are studied and when a limited amount of sample is available. Fortunately, these disadvantages can be overcome by partial-filling affinity capillary electrophoresis (PFACE), which was developed to minimize the consumption of sample. Different PFACE methods have been successfully utilized for the interaction studies between different biological compounds including proteins, ligands, D-alanine-D-alanine terminus peptides and glycopeptide antibiotics. Literature part of this M.Sc. thesis concentrates on different PFACE methods and on their utilization for the clarification of different biological interactions. In the experimental part the applicability of two different capillary electrophoretic systems, capillary electrochromatography and PFACE, to the studies on high-density lipoprotein (HDL), apolipoprotein (apo) A-I, phospholipid transfer protein (PLTP) and cholesteryl ester transfer protein (CETP) was clarified. Because all of these biomolecules have links to the development of atherosclerosis by removing cholesterol from peripheral cells, their investigation is an important topic. Both capillary electromigration techniques demonstrated that not only apoA-I of HDL, but also lipids play a role in the interactions with PLTP and CETP.
  • Niittymäki, Henri Kalervo (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2007)
  • Kemppainen, Teemu (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2007)
    Real-time scheduling algorithms, such as Rate Monotonic and Earliest Deadline First, guarantee that calculations are performed within a pre-defined time. As many real-time systems operate on limited battery power, these algorithms have been enhanced with power-aware properties. In this thesis, 13 power-aware real-time scheduling algorithms for processor, device and system-level use are explored.
  • Nurmela, Janne (2022)
    The quantification of carbon dioxide emissions pose a significant and multi-faceted problem for the atmospheric sciences as a part of the research regarding global warming and greenhouse gases. Emissions originating from point sources, referred to as plumes, can be simulated using mathematical and physical models, such as a convection-diffusion plume model and a Gaussian plume model. The convection-diffusion model is based on the convection-diffusion partial differential equation describing mass transfer in diffusion and convection fields. The Gaussian model is a special case or a solution for the general convection-diffusion equation when assumptions of homogeneous wind field, relatively small diffusion and time independence are made. Both of these models are used for simulating the plumes in order to find out the emission rate for the plume source. An equation for solving the emission rate can be formulated as an inverse problem written as y=F(x)+ε where y is the observed data, F is the plume model, ε is the noise term and x is an unknown vector of parameters, including the emission rate, which needs to be solved. For an ill-posed inverse problem, where F is not well behaved, the solution does not exist, but a minimum norm solution can be found. That is, the solution is a vector x which minimizes a chosen norm function, referred to as a loss function. This thesis focuses on the convection-diffusion and Gaussian plume models, and studies both the difference and the sensibility of these models. Additionally, this thesis investigates three different approaches for optimizing loss functions: the optimal estimation for linear model, Levenberg–Marquardt algorithm for non-linear model and adaptive Metropolis algorithm. A goodness of different fits can be quantified by comparing values of the root mean square errors; the better fit the smaller value the root mean square error has. A plume inversion program has been implemented in Python programming language using the version 3.9.11 to test the implemented models and different algorithms. Assessing the parameters' effect on the estimated emission rate is done by performing sensitivity tests for simulated data. The plume inversion program is also applied for the satellite data and the validity of the results is considered. Finally, other more advanced plume models and improvements for the implementation will be discussed.
  • Utoslahti, Aki (2022)
    Lempel-Ziv factorization of a string is a fundamental tool that is used by myriad data compressors. Despite its optimality regarding the number of produced factors, it is rarely used without modification, for reasons of its computational cost. In recent years, Lempel-Ziv factorization has been a busy research subject, and we have witnessed the state-of-the-art being completely changed. In this thesis, I explore the properties of the latest suffix array-based Lempel-Ziv factorization algorithms, while I experiment with turning them into an efficient general-purpose data compressor. The setting of this thesis is purely exploratory, guided by reliable and repeatable benchmarking. I explore all aspects of the suffix array-based Lempel-Ziv data compressor. I describe how the chosen factorization method affects the development of encoding and other components of a functional data compressor. I show how the chosen factorization technique, together with capabilities of modern hardware, allows determining the length of the longest common prefix of two strings over 80% faster compared to the baseline approach. I also present a novel approach to optimizing the encoding cost of the Lempel-Ziv factorization of a string, i.e., bit-optimality, using a dynamic programming approach to the Single-Source Shortest Path problem. I observed that, in its current state, the process of suffix array construction is a major computational bottleneck in suffix array-based Lempel-Ziv factorization. Additionally, using a suffix array to produce a Lempel-Ziv factorization leads to optimality regarding the number of factors, which does not necessarily correspond to bit-optimality. Finally, a comparison with common third-party data compressors revealed that relying exclusively on Lempel-Ziv factorization prevents reaching the highest compression efficiency. For these reasons, I conclude that current suffix array-based Lempel-Ziv factorization is unsuitable for general-purpose data compression.
  • Mikkola, Santeri (2024)
    The question of reunification, or ‘the Taiwan issue’, stands as one of the paramount geopolitical conundrums of the 21st century. China asserts that Taiwan is an inalienable part its historical geo-body and socio-cultural chronicles under the unifying idea of ‘Chineseness’. Nevertheless, since Taiwan’s democratization process began to thrive in the 1990s, perceptions of national identity have diverged drastically from those in mainland China. Corollary, the appeal for reunification in Taiwan is almost non-existent, and hence achieving peaceful unification under the ‘one country, two systems’ proposal seems highly unlikely. Furthermore, the United States assumes a pivotal role in cross-strait geopolitics, intricately tangling the question of Taiwan into the broader scheme of great power politics. This thesis examines the intricate dynamics of the Taiwan issue by analyzing the practical geopolitical reasoning of the PRC intellectuals of statecraft over Taiwan. The theoretical and methodological foundations of this study draw from critical geopolitics and critical discourse analysis. The primary empirical research materials comprise the three Taiwan white papers published by the PRC. In addition, the analysis is supplemented by other official documents as well as vast array of research literature published on cross-strait geopolitics. Building upon Ó Tuathail’s theorization of practical geopolitical reasoning, the paper presents the ‘grammar of geopolitics’ of the Taiwan issue from the perspective of the PRC. Within this analytical framework, three guiding geopolitical storylines were identified: 1) Historical Sovereignty, 2) National Unity under ‘Chineseness’, and 3) Separatism and External Powers as Antagonist Forces. The results reveals that the CCP has constructed the imperative of reunification as an historically and geographically bound inevitability. Nevertheless, China's increasing geopolitical anxiety over achieving the objective of reunification with Taiwan is evidential in its discourses. This increasing geopolitical anxiety is likely to compel the CCP to adopt more coercive actions in the near and mid-term future if it deems it necessary. Given the developments in Taiwan, Sino-U.S. relations and domestically in China, it seems probable that pressure on Taiwan will continue to mount throughout the 2020s. Much of the strategic calculations and geopolitical discourses constructed regarding the Taiwan issue can be attributed to the CCP's concerns about its own legitimacy to rule. Within its geopolitical discourses, the issue of reunification is rendered to an existential question for China and arguably it constitutes a significant part of the modern CCP’s raison d'être. China’s increasing self-confidence as a superpower is continually trembling the dynamics of international affairs and the geopolitical landscape, particularly within the Indo-Pacific region. Consequently, the project of Chinese geopolitics remains an unfinished business, and warrants further contributions from researchers in the field of critical geopolitics.
  • Pöllänen, Joonas (2021)
    This master’s thesis attempts to examine views on Finland’s security environment among Finnish security experts and analyse these views through the framework of critical geopolitics. Theoretically, the thesis draws both from earlier literature on perceived state security threats to Finland and the research on security-geopolitics relationship within critical geopolitics. The thesis utilizes Q methodology, a relatively little-known approach with a long history and an active userbase in social sciences. The purpose of the methodology is to study personal viewpoints, in other words, subjectivities, among a selected group of people, the participants of the study. Q methodology employs both qualitative and quantitative methods, and the result of a Q methodological research is a number of discourses, which can be further analysed. The group of participants whose views were examined consisted of nine geopolitical experts and policymakers, all of whom were civil servants of the Finnish Ministry of Foreign Affairs and the Finnish Defence Forces. Three separate discourse were distinguished in this group, on top of which there was a consensus in some issues examined. One of the resulting discourses, which was especially widespread among participants from the Defence Forces, viewed Russia as Finland’s geopolitical Other. According to this discourse, Finland’s security would be highly dependent on this Other, even though it may not be a realistic security threat at the moment. This view is in line with a traditional geopolitical discourse in Finland. Another discourse, which was common among the participants from the Ministry of Foreign Affairs, emphasized internal security threats and democracy’s role for security, while it seemingly downplayed Russia’s role. A third discourse, on the other hand, highlighted non-state security issues, such as terrorism. The consensus discourse among the group of participants viewed the European Union strongly as the primary geopolitical framework of Finland. Even though two of the three individual discourses did not highlight Russia’s role, there was an indirectly implied consensus that Finland should not seek close cooperation with Russia in important security matters, such as cybersecurity
  • Lassila, Petri (2021)
    Lipid-based solid-fat substitutes (such as oleogels) structurally modified using ultrasonic standing waves (USW), have recently been shown to potentially increase oleogel storage-stability. To enable their potential application in food products, pharmaceuticals, and cosmetics, practical and economical production methods are needed compared to previous work, where USW treated oleogel production was limited to 50-500 µL. The purpose of this work is to improve upon the previous procedure of producing structurally modified oleogels via the use of USW by developing a scaled up and convenient approach. To this aim, three different USW chamber prototypes were designed and developed, with common features in mind to: (i) increase process volumes to 10-100 mL, (ii) make the sample extractable from the treatment chamber, (iii) avoid contact between the sample and the ultrasonic transducer. Imaging of the internal structure of USW treated oleogels was used as the determining factor of successful chamber design. The best design was subsequently used to produce USW treated oleogels, of which the bulk mechanical properties were studied using uniaxial compression tests, along with local mechanical properties, investigated using scanning acoustic microscopy. Results elucidated the mechanical behaviour of oleogels as foam-like. Finally, the stability of treated oleogels was compared to control samples using an automated image analysis oil release test. This work enables the effective mechanical-structural manipulation of oleogels in volumes of 10-100 mL, paving the way to possible large-scale lipid-based materials USW treatments.
  • Kekkonen, Tuukka (2021)
    The sub-λ/2 focusing, also known as super resolution, is widely studied in optics, but only few practical realizations are done in acoustics. In this contribution, I show a novel way to produce sub- λ/2 focusing in the acoustic realm. I used an oil-filled cylinder immersed in liquid to focus an incident plane wave into a line focus. Three different immersion liquids were tested: water, olive oil, and pure ethanol. In addition to the practical experiment, we conducted a series of finite element simulations, by courtesy of Joni Mäkinen, to compare to the experimental results.
  • Hickman, Brandon (2016)
    The aim of this thesis is the development of the highest quality quantitative precipitation estimate (QPE) for the Helsinki urban area through the creation of a quality controlled multi-radar composite. Weather radars suffer from a number of errors, and these are typically compounded when located near urban areas. Through the use of radar calibration, and several quality control methods, the three Helsinki area's radars (Kerava, Kumpula, Vantaa) were composited and a blended QPE was created. The three C-band dual-polarimetric weather radars were calibrated through the self-consistency theory which relates Z, Zdr, to Kdp. The calibration was conducted over several summer days in 2014 for each radar, and all were found to be under-calibrated by about 2 dB. The influence of rain on top of the radome was also examined and found that wet radome attenuation can produce several dB offset in calibration. Composites of Z and Kdp used weights were created to correct for non-hydrometeor class, beam blockage, attenuation of the beam, radome attenuation, range, and ground clutter. Noise in Kdp from light rain is reduced through utilizing the self-consistency theory. Composited reflectivity created significant improvements by filling data gaps, reducing errors, as well as providing additional observations for each echo. However minor errors, such as multi-trip echoes and speckle proceeded to the composite. The blended QPE was created from the composite data to obtain the benefits of R(Kdp) in heavy rain and hail, while R(Z) was used in cases of light rain. Rain rate algorithms developed for the Helsinki climate were implemented to calculate the rain intensity for the selected precipitation type, which were determined through a series of threshold values obtained from literature. R(Kdp) allows better estimation in for heavy rain and hail because of being closer related to the hydrometers diameter as well as being immune to a number of errors present in Z. The QPE created in this project provides improved precipitation intensity due to the use of multiple corrected radars. This data can be used for improved urban run-off modeling, emergency warnings and weather forecasting. However the method presented here is only applicable for liquid/mixed liquid and hail precipitation because of the impact of frozen hydrometeor on the dual-polarimetric parameters. Additional quality control methods and different precipitation estimates would be required for winter time precipitation.
  • Anni, Andelin (2023)
    Predator—prey models can be studied from several perspectives each telling its own story about real-life phenomena. For this thesis the perspective chosen, is to include prey—rescue to the standoff between the predator and the prey. Prey--rescue is seen in the nature for many species, but to point one occurrence out, the standoff between a hyena and a lion. When a lion attacks a hyena, the herd of the hyena try to frighten the lion away. The rescue attempt can either be successful or a failure. In this thesis the prey-rescue model is derived for an individual rescuer and for a group of prey. For both cases, the aim is to derive the functional and numerical responses of the predator, but the focus is on the deriving and studying of the functional responses. First, a brief background to motivate the study of this thesis is given. The indroduction goes through the most important aspects of predator—prey modelling and gives an example of a simple, but broadly known Lotka—Volterra predator-prey model. The study begins with the simplest case of prey-rescue, the individual prey—rescue. First, the individual level states, their processes and all the assumptions of the model are introduced. Then, the model is derived and reduced with timescale separation to achieve more interpretable results. The functional response is formed after solving the quasi-equilibrium of the model. It was found that this way of constructing the model gives the popular Holling Type II functional response. Then, it is examined what follows when more and more prey get involved to the standoff trying to rescue the individual being attacked by. This is studied in three different time-scales: ultra—fast, intermediate, and slow timescales. The process of deriving the model and the functional response is like in the simple case of individual prey rescue, but the calculations get more intense. The functional response was found to be uninteresting. In conclusion, the model was adjusted. One of the timescales is left out from the studies in hopes for more interesting results. The derivation came out similar as in the third chapter, but with more advanced calculations and different results of quasi-equilibrium and functional response. The functional response obtained, was found to be worth of studying in a detailed fashion. This detailed study of the functional response obtained, is done last. It was found that different parameter choices affect the shape of the functional response. The parameters were chosen to be biologically relevant. Assuming that the rescue is certain for the group size n = 2, it was found that the functional response took a humpback form for some choices of the other parameters. The parameter ranges, for which the functional response had a humpback shape, were found.
  • Falk, Sebastian (2018)
    The idea underlying this thesis is to use data gathered by building management systems to build machine learning models in order to improve these systems. Our goal is to create models which can use data from multiple different sensors as its input and output some predictions about that data. We will then use these predictions when implementing new applications. At our disposal we have data gathered by both motion sensors as well as carbon dioxide (\ce{CO2}) sensors. This data is gathered at regular intervals, and will be in the form of time-series, after some transformations, which is the first topic we cover. We want to improve the systems to which these sensors are connected. For a concrete example we can consider the ventilation systems which control the air-conditioning. They usually have \ce{CO2} sensors connected to them. By keeping an eye on the \ce{CO2} value the system is able to adjust the air flow when the value becomes too high. The problem with this is that when that value is reached it takes some time before it is again lowered to a normal level. If we were able to predict when this value will begin to rise the system could increase the airflow beforehand, meaning that it can avoid reaching the threshold level. This improves the effectiveness of the system, making the air quality constantly stay at a comfortable level. Another example is the lighting control systems which commonly have some motion detection sensors which control the lights. A motion detection event occurs when one of these sensors sees some movement. Sensors are connected to one or multiple luminaires, turning the luminaires on when an event happens. The luminaires also turn off automatically after a set amount of time. Being able to predict when these events happen would make it possible to turn on the lights before a person actually walks into the room in question. The system would also be able to turn off the lights if it knows that no one will be in the room, which means that the lights will not be on unnecessarily. For creating these models we will be using multiple different prediction methods. In the thesis we will discuss some time-series forecasting models such as the autoregressive integrated moving average model as well as supervised learning algorithms. The supervised learning models we will cover are decision tree models, random forest models, feedforward neural network models as well as a recurrent neural network model called long short-term memory. We will explain how all of these models are created as well as how they can be used for time-series prediction on the data which we have at our disposal.
  • Kailamäki, Kalle (2022)
    This thesis explores predicting current prices of individual agricultural fields in Finland based on historical data. The task is to predict field prices accurately with the data we have available while keeping model predictions interpretable and well explainable. The research question is to find which out of several different models we try out is most optimal for the task. The motivation behind this research is the growing agricultural land market and the lack of publicly available field valuation services that can assist market participants to determine and identify reasonable asking prices. Previous studies on the topic have used standard statistics to establish relevant factors that affect field prices. Rather than creating a model whose predictions can be used on their own in every case, the primary purpose of previous works has indeed been to identify information that should be considered in manual field valuation. We, on the other hand, focus on the predictive ability of models that do not require any manual labor. Our modelling approaches focus mainly but not exclusively on algorithms based on Markov–Chain Monte Carlo. We create a nearest neighbors model and four hierarchical linear models of varying complexity. Performance comparisons lead us to recommend a nearest neighbor -type model for this task.
  • Zetterman, Elina (2024)
    When studying galaxy formation and evolution, the relationship between galaxy properties and dark matter halo properties are important, since galaxies form and evolve within these halos. This relationship can be figured out using numerical simulations, but unfortunately, they are computationally expensive and require vast amounts of computational resources. This provides incentive to use machine learning instead, since training a machine learning model requires significantly less time and resources. If machine learning could be used to predict galaxy properties from halo properties, numerical simulations would still be needed to find the halo population, but the more expensive hydrodynamical simulations would no longer be necessary. In this thesis, we use data from the IllustrisTNG hydrodynamical simulation to train five different types of machine learning models. The goal is to predict four different galaxy properties from multiple halo properties, and measure how accurate and reliable the predictions are. We also compare the different types of models with each other to find out which ones have the best performance. Additionally, we calculate confidence intervals for the predictions to evaluate the uncertainty of the models. We find that out of the four galaxy properties, stellar mass is the easiest to predict, whereas color is the most difficult one. From the five different types of models, light gradient boosting is in all cases either the best performing model, or its performance is almost as good as that of the best performing model. This, combined with the fact that training this type of model is extremely fast, light gradient boosting has good potential to be utilized in practice.
  • Holmström, Axi (2016)
    Quantum Neural Networks (QNN) were used to predict both future steering wheel signals and upcoming lane departures for N=34 drivers undergoing 37 h of sleep deprivation. The drivers drove in a moving-base truck simulator for 55 min once every third hour, resulting in 31 200 km of highway driving, out of which 8 432 km were on straights. Predicting the steering wheel signal one time step ahead, 0.1 s, was achieved with a 15-40-20-1 time-delayed feed-forward QNN with a root-mean-square error of RMSEtot = 0.007 a.u. corresponding to a 0.4 % relative error. The best prediction of the number of lane departures during the subsequent 10 s was achieved using the maximum peak-to-peak amplitude of the steering wheel signal from the previous ten 1 s segments as inputs to a 10-15-5-1 time-delayed feed-forward QNN. A correct prediction was achieved in 55 % of cases and the overall sensitivity and specificity were 31 % and 80 %, respectively.
  • Kangassalo, Lauri (2020)
    We study the effect of word informativeness on brain activity associated with reading, i.e. whether the brain processes informative and uninformative words differently. Unlike most studies that investigate the relationship between language and the brain, we do not study linguistic constructs such as syntax or semantics, but informativeness, an attribute statistically computable from text. Here, informativeness is defined as the ability of a word to distinguish the topic to which it is related to. For instance, the word 'Gandhi' is better at distinguishing the topic of India from other topics than the word 'hot'. We utilize Electroencephalography (EEG) data recorded from subjects reading Wikipedia documents of various topics. We report two experiments: 1) a neurophysiological experiment investigating the neural correlates of informativeness and 2) a single-trial Event-Related brain Potential (ERP) classification experiment, in which we predict the word informativeness from brain signals. We show that word informativeness has a significant effect on the P200, P300, and P600 ERP-components. Furthermore, we demonstrate that word informativeness can be predicted from ERPs with a performance better than a random baseline using a Linear Discriminant Analysis (LDA) classifier. Additionally, we present a language model -based statistical model that allows the estimation of word informativeness from a corpus of text.
  • Ranimäki, Jimi (2023)
    It is important that the financial system retains its functionality throughout the macroeconomic cycle. When people lose their trust in banks the whole economy can face dire consequences. Therefore accurate and stable predictions of the expected losses of borrowers or loan facilities are vital for the preservation of a functioning economy. The research question of the thesis is: What effect does the choice of calibration type have on the accuracy of the probability of default values predictions. The research question is attempted to be answered through an elaborate simulation of the whole probability of default model estimation exercise, with a focus on the rank order model calibration to the macroeconomic cycle. Various calibration functions are included in the study to offer more diversity in the results. Furthermore, the thesis provides insight into the regulatory environment of financial institutions, presenting relevant articles from accords, regulations and guidelines by international and European supervisory agents. In addition, the thesis introduces statistical methods for model calibration to the long-run average default rate. Finally, the thesis studies the effect of calibration type on the probability of default parameter estimation. The investigation itself is done by first simulating the data and then by applying multiple different calibration functions, including two logit functions and two Bayesian models to the simulated data. The simulation exercise is repeated 1 000 times for statistically robust results. The predictive power was measured using mean squared error and mean absolute error. The main finding of the investigation was that the simple grades perform unexpectedly well in contrast to raw predictions. However, the quasi moment matching approach for the logit function generally resulted in higher predictive power for the raw predictions in terms of the error measures, besides against the captured probability of default. Overall, simple grades and raw predictions yielded similar levels of predictive power, while the master scale approach lead to lower numbers. It is reasonable to conclude that the best selection of approaches according to the investigation would be the quasi moment matching approach for the logit function either with simple grades or raw predictions calibration type, as the difference in the predictive power between these types was minuscule. The calibration approaches investigated were significantly simplified from actual methods used in the industry, for example, calibration exercises mainly focus on the derivation of the correct long-run average default rate over time and this study used only the central tendency of the portfolio as the value.
  • Harju, Esa (2019)
    Teaching programming is increasingly more widespread and starts at primary school level on some countries. Part of that teaching consist of students writing small programs that will demonstrate learned theory and how various things fit together to form a functional program. Multiple studies indicate that programming is difficult skill to learn and master. Some part of difficulty comes from plethora of concepts that students are expected to learn in relatively short time. Part of practicing to write programs involves feedback, which aids students’ learning of assignment’s topic, and motivation, which encourages students to continue the course and their studies. For feedback it would be helpful to know students’ opinion of a programming assignment difficulty. There are few studies that attempt to find out if there is correlation between metrics that are obtained from students’ writing a program and their reported difficulty of it. These studies use statistical models on data after the course is over. This leads to an idea if such a thing could be done while students are working on programming assignments. To do this some sort of machine learning model would be possible solution but as of now no such models exist. Due to this we will utilize idea from one of these studies to create a model, which could do such prediction. We then improve that model, which is coarse, with two additional models that are more tailored for the job. Our main results indicate that this kind of models show promise in their prediction of a programming assignment difficulty based on collected metrics. With further work these models could provide indication of a student struggling on some assignment. Using this kind of model as part of existing tools we could provide a student subtle help before his frustration grows too much. Further down the road such a model could be used to provide further exercises, if need by a student, or progress forward once he masters certain topic.
  • Sundquist, Henri (2024)
    Acute myeloid leukemia (AML) is a disease in which blood cell production is severely disrupted. Cell count and morphological analysis from bone marrow (BM) samples are key in the diag- nosis of AML. Recent advances in computer vision have led to algorithms developed at the Hematoscope Lab that can automatically classify cells from these BM samples and calculate various cell-level statistics. This thesis investigated the use of cytomorphological data along with standard clinical data to predict progression-free survival (PFS). A benchmark study using penalized Cox regression, random survival forests, and survival support vector machines was conducted to study the utility of cytomorphology data. As features greatly outnumber samples, the methods are further compared over three feature filtering methods based on Spearman’s correlation coefficient, conditional Cox screening, and mutual information. In a dataset from the national VenEx trial, the penalized Cox regression method with ElasticNet penalization supplemented with Cox conditional screening was found to perform best in the nested CV benchmarking. A post-hoc dissection of two best-performing Cox models revealed potentially predictive cytomorphological features, while disease etiology and patient age were likewise important.