Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Niittymäki, Henri Kalervo (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2007)
  • Kemppainen, Teemu (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2007)
    Real-time scheduling algorithms, such as Rate Monotonic and Earliest Deadline First, guarantee that calculations are performed within a pre-defined time. As many real-time systems operate on limited battery power, these algorithms have been enhanced with power-aware properties. In this thesis, 13 power-aware real-time scheduling algorithms for processor, device and system-level use are explored.
  • Nurmela, Janne (2022)
    The quantification of carbon dioxide emissions pose a significant and multi-faceted problem for the atmospheric sciences as a part of the research regarding global warming and greenhouse gases. Emissions originating from point sources, referred to as plumes, can be simulated using mathematical and physical models, such as a convection-diffusion plume model and a Gaussian plume model. The convection-diffusion model is based on the convection-diffusion partial differential equation describing mass transfer in diffusion and convection fields. The Gaussian model is a special case or a solution for the general convection-diffusion equation when assumptions of homogeneous wind field, relatively small diffusion and time independence are made. Both of these models are used for simulating the plumes in order to find out the emission rate for the plume source. An equation for solving the emission rate can be formulated as an inverse problem written as y=F(x)+ε where y is the observed data, F is the plume model, ε is the noise term and x is an unknown vector of parameters, including the emission rate, which needs to be solved. For an ill-posed inverse problem, where F is not well behaved, the solution does not exist, but a minimum norm solution can be found. That is, the solution is a vector x which minimizes a chosen norm function, referred to as a loss function. This thesis focuses on the convection-diffusion and Gaussian plume models, and studies both the difference and the sensibility of these models. Additionally, this thesis investigates three different approaches for optimizing loss functions: the optimal estimation for linear model, Levenberg–Marquardt algorithm for non-linear model and adaptive Metropolis algorithm. A goodness of different fits can be quantified by comparing values of the root mean square errors; the better fit the smaller value the root mean square error has. A plume inversion program has been implemented in Python programming language using the version 3.9.11 to test the implemented models and different algorithms. Assessing the parameters' effect on the estimated emission rate is done by performing sensitivity tests for simulated data. The plume inversion program is also applied for the satellite data and the validity of the results is considered. Finally, other more advanced plume models and improvements for the implementation will be discussed.
  • Utoslahti, Aki (2022)
    Lempel-Ziv factorization of a string is a fundamental tool that is used by myriad data compressors. Despite its optimality regarding the number of produced factors, it is rarely used without modification, for reasons of its computational cost. In recent years, Lempel-Ziv factorization has been a busy research subject, and we have witnessed the state-of-the-art being completely changed. In this thesis, I explore the properties of the latest suffix array-based Lempel-Ziv factorization algorithms, while I experiment with turning them into an efficient general-purpose data compressor. The setting of this thesis is purely exploratory, guided by reliable and repeatable benchmarking. I explore all aspects of the suffix array-based Lempel-Ziv data compressor. I describe how the chosen factorization method affects the development of encoding and other components of a functional data compressor. I show how the chosen factorization technique, together with capabilities of modern hardware, allows determining the length of the longest common prefix of two strings over 80% faster compared to the baseline approach. I also present a novel approach to optimizing the encoding cost of the Lempel-Ziv factorization of a string, i.e., bit-optimality, using a dynamic programming approach to the Single-Source Shortest Path problem. I observed that, in its current state, the process of suffix array construction is a major computational bottleneck in suffix array-based Lempel-Ziv factorization. Additionally, using a suffix array to produce a Lempel-Ziv factorization leads to optimality regarding the number of factors, which does not necessarily correspond to bit-optimality. Finally, a comparison with common third-party data compressors revealed that relying exclusively on Lempel-Ziv factorization prevents reaching the highest compression efficiency. For these reasons, I conclude that current suffix array-based Lempel-Ziv factorization is unsuitable for general-purpose data compression.
  • Mikkola, Santeri (2024)
    The question of reunification, or ‘the Taiwan issue’, stands as one of the paramount geopolitical conundrums of the 21st century. China asserts that Taiwan is an inalienable part its historical geo-body and socio-cultural chronicles under the unifying idea of ‘Chineseness’. Nevertheless, since Taiwan’s democratization process began to thrive in the 1990s, perceptions of national identity have diverged drastically from those in mainland China. Corollary, the appeal for reunification in Taiwan is almost non-existent, and hence achieving peaceful unification under the ‘one country, two systems’ proposal seems highly unlikely. Furthermore, the United States assumes a pivotal role in cross-strait geopolitics, intricately tangling the question of Taiwan into the broader scheme of great power politics. This thesis examines the intricate dynamics of the Taiwan issue by analyzing the practical geopolitical reasoning of the PRC intellectuals of statecraft over Taiwan. The theoretical and methodological foundations of this study draw from critical geopolitics and critical discourse analysis. The primary empirical research materials comprise the three Taiwan white papers published by the PRC. In addition, the analysis is supplemented by other official documents as well as vast array of research literature published on cross-strait geopolitics. Building upon Ó Tuathail’s theorization of practical geopolitical reasoning, the paper presents the ‘grammar of geopolitics’ of the Taiwan issue from the perspective of the PRC. Within this analytical framework, three guiding geopolitical storylines were identified: 1) Historical Sovereignty, 2) National Unity under ‘Chineseness’, and 3) Separatism and External Powers as Antagonist Forces. The results reveals that the CCP has constructed the imperative of reunification as an historically and geographically bound inevitability. Nevertheless, China's increasing geopolitical anxiety over achieving the objective of reunification with Taiwan is evidential in its discourses. This increasing geopolitical anxiety is likely to compel the CCP to adopt more coercive actions in the near and mid-term future if it deems it necessary. Given the developments in Taiwan, Sino-U.S. relations and domestically in China, it seems probable that pressure on Taiwan will continue to mount throughout the 2020s. Much of the strategic calculations and geopolitical discourses constructed regarding the Taiwan issue can be attributed to the CCP's concerns about its own legitimacy to rule. Within its geopolitical discourses, the issue of reunification is rendered to an existential question for China and arguably it constitutes a significant part of the modern CCP’s raison d'être. China’s increasing self-confidence as a superpower is continually trembling the dynamics of international affairs and the geopolitical landscape, particularly within the Indo-Pacific region. Consequently, the project of Chinese geopolitics remains an unfinished business, and warrants further contributions from researchers in the field of critical geopolitics.
  • Pöllänen, Joonas (2021)
    This master’s thesis attempts to examine views on Finland’s security environment among Finnish security experts and analyse these views through the framework of critical geopolitics. Theoretically, the thesis draws both from earlier literature on perceived state security threats to Finland and the research on security-geopolitics relationship within critical geopolitics. The thesis utilizes Q methodology, a relatively little-known approach with a long history and an active userbase in social sciences. The purpose of the methodology is to study personal viewpoints, in other words, subjectivities, among a selected group of people, the participants of the study. Q methodology employs both qualitative and quantitative methods, and the result of a Q methodological research is a number of discourses, which can be further analysed. The group of participants whose views were examined consisted of nine geopolitical experts and policymakers, all of whom were civil servants of the Finnish Ministry of Foreign Affairs and the Finnish Defence Forces. Three separate discourse were distinguished in this group, on top of which there was a consensus in some issues examined. One of the resulting discourses, which was especially widespread among participants from the Defence Forces, viewed Russia as Finland’s geopolitical Other. According to this discourse, Finland’s security would be highly dependent on this Other, even though it may not be a realistic security threat at the moment. This view is in line with a traditional geopolitical discourse in Finland. Another discourse, which was common among the participants from the Ministry of Foreign Affairs, emphasized internal security threats and democracy’s role for security, while it seemingly downplayed Russia’s role. A third discourse, on the other hand, highlighted non-state security issues, such as terrorism. The consensus discourse among the group of participants viewed the European Union strongly as the primary geopolitical framework of Finland. Even though two of the three individual discourses did not highlight Russia’s role, there was an indirectly implied consensus that Finland should not seek close cooperation with Russia in important security matters, such as cybersecurity
  • Lassila, Petri (2021)
    Lipid-based solid-fat substitutes (such as oleogels) structurally modified using ultrasonic standing waves (USW), have recently been shown to potentially increase oleogel storage-stability. To enable their potential application in food products, pharmaceuticals, and cosmetics, practical and economical production methods are needed compared to previous work, where USW treated oleogel production was limited to 50-500 µL. The purpose of this work is to improve upon the previous procedure of producing structurally modified oleogels via the use of USW by developing a scaled up and convenient approach. To this aim, three different USW chamber prototypes were designed and developed, with common features in mind to: (i) increase process volumes to 10-100 mL, (ii) make the sample extractable from the treatment chamber, (iii) avoid contact between the sample and the ultrasonic transducer. Imaging of the internal structure of USW treated oleogels was used as the determining factor of successful chamber design. The best design was subsequently used to produce USW treated oleogels, of which the bulk mechanical properties were studied using uniaxial compression tests, along with local mechanical properties, investigated using scanning acoustic microscopy. Results elucidated the mechanical behaviour of oleogels as foam-like. Finally, the stability of treated oleogels was compared to control samples using an automated image analysis oil release test. This work enables the effective mechanical-structural manipulation of oleogels in volumes of 10-100 mL, paving the way to possible large-scale lipid-based materials USW treatments.
  • Kekkonen, Tuukka (2021)
    The sub-λ/2 focusing, also known as super resolution, is widely studied in optics, but only few practical realizations are done in acoustics. In this contribution, I show a novel way to produce sub- λ/2 focusing in the acoustic realm. I used an oil-filled cylinder immersed in liquid to focus an incident plane wave into a line focus. Three different immersion liquids were tested: water, olive oil, and pure ethanol. In addition to the practical experiment, we conducted a series of finite element simulations, by courtesy of Joni Mäkinen, to compare to the experimental results.
  • Hickman, Brandon (2016)
    The aim of this thesis is the development of the highest quality quantitative precipitation estimate (QPE) for the Helsinki urban area through the creation of a quality controlled multi-radar composite. Weather radars suffer from a number of errors, and these are typically compounded when located near urban areas. Through the use of radar calibration, and several quality control methods, the three Helsinki area's radars (Kerava, Kumpula, Vantaa) were composited and a blended QPE was created. The three C-band dual-polarimetric weather radars were calibrated through the self-consistency theory which relates Z, Zdr, to Kdp. The calibration was conducted over several summer days in 2014 for each radar, and all were found to be under-calibrated by about 2 dB. The influence of rain on top of the radome was also examined and found that wet radome attenuation can produce several dB offset in calibration. Composites of Z and Kdp used weights were created to correct for non-hydrometeor class, beam blockage, attenuation of the beam, radome attenuation, range, and ground clutter. Noise in Kdp from light rain is reduced through utilizing the self-consistency theory. Composited reflectivity created significant improvements by filling data gaps, reducing errors, as well as providing additional observations for each echo. However minor errors, such as multi-trip echoes and speckle proceeded to the composite. The blended QPE was created from the composite data to obtain the benefits of R(Kdp) in heavy rain and hail, while R(Z) was used in cases of light rain. Rain rate algorithms developed for the Helsinki climate were implemented to calculate the rain intensity for the selected precipitation type, which were determined through a series of threshold values obtained from literature. R(Kdp) allows better estimation in for heavy rain and hail because of being closer related to the hydrometers diameter as well as being immune to a number of errors present in Z. The QPE created in this project provides improved precipitation intensity due to the use of multiple corrected radars. This data can be used for improved urban run-off modeling, emergency warnings and weather forecasting. However the method presented here is only applicable for liquid/mixed liquid and hail precipitation because of the impact of frozen hydrometeor on the dual-polarimetric parameters. Additional quality control methods and different precipitation estimates would be required for winter time precipitation.
  • Anni, Andelin (2023)
    Predator—prey models can be studied from several perspectives each telling its own story about real-life phenomena. For this thesis the perspective chosen, is to include prey—rescue to the standoff between the predator and the prey. Prey--rescue is seen in the nature for many species, but to point one occurrence out, the standoff between a hyena and a lion. When a lion attacks a hyena, the herd of the hyena try to frighten the lion away. The rescue attempt can either be successful or a failure. In this thesis the prey-rescue model is derived for an individual rescuer and for a group of prey. For both cases, the aim is to derive the functional and numerical responses of the predator, but the focus is on the deriving and studying of the functional responses. First, a brief background to motivate the study of this thesis is given. The indroduction goes through the most important aspects of predator—prey modelling and gives an example of a simple, but broadly known Lotka—Volterra predator-prey model. The study begins with the simplest case of prey-rescue, the individual prey—rescue. First, the individual level states, their processes and all the assumptions of the model are introduced. Then, the model is derived and reduced with timescale separation to achieve more interpretable results. The functional response is formed after solving the quasi-equilibrium of the model. It was found that this way of constructing the model gives the popular Holling Type II functional response. Then, it is examined what follows when more and more prey get involved to the standoff trying to rescue the individual being attacked by. This is studied in three different time-scales: ultra—fast, intermediate, and slow timescales. The process of deriving the model and the functional response is like in the simple case of individual prey rescue, but the calculations get more intense. The functional response was found to be uninteresting. In conclusion, the model was adjusted. One of the timescales is left out from the studies in hopes for more interesting results. The derivation came out similar as in the third chapter, but with more advanced calculations and different results of quasi-equilibrium and functional response. The functional response obtained, was found to be worth of studying in a detailed fashion. This detailed study of the functional response obtained, is done last. It was found that different parameter choices affect the shape of the functional response. The parameters were chosen to be biologically relevant. Assuming that the rescue is certain for the group size n = 2, it was found that the functional response took a humpback form for some choices of the other parameters. The parameter ranges, for which the functional response had a humpback shape, were found.
  • Falk, Sebastian (2018)
    The idea underlying this thesis is to use data gathered by building management systems to build machine learning models in order to improve these systems. Our goal is to create models which can use data from multiple different sensors as its input and output some predictions about that data. We will then use these predictions when implementing new applications. At our disposal we have data gathered by both motion sensors as well as carbon dioxide (\ce{CO2}) sensors. This data is gathered at regular intervals, and will be in the form of time-series, after some transformations, which is the first topic we cover. We want to improve the systems to which these sensors are connected. For a concrete example we can consider the ventilation systems which control the air-conditioning. They usually have \ce{CO2} sensors connected to them. By keeping an eye on the \ce{CO2} value the system is able to adjust the air flow when the value becomes too high. The problem with this is that when that value is reached it takes some time before it is again lowered to a normal level. If we were able to predict when this value will begin to rise the system could increase the airflow beforehand, meaning that it can avoid reaching the threshold level. This improves the effectiveness of the system, making the air quality constantly stay at a comfortable level. Another example is the lighting control systems which commonly have some motion detection sensors which control the lights. A motion detection event occurs when one of these sensors sees some movement. Sensors are connected to one or multiple luminaires, turning the luminaires on when an event happens. The luminaires also turn off automatically after a set amount of time. Being able to predict when these events happen would make it possible to turn on the lights before a person actually walks into the room in question. The system would also be able to turn off the lights if it knows that no one will be in the room, which means that the lights will not be on unnecessarily. For creating these models we will be using multiple different prediction methods. In the thesis we will discuss some time-series forecasting models such as the autoregressive integrated moving average model as well as supervised learning algorithms. The supervised learning models we will cover are decision tree models, random forest models, feedforward neural network models as well as a recurrent neural network model called long short-term memory. We will explain how all of these models are created as well as how they can be used for time-series prediction on the data which we have at our disposal.
  • Kailamäki, Kalle (2022)
    This thesis explores predicting current prices of individual agricultural fields in Finland based on historical data. The task is to predict field prices accurately with the data we have available while keeping model predictions interpretable and well explainable. The research question is to find which out of several different models we try out is most optimal for the task. The motivation behind this research is the growing agricultural land market and the lack of publicly available field valuation services that can assist market participants to determine and identify reasonable asking prices. Previous studies on the topic have used standard statistics to establish relevant factors that affect field prices. Rather than creating a model whose predictions can be used on their own in every case, the primary purpose of previous works has indeed been to identify information that should be considered in manual field valuation. We, on the other hand, focus on the predictive ability of models that do not require any manual labor. Our modelling approaches focus mainly but not exclusively on algorithms based on Markov–Chain Monte Carlo. We create a nearest neighbors model and four hierarchical linear models of varying complexity. Performance comparisons lead us to recommend a nearest neighbor -type model for this task.
  • Zetterman, Elina (2024)
    When studying galaxy formation and evolution, the relationship between galaxy properties and dark matter halo properties are important, since galaxies form and evolve within these halos. This relationship can be figured out using numerical simulations, but unfortunately, they are computationally expensive and require vast amounts of computational resources. This provides incentive to use machine learning instead, since training a machine learning model requires significantly less time and resources. If machine learning could be used to predict galaxy properties from halo properties, numerical simulations would still be needed to find the halo population, but the more expensive hydrodynamical simulations would no longer be necessary. In this thesis, we use data from the IllustrisTNG hydrodynamical simulation to train five different types of machine learning models. The goal is to predict four different galaxy properties from multiple halo properties, and measure how accurate and reliable the predictions are. We also compare the different types of models with each other to find out which ones have the best performance. Additionally, we calculate confidence intervals for the predictions to evaluate the uncertainty of the models. We find that out of the four galaxy properties, stellar mass is the easiest to predict, whereas color is the most difficult one. From the five different types of models, light gradient boosting is in all cases either the best performing model, or its performance is almost as good as that of the best performing model. This, combined with the fact that training this type of model is extremely fast, light gradient boosting has good potential to be utilized in practice.
  • Holmström, Axi (2016)
    Quantum Neural Networks (QNN) were used to predict both future steering wheel signals and upcoming lane departures for N=34 drivers undergoing 37 h of sleep deprivation. The drivers drove in a moving-base truck simulator for 55 min once every third hour, resulting in 31 200 km of highway driving, out of which 8 432 km were on straights. Predicting the steering wheel signal one time step ahead, 0.1 s, was achieved with a 15-40-20-1 time-delayed feed-forward QNN with a root-mean-square error of RMSEtot = 0.007 a.u. corresponding to a 0.4 % relative error. The best prediction of the number of lane departures during the subsequent 10 s was achieved using the maximum peak-to-peak amplitude of the steering wheel signal from the previous ten 1 s segments as inputs to a 10-15-5-1 time-delayed feed-forward QNN. A correct prediction was achieved in 55 % of cases and the overall sensitivity and specificity were 31 % and 80 %, respectively.
  • Kangassalo, Lauri (2020)
    We study the effect of word informativeness on brain activity associated with reading, i.e. whether the brain processes informative and uninformative words differently. Unlike most studies that investigate the relationship between language and the brain, we do not study linguistic constructs such as syntax or semantics, but informativeness, an attribute statistically computable from text. Here, informativeness is defined as the ability of a word to distinguish the topic to which it is related to. For instance, the word 'Gandhi' is better at distinguishing the topic of India from other topics than the word 'hot'. We utilize Electroencephalography (EEG) data recorded from subjects reading Wikipedia documents of various topics. We report two experiments: 1) a neurophysiological experiment investigating the neural correlates of informativeness and 2) a single-trial Event-Related brain Potential (ERP) classification experiment, in which we predict the word informativeness from brain signals. We show that word informativeness has a significant effect on the P200, P300, and P600 ERP-components. Furthermore, we demonstrate that word informativeness can be predicted from ERPs with a performance better than a random baseline using a Linear Discriminant Analysis (LDA) classifier. Additionally, we present a language model -based statistical model that allows the estimation of word informativeness from a corpus of text.
  • Ranimäki, Jimi (2023)
    It is important that the financial system retains its functionality throughout the macroeconomic cycle. When people lose their trust in banks the whole economy can face dire consequences. Therefore accurate and stable predictions of the expected losses of borrowers or loan facilities are vital for the preservation of a functioning economy. The research question of the thesis is: What effect does the choice of calibration type have on the accuracy of the probability of default values predictions. The research question is attempted to be answered through an elaborate simulation of the whole probability of default model estimation exercise, with a focus on the rank order model calibration to the macroeconomic cycle. Various calibration functions are included in the study to offer more diversity in the results. Furthermore, the thesis provides insight into the regulatory environment of financial institutions, presenting relevant articles from accords, regulations and guidelines by international and European supervisory agents. In addition, the thesis introduces statistical methods for model calibration to the long-run average default rate. Finally, the thesis studies the effect of calibration type on the probability of default parameter estimation. The investigation itself is done by first simulating the data and then by applying multiple different calibration functions, including two logit functions and two Bayesian models to the simulated data. The simulation exercise is repeated 1 000 times for statistically robust results. The predictive power was measured using mean squared error and mean absolute error. The main finding of the investigation was that the simple grades perform unexpectedly well in contrast to raw predictions. However, the quasi moment matching approach for the logit function generally resulted in higher predictive power for the raw predictions in terms of the error measures, besides against the captured probability of default. Overall, simple grades and raw predictions yielded similar levels of predictive power, while the master scale approach lead to lower numbers. It is reasonable to conclude that the best selection of approaches according to the investigation would be the quasi moment matching approach for the logit function either with simple grades or raw predictions calibration type, as the difference in the predictive power between these types was minuscule. The calibration approaches investigated were significantly simplified from actual methods used in the industry, for example, calibration exercises mainly focus on the derivation of the correct long-run average default rate over time and this study used only the central tendency of the portfolio as the value.
  • Harju, Esa (2019)
    Teaching programming is increasingly more widespread and starts at primary school level on some countries. Part of that teaching consist of students writing small programs that will demonstrate learned theory and how various things fit together to form a functional program. Multiple studies indicate that programming is difficult skill to learn and master. Some part of difficulty comes from plethora of concepts that students are expected to learn in relatively short time. Part of practicing to write programs involves feedback, which aids students’ learning of assignment’s topic, and motivation, which encourages students to continue the course and their studies. For feedback it would be helpful to know students’ opinion of a programming assignment difficulty. There are few studies that attempt to find out if there is correlation between metrics that are obtained from students’ writing a program and their reported difficulty of it. These studies use statistical models on data after the course is over. This leads to an idea if such a thing could be done while students are working on programming assignments. To do this some sort of machine learning model would be possible solution but as of now no such models exist. Due to this we will utilize idea from one of these studies to create a model, which could do such prediction. We then improve that model, which is coarse, with two additional models that are more tailored for the job. Our main results indicate that this kind of models show promise in their prediction of a programming assignment difficulty based on collected metrics. With further work these models could provide indication of a student struggling on some assignment. Using this kind of model as part of existing tools we could provide a student subtle help before his frustration grows too much. Further down the road such a model could be used to provide further exercises, if need by a student, or progress forward once he masters certain topic.
  • Sundquist, Henri (2024)
    Acute myeloid leukemia (AML) is a disease in which blood cell production is severely disrupted. Cell count and morphological analysis from bone marrow (BM) samples are key in the diag- nosis of AML. Recent advances in computer vision have led to algorithms developed at the Hematoscope Lab that can automatically classify cells from these BM samples and calculate various cell-level statistics. This thesis investigated the use of cytomorphological data along with standard clinical data to predict progression-free survival (PFS). A benchmark study using penalized Cox regression, random survival forests, and survival support vector machines was conducted to study the utility of cytomorphology data. As features greatly outnumber samples, the methods are further compared over three feature filtering methods based on Spearman’s correlation coefficient, conditional Cox screening, and mutual information. In a dataset from the national VenEx trial, the penalized Cox regression method with ElasticNet penalization supplemented with Cox conditional screening was found to perform best in the nested CV benchmarking. A post-hoc dissection of two best-performing Cox models revealed potentially predictive cytomorphological features, while disease etiology and patient age were likewise important.
  • Perälampi, Minna (2020)
    Tässä työssä aiheena on älypuhelin sähköpostikampanijoiden klikkausten mallintaminen ja ennus- taminen LightGBM algortimin avulla. Mainosten klikkaamisen ennustamista käytetään sähkö- posti markkinoinnin kohdentamiseen potentiaalisesti kiinnostuneille asiakkaille. Klikkaamisen en- nustamisessa käytetty aineisto haettiin DNA Oyj:n tietokannasta. Tutkielmani alussa esittelen mallinnuksessa käytettavan Gradient Boosting Decision Tree mallin seka siitä johdetun LightGBM mallin, jotka perustuvat päätöspuihin. Kerroen ensin lyhyesti päätöspuista, jonka jälkeen esittelen Gradient Boosting Decision Tree mallien teoreettisen taus- tan. Siirryn sen jälkeen esittelemaan LightGBM versiota, minkä yhteydessä esittelen myös sen toteutukseen liittyviä algoritmeja. Tämän jälkeen esittelen Bayesilaisen Optimointi menetelmän jolla hienosäädän mallin hyper parametreja. Seuraavaksi esittelen mallissa käytetyn aineiston. Aineistossa olevat muuttujat kuvaavat asiakkaan demograafisia tietoja, laitteita, internetin käyttöä, verkkokaupassa asioimista sekä ostoshistoriaa aikaisempien kampanijoiden läheyshetkellä. Tämän jälkeen käyn lapi mallin sovittamisen sekä mallin testaamiseksi toteutetun testikampanjan. Mallin arviointiin sovellettiin luokittelumenetelmiin sopivia mittareita. Arvioin mallin toimivuutta klikkausten ennustamiseen testikampanijasta saatujen tulosten perusteella. Lopuksi pohdin mallin ja menetelmän suorituskykyä. Mallin koulutusaineisto ei vastannut tele- operaattorin asiakaskantaa, minkä vuoksi mallin tulokset olivat huonot silloin kun sitä sovellettiin koko asiakaskantaan. Sovellettaessa koulutusaineistoa vastaavaan tilanteeseen mallin suorituskyky oli kohtuullinen. Mallia aioitaan kehittää jatkossa paremmaksi DNA Oyj:llä.
  • Hietanen, Jesse (2016)
    Reducing greenhouse gas emissions and increasing carbon sequestration is critical for climate change mitigation. With the emergence of carbon markets and the development of compensatory mechanisms such as Reducing Emissions from Deforestation and Degradation in Developing Countries (REDD+), there is much interest in measurement and monitoring of soil organic carbon (SOC). Detailed information on the distribution of SOC and other soil attributes, such as nitrogen (N), across the landscape is necessary in order to locate areas where carbon stocks can be increased and loss of soil carbon slowed down. SOC has large spatial variability, which often demands intensive sampling in the field. Airborne laser scanning (ALS) provides very accurate information about the topography and vegetation of the measured area, and hence, possible means for improving soil properties maps. In this thesis, the aim was to study the feasibility of ALS and free of cost ancillary data for predicting SOC and N in a tropical study area. The study area is located in the Taita Hills, in South-Eastern Kenya, and has highly fluctuating topography ranging between 930–2187 m. Land cover in the Taita Hills is very heterogeneous and consists of forest, woodlands, agroforestry and croplands. The field data consisted of SOC and N measurements for 150 sample plots (0.1 ha). The soil samples along with several other soil and vegetation attributes were collected in 2013. ALS (Optech ALTM 3100, mean return density 11.4 m-1) data was acquired in February 2013. ALS data was pre-processed by classifying ground, low- and high vegetation, buildings and power wires. ALS point cloud was used to calculate two types of predictors for SOC and N: 1) topographical variables based on the high resolution digital terrain model (DTM) and 2) ALS metrics describing the vertical distribution and cover of vegetation. The ancillary datasets included spectral predictors based on Landsat 7 ETM+ time series and soil grids for Africa at 250 m resolution. In total, over 500 potential predictors were calculated for the modelling. Random Forest model was constructed from the selected variables and model performance was analysed by comparing the predicted values to the field measurements. The best model for SOC had pseudo R2 of 0.66 and relative root mean square error (RRMSE) of 30.98 %. Best model for N had pseudo R2 of 0.43 and RRMSE of 32.14 %. Usage of Landsat time series as ancillary dataset improved the modelling results slightly. For SOC, the most important variables were tangential curvature, maximum intensity and Landsat band 2 (green). Finally, the best model was applied for mapping SOC and N in the study area. The results of this study are in line with other remote sensing studies modelling soil properties in Africa. The soil properties in the study area do not correlate strongly with present vegetation and topography leading to intermediate modelling results.