Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Vartia, Olli (2013)
    Porous structures, such as foams make excellent thermal insulators. This happens because thermal transfer by conduction is hindered by the voids in the material. However, heat can still radiate through the material or just past the voids. Due to Stefan-Boltzmann law, heat transfer by radiation can be especially significant for large temperatures, and it follows that thermal transfer models that account for radiation may be necessary in some cases. Several existing models for radiative thermal transfer in porous materials, such as continuum models and Monte Carlo, have been used in the past. What many of these models tend to have in common, is that they are highly specific to the systems they were originally made for and require some rather limiting approximations. A more general method which would only require knowing the material and the geometry of the system would be useful. One candidate for such a method, discrete dipole approximation for periodic structures, was tested. In the discrete dipole approximation a structure is discretized into a collection of polarizable points, or dipoles and an incoming electromagnetic planewave is set to polarize it. This has the benefits that it accurately takes into account the target geometry and possible near field effects. It was found that this method is limited for high wavelength, by computation time and for small wavelengths by errors. The cause of the errors for small wavelengths was not entirely caused by the discretization and remains not fully understood.
  • Kallanranta, Antti (2018)
    The geological Discrete Fracture Network (DFN) model is a statistical model for stochastically simulating rock fractures and minor faults (Fox et al. 2007). Unlike the continuum model approaches, DFN model geometries explicitly represent populations of individual or equivalent fractures (Wilson et al. 2011). Model construction typically involves stochastic approaches that create multiple deterministic realizations of the fracture network (Gringarten 1998). This study was made as a part of a broader Salpausselkä project to gain deeper understanding of the brittle structures in the study area.This thesis can be broken down to three steps: literaturereview of the DFN methodology, parameterization of the model variables, and DFN modeling itself. For the purposes of the DFN modeling one-dimensionalfracture intensities measured in the field (P10) had to be converted into their volumetric counterpart (P32). Wang’s (2005) C13 conversion factor was decided tobe the most appropriatemethod. Calculation of the angles between the scanlines and fracture normals (α), conversion factor C13, and P32 were done in Python by applying the methods presented by Wang (2005) and Fox et al. (2007, 2012). Fracture setswere weighted by their P10 intensities to get clearer picture of the dominant fracturing orientations. For better and automated classification clustering of the fracture poles into desired number of mean vectorswas conducted byusing kmeansfunctionof Python module MPLstereonet. The function finds centers of multi-modal clusters of data by using a numpy einsummodified for spherical measurements. Fracture set poles were divided into populationsby finding the mean vector with the smallest angular distance from each pole. C13 calculation was done by integrating over the probability distribution function(PDF)of each population.C13 values produced by the script fall within the expected range quoted by the reference literature(Wang 2005, Fox et al. 2007, 2012). In the final modeling phase the clustered groups were modeled in MOVE as finite surfaces and the resulting DFN model was compared to the Local anisotropy interpolator (LAI) model created by Ruuska (2018).Fracture populations were modeled on an outcrop level as well as interpolated over the whole study area, producing two different interpretations of the most dominant fracturing orientations.Based on the results, fracture set pole clustering with open source methods (MPLStereonet K-means) is a feasible approach. K-means clustering algorithm was superior to the expert approach on every level, though more studies are needed to ascertain the soundness of the methodology. Statements made at this point are merely tentative due to the quality and amount of the available data. Taking into account the results of the parallel MSc thesis (Ruuska, 2018) thesis, the DFN and clustered fracture populations constructed using aforementioned methods can be used as a tentative approximation of the preferred fracturing orientations within the boundaries of the study area. Outcrop level model shows the true, measured values and could be used as ground truth in future modeling efforts. Efficient production of large-scale brittle models could be possible with the added flexibility of the implicit modeling methods and automated clustering.
  • Detrois, Kira Elaine (2023)
    Background/Objectives: Various studies have shown the advantage when incorporating polygenic risk scores (PRSs) in models with classic risk factors. However, systematic comparisons of PRSs with non-genetic factors are lacking. In particular, many studies on PRSs do not even report the predictive performance of the confounders, such as age and sex, included in the model, which are already very predictive for most diseases. We looked at the ability of PRSs to predict the onset of 18 diseases in FinnGen R8 (N=342,499) and compared PRSs with the known non-genetic risk factors, age, sex, Education, and Charlson Comorbidity Index (CCI). Methods: We set up individual studies for the 18 diseases. A single study consisted of an exposure (1999-2009), a washout (2009-2011), and an observation period (2011-2019). Eligible individuals could not have the selected disease of interest inside the disease-free period, which ranged from birth until the beginning of the observation period. We then defined the case and control status based on the diagnoses in the observation period and calculated the phenotypic scores during the exposure period. The PRSs were calculated using MegaPRS and the latest publicly available genome-wide association study summary statistics. We then fitted separate Cox proportional hazards models for each disease to predict disease onset during the observation period. Results: In FinnGen, the model’s predictive ability (c-index) with all predictors ranged from 0.565 (95%CI: 0.552-0.576) for Acute Appendicitis to 0.838 (95% CI: 0.834-0.841) for Atrial Fibrillation. The PRSs outperformed the phenotypic predictors, CCI, and Education, for 6/18 diseases and still significantly enhance onset prediction for 13/18 diseases when added to a model with only non-genetic predictors. Conclusion: Overall, we showed that for many diseases PRSs add predictive power over commonly used predictors - such as age, sex, CCI, and Education. However, many important challenges must be addressed before implementing PRSs in clinical practice. Notably, we will need disease-specific cost- benefit analyses and studies to assess the direct impact of including PRSs in clinical use. Nonetheless, as more research is being conducted, PRSs could play an increasingly valuable role in identifying individuals at higher risk for certain diseases and enabling targeted interventions to improve health outcomes.
  • Muszynski, Johann Michael (2015)
    The presence of dislocations in metal crystals accounts for the plasticity of metals. These dislocations do not nucleate spontaneously, but require favorable conditions. These conditions include, but are not limited to, a high temperature, external stress, and an interface such as a grain boundary or a surface. The slip of dislocations leads to steps forming on the surface, as atomic planes are displaced along a line. If a void is placed very near a surface, the possibility of forming a dislocation platelet exists. The skip of the dislocation platelet would displace the surface atoms within a closed line. Repeating such a process may form a small protrusion on the surface. In this thesis, the mechanism with which a dislocations displace the surface atoms within a closed loop is studied by using molecular dynamics (MD) simulations of copper. A spherical void is placed within the lattice, and the lattice is then subjected to an external stress. The dislocation reactions which lead to the formation of the dislocation platelet after the initial dislocation nucleation on the void is studied by running MD simulations of a void with the radius of 3 nm under tensile stress. Since the dislocations are thermally activated, the simulation proceeded differently for each run. We describe the different ways the dislocations nucleate, and the dislocation reactions that occur when they intersect to form the platelet. The activation energy of this process was studied by simulating half of a much larger void, with a radius of 8 nm, in order to obtain a more realistic nucleation environment. Formulas connecting the observable and controllable simulation variables with the energies of the nucleation are derived. The activation energies are then calculated and compared with values from literature.
  • Johansson, Maria (2013)
    Dispersive liquid-liquid microextraction was developed in 2006 for the extraction of organic compounds from water samples. Since then, more complex matrices have been processed and the technique includes nowadays a variety of subsets. To the advantages of the technique are, for example, its rapidity, low cost and high enrichment factors. A pretreatment and analysis method was developed for the five harmful flame retardants, dechlorane plus (syn and anti) and dechloranes 602, -603 and -604 (component A) from solid environmental samples. The pretreatment method included extraction with pressurised liquid extraction and clean-up with multilayer silica and basic alumina columns. The analytes were separated and analysed with gas chromatography coupled to mass spectrometry. Electron capture negative ionisation was applied as the ionisation technique. The developed method was sensitive, resulting in acceptable recoveries and low detection limits. The chosen ionisation technique was proven to be superior over the more used electron ionisation.
  • Sobolev, Anton (2020)
    When couples with children split or divorce, they are often unable to come to a mutual agreement concerning their child's place of residency, custody, the child's meetings with the other parent and the frequency of these meetings, or financial aid one parent is obliged to pay the other parent for the child. In many countries, these disagreements quite often lead to long disputes in court. A lot of research has been made (both in Finland and internationally) concerning the court's consideration of disputes about children. This thesis studies the disputes on custody and residency of a child in the district courts of Finland. The objective is to find out which factors play the biggest role in solving these disputes in court. Nine district courts of Finland have kindly provided the documents of the disputes concerning custody and residency of children from the period of 2004 - 2015. Only the cases where a dispute was solely between the parents of a child (no other relatives) and where the final decision was made by court (no agreement between the parties) are taken into analysis. Disputes are divided into two types - the ones where residency of a child was involved in a dispute (residency disputes) and the ones where it was not involved (custody disputes). The winner of a dispute is a dependent variable. A logistic regression model is applied for the custody disputes, and a cumulative logistic regression model is applied for the residency disputes. Due to results of the analysis, mothers win more disputes than fathers, but the difference is statistically significant only for the residency disputes. When only father is of a foreign background, it lowers father's winning chances in a custody dispute, but neither father's nor mother's foreign backgrounds are statistically significant for the residency disputes. A substantiated violence of father towards mother again acts negatively for fathers in custody disputes, and so does a non-substantiated accusation regarding alcohol or drug abuse by father. For the residency disputes, the main factors decreasing fathers' probability to win are mother hiring a legal assistant and father receiving legal aid (which takes place when father is not financially capable of hiring a legal assistant). Established conditions of a child at one of the parents increase the winning chances of that parent, but the effect is higher for fathers. All the accusations (both substantiated and non-substantiated in court) act in favor of fathers; these are substantiated mother's mental disorder, non-substantiated alcohol or drug abuse by mother and non-substantiated accusation regarding father's violence towards mother. At the same time, no variables regarding genders of children disputed about, genders of a judge or of legal assistants are statistically significant in the models. The same concerns the parents' demands in court, as well as the ages of parents (and their difference) and of children involved in disputes. This investigation can be extended by adding the disputes from other years and from other district courts into the analysis.
  • Singh, Maninder Pal (2016)
    Research in healthcare domain is primarily focused on diseases based on the physiological changes of an individual. Physiological changes are often linked to multiple streams originated from different biological systems of a person. The streams from various biological systems together form attributes for evaluation of symptoms or diseases. The interconnected nature of different biological systems encourages the use of an aggregated approach to understand symptoms and predict diseases. These streams or physiological signals obtained from healthcare systems contribute to a vast amount of vital information in healthcare data. The advent of technologies allows to capture physiological signals over the period, but most of the data acquired from patients are observed momentarily or remains underutilized. The continuous nature of physiological signals demands context aware real-time analysis. The research aspects are addressed in this thesis using large-scale data processing solution. We have developed a general-purpose distributed pipeline for cumulative analysis of physiological signals in medical telemetry. The pipeline is built on the top of a framework which performs computation on a cluster in a distributed environment. The emphasis is given to the creation of a unified pipeline for processing streaming and non-streaming physiological time series signals. The pipeline provides fault-tolerance guarantees for the processing of signals and scalable to multiple cluster nodes. Besides, the pipeline enables indexing of physiological time series signals and provides visualization of real-time and archived time series signals. The pipeline provides interfaces to allow physicians or researchers to use distributed computing for low-latency and high-throughput signals analysis in medical telemetry.
  • Laukkanen, Janne Johannes (2018)
    The vast amount of data created in the world today requires an unprecedented amount of processing power to be turned into valuable information. Importantly, more and more of this data is created on the edges of the Internet, where small computers, capable of sensing and controlling their environments, are producing it. Traditionally these so-called Internet of Things (IoT) devices have been utilized as sources of data or as control devices, and their rising computing capabilities have not yet been harnessed for data processing. Also, the middleware systems that are created to manage these IoT resources have heterogeneous APIs, and thus cannot communicate with each other in a standardized way. To address these issues, the IoT Hub framework was created. It provides a RESTful API for standardized communication, and includes an execution engine for distributed task processing on the IoT resources. A thorough experimental evaluation shows that the IoT Hub platform can considerably lower the execution time of a task in a distributed IoT environment with resource constrained devices. When compared to theoretical benchmark values, the platform scales well and can effectively utilize dozens of IoT resources for parallel processing.
  • Hirvonen, Juho (2012)
    In this work we study a graph problem called edge packing in a distributed setting. An edge packing p is a function that associates a packing weight p(e) with each edge e of a graph such that the sum of the weights of the edges incident to each node is at most one. The task is to maximise the total weight of p over all edges. We are interested in approximating a maximum edge packing and in finding maximal edge packings, that is, edge packings such that the weight of no edge can be increased. We use the model of distributed computing known as the LOCAL model. A communication network is modelled as a graph, where nodes correspond to computers and edges correspond to direct communication links. All nodes start at the same time and they run the same algorithm. Computation proceeds in synchronous communication rounds, during each of which each node can send a message through each of its communication links, receive a message from each of its communication links, and then do unbounded local computation. When a node terminates the algorithm, it must produce a local output – in this case a packing weight for each incident edge. The local outputs of the nodes must together form a feasible global solution. The running time of an algorithm is the number of steps it takes until all nodes have terminated and announced their outputs. In a typical distributed algorithm, the running time of an algorithm is a function of n, the size of the communication graph, and ∆, the maximum degree of the communication graph. In this work we are interested in deterministic algorithms that have a running time that is a function of ∆, but not of n. In this work we will review an O(log ∆)-time constant-approximation algorithm for maximum edge packing, and an O(∆)-time algorithm for maximal edge packing. Maximal edge packing is an example of a problem where the best known algorithm has a running time that is linear-in-∆. Other such problems include maximal matching and (∆ + 1)-colouring. However, few matching lower bounds exist for these problems: by prior work it is known that finding a maximal edge packing requires time Ω(log ∆), leaving an exponential gap between the best known lower and upper bounds. Recently Hirvonen and Suomela (PODC 2012) showed a linear-in-∆ lower bound for maximal matching. This lower bound, however, applies only in weaker, anonymous models of computation. In this work we show a linear-in-∆ lower bound for maximal edge packing. It applies also in the stronger port numbering model with orientation. Recently Göös et al. (PODC 2012) showed that for a large class of optimisation problems, the port numbering with orientation model is as powerful as a stronger, so called unique identifier model. An open question is if this result can applied to extend our lower bound to the unique identifier model.
  • Mäki, Jussi Olavi Aleksis (2013)
    With the increasing growth of data traffic in mobile networks there is an ever growing demand from the operators for a more scalable and cost efficient network core. Recent successes in deploying Software-Defined Networking (SDN) in data centers and large network backbones has given it credibility as a viable solution for meeting the requirements of even the large core networks. Software-Defined Networking is a novel new paradigm where the control logic of the network is separated from the network elements into logically centralized controllers. This separation of concerns offers more flexibility in network control and makes writing of new management applications, such as routing protocols, easier, faster and more manageable. This thesis is an empirical experiment in designing and implementing a scalable and fault- tolerant distributed SDN controller and management application for managing the GPRS Tunneling Protocol flows that carry the user data traffic within the Evolved Packet Core. The experimental implementation is built using modern open-source distributed system tools such as the Apache Zookeeper distributed coordination service and Basho's Riak distributed key-value database. In addition to the design, a prototype implementation is presented and its performance is evaluated.
  • Alhalaseh, Rola (2018)
    Sensors of different kinds connect to the IoT network and generate a large number of data streams. We explore the possibility of performing stream processing at the network edge and an architecture to do so. This thesis work is based on a prototype solution developed by Nokia. The system operates close to the data sources and retrieves the data based on requests made by applications through the system. Processing the data close to the place where it is generated can save bandwidth and assist in decision making. This work proposes a processing component operating at the far edge. The applicability of the prototype solution given the proposed processing component was illustrated in three use cases. Those use cases involve analysis performed on values of Key Performance Indicators, data streams generated by air quality sensors called Sensordrones, and recognizing car license plates by an application of deep learning.
  • Lange, Moritz Johannes (2020)
    In the context of data science and machine learning, feature selection is a widely used technique that focuses on reducing the dimensionality of a dataset. It is commonly used to improve model accuracy by preventing data redundancy and over-fitting, but can also be beneficial in applications such as data compression. The majority of feature selection techniques rely on labelled data. In many real-world scenarios, however, data is only partially labelled and thus requires so-called semi-supervised techniques, which can utilise both labelled and unlabelled data. While unlabelled data is often obtainable in abundance, labelled datasets are smaller and potentially biased. This thesis presents a method called distribution matching, which offers a way to do feature selection in a semi-supervised setup. Distribution matching is a wrapper method, which trains models to select features that best affect model accuracy. It addresses the problem of biased labelled data directly by incorporating unlabelled data into a cost function which approximates expected loss on unseen data. In experiments, the method is shown to successfully minimise the expected loss transparently on a synthetic dataset. Additionally, a comparison with related methods is performed on a more complex EMNIST dataset.
  • Gaire, Surakshya (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2016)
    The objective of this master s thesis was to better understand the impact of black carbon and its distribution in Northern Europe and the Arctic. To achieve the goal of the project, information on the observations relevant to black carbon (BC) pollution in Arctic dataset was collected. For the observational data all main BC measurement campaigns along with active satellite operations were collected. In this study, the BC concentration and deposition was estimated by the System Integrated Modelling of Atmospheric coMposition - a chemical transport model (CTM) SILAM. The model was driven with monitoring atmospheric composition and climate (MACCity), emission database for global atmospheric research hemispheric transport of air pollution (EDGAR-HTAP), and evaluating the climate and air quality impacts of short lived pollutants (ECLIPSE) emission inventories. For the computations, the year 2010 was chosen because of a better availability of data during that year. In the literature section, black carbon process in the atmosphere is explained along with its properties and characteristics. Furthermore, data description and data analysis is included which is followed by interpretation of model output on the seasonal deposition and concentration. As shown by the model-measurement comparison, the model basically captured the measured BC and organic carbon (OC) quite well for all emission inventories. However, the correlation coefficient for OC was still weak for most of the stations in Europe. The overall performance of BC for European stations is substantially better than in the Arctic areas. Deposition for BC and OC shows that the seasonal transport of BC from source regions is evident in the Arctic and near Arctic areas. Patterns of dry deposition is higher in winter period than in summer period. The SILAM model suggests winter period concentration of BC by MACCity and ECLIPSE inventory of 0.23 µg/m3 and 0.26 µg/m3 respectively for year 2010. This study provides a best performing setup for BC modeling , transport and deposition of BC in the Northern Europe and the Arctic despite the absence of ageing process. More observational data from Arctic stations would provide better result and model performance. Finally, the study gives an insight of the quality of existing emission inventories and the capabilities of reproducing seasonal deposition and concentration of BC to the Arctic.
  • Vento, Eero (2017)
    Tourism is one of the main contributors in the fight against poverty, as it has become one of the strongest drivers of trade and prosperity in the global south. Protected area tourism is an especially quickly growing segment of the industry, having an important role in regional development on many rural areas of global south. However, territories labelled as protected areas represent a great variety of spaces. This research aims at unifying the holistic picture of protected area tourism governance by analysing, how protected areas with divergent landownership arrangements, management objectives and associated regulations influence tourism development and its local socio-economic impacts at the grass roots. This comparative case-study survey scrutinizes local-level tourism governance and territorial regulations on three neighbouring protected areas in Taita Taveta County, Kenya. The Tsavo National Parks are state-owned conservancies focusing on conserving biodiversity. LUMO community wildlife sanctuary is a nature tourism project owned and orchestrated by a local community, which aims to advance local socio-economic development via tourism while preserving the environment at the same time. The third area, Sarova, is a private-owned conservancy harnessed solely for nature tourism and profit-making. The areas are liable to same legislative framework and international phenomena have similar influence on them, which makes comparison on their divergent management objectives and local-level regulations expedient. By giving voice to local-level tourism stakeholders, it is possible to point out how the category (i.e. public, private or community) of the land owner and the areas’ respective management objectives influence tourism operations and impact the socio-economic outcomes from both conservation and tourism. The comparative analyses focus first on spatial socially constructed preconditions for tourism development and second, on its developmental outcomes that will primarily be analysed by reflecting the livelihood changes generated by protected area tourism and protection regulations in place. The data-set was gathered during field research in February–March 2016, and it is mainly based on semi-structured interviews with tourism employees, employers and regional experts. The principal method of interviewing is supplemented by observation and statistics, and the data is analysed by thematic and qualitative content analyses. The protected areas’ management objectives and associated regulations have drastic impacts on tourism development within their respective borders. The local administrations of the protected areas were identified as the primary institutions to explain the stark spatial differences in the case-study areas tourist numbers. Instead of the mere ”type” of the landowner, the areas’ respective management objectives and associated regulations determined whether protected area tourism generated livelihoods or other positive socio-economic outcomes. Altogether, similar preconditions for tourism development and similar socio-economic outcomes cannot be expected from all territories labelled as protected areas.
  • Strömgård, Simon (2016)
    Multiple factors determine diversity of diatoms in running waters. Diversity is a complex concept and made up by different components. Diversity can be divided into alpha, beta and gamma diversity. These different types of diversity are regulated by factors operating on a large geographic scale and by local environmental factors. Studies concentrating on diversity patterns of diatoms have become more common in the last 10 years. Especially beta diversity has gotten an increasing interest. Despite the increasing interest in the subject, the driving mechanisms are still not fully understood in aquatic ecosystems. The aim of this theses is to investigate which factors affect alfa and beta diversity in 10 streams in southern Finland. The influence of habitat heterogeneity on beta diversity is also investigated. In addition, the aim is to examine which local environmental factors structure the variation in species composition. The study area covers a 115 km wide area to minimize the effect of large scale factors on species composition. The material consists of environmental data and diatom data from 49 study sites. Land use data used in the study is derived from CORINE Land Cover 2012 data set. All samples were collected during a two-week period (30.7.2014–11.8.2014). Statistical methods used were linear models, generalized linear models (GLM), distance based redundancy analysis (db-RDA) and test for homogeneity of multivariate dispersions (PERMDISP). Water conductivity and light conditions at the study sites were strong environmental factors determining diatom alpha diversity. Habitat heterogeneity showed only a marginally significant positive relationship to beta diversity but a clear trend was visible in the data. The db-RDA results showed that different environmental factors accounted for the variation in species composition. Conductivity, light, water color, water temperature and stream width were important factors explaining variation in species composition. These results suggest that there is a possible connection between habitat heterogeneity and beta diversity. Further research in the subject should be done to determine if there is a significant relationship. The local environmental factors are important for structuring species composition. Possible anthropogenic stress factors influencing stream ecosystems can affect patterns of beta diversity and should be emphasizes in coming research.
  • Laurila, Tiia (2018)
    Differentiaaliliikkuvuusspektrometri (Differential Mobility Particle Sizer; DMPS) -laitteistoa voidaan käyttää ilmakehän aerosolihiukkasten lukumääräkokojakauman mittaamiseen. DMPS-laitteisto koostuu impaktorista, kuivaajasta, bipolaarisesta diffuusiovaraajasta, differentiaaliliikkuvuusanalysaattorista (Differential Mobility Analyzer; DMA) ja kondensaatio- hiukkaslaskurista (Condensation Particle Counter; CPC). Tässä työssä verrataan DMPS-laitteistossa rinnakkain mittaavan muokatun A20 CPC:n ja TSI 3776 CPC:n laskentastatistiikkaa. Pienimmillä aerosolihiukkasilla on vaikutus ympäristöön ja terveyteen, minkä takia on kasvava tarve mitata tarkasti myös pienimpien hiukkasten kokojakaumaa. Aerosolihiukkasten lukumääräkokojakaumaan ja siitä johdettavien suureiden epävarmuuksia ei kuitenkaan tunneta vielä täysin. Työssä pyritään parantamaan perinteisen CPC:n laskentastatistiikkaa ja tutkimaan lukumääräkokojakauman sekä siitä johdettavien suureiden, kuten muodostumisnopeuden (Formation Rate; J) ja kasvunopeuden (Growth Rate; GR), epävarmuuksia. Perinteinen, ilman suojavirtausta toimiva, CPC voidaan muokata havaitsemaan jopa alle 3 nm hiukkasia kasvattamalla lämpötilaeroa saturaattorin ja kondenserin välillä ja muuttamalla aerosolivirtausta. Tässä työssä A20 CPC:n aerosolivirtaus optiikan läpi kasvatettiin 2.5 litraan minuutissa diffuusiosta johtuvien häviöiden minimoimiseksi ja laskentastatistiikan parantamiseksi. Verrattuna TSI 3776 CPC:hen muokatulla A20 CPC:llä on 50 kertaa suurempi aerosolivirtaus, joten voimme olettaa, että muokattu A20 mittaa TSI 3776 UCPC:hen verrattuna enemmän hiukkasia pienemmällä epävarmuudella. Muokatulla A20 CPC:llä on parempi laskentastatistiikka, jonka ansiosta kokojakauman laskennasta johtuva suhteellinen virhe on pienempi. Muokatulla A20 CPC:llä on TSI 3776 CPC:hen verrattuna 50 kertaa suurempi aerosolivirtaus ja se laskee keskimäärin 50 kertaa enemmän hiukkasia koko DMPS-laitteiston mittaamalla kokoalueella (1-40 nm). Muokatulla A20 CPC:llä laskettu GR on noin 60% suurempi pienimmillä (3-6 nm) hiukkasilla ja noin 3% suurempi 6-11 nm hiukkasilla. Myös J on noin 30% suurempi muokatulla A20 CPC:llä laskettuna 3-6 nm hiukkasille. CPC:n laskennasta johtuva epävarmuus on syytä huomioitava määritettäessä DMPS-mittauksen kokonaisvirhettä. Laskentastatistiikalla on merkitystä paitsi lukumääräkokojakaumaan, myös sen johdannaissuureisiin.
  • Salmi, Joni (2019)
    Docker is an emerging technology that makes Linux containers easy to use for developers and system administrators. Unlike virtual machines, Linux containers share host OS kernel and are isolated using kernel features. Docker containers are lightweight and package applications in a distributable format known as Docker images. In this paper, we conduct a literature review and provide an overview to the Docker ecosystem. The literature review summarizes the current state of research and reports relevant and most important findings. We explore use cases, performance, security and orchestration of containers and reflect that to virtual machines and bare-metal.
  • Hansson, Kristian (2019)
    Reunalaskennan tarkoituksena on siirtää tiedonkäsittelyä lähemmäs tiedon lähdettä, sillä keskitettyjen palvelinten laskentakyky ei riitä tulevaisuudessa kaiken tiedon samanaikaiseen analysointiin. Esineiden internet on yksi reunalaskennan käyttötapauksista. Reunalaskennan järjestelmät ovat melko monimutkaisia ja vaativat yhä enemmän ketterien DevOps-käytäntöjen soveltamista. Näiden käytäntöjen toteuttamiseen on löydettävä sopivia teknologioita. Ensimmäiseksi tutkimuskysymykseksi asetettiin: Millaisia teknisiä ratkaisuja reunalaskennan sovellusten toimittamiseen on sovellettu? Tähän vastattiin tarkastelemalla teollisuuden, eli pilvipalveluntarjoajien ratkaisuja. Teknisistä ratkaisuista paljastui, että reunalaskennan sovellusten toimittamisen välineenä käytetään joko kontteja tai pakattuja hakemistoja. Reunan ja palvelimen väliseen kommunikointiin hyödynnettiin kevyitä tietoliikenneprotokollia tai VPN-yhteyttä. Kirjallisuuskatsauksessa konttiklusterit todettiin mahdolliseksi hallinnoinnin välineeksi reunalaskennassa. Ensimmäisen tutkimuskysymyksen tuloksista johdettiin toinen tutkimuskysymys: Voiko Docker Swarmia hyödyntää reunalaskennan sovellusten operoinnissa? Kysymykseen vastattiin empiirisellä tapaustutkimuksella. Keskitetty reunalaskennan sovellusten toimittamisen prosessi rakennettiin Docker Swarm -konttiklusteriohjelmistoa, pilvipalvelimia ja Raspberry Pi -korttitietokoneita hyödyntäen. Toimittamisen lisäksi huomioitiin ohjelmistojen suorituksenaikainen valvonta, edellisen ohjelmistoversion palautus, klusterin laitteiden ryhmittäminen, fyysisten lisälaitteiden liittäminen ja erilaisten suoritinarkkitehtuurien mahdollisuus. Tulokset osoittivat, että Docker Swarmia voidaan hyödyntää sellaisenaan reunalaskennan ohjelmistojen hallinnointiin. Docker Swarm soveltuu toimittamiseen, valvontaan, edellisen version palauttamiseen ja ryhmittämiseen. Lisäksi sen avulla voi luoda samaa ohjelmistoa suorittavia klustereita, jotka koostuvat arkkitehtuuriltaan erilaisista suorittimista. Docker Swarm osoittautui kuitenkin sopimattomaksi reunalaitteeseen kytkettyjen lisälaitteiden ohjaamiseen. Teollisuuden tarjoamien reunalaskennan ratkaisujen runsas määrä osoitti laajaa kiinnostusta konttien käytännön soveltamiseen. Tämän tutkimuksen perusteella erityisesti konttiklusterit osoittautuivat lupaavaksi teknologiaksi reunalaskennan sovellusten hallinnointiin. Lisänäytön saamiseksi on tarpeen tehdä laajempia empiirisiä jatkotutkimuksia samankaltaisia puitteita käyttäen.
  • Harhio, Säde (2022)
    The importance of software architecture design decisions has been known for almost 20 years. Knowledge vaporisation is a problem in many projects, especially in the current fast-paced culture, where developers often switch from project to another. Documenting software architecture design decisions helps developers understand the software better and make informed decisions in the future. However, documenting architecture design decisions is highly undervalued. It does not create any revenue in itself, and it is often the disliked and therefore neglected part of the job. This literature review explores what methods, tools and practices are being suggested in the scientific literature, as well as, what practitioners are recommending within the grey literature. What makes these methods good or bad is also investigated. The review covers the past five years and 36 analysed papers. The evidence gathered shows that most of the scientific literature concentrates on developing tools to aid the documentation process. Twelve out of nineteen grey literature papers concentrate on Architecture Decision Records (ADR). ADRs are small template files, which as a collection describe the architecture of the entire system. The ADRs appear to be what practitioners have become used to using over the past decade, as they were first introduced in 2011. What is seen as beneficial in a method or tool is low-cost and low-effort, while producing concise, good quality content. What is seen as a drawback is high-cost, high-effort and producing too much or badly organised content. The suitability of a method or tool depends on the project itself and its requirements.
  • Jokinen, Olli (2024)
    The rise of large language models (LLMs) has revolutionized natural language processing, par- ticularly through transfer learning and fine-tuning paradigms that enhance the understanding of complex textual data. This thesis builds upon the concept of fine-tuning to improve the under- standing of Finnish Wikipedia articles. Specifically, a BERT-based language model is fine-tuned to create high-quality document representations from Finnish texts. The learned representations are applied to downstream tasks, where the model’s performance is evaluated against baseline models. This thesis draws on the SPECTER paper, published in 2020, which introduced a training frame- work for fine-tuning a general-purpose document embedder. SPECTER was trained using a document-level training objective that leveraged document link information. Originally, SPECTER was designed for scientific articles, utilizing citations between articles. The training instances con- sisted of triplets of query, positive, and negative papers, with the aim of capturing the semantic similarity of the documents. This work extends the SPECTER framework to Finnish Wikipedia data. While scientific articles have citations, Wikipedia’s cross-references are used to build a document graph that captures the relatedness between articles. Additionally, Wikipedia data is publicly available as a full data dump, making it an attractive choice for the dataset in this thesis. One of the objectives is to demonstrate the flexibility of the SPECTER framework on a new dataset that has a similar networked structure to that of scientific articles. The fine-tuned model can be used as a general-purpose tool for various tasks and applications; however, in this thesis, its performance is measured in topic classification and cross-reference ranking. The Transformer-based language model produces fixed-length embeddings, which are used as features in the topic classification task and as vectors to measure the L2 distance of article vectors in the cross-reference prediction task. This thesis shows that the proposed model, WikiSpecter, optimized with a document-level objective, outperformed baseline models in both tasks. The performance indicates that Finnish Wikipedia provides relevant cross-references that help the model capture relationships across a range of topics.