Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Tietojenkäsittelytieteen maisteriohjelma"

Sort by: Order: Results:

  • Song, Xingchang (2022)
    Quantum networking is developing fast as an emerging research field. Distributing entangled qubits between any two locations in a quantum network is one of the goals of quantum networking, in which repeaters can be used to extend the length of entanglement. Although researchers focus extensively on problems inside one quantum network, further study on communication between quantum networks is necessary because the next possible evolution of quantum networking is the communication between two or more autonomous quantum networks. In this thesis, we adapted a time slotted model from the literature to study the inter quantum network routing problem. Quantum routing problem can be split into path selection and request scheduling. We focus on the latter considering the previous one received considerable interest in the literature. Five request scheduling policies are proposed to study the impact of preference for certain request types on entanglement generation rate. Experiments also demonstrate other factors should be considered in context of entanglement rate in communication between quantum networks, e.g., the number and distribution of requests and inter-network distance.
  • Akkanen, Saara (2023)
    This Master’s Thesis describes an original user study that took place at the University of Helsinki. The study compares and evaluates the usability of three different methods that are used in meeting rooms to share a private device’s screen on a big public screen in order to give a slideshow presentation: HDMI, VIA, and Ubicast. There were 18 participants. The study was conducted in a controlled environment, replicating a typical meeting room setup. The experiment consisted of screen mirroring tasks and an interview. In a screen mirroring task, the participants were asked to share their screen using each of the three technologies. They were provided with the necessary equipment and user guides if needed. Then the participants were given training on how to use the technologies, and they performed the tasks again. During the task, the time taken to complete each screen mirroring session was recorded, and any errors or difficulties encountered were noted. After completing the screen mirroring tasks, participants were interviewed to gather qualitative data on their experiences and preferences. They were asked about the ease of use, efficiency, and any difficulties they faced while using each technology. This information was used to gain insights into user preferences and potential areas for improvement in the respective technologies. To analyze the data, the System Usability Scale (SUS) scores and time taken to complete the screen mirroring tasks were calculated for each technology. Statistical analyses were conducted to determine any significant differences in SUS scores and time across the three technologies. Additionally, the interview data was analyzed using thematic analysis to identify common themes and patterns in the experiences of the users. HDMI emerged on the top, with Ubicast not far behind.
  • Hartzell, Kai (2023)
    The concept of big data has gained immense significance due to the constant growth of data sets. The primary challenge lies in effectively managing and extracting valuable conclusions from this ever-expanding data. To address this challenge, the need for more efficient data processing frameworks has become essential. This thesis delves deeply into the concept of big data by first introducing and defining it comprehensively. Subsequently, the thesis explores a range of widely used open-source frameworks, some of which have been in existence for a considerable period already, while others have been developed to enhance the efficiency and particular aspects further. At the beginning of the thesis, three popular frameworks—MapReduce, Apache Hadoop, and Spark—are introduced. Following this, the thesis introduces popular data storage concepts and SQL engines, highlighting the growing adoption of SQL as an effective way of interaction within the field of big data analytics. The reasons behind this choice are explored, and the performances and characteristics of these systems are compared. In the later sections of the thesis, the focus shifts towards big data cloud services, with a particular emphasis on AWS (Amazon Web Services). Alternative cloud service providers are also discussed in brief. The thesis culminates in a practical demonstration of data analysis conducted on a selected dataset within three selected AWS cloud services. This involves creating scripts to gather and process data, establishing ETL pipelines, configuring databases, conducting data analysis, and documenting the experiments. The goal is to assess the advantages and disadvantages of these services and to provide a comprehensive understanding of their functionalities.
  • Länsman, Olá-Mihkku (2020)
    Demand forecasts are required for optimizing multiple challenges in the retail industry, and they can be used to reduce spoilage and excess inventory sizes. The classical forecasting methods provide point forecasts and do not quantify the uncertainty of the process. We evaluate multiple predictive posterior approximation methods with a Bayesian generalized linear model that captures weekly and yearly seasonality, changing trends and promotional effects. The model uses negative binomial as the sampling distribution because of the ability to scale the variance as a quadratic function of the mean. The forecasting methods provide highest posterior density intervals in different credible levels ranging from 50% to 95%. They are evaluated with proper scoring function and calculation of hit rates. We also measure the duration of the calculations as an important result due to the scalability requirements of the retail industry. The forecasting methods are Laplace approximation, Monte Carlo Markov Chain method, Automatic Differentiation Variational Inference, and maximum a posteriori inference. Our results show that the Markov Chain Monte Carlo method is too slow for practical use, while the rest of the approximation methods can be considered for practical use. We found out that Laplace approximation and Automatic Differentiation Variational Inference have results closer to the method with best analytical quarantees, the Markov Chain Monte Carlo method, suggesting that they were better approximations of the model. The model faced difficulties with highly promotional, slow selling, and intermittent data. Best fit was provided with high selling SKUs, for which the model provided intervals with hit rates that matched the levels of the credible intervals.
  • Rahikainen, Tintti (2023)
    Machine learning operations (MLOps) tools and practices help us continuously develop and de- ploy machine learning models as part of larger software systems. Explainable machine learning can support MLOps, and vice versa. The results of machine learning models are dependent on the data and features the models use, so understanding the features is important when we want to explain the decisions of the model. In this thesis, we aim to understand how feature stores can be used to help understand the features used by machine learning models. We compared two existing open source feature stores, Feast and Hopsworks, from an explainability point of view to explore how they can be used for explainable machine learning. We were able to use both Feast and Hopsworks to aid us in understanding the features we extracted from two different datasets. The feature stores have significant differences, Hopsworks being a part of a larger MLOps platform, and having more extensive functionalities. Feature stores provide useful tools for discovering and understanding the features for machine learning models. Hopsworks can help us understand the whole lineage of the data – where it comes from and how it has been transformed – while Feast focuses on serving the features consistently to models and needs complementing services to be as useful from an explainability point of view.
  • Laitala, Julius (2021)
    Arranging products in stores according to planograms, optimized product arrangement maps, is important for keeping up with the highly competitive modern retail market. The planograms are realized into product arrangements by humans, a process which is prone to mistakes. Therefore, for optimal merchandising performance, the planogram compliance of the arrangements needs to be evaluated from time to time. We investigate utilizing a computer vision problem setting – retail product detection – to automate planogram compliance evaluation. We introduce the relevant problems, the state-of- the-art approaches for solving them and background information necessary for understanding them. We then propose a computer vision based planogram compliance evaluation pipeline based on the current state of the art. We build our proposed models and algorithms using PyTorch, and run tests against public datasets and an internal dataset collected from a large Nordic retailer. We find that while the retail product detection performance of our proposed approach is quite good, the planogram compliance evaluation performance of our whole pipeline leaves a lot of room for improvement. Still, our approach seems promising, and we propose multiple ways for improving the performance enough to enable possible real world utility. The code used for our experiments and the weights for our models are available at https://github.com/laitalaj/cvpce
  • Tauriainen, Juha (2023)
    Software testing is an important part of ensuring software quality. Studies have shown that having more tests results in a lower count of defects. Code coverage is a tool used in software testing to find parts of the software that require further testing and to learn which parts have been tested. Code coverage is generated automatically by the test suites during test execution. Many types of code coverage metrics exist, the most common being line coverage, statement coverage, function coverage, and branch coverage metrics. These four common metrics are usually enough, but there are many specific coverage types for specific purposes, such as condition coverage which tells how many boolean conditions have been evaluated as true and false. Each different metric gives hints on how the codebase is tested. A common consensus amongst practitioners is that code coverage does not correlate much with software quality. The correlation of software quality with code coverage is a historically broadly researched topic, which has importance both in academia and professional practice. This thesis investigates if code coverage correlates with software quality by performing a literature review. Surprising results are derived from the literature review, as most studies included in this thesis point towards code coverage correlating with software quality. This positive correlation comes from 22 studies conducted between 1995-2021, which include Academic and Industrial studies, with studies put into multiple categories, such as Correlation or No correlation based on the key finding, and categories such as Survey studies, Case studies, Open-source studies, based on the study type. Each category has most studies pointing towards a correlation. This finding is in contradiction with the opinions of professional practitioners.
  • Hippeläinen, Sampo (2022)
    One of the problems with the modern widespread use of cloud services pertains to geographical location. Modern services often employ location-dependent content, in some cases even data that should not end up outside a certain geographical region. A cloud service provider may however have reasons to move services to other locations. An application running in a cloud environment should have a way to verify the location of both it and its data. This thesis describes a new solution to this problem by employing a permanently deployed hardware device which provides geolocation data to other computers in the same local network. A protocol suite for applications to check their geolocation is developed using the methodology of design science research. The protocol suite thus created uses many tried-and-true cryptographic protocols. A secure connection is established between an application server and the geolocation device, during which the authenticity of the device is verified. The location of data is ensured by checking that a storage server indeed has access to the data. Geographical proximity is checked by measuring round-trip times and setting limits for them. The new solution, with the protocol suite and hardware, is shown to solve the problem and fulfill strict requirements. It improves on the results presented in earlier work. A prototype is implemented, showing that the protocol suite can be feasible both in theory and practice. Details will however require further research.
  • Fred, Hilla (2022)
    Improving the monitoring of health and well-being of dairy cows through the use of computer vision based systems is a topic of ongoing research. A reliable and low-cost method for identifying cow individuals would enable automatic detection of stress, sickness or injury, and the daily observation of the animals would be made easier. Neural networks have been used successfully in the identification of cow individuals, but methods are needed that do not require incessant annotation work to generate training datasets when there are changes within a group. Methods for person re-identification and tracking have been researched extensively, with the aim of generalizing beyond the training set. These methods have been found suitable also for re-identifying and tracking previously unseen dairy cows in video frames. In this thesis, a metric-learning based re-identification model pre-trained on an existing cow dataset is compared to a similar model that has been trained on new video data recorded at Luke Maaninka research farm in Spring 2021, which contains 24 individually labelled cow individuals. The models are evaluated in tracking context as appearance descriptors in Kalman filter based tracking algorithm. The test data is video footage from a separate enclosure in Maaninka and a group of 24 previously unseen cow individuals. In addition, a simple procedure is proposed for the automatic labeling of cow identities in images based on RFID data collected from cow ear tags and feeding stations, and the known feeding station locations.
  • Bui, Minh (2021)
    Background. In API requests to a confidential data system, there always are sets of rules that the users must follow to retrieve desired data within their granted permission. These rules are made to assure the security of the system and limit all possible violations. Objective. The thesis is about detecting the violations of these rules in such systems. For any violation found, the request is considered as containing inconsistency and it must be fixed before retrieving any data. This thesis also looks for all diagnoses of inconsistencies requests. These diagnoses promote reconstructing the requests to remove any inconsistency. Method. In this thesis, we choose the design science research methodology to work on solutions. In this methodology, the current problem in distributing data from a smart building plays as the main motivation. Then, system design and development are implemented to prove the found solutions of practicality, while a testing system is built to confirm its validity. Results. The inconsistencies detection is considered as a diagnostic problem, and many algorithms have been found to resolve the diagnostic problem for decades. The algorithms are developed based on DAG algorithms and preserved to apply on different purposes. This thesis is based on these algorithms and constraint programming techniques to resolve the facing issues of the given confidential data system. Conclusions. A combination of constraint programming techniques and DAG algorithms for diagnostic problems can be used to resolve inconsistencies detection in API requests. Despite the need on performance improvement in application of these algorithms, the combination works effectively, and can resolve the research problem.
  • Ahonen, Heikki (2020)
    The research group dLearn.Helsinki has created a software for defining the work life competence skills of a person, working as a part of a group. The software is a research tool for developing the mentioned skills of users, and users can be of any age, from school children to employees in a company. As the users can be of different age groups, the data privacy of different groups has to be taken into consideration from different aspects. Children are more vulnerable than adults, and may not understand all the risks imposed to-wards them. Thus in the European Union the General Data Protection Regulation (GDPR)determines the privacy and data of children are more protected, and this has to be taken into account when designing software which uses said data. For dLearn.Helsinki this caused changes not only in the data handling of children, but also other users. To tackle this problem, existing and future use cases needed to be planned and possibly implemented. Another solution was to implement different versions of the software, where the organizations would be separate. One option would be determining organizational differences in the existing SaaS solution. The other option would be creating on-premise versions, where organizations would be locked in accordance to the customer type. This thesis introduces said use cases, as well as installation options for both SaaS and on-premise. With these, broader views of data privacy and the different approaches are investigated, and it can be concluded that no matter the approach, the data privacy of children will always prove a challenge.
  • Hiillos, Nicolas (2023)
    This master's thesis describes the development and validation of a uniform control interface for drawing robots with ROS2. The robot control software was tasked with taking SVG images as input and producing them as drawings with three different robots. These robots are the Evil Mad Scientist AxiDraw V3/A3, UFACTORY xArm Lite6, and virtual xArm Lite6. The intended use case for the robots and companion control software is experiments studying human perception of the creativity of the drawing robots. The control software was implemented over the course of a little over six months and used a combination of C++ and Python. The design of the software utilizes ROS2 abstractions such as nodes and topics to combine different components of the software. The control software is validated against the given requirements and found to fulfil the main objectives of the project. The most important of these are that the robots successfully draw SVG images, that they do so in a similar time frame, and that these images look very similar. Drawing similarity was tested by scanning images, aligning them using using minimal error, and then comparing them visually after overlaying the images. Comparing aligned images was useful in detecting subtle differences in the drawing similarity of the robots and was used to discover issues with the robot control software. MSE and SSIM were also calculated for a set of these aligned images, allowing for the effect of future changes made to the robot control software to be quantitatively evaluated. Drawing time for the robots was evaluated by measuring the time taken for drawing a set of images. This testing showed that the Axidraw's velocity and acceleration needed to be reduced by 56% so that the xArm Lite6 could draw in similar time.
  • Hertweck, Corinna (2020)
    In this work, we seek robust methods for designing affirmative action policies for university admissions. Specifically, we study university admissions under a real centralized system that uses grades and standardized test scores to match applicants to university programs. For the purposes of affirmative action, we consider policies that assign bonus points to applicants from underrepresented groups with the goal of preventing large gaps in admission rates across groups, while ensuring that the admitted students are for the most part those with the highest scores. Since such policies have to be announced before the start of the application period, there is uncertainty about which students will apply to which programs. This poses a difficult challenge for policy-makers. Hence, we introduce a strategy to design policies for the upcoming round of applications that can either address a single or multiple demographic groups. Our strategy is based on application data from previous years and a predictive model trained on this data. By comparing this predictive strategy to simpler strategies based only on application data from, e.g., the previous year, we show that the predictive strategy is generally more conservative in its policy suggestions. As a result, policies suggested by the predictive strategy lead to more robust effects and fewer cases where the gap in admission rates is inadvertently increased through the suggested policy intervention. Our findings imply that universities can employ predictive methods to increase the reliability of the effects expected from the implementation of an affirmative action policy.
  • Laaja, Oskari (2022)
    Mobile applications have become common and end-users expect to be able to use either of the major platforms: iOS or Android. The expectation of finding the application in their respected platform stores is strongly present. The process of publishing mobile applications into these application stores can be cumbersome. The frequency of mobile application updates can be damaged by the heaviness of the process, reducing the end-user satisfaction. As manually completed processes are prone to human errors, the robustness of the process decreases and the quality of the application may diminish. This thesis presents an automated pipeline to complete the process of publishing cross-platform mobile application into App Store and Play Store. The goal of this pipeline is to make the process faster to complete, more robust and more accessible to people without technical knowhow. The work was done with design science methodology. As results, two artifacts are generated from this thesis: a model of a pipeline design to improve the process and implementation of said model to functionally prove the possibility of the design. The design is evaluated against requirements set by the company for which the implementation was done. As a result, the process used in the project at which the implementation was taken into use got faster, simpler and became possible for non-development personnel to use.
  • Ikkala, Tapio (2020)
    This thesis presents a scalable method for identifying anomalous periods of non-activity in short periodic event sequences. The method is tested with real world point-of-sale (POS) data from grocery retail setting. However, the method can be applied also to other problem domains which produce similar sequential data. The proposed method models the underlying event sequence as a non-homogeneous Poisson process with a piecewise constant rate function. The rate function for the piecewise homogeneous Poisson process can be estimated with a change point detection algorithm that minimises a cost function consisting of the negative Poisson log-likelihood and a penalty term that is linear to the number of change points. The resulting model can be queried for anomalously long periods of time with no events, i.e., waiting times, by defining a threshold below which the waiting time observations are deemed anomalies. The first experimental part of the thesis focuses on model selection, i.e., in finding a penalty value that results in the change point detection algorithm detecting the true changes in the intensity of the arrivals of the events while not reacting to random fluctuations in the data. In the second experimental part the performance of the anomaly detection methodology is measured against stock-out data, which gives an approximate ground truth for the termination of a POS event sequence. The performance of the anomaly detector is found to be subpar in terms of precision and recall, i.e., the true positive rate and the positive predictive value. The number of false positives remains high even with small threshold values. This needs to be taken into account when considering applying the anomaly detection procedure in practice. Nevertheless, the methodology may have practical value in the retail setting, e.g., in guiding the store personnel where to focus their resources in ensuring the availability of the products.
  • Torppa, Tuomo (2021)
    User-centered design (UCD) and agile software development (ASDP) both answer separate answers for issues modern software development projects face, but no direct guidelines on how to implement both in one project exist. Relevant literature offers multiple separate detailed techniques, but the applicability of the techniques is dependant on multiple features of the development team, e.g., personnel and expertise available and the size of the team. In this thesis, we propose a new agile development process model, which is created through evaluating the existing UCD–ASDP combination methods suggested in current literature to find the most suitable application methods to the case this study is applied to. In this new method, the development team is taken to do their daily work physically near to the software’s end- users for a short period of time to make the software team as easily accessible as possible. This method is then applied within an ongoing software project for a two week period in which the team visits two separate locations where end-users have the possibility to meet the development team. This introduced "touring" method ended up offering the development team a valuable under-standing of the skill and involvement level of the end-users they met without causing significant harm to the developer experience. The end-users were pleased with the visits and the method gained support and suggestions for future applications.
  • Sokkanen, Joel (2023)
    DevOps software development methodologies have steadily gained ground over the past 15 years. Properly implemented DevOps enables the software to be integrated and deployed at a rapid pace. The implementation of DevOps practices create pressure for software testing. In the world of fast-paced integrations and deployments, software testing must perform its quality assurance function quickly and efficiently. The goal of this thesis was to identify the most relevant DevOps software testing practices and their impact on software testing. Software testing in general is a widely studied topic. This thesis looks into the recent develop- ments of software testing in DevOps. The primary sources of this study consist of 15 academic papers, which were collected with the systematic literature review source collection methodolo- gies. The study combines both systematic literature review and rapid review methodologies. The DevOps software testing practices associated with high level of automation, continuous testing and DevOps culture adoption stood out in the results. These were followed by the practices highlighting the need for flexible and versatile test tooling and test infrastructures. DevOps adoption requires the team composition and responsibilities to be carefully planned. The selected testing practices should be carefully chosen. Software testing should be primarily organized in highly automated DevOps pipelines. Manual testing should be utilized to validate the results of the automatic tests. Continuous testing, multiple testing levels and versatile test tooling should be utilized. Integration and regression testing should be run on all code changes. Application monitoring and the collection of telemetry data should be utilized to improve the tests.
  • Sarapalo, Joonas (2020)
    The page hit counter system processes, counts and stores page hit counts gathered from page hit events from a news media company’s websites and mobile applications. The system serves a public application interface which can be queried over the internet for page hit count information. In this thesis I will describe the process of replacing a legacy page hit counter system with a modern implementation in the Amazon Web Services ecosystem utilizing serverless technologies. The process includes the background information, the project requirements, the design and comparison of different options, the implementation details and the results. Finally, I will show how the new system implemented with Amazon Kinesis, AWS Lambda and Amazon DynamoDB has running costs that are less than half of that of the old one’s.
  • Duong, Quoc Quan (2021)
    Discourse dynamics is one of the important fields in digital humanities research. Over time, the perspectives and concerns of society on particular topics or events might change. Based on the changing in popularity of a certain theme different patterns are formed, increasing or decreasing the prominence of the theme in news. Tracking these changes is a challenging task. In a large text collection discourse themes are intertwined and uncategorized, which makes it hard to analyse them manually. The thesis tackles a novel task of automatic extraction of discourse trends from large text corpora. The main motivation for this work lies in the need in digital humanities to track discourse dynamics in diachronic corpora. Machine learning is a potential method to automate this task by learning patterns from the data. However, in many real use-cases ground truth is not available and annotating discourses on a corpus-level is incredibly difficult and time-consuming. This study proposes a novel procedure to generate synthetic datasets for this task, a quantitative evaluation method and a set of benchmarking models. Large-scale experiments are run using these synthetic datasets. The thesis demonstrates that a neural network model trained on such datasets can obtain meaningful results when applied to a real dataset, without any adjustments of the model.
  • Harhio, Säde (2022)
    The importance of software architecture design decisions has been known for almost 20 years. Knowledge vaporisation is a problem in many projects, especially in the current fast-paced culture, where developers often switch from project to another. Documenting software architecture design decisions helps developers understand the software better and make informed decisions in the future. However, documenting architecture design decisions is highly undervalued. It does not create any revenue in itself, and it is often the disliked and therefore neglected part of the job. This literature review explores what methods, tools and practices are being suggested in the scientific literature, as well as, what practitioners are recommending within the grey literature. What makes these methods good or bad is also investigated. The review covers the past five years and 36 analysed papers. The evidence gathered shows that most of the scientific literature concentrates on developing tools to aid the documentation process. Twelve out of nineteen grey literature papers concentrate on Architecture Decision Records (ADR). ADRs are small template files, which as a collection describe the architecture of the entire system. The ADRs appear to be what practitioners have become used to using over the past decade, as they were first introduced in 2011. What is seen as beneficial in a method or tool is low-cost and low-effort, while producing concise, good quality content. What is seen as a drawback is high-cost, high-effort and producing too much or badly organised content. The suitability of a method or tool depends on the project itself and its requirements.