Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Tietojenkäsittelytieteen maisteriohjelma"

Sort by: Order: Results:

  • Colliander, Camilla (2022)
    Software development speed has significantly increased in recent years with methodologies like Agile and DevOps that use automation, among other technics, to enable continuous delivery of new features and software updates to the market. This increased speed has given rise to concerns over guaranteeing security at such a pace. To improve security in today’s fast-paced software development, DevSecOps was created as an extension of DevOps. This thesis focuses on the experiences and challenges of organizations and teams striving to implement DevSecOps. We first view our concepts through existing literature. Then, we conduct an online survey of 37 professionals from both security and development backgrounds. The results present the participants’ overall sentiments towards DevSecOps and the challenges they struggle with. We also investigate what kind of solutions have been tried to mitigate these issues and if these solutions have indeed worked.
  • Ritala, Susanna (2021)
    Chatbotteja on kehitetty jo vuosikymmenten ajan, mutta nykyinen kiinnostus on kasvanut niihin teknologian kehityksen myötä. Chatbotit palvelevat ihmisiä eri tarkoituksissa ja niiden toiminta perustuu keskusteluun ihmisen kanssa. Chatbotit tarjoavat henkilökohtaista palvelua vuorokauden jokaisena hetkenä, jonka vuoksi niiden tarve on lisääntynyt monilla aloilla, kuten verkkomyynnissä ja terveydenhuollossa. Chatbottien kehityksessä on tärkeää pohtia niiden toteutusta. Monet käyttäjät suosivat edelleen muita informaationlähteitä heidän ongelmiensa ratkaisuun. Yksi tapa mitata chatbot-järjestelmien laatua on tutkia niiden käyttäjäkokemusta. Tässä tutkielmassa tarkastellaan empiirisesti chatbot-sovellusten käyttäjäkokemusta. Empiirisen osion muodostaa laadullinen tutkimus, jonka avulla pyritään vastaamaan seuraavaan tutkimuskysymykseen: Kuinka chatbottien käyttäjäkokemusta voitaisiin parantaa? Tutkimus järjestettiin Osaamisbotti-palvelun kanssa, joka tarjosi testiympäristön tutkimuksen suorittamiselle. Tutkimukseen osallistui kahdeksan henkilöä, jotka suorittivat heille annetun tehtävän keskustelemalla chatbotin kanssa. Tutkimuksen aineisto on saatu protokolla-analyysin ja sen jälkeisen haastattelun keinoin. Tulokset esittävät, että ihmismäiset keskustelukyvyt, pidemmät vastaukset sekä tehokas keskustelun kulku parantavat chatbottien käyttäjäkokemusta. Lisäksi riittävällä informoinnilla ohjataan keskustelua sekä vältetään virhetilanteita. Chatbottien hyvällä saatavuudella sekä helppokäyttöisyydellä kasvatetaan niiden hyväksyntää ja käyttöönottoa. Tutkielman tuloksia voidaan hyödyntää tulevissa tutkimuksissa ja chatbottien kehitystyössä.
  • Wu, Qinglu (2023)
    Background: BIM (Building Information Modelling) has helped the construction industry with better workflow and collaboration. To further integrate technologies into the construction industry, research and applications are actively integrating cloud technologies into traditional BIM design workflow. Such integration can be referred to as Cloud BIM, which is considered the second generation of BIM development. Cloud BIM is related to many aspects including technical implementation, workflow improvement, and collaboration of different roles. Aims: In this thesis, we want to find the current situation of Cloud BIM, identifying the benefits and challenges as well as possible technical solutions to the challenges. Methods: We conducted a literature review and analyzed eleven selected papers to gather the necessary data for this thesis. We then did a case study of an integration of two applications, to understand the real challenges in an actual implementation of a cloud-based BIM solution. Results: Cloud BIM can mainly benefit collaboration and information exchange. However, many challenges still exist both in technical and non-technical aspects that require more work. Our integration explored a deeper and more cloud-based solution in a certain process of BIM projects. The main challenge we faced is inconsistent data standards. Conclusions: The results show that the industry is on the way to integrating the cloud into BIM. However, more work needs to be done to overcome the challenges.
  • Ma, Jun (2021)
    Sequence alignment by exact or approximate string matching is one of the fundamental problems in bioinformatics. As the volume of sequenced genomes grows rapidly, pairwise sequence alignment becomes inefficient for pan-genomic analyses involving multiple sequences. The graph representation of multiple genomes has been an increasingly useful tool in pan-genomics research. Therefore, sequence-to-graph alignment becomes an important and challenging problem. For pairwise approximate sequence alignment under Levenshtein (edit) distance, subquadratic algorithms for finding an optimal solution are unknown. As a result, aligning sequences of millions of characters optimally is too challenging and impractical. Thus, many heuristics and techniques are developed for possibly suboptimal alignments. Among them, co-linear chaining (CLC) is a powerful and popular technique that approximates the alignment by finding a chain of short aligned fragments that may come from exact matching. The optimal solution to CLC on sequences can be found efficiently in subquadratic time. For sequence-to-graph alignment, the CLC problem has been solved theoretically on a special class of graphs that are narrow and have no cycles, i.e. directed acyclic graphs (DAGs) with small width, by Mäkinen et al. (ACM Transactions on Algorithms, 2019). Pan-genome graphs such as variation graphs satisfy these restrictions but allowing cycles may enable more general applications of the algorithm. In this thesis, we introduce an efficient algorithm to solve the CLC problem on general graphs with small width that may have cycles, by reducing it to a slightly modified CLC problem on DAGs. We implemented an initial version of the new algorithm on DAGs as a sequence-to-graph aligner GraphChainer. The aligner is evaluated and compared to an existing state-of-the-art aligner GraphAligner (Genome Biology, 2020) in experiments using both simulated and real genome assembly data on variation graphs. Our method improves the quality of alignments significantly in the task of aligning real human PacBio data. GraphChainer is freely available as an open source tool at
  • Martesuo, Kim (2019)
    Creating a user interface (UI) is often a part of software development. In the software industry designated UI designers work side by side with the developers in agile software development teams. While agile software processes have been researched, yet there is no general consensus on how UI designers should be integrated with the developing team. The existing research points towards the industry favoring tight collaboration between developers and UI designers by having them work together in the same team. The subject is gathering interest and different ways of integration is happening in the industry. In this thesis we researched the collaboration between developers and UI designers in agile software development. The goal was to understand the teamwork between the UI designers and developers working in the same agile software teams. The research was conducted by doing semi-structured theme interviews with UI designers and devel- opers individually. The interviewees were from consulting firms located in the Helsinki metropolitan are in Finland. The subjects reported about a recent project where they worked in an agile software team consisting of UI designers and developers. The data from the interviews was compared to the literature. Results of the interviews were similar to the findings from the literature for the most part. Finding a suitable process for the teamwork, co-location, good social relations and a an atmosphere of trust were factors present in the literature and the interviews. The importance of good software tools for communicating designs, and developers taking part in the UI designing process stood out from the interviews.
  • Xu, Weiyi (2024)
    Hashing is a one-way encryption method that can be used for data integrity verification, for example, in digital signature systems. The Ssdeep algorithm is a classic context-triggered piecewise hashing function that is commonly used for similarity file check. The input is divided into separate block segments for the signature generation so that the modification of some parts only makes a difference to certain bytes of the signature. This characteristic makes it one of the most popular fuzzy hash algorithms for detecting similar information. Nevertheless, the cryptanalysis of the Ssdeep is missing in previous research. Therefore, in this thesis, we propose collision attack methods based on such vulnerabilities to test the feasibility of using two different inputs to obtain the same signature. Specifically, our objective is to let the attacker add custom comments to a code file, so that the output signature is identical to a previously acknowledge signature by the target system using Ssdeep. In our work, we identify the vulnerabilities in the traditional hash and the rolling hash in the Ssdeep calculation. We further name three useful elements, the reset string, the matching character, and the trigger string, to control the Ssdeep process. Specifically for finding the matching character, we use brute force and modular multiplicative inverse numbers, based on which we further propose two implementation versions: coarse-grained and fine-grained differed in the number of potential solution states. Additionally, we investigate the block size, a parameter which is dependent on the file content length, so that the proposed attack methods can work under various realistic scenarios. We test our work by comprehensive experiments, and the results verify the effectiveness of our methods in collision attacks on the Ssdeep algorithm. Our work gives thoughtful insights into the breaking of data integrity in other fuzzy and context-triggered piecewise hashing algorithms.
  • Song, Xingchang (2022)
    Quantum networking is developing fast as an emerging research field. Distributing entangled qubits between any two locations in a quantum network is one of the goals of quantum networking, in which repeaters can be used to extend the length of entanglement. Although researchers focus extensively on problems inside one quantum network, further study on communication between quantum networks is necessary because the next possible evolution of quantum networking is the communication between two or more autonomous quantum networks. In this thesis, we adapted a time slotted model from the literature to study the inter quantum network routing problem. Quantum routing problem can be split into path selection and request scheduling. We focus on the latter considering the previous one received considerable interest in the literature. Five request scheduling policies are proposed to study the impact of preference for certain request types on entanglement generation rate. Experiments also demonstrate other factors should be considered in context of entanglement rate in communication between quantum networks, e.g., the number and distribution of requests and inter-network distance.
  • Akkanen, Saara (2023)
    This Master’s Thesis describes an original user study that took place at the University of Helsinki. The study compares and evaluates the usability of three different methods that are used in meeting rooms to share a private device’s screen on a big public screen in order to give a slideshow presentation: HDMI, VIA, and Ubicast. There were 18 participants. The study was conducted in a controlled environment, replicating a typical meeting room setup. The experiment consisted of screen mirroring tasks and an interview. In a screen mirroring task, the participants were asked to share their screen using each of the three technologies. They were provided with the necessary equipment and user guides if needed. Then the participants were given training on how to use the technologies, and they performed the tasks again. During the task, the time taken to complete each screen mirroring session was recorded, and any errors or difficulties encountered were noted. After completing the screen mirroring tasks, participants were interviewed to gather qualitative data on their experiences and preferences. They were asked about the ease of use, efficiency, and any difficulties they faced while using each technology. This information was used to gain insights into user preferences and potential areas for improvement in the respective technologies. To analyze the data, the System Usability Scale (SUS) scores and time taken to complete the screen mirroring tasks were calculated for each technology. Statistical analyses were conducted to determine any significant differences in SUS scores and time across the three technologies. Additionally, the interview data was analyzed using thematic analysis to identify common themes and patterns in the experiences of the users. HDMI emerged on the top, with Ubicast not far behind.
  • Mykkänen, Arttu (2024)
    Lossy image compression algorithms are used everywhere, ranging from general photography to natively running applications and the web. The most important question regarding a lossy image compression algorithm is arguably its rate-distortion performance. However, recent research has emphasized the fact that subjective experience of image quality can come at the expense of added or more pronounced distortions for low bit rates. The main focus of this thesis is to form a comprehensive view into standard lossy image compression, and to compare both classical and neural lossy image compression methods in order to inform their selection and use. Moreover, we focus on image quality in addition to traditional rate-distortion considerations. Various lossy image compression algorithms are discussed in some detail for a comprehensive view. Furthermore, we compare them using the most widely used objective IQA measures, the PSNR, the SSIM, the MS-SSIM, as well as a crowdsourced subjective image quality comparison questionnaire. The results indicate that neural methods easily dominate with respect to both objective and subjective scores. However, rank ordering reveals some discrepancies, indicating that high objective measurement scores do not always align with subjective experience. The thesis gives detailed explanations as well as objective and subjective scores for eight different classical or neural lossy image compression algorithms: JPEG, JPEG2000, WebP, BPG, VVC intra coding, HiFiC, the Coarse-to-fine hyperprior model, and iWave++.
  • Gaisins, Edgars (2024)
    Graph drawing in two dimensions is an established area of research with many proposed drawing algorithms and drawing quality measures. Most research has remained in two dimensions because of the limitations of paper and traditional monitors. However, with the advancements in head-mounted VR and AR headsets, it is possible to visualise graphs in true three dimensions which allows the viewer to comprehend larger graphs. To provide a useful graph viewing experience, the graph needs to be laid out in a way that highlights important features and structures. In this paper, we provide a collection of measures that measure the quality of 3D graph layouts and analyse their usefulness. We also measure the quality and performance of eleven graph layout algorithms on synthetic and real-world graphs and give suggestions for choosing an algorithm.
  • Hartzell, Kai (2023)
    The concept of big data has gained immense significance due to the constant growth of data sets. The primary challenge lies in effectively managing and extracting valuable conclusions from this ever-expanding data. To address this challenge, the need for more efficient data processing frameworks has become essential. This thesis delves deeply into the concept of big data by first introducing and defining it comprehensively. Subsequently, the thesis explores a range of widely used open-source frameworks, some of which have been in existence for a considerable period already, while others have been developed to enhance the efficiency and particular aspects further. At the beginning of the thesis, three popular frameworks—MapReduce, Apache Hadoop, and Spark—are introduced. Following this, the thesis introduces popular data storage concepts and SQL engines, highlighting the growing adoption of SQL as an effective way of interaction within the field of big data analytics. The reasons behind this choice are explored, and the performances and characteristics of these systems are compared. In the later sections of the thesis, the focus shifts towards big data cloud services, with a particular emphasis on AWS (Amazon Web Services). Alternative cloud service providers are also discussed in brief. The thesis culminates in a practical demonstration of data analysis conducted on a selected dataset within three selected AWS cloud services. This involves creating scripts to gather and process data, establishing ETL pipelines, configuring databases, conducting data analysis, and documenting the experiments. The goal is to assess the advantages and disadvantages of these services and to provide a comprehensive understanding of their functionalities.
  • Länsman, Olá-Mihkku (2020)
    Demand forecasts are required for optimizing multiple challenges in the retail industry, and they can be used to reduce spoilage and excess inventory sizes. The classical forecasting methods provide point forecasts and do not quantify the uncertainty of the process. We evaluate multiple predictive posterior approximation methods with a Bayesian generalized linear model that captures weekly and yearly seasonality, changing trends and promotional effects. The model uses negative binomial as the sampling distribution because of the ability to scale the variance as a quadratic function of the mean. The forecasting methods provide highest posterior density intervals in different credible levels ranging from 50% to 95%. They are evaluated with proper scoring function and calculation of hit rates. We also measure the duration of the calculations as an important result due to the scalability requirements of the retail industry. The forecasting methods are Laplace approximation, Monte Carlo Markov Chain method, Automatic Differentiation Variational Inference, and maximum a posteriori inference. Our results show that the Markov Chain Monte Carlo method is too slow for practical use, while the rest of the approximation methods can be considered for practical use. We found out that Laplace approximation and Automatic Differentiation Variational Inference have results closer to the method with best analytical quarantees, the Markov Chain Monte Carlo method, suggesting that they were better approximations of the model. The model faced difficulties with highly promotional, slow selling, and intermittent data. Best fit was provided with high selling SKUs, for which the model provided intervals with hit rates that matched the levels of the credible intervals.
  • Rahikainen, Tintti (2023)
    Machine learning operations (MLOps) tools and practices help us continuously develop and de- ploy machine learning models as part of larger software systems. Explainable machine learning can support MLOps, and vice versa. The results of machine learning models are dependent on the data and features the models use, so understanding the features is important when we want to explain the decisions of the model. In this thesis, we aim to understand how feature stores can be used to help understand the features used by machine learning models. We compared two existing open source feature stores, Feast and Hopsworks, from an explainability point of view to explore how they can be used for explainable machine learning. We were able to use both Feast and Hopsworks to aid us in understanding the features we extracted from two different datasets. The feature stores have significant differences, Hopsworks being a part of a larger MLOps platform, and having more extensive functionalities. Feature stores provide useful tools for discovering and understanding the features for machine learning models. Hopsworks can help us understand the whole lineage of the data – where it comes from and how it has been transformed – while Feast focuses on serving the features consistently to models and needs complementing services to be as useful from an explainability point of view.
  • Laitala, Julius (2021)
    Arranging products in stores according to planograms, optimized product arrangement maps, is important for keeping up with the highly competitive modern retail market. The planograms are realized into product arrangements by humans, a process which is prone to mistakes. Therefore, for optimal merchandising performance, the planogram compliance of the arrangements needs to be evaluated from time to time. We investigate utilizing a computer vision problem setting – retail product detection – to automate planogram compliance evaluation. We introduce the relevant problems, the state-of- the-art approaches for solving them and background information necessary for understanding them. We then propose a computer vision based planogram compliance evaluation pipeline based on the current state of the art. We build our proposed models and algorithms using PyTorch, and run tests against public datasets and an internal dataset collected from a large Nordic retailer. We find that while the retail product detection performance of our proposed approach is quite good, the planogram compliance evaluation performance of our whole pipeline leaves a lot of room for improvement. Still, our approach seems promising, and we propose multiple ways for improving the performance enough to enable possible real world utility. The code used for our experiments and the weights for our models are available at
  • Rensing, Fabian (2024)
    Accurately predicting a ship’s fuel consumption is essential for an efficient shipping operation. A prediction model has to be regularly retrained to minimize drift between its predictions and the actual consumption of the ship since a ship’s performance is constantly changing because of weather influences and constant hull fouling. Continuous Learning (CL) promises repeated retraining of an ML model while also mitigating catastrophic forgetting. The so-called catastrophic forgetting happens when a model is trained on new data without proper measures to “remind” the model of its previous knowledge. In the context of Ship Performance Prediction, this might be previously encountered weather or performance patterns in certain conditions. This thesis explores the adaptability of CL to set up a production-ready training pipeline to regularly retrain a model that predicts a ship’s fuel consumption.
  • Tauriainen, Juha (2023)
    Software testing is an important part of ensuring software quality. Studies have shown that having more tests results in a lower count of defects. Code coverage is a tool used in software testing to find parts of the software that require further testing and to learn which parts have been tested. Code coverage is generated automatically by the test suites during test execution. Many types of code coverage metrics exist, the most common being line coverage, statement coverage, function coverage, and branch coverage metrics. These four common metrics are usually enough, but there are many specific coverage types for specific purposes, such as condition coverage which tells how many boolean conditions have been evaluated as true and false. Each different metric gives hints on how the codebase is tested. A common consensus amongst practitioners is that code coverage does not correlate much with software quality. The correlation of software quality with code coverage is a historically broadly researched topic, which has importance both in academia and professional practice. This thesis investigates if code coverage correlates with software quality by performing a literature review. Surprising results are derived from the literature review, as most studies included in this thesis point towards code coverage correlating with software quality. This positive correlation comes from 22 studies conducted between 1995-2021, which include Academic and Industrial studies, with studies put into multiple categories, such as Correlation or No correlation based on the key finding, and categories such as Survey studies, Case studies, Open-source studies, based on the study type. Each category has most studies pointing towards a correlation. This finding is in contradiction with the opinions of professional practitioners.
  • Hippeläinen, Sampo (2022)
    One of the problems with the modern widespread use of cloud services pertains to geographical location. Modern services often employ location-dependent content, in some cases even data that should not end up outside a certain geographical region. A cloud service provider may however have reasons to move services to other locations. An application running in a cloud environment should have a way to verify the location of both it and its data. This thesis describes a new solution to this problem by employing a permanently deployed hardware device which provides geolocation data to other computers in the same local network. A protocol suite for applications to check their geolocation is developed using the methodology of design science research. The protocol suite thus created uses many tried-and-true cryptographic protocols. A secure connection is established between an application server and the geolocation device, during which the authenticity of the device is verified. The location of data is ensured by checking that a storage server indeed has access to the data. Geographical proximity is checked by measuring round-trip times and setting limits for them. The new solution, with the protocol suite and hardware, is shown to solve the problem and fulfill strict requirements. It improves on the results presented in earlier work. A prototype is implemented, showing that the protocol suite can be feasible both in theory and practice. Details will however require further research.
  • Fred, Hilla (2022)
    Improving the monitoring of health and well-being of dairy cows through the use of computer vision based systems is a topic of ongoing research. A reliable and low-cost method for identifying cow individuals would enable automatic detection of stress, sickness or injury, and the daily observation of the animals would be made easier. Neural networks have been used successfully in the identification of cow individuals, but methods are needed that do not require incessant annotation work to generate training datasets when there are changes within a group. Methods for person re-identification and tracking have been researched extensively, with the aim of generalizing beyond the training set. These methods have been found suitable also for re-identifying and tracking previously unseen dairy cows in video frames. In this thesis, a metric-learning based re-identification model pre-trained on an existing cow dataset is compared to a similar model that has been trained on new video data recorded at Luke Maaninka research farm in Spring 2021, which contains 24 individually labelled cow individuals. The models are evaluated in tracking context as appearance descriptors in Kalman filter based tracking algorithm. The test data is video footage from a separate enclosure in Maaninka and a group of 24 previously unseen cow individuals. In addition, a simple procedure is proposed for the automatic labeling of cow identities in images based on RFID data collected from cow ear tags and feeding stations, and the known feeding station locations.
  • Bui, Minh (2021)
    Background. In API requests to a confidential data system, there always are sets of rules that the users must follow to retrieve desired data within their granted permission. These rules are made to assure the security of the system and limit all possible violations. Objective. The thesis is about detecting the violations of these rules in such systems. For any violation found, the request is considered as containing inconsistency and it must be fixed before retrieving any data. This thesis also looks for all diagnoses of inconsistencies requests. These diagnoses promote reconstructing the requests to remove any inconsistency. Method. In this thesis, we choose the design science research methodology to work on solutions. In this methodology, the current problem in distributing data from a smart building plays as the main motivation. Then, system design and development are implemented to prove the found solutions of practicality, while a testing system is built to confirm its validity. Results. The inconsistencies detection is considered as a diagnostic problem, and many algorithms have been found to resolve the diagnostic problem for decades. The algorithms are developed based on DAG algorithms and preserved to apply on different purposes. This thesis is based on these algorithms and constraint programming techniques to resolve the facing issues of the given confidential data system. Conclusions. A combination of constraint programming techniques and DAG algorithms for diagnostic problems can be used to resolve inconsistencies detection in API requests. Despite the need on performance improvement in application of these algorithms, the combination works effectively, and can resolve the research problem.
  • Ahonen, Heikki (2020)
    The research group dLearn.Helsinki has created a software for defining the work life competence skills of a person, working as a part of a group. The software is a research tool for developing the mentioned skills of users, and users can be of any age, from school children to employees in a company. As the users can be of different age groups, the data privacy of different groups has to be taken into consideration from different aspects. Children are more vulnerable than adults, and may not understand all the risks imposed to-wards them. Thus in the European Union the General Data Protection Regulation (GDPR)determines the privacy and data of children are more protected, and this has to be taken into account when designing software which uses said data. For dLearn.Helsinki this caused changes not only in the data handling of children, but also other users. To tackle this problem, existing and future use cases needed to be planned and possibly implemented. Another solution was to implement different versions of the software, where the organizations would be separate. One option would be determining organizational differences in the existing SaaS solution. The other option would be creating on-premise versions, where organizations would be locked in accordance to the customer type. This thesis introduces said use cases, as well as installation options for both SaaS and on-premise. With these, broader views of data privacy and the different approaches are investigated, and it can be concluded that no matter the approach, the data privacy of children will always prove a challenge.