Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Magisterprogrammet i datavetenskap"

Sort by: Order: Results:

  • Vidjeskog, Martin (2022)
    The traditional way of computing the Burrows-Wheeler transform (BWT) has been to first build a suffix array, and then use this newly formed array to obtain the BWT. While this approach runs in linear time, the space requirement is far from optimal. When the length of the input string increases, the required working space quickly becomes too large for normal computers to handle. To overcome this issue, researchers have proposed many different types of algorithms for building the BWT. In 2009, Daisuke Okanohara and Kunihiko Sadakane presented a new linear time algorithm for BWT construction. This algorithm is relatively fast and requires far less working space than the traditional way of computing the BWT. It is based on a technique called induced sorting and can be seen as a state-of-the-art approach for internal memory BWT construction. However, a proper exploration of how to implement the algorithm efficiently has not been undertaken. One 32-bit implementation of the algorithm is known to exist, but due to the limitations of 32-bit programs, it can only be used for input strings under the size of 4 GB. This thesis introduces the algorithm from Okanohara and Sadakane and implements a 64-bit version of it. The implemented algorithm can in theory support input strings that are thousands of gigabytes in size. In addition to the explanation of the algorithm, the time and space requirements of the 64-bit implementation are compared to some other fast BWT algorithms.
  • Zhou, Xinyuan (2024)
    This thesis explores the integration of Augmented Reality (AR) into social media platforms, taking Snapchat's AR as a case study. Design science research methodology is used as a research method. The primary question addressed is how augmented reality enhances user interaction and engagement in social media. The second research question is about challenges and considerations in integrating AR into social media platforms. An artifact including a series of AR features is developed by utilizing Lens Studio. User interaction features such as 3D object manipulation, distance-opacity mapping and camera interaction transformations are expected to bring immersive AR experience to users. The combination of AR and cloud-based technologies for location-based AR, data management and multi-user scenarios is also discussed. A structured experiment is conducted to evaluate the effectiveness of AR features in enhancing user engagement. The developed AR features are distributed to participants to experience via Snapchat QR codes then they provid feedback through a detailed questionnaire. The evaluation focused on metrics such as time spent, interaction frequency, depth of interaction, and technical performance, revealing significant insights into both user engagement and technical challenges. The findings confirm that AR significantly increases user engagement, with a majority of participants willing to spend more time and interact more frequently with Snapchat due to the AR features. However, technical challenges like battery drain and response time were highlighted. The thesis concludes that while AR has great potential to enhance social media experiences, ongoing improvements in technical infrastructure are essential to fully realize this potential. Future research should explore the long-term impacts of AR and its scalability across different platforms.
  • Luopajärvi, Kalle (2022)
    In independent component analysis the data is decomposed into its statistically independent components. In recent years, statistical models have been developed that solve a non-linear version of the independent component analysis. This thesis focuses on the estimation methods of a particular non-linear independent component analysis model called iVAE. It is shown on simulated data that the generative adversarial networks can significantly improve the iVAE model estimation compared with the previously used default iVAE estimation method. The improved model estimation might enable new applications for the iVAE model.
  • Luopajärvi, Kalle (2022)
    In independent component analysis the data is decomposed into its statistically independent components. In recent years, statistical models have been developed that solve a non-linear version of the independent component analysis. This thesis focuses on the estimation methods of a particular non-linear independent component analysis model called iVAE. It is shown on simulated data that the generative adversarial networks can significantly improve the iVAE model estimation compared with the previously used default iVAE estimation method. The improved model estimation might enable new applications for the iVAE model.
  • Laakso, Atte (2023)
    This thesis conducts a systematic literature review on ethical issues of large language models (LLM). These models are a very prudent topic, as both their presence and demand have skyrocketed since the release of ChatGPT - a free to use generative language model. The literature review of 116 studies, both conceptual and empirical, identifies 39 recurring ethical issues. The issues range from methodological to fundamental ones, for example Environmental impacts" and "Biased training data or outputs". These identified issues are analyzed based on the Ethics guidelines for trustworthy AI (Artificial Intelligence), released by the European Commission’s High-Level Expert Group on AI. The guidelines detail requirements that all trustworthy and ethical AI applications should adhere to, e.g., Human agency, Transparency, Accountability. All identified issues are mapped to these requirements, and the conclusion is that LLMs have significant challenges relating to each one. The findings indicate that the use LLMs comes with significant issues, both demonstrated and theorized. While some methods for mitigating these issues are identified, many still remain unanswered. One of these unanswered issues is the most identified one - inherent biases in LLMs. Since there is no universal understanding on biases, there is no way to make LLMs seem unbiased to everyone. This thesis collates the current talking points and issues identified with LLMs. It provides a comprehensive, but not exhaustive, list of these issues and shows that there is much discussion on the topic. The conclusion is that more discussion is required, but more vitally, even more (regulatory) action is needed along with it.
  • Zafar, Muhammad Zeeshan (2024)
    In higher education, student recruitment and marketing play a prominent role in the success of educational institutions, maintaining a robust student population and fostering diversity. Institutions compete for the attention of prospective students, and in this data-driven era, a strategic and data-driven approach is required to compete and make informed business decisions. The student recruitment and marketing team of the University of Helsinki possesses various data sources that require storage, transformation, and visualization to get insights from that data. This thesis aims to solve these problems by creating a cloud database using Azure SQL Database, building Extract, Transform, and Load (ETL) pipelines using Azure Data Factory, and developing dashboards in Power BI that allow the student recruitment and marketing team to transform and load their data into a database and visualize the data in Power BI that helps in making better strategic decisions and sharing the dashboards with stakeholders across the institution. The results establish the ability to use Azure services for data management. Results include interactive dashboards in Power BI consisting of various visualizations that meet the requirements of the student recruitment and marketing team by providing Key performance indicators (KPIs). This approach enabled data-driven decision-making for the student recruitment and marketing team.
  • Pukkila, Eero (2022)
    Etuuspohjaisen eläkejärjestelyn laskennan tavoitteena on selvittää eläkevakuutuksen ottajan säästö- ja eläkesuunnitelmien yhteensopivuus ottaen samalla huomioon sopimukseen kuuluvat turvat ja muut kulut. Vapaaehtoisiin eläkesopimuksiin tehtyjen lakimuutosten seurauksena tällainen laskenta on monimutkaistunut huomattavasti 2000-luvun aikana ja vanhoille järjestelmille luodut laskentamallit eivät aina suoriudu toivotulla nopeudella. Tämän tutkielman aiheena on Profit Software Oy:n Profit Life & Pension -vakuutustenhallintajärjestelmän optimointi edellä kuvatun laskennan osalta.
  • Auvo, Markus (2022)
    As everyday life becomes digital, more and more daily things are done online. In particular, the increased use of mobile devices has accelerated this development. People are increasingly leaving information online about themselves that can be used to identify a person. On 25 May 2018, the European Union’s General Data Protection Regulation, the GDPR, was repealed in the European Union, repealing the previous European Union Data Protection Directive. The GDPR sets out how personal information should be stored and who can process it. The thesis examined how the introduction of GDPR has affected the customer data storage solutions and IT processes of Finnish SMEs during 2018-2020. The companies were examined in three phases: before, during and after the introduction of the GDPR. The study looked at the number of data breaches in the EU and the penalties imposed for them, and compared the situation in Finland. In addition, Finnish SMEs were interviewed for the dissertation. The interview was conducted as a questionnaire interview with 15 companies. The thesis found that Finland did not stand out in any way among other EU countries in GDPR violations. The answers received as a result of the survey revealed that there has been a clear variation in the interpretation of the content of the GDPR in Finland, which has affected the measures taken by companies. Based on the survey, the measures have also been influenced by the organization and organizational culture. However, the reliability of the results is affected by the small sample size.
  • Karis, Peter (2020)
    This thesis presents a user study to evaluate the usability and effectiveness of a novel search interface as compared to a more traditional solution. InnovationMap is a novel search interface by Khalil Klouche, Tuukka Ruotsalo and Giulio Jacucci (University of Helsinki). It is a tool for aiding the user to perform ‘exploratory searching’; a type of search activity where the user is exploring an information space unknown to them and thus cannot form a specific search phrase to perform a traditional ‘lookup’ search as with the conventional search interfaces. In this user study InnovationMap is compared against TUHAT, a search portal that is currently in use at the University of Helsinki for searching for research works and research personnel from the university databases. The user evaluation is conducted as a qualitative within-subject study using volunteer users from the University of Helsinki. Each participant uses both systems in an alternating order over the course of two sessions. During the two sessions the volunteer user carries out information finding tasks defined in the experiment design, answers to a SUS (System Usability Scale) questionnaire and participates in a semi-structured interview. The answers from the assigned tasks are then evaluated and scored by field experts. The combined results from these methods are then used to formulate an educated assessment of the usability, effectiveness and future development potential of the InnovationMap search system.
  • Aula, Kasimir (2019)
    Air pollution is considered to be one of the biggest environmental risks to health, causing symptoms from headache to lung diseases, cardiovascular diseases and cancer. To improve awareness of pollutants, air quality needs to be measured more densely. Low-cost air quality sensors offer one solution to increase the number of air quality monitors. However, they suffer from low accuracy of measurements compared to professional-grade monitoring stations. This thesis applies machine learning techniques to calibrate the values of a low-cost air quality sensor against a reference monitoring station. The calibrated values are then compared to a reference station’s values to compute error after calibration. In the past, the evaluation phase has been carried out very lightly. A novel method of selecting data is presented in this thesis to ensure diverse conditions in training and evaluation data, that would yield a more realistic impression about the capabilities of a calibration model. To better understand the level of performance, selected calibration models were trained with data corresponding to different levels of air pollution and meteorological conditions. Regarding pollution level, using homogeneous training and evaluation data, the error of a calibration model was found to be even 85% lower than when using diverse training and evaluation pollution environment. Also, using diverse meteorological training data instead of more homogeneous data was shown to reduce the size of the error and provide stability on the behavior of calibration models.
  • Joswig, Niclas (2021)
    Simultaneous Localization and Mapping (SLAM) research is gaining a lot of traction as the available computational power and the demand for autonomous vehicles increases. A SLAM system solves the problem of localizing itself during movement (Visual Odometry) and, at the same time, creating a 3D map of its surroundings. Both tasks can be solved on the basis of expensive and spacious hardware like LiDaRs and IMUs, but in this subarea of visual SLAM research aims at replacing those costly sensors by, ultimately, inexpensive monocular cameras. In this work I applied the current state-of-the-art in end-to-end deep learning-based SLAM to a novel dataset comprising of images recorded from cameras mounted to an indoor crane from the Konecranes CXT family. One major aspect that is unique about our proposed dataset is the camera angle that resembles a classical bird’s-eye view towards the ground. This orientation change coming alongside with a novel scene structure has a large impact on the subtask of mapping the environment, which is in this work done through monocular depth prediction. Furthermore, I will assess which properties of the given industrial environments have the biggest impact on the system’s performance to identify possible future research opportunities for improvement. The main performance impairments I examined, that are characteristic for most types of industrial premise, are non-lambertian surfaces, occlusion and texture-sparse areas alongside the ground and walls.
  • Joswig, Niclas (2021)
    Simultaneous Localization and Mapping (SLAM) research is gaining a lot of traction as the available computational power and the demand for autonomous vehicles increases. A SLAM system solves the problem of localizing itself during movement (Visual Odometry) and, at the same time, creating a 3D map of its surroundings. Both tasks can be solved on the basis of expensive and spacious hardware like LiDaRs and IMUs, but in this subarea of visual SLAM research aims at replacing those costly sensors by, ultimately, inexpensive monocular cameras. In this work I applied the current state-of-the-art in end-to-end deep learning-based SLAM to a novel dataset comprising of images recorded from cameras mounted to an indoor crane from the Konecranes CXT family. One major aspect that is unique about our proposed dataset is the camera angle that resembles a classical bird’s-eye view towards the ground. This orientation change coming alongside with a novel scene structure has a large impact on the subtask of mapping the environment, which is in this work done through monocular depth prediction. Furthermore, I will assess which properties of the given industrial environments have the biggest impact on the system’s performance to identify possible future research opportunities for improvement. The main performance impairments I examined, that are characteristic for most types of industrial premise, are non-lambertian surfaces, occlusion and texture-sparse areas alongside the ground and walls
  • Nieminen, Jeremi (2023)
    This thesis examines the render speeds of WebViews in React Native applications. React Native is a popular cross-platform framework for developing mobile applications, and WebViews allow embedding web content within mobile applications. While WebViews offer the advantage of bringing readily available web content in applications, the cost of using this technology in terms of applications responsiveness is not well researched. The goal of this thesis is to evaluate this cost so that developers and stakeholders can make more informed decisions regarding the use of WebViews in React Native applications. A series of tests was performed using a React Native application that was developed for the purpose of this study. In these tests, we rendered WebViews and similarly appearing views that consist of React Native components, and measured their mean render times. Our analysis of these results revealed that using React Native components instead of WebViews offers significant benefits in terms of rendering performance on both, iOS and Android platforms. The use of WebViews in rendering user interfaces can bring a notable disadvantage in the matter of user experience, especially on Android devices. These findings suggest that rendering Native user interface components instead of WebViews should be preferred if we want to maximize user experience across different devices and platforms.
  • Kangas, Vilma (2020)
    Software testing is an important process when ensuring a program's quality. However, testing has not traditionally been a very substantial part of computer science education. Some attempts to integrate it into the curriculum has been made but best practices still prove to be an open question. This thesis discusses multiple attempts of teaching software testing during the years. It also introduces CrowdSorcerer, a system for gathering programming assignments with tests from students. It has been used in introductory programming courses in University of Helsinki. To study if the students benefit from creating assignments with CrowdSorcerer, we analysed the number of assignments and tests they created and if they correlate with their performance in a testing-related question in the course exam. We also gathered feedback from the students on their experiences from using CrowdSorcerer. Looking at the results, it seems that more research on how to teach testing would be beneficial. Improving CrowdSorcerer would also be a good idea.
  • Ahlfors, Dennis (2022)
    While the role of IT and computer science in the society is on the rise, interest in computer science education is also on the rise. Research covering study success and study paths is important for understanding both student needs and developing the educational programmes further. Using a data set covering student records from 2010 to 2020, this thesis aims to find key insights and base research in the topic of computer science study success and study paths in the University of Helsinki. Using novel visualizations and descriptive statistics this thesis builds a picture of the evolution of study paths and student success during a 10-year timeframe, providing much needed contextual information to be used as inspiration for future focused research into the phenomena discovered. The visualizations combined with statistical results show that certain student groups seem to have better study success and that there are differences in the study paths chosen by the student groups. It is also shown that the graduation rates from the Bachelor’s Programme in Computer Science are generally low, with some student groups showing higher than average graduation rates. Time from admission to graduation is longer than suggested and the sample study paths provided by the university are not generally followed, leading to the conclusion that the programme structure would need some assessment to better incorporate students with diverse academic backgrounds and differing personal study plans.
  • Heinonen, Ava (2020)
    The design of instructional material affects learning from it. Abstraction, or limiting details and presenting difficult concepts by linking them with familiar objects, can limit the burden to the working memory and make learning easier. The presence of visualizations and the level to which students can interact with them and modify them also referred to as engagement, can promote information processing. This thesis presents the results of a study using a 2x3 experimental design with abstraction level (high abstraction, low abstraction) and engagement level (no viewing, viewing, presenting) as the factors. The study consisted of two experiments with different topics: hash tables and multidimensional arrays. We analyzed the effect of these factors on instructional efficiency and learning gain, accounting for prior knowledge, and prior cognitive load. We observed that high abstraction conditions limited study cognitive load for all participants, but were particularly beneficial for participants with some prior knowledge on the topic they studied. We also observed that higher engagement levels benefit participants with no prior knowledge on the topic they studied, but not necessarily participants with some prior knowledge. Low cognitive load in the pre-test phase makes studying easier regardless of the instructional material, as does knowledge on the topic being studied. Our results indicate that the abstractions and engagement with learning materials need to be designed with the students and their knowledge levels in mind. However, further research is needed to assess the components in different abstraction levels that affect learning outcomes and why and how cognitive load in the pre-test phase affects cognitive load throughout studying and testing.
  • Diseth, Anastasia Chabounina (2024)
    Combinatorial optimization problems arise in many applications. Finding solutions that are as good as possible, ideally optimal, respect to given criteria is important. Additionally, many real-world combinatorial optimization problems are NP-hard. The so-called declarative approach to solving combinatorial optimization problems has proven to be successful in practice. In this work we focus on the the implicit hitting set-based (IHS) maximum satisfiability (MaxSAT) paradigm to solving combinatorial optimization problems declaratively. In the MaxSAT paradigm the problem at hand is formulated as a linear objective function to minimize subject to a set of constraints expressed in the language of propositional logic. In the IHS approach the problem is solved by alternating calls to two subroutines. An optimizer procedure computes optimal solutions over the variables in the objective function without the constraints available and a feasibility oracle verifies the solutions in terms of the constraints. In this work we study alternative divisions of constraints of a given problem formulation between the optimizer and the oracle. We allow the optimizer to compute solutions over any variables of the problem instance, thus extending the hitting set formulations of the IHS-based MaxSAT. We focus on two specific combinatorial optimization problems and existing MaxSAT encodings of these problems. The problems focus on are computing the treewidth of a graph and finding an optimal k-undercover Boolean matrix factorization. We have also extended a state-of-the-art IHS-based MaxSAT solver to support extended divisions of encodings and provide the implementation as open source.
  • Tilander, Vivianna (2023)
    Context: An abundance of research on the productivity of software development teams and developers exists identifying many factors and their effects in different contexts and concerning different aspects of productivity. Objective: This thesis aims to collect and analyse existing recent research results of factors that are related to or directly influence the productivity of teams or developers, how they influence it in different contexts and briefly summarise the metrics used in recent studies to measure productivity. Method: The method selected to reach for these aims was to conduct a systematic literature review on relevant studies published between 2017 and 2022. Altogether, 48 studies were selected and analysed during the review. Results: The metrics used by the reviewed studies for measuring productivity range from time used for completing a task to self-evaluated productivity to the amount of commits contributed. Some of these are used by multiple studies and many by only one or a few and measure productivity from different angles. Various factors were found and these range from team size to experienced emotion to working from home during the COVID-19 pandemic. The relationships found between these factors and some aspects of the productivity of developers and teams range from positive to negative and sometimes both depending on the context and the productivity metric in question. Conclusions: While many relationships were found between various factors and the productivity of software developers and development teams in this review, these do not cover all possible factors, relationships or measurable productivity aspects in all possible contexts. Additionally, one should keep in mind that most of the found relationships do not imply causality.
  • Kilpinen, Arttu (2022)
    The objective of the shortest common superstring problem is to find a string of minimum length that contains all keywords in the given input as substrings. Shortest common superstrings have many applications in the fields of data compression and bioinformatics. For example, a common superstring can be seen as a compressed form of the keywords it is generated from. Since the shortest common superstring problem is NP-hard, we focus on the approximation algorithms that implement a so-called greed heuristic. It turns out that the actual shortest common superstring is not always needed. Instead, it is often enough to find an approximate solution of sufficient quality. We provide an implementation of the Ukkonen's linear time algorithm for the greedy heuristic. The practical performance of this implementation is measured by comparing it to another implementation of the same heuristic. We also hypothesize that shortest common superstrings can be potentially used to improve the compression ratio of the Relative Lempel-Ziv data compression algorithm. This hypothesis is examined and shown to be valid.
  • Mann, Eshita (2023)
    Software engineers frequently deal with state machines and protocols while building telecommunications systems. Finite state machines have grown to become an essential tool for designing and implementing networks due to their ability to model complicated behaviour in a structured and efficient manner. They offer a framework for defining systems as a collection of states and transitions, enabling programmers to create software that can respond to a variety of situations and events. This thesis explores the use of finite state machines in network software, exploring their various applications, advantages, and limitations with a focus on cellular technologies and mobile communications. The study covers a wide range of state machine methods, including control structures, Unified Modelling Language, Specification and Description Language, state patterns, state machine frameworks, and code generators. The objectives of the research include a comprehensive review of existing state machine techniques, analysis of their relative merits as well as shortcomings, modelling and implementation of selected methods to evaluate their effectiveness, and identification of required features to meet network requirements. The thesis compares the Boost Meta State Machine against the TeleNokia Specification and Description Language for a case study followed by a feature-based comparison of quality attributes to evaluate their performance in areas such as system design, development, and evolution and maintenance. The results show that the Boost framework is better suited as a state machine implementation technique for most network software application scenarios. Finally, the thesis identifies potential directions for further research and technical approaches to address the issues discussed, highlighting emerging trends and technologies that are likely to shape the future of this important area of network architecture.