Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Tietojenkäsittelytieteen maisteriohjelma"

Sort by: Order: Results:

  • Heikkinen, Juuso (2021)
    Telecommunication companies are moving towards even more digitalized and agile ways of working. They are expanding their business in other fields, such as television, thus moving further away from the traditional telecommunications model. Recently, Telia has become the largest television company in the Nordics. One of the their main products in the field of television is channel packages, which allow customers to access specific television content. In this study, a benefit analysis for Telia Finland Oyj was conducted to inspect the benefits that test automation brings for the channel package testing process. 8 interviews in total were conducted with Telia employees with knowledge on channel packages. To receive both a business and a technical perspective, the interviewees were divided into two groups fitting their expertise. In general, test automation was seen as a useful tool. The main business related benefits of test automation mentioned were a faster and cheaper testing process, and a faster time-to-market. It was also seen that test automation could help achieve a more efficient testing process, and increase confidence in test automation. Based on the interview results, an epic was defined and analyzed according to the principles of Scaled Agile Framework (SAFe). This included describing the solution in detail and defining a Minimum Viable Product (MVP). By using example variables and generalized values, several calculations were made to present a framework on the costs of implementing the MVP and the estimated reduction of channel package testing costs. By utilizing the MVP as a part of the channel package testing process, the return on investment (ROI) was not as desirable as expected. With more automated tests compared to the number of test cases, combined with regular use of test automation, the investment would pay itself back and start generating additional savings faster. Based on the epic analysis, a Lean Business Case was defined.
  • Pulkka, Robert (2022)
    In recent years, the concept of Metaverse has become a popular buzzword in the media and different communities. In 2021, the company behind Facebook rebranded itself into Meta Platforms, inc. in order to match their new vision of developing the Metaverse. The Metaverse is becoming reality as intersecting technologies, including head-mounted virtual reality displays (HMDs) and non-fungible tokens (NFTs), have been developed. Different communities, such as media, researchers, consumers and companies have different perspectives on the Metaverse and its opportunities and problems. Metaverse technology has been researched thoroughly, while little to none research has been done on gray literature, i.e. non-scientific sources, to gain insight on the ongoing hype. The conducted research analyzed 44 sources in total, ranging from news articles to videos and forum discussions. The results show that people are seeing opportunities in Metaverse entrepreneurship in the changing career landscape. However, the visions of Meta Platforms, inc. also receive a fair amount of critique in the analyzed articles and threads. The results suggest that most of the consumers are only interested in a smaller subset of features than what is being marketed. The conducted research gives insight on how different sources are seeing the Metaverse and can therefore be used as a starting point for more comprehensive gray literature studies on the Metaverse. While making innovations to the underlying technology is important, studying people’s viewpoints is a requirement for the academia to understand the phenomenon and for the industry to produce a compelling product.
  • Ratilainen, Katja-Mari (2023)
    Context: The Bank of Finland, as the national monetary and central bank of Finland, possesses an extensive repository of data that fulfills both the statistical needs of international organizations and the federal requirements. Data scientists within the bank are increasingly interested in investing in machine learning (ML) capabilities to develop predictive models. MLOps offers a set of practices that ensure the reliable and efficient maintenance and deployment of ML models. Objective: In this thesis, we focus on addressing how to implement an ML pipeline within an existing environment. The case study is explorative in nature, with the primary objective of gaining deeper insight into MLOps tools and their practical implementation within the organization. Method: We apply the design science research methodology to divide design and development into six tasks: problem identification, objective definition, design and development, demonstration, evaluation, and communication. Results: We select the tools for the MLOps based on the user requirements and the existing environment, and then we design and develop a simplified end-to-end ML pipeline utilizing the chosen tools. Lastly, we conduct an evaluation to measure the alignment between the selected tools and the initial user requirements.
  • Leskinen, Juno (2022)
    The continuously evolving cyber threat landscape has become a major concern because sophisticated attacks against systems connected to the Internet have become frequent. The concern is on particular threats that are known as Advanced Persistent Threats (APT). The thesis aims to introduce what APTs are and illustrate other topics under the scope, such as tools and methods attackers can use. Attack models will also be explained, providing example models proposed in the literature. The thesis also introduces which kind of operational objectives attacks can have, and for each objective, one example attack is given that characterizes the objective. In addition, the thesis also uncovers various countermeasures, including most essential security solutions, complemented with more advanced methods. The last countermeasure that the thesis introduces is attribution analysis.
  • Jabs, Christoph (2022)
    Many real-world problem settings give rise to NP-hard combinatorial optimization problems. This results in a need for non-trivial algorithmic approaches for finding optimal solutions to such problems. Many such approaches—ranging from probabilistic and meta-heuristic algorithms to declarative programming—have been presented for optimization problems with a single objective. Less work has been done on approaches for optimization problems with multiple objectives. We present BiOptSat, an exact declarative approach for finding so-called Pareto-optimal solutions to bi-objective optimization problems. A bi-objective optimization problem arises for example when learning interpretable classifiers and the size, as well as the classification error of the classifier should be taken into account as objectives. Using propositional logic as a declarative programming language, we seek to extend the progress and success in maximum satisfiability (MaxSAT) solving to two objectives. BiOptSat can be viewed as an instantiation of the lexicographic method and makes use of a single SAT solver that is preserved throughout the entire search procedure. It allows for solving three tasks for bi-objective optimization: finding a single Pareto-optimal solution, finding one representative solution for each Pareto point, and enumerating all Pareto-optimal solutions. We provide an open-source implementation of five variants of BiOptSat, building on different algorithms proposed for MaxSAT. Additionally, we empirically evaluate these five variants, comparing their runtime performance to that of three key competing algorithmic approaches. The empirical comparison in the contexts of learning interpretable decision rules and bi-objective set covering shows practical benefits of our approach. Furthermore, for the best-performing variant of BiOptSat, we study the effects of proposed refinements to determine their effectiveness.
  • Avikainen, Jari (2019)
    This thesis presents a wavelet-based method for detecting moments of fast change in the textual contents of historical newspapers. The method works by generating time series of the relative frequencies of different words in the newspaper contents over time, and calculating their wavelet transforms. Wavelet transform is essentially a group of transformations describing the changes happening in the original time series at different time scales, and can therefore be used to pinpoint moments of fast change in the data. The produced wavelet transforms are then used to detect fast changes in word frequencies by examining products of multiple scales of the transform. The properties of the wavelet transform and the related multi-scale product are evaluated in relation to detecting various kinds of steps and spikes in different noise environments. The suitability of the method for analysing historical newspaper archives is examined using an example corpus consisting of 487 issues of Uusi Suometar from 1869–1918 and 250 issues of Wiipuri from 1893–1918. Two problematic features in the newspaper data, noise caused by OCR (optical character recognition) errors and uneven temporal distribution of the data, are identified and their effects on the results of the presented method are evaluated using synthetic data. Finally, the method is tested using the example corpus, and the results are examined briefly. The method is found to be adversely affected especially by the uneven temporal distribution of the newspaper data. Without additional processing, or improving the quality of the examined data, a significant amount of the detected steps are due to the noise in the data. Various ways of alleviating the effect are proposed, among other suggested improvements on the system.
  • Lehtonen, Leevi (2021)
    Quantum computing has an enormous potential in machine learning, where problems can quickly scale to be intractable for classical computation. A Boltzmann machine is a well-known energy-based graphical model suitable for various machine learning tasks. Plenty of work has already been conducted for realizing Boltzmann machines in quantum computing, all of which have somewhat different characteristics. In this thesis, we conduct a survey of the state-of-the-art in quantum Boltzmann machines and their training approaches. Primarily, we examine variational quantum Boltzmann machine, a specific variant of quantum Boltzmann machine suitable for the near-term quantum hardware. Moreover, as variational quantum Boltzmann machine heavily relies on variational quantum imaginary time evolution, we effectively analyze variational quantum imaginary time evolution to a great extent. Compared to the previous work, we evaluate the execution of variational quantum imaginary time evolution with a more comprehensive collection of hyperparameters. Furthermore, we train variational quantum Boltzmann machines using a toy problem of bars and stripes, representing more multimodal probability distribution than the Bell states and the Greenberger-Horne-Zeilinger states considered in the earlier studies.
  • Kulbitski, Mikita (2023)
    Nowadays power consumption is a hot and actual domain. Efficient energy consumption allows you to use resources from the environment wisely and, moreover, switch to alternative energy sources where it is possible. This thesis is aimed at analyzing elevator’s power consumption data (average per every 5 minutes and 1 hour). The data has been gathered for several years, so it is a time series. This thesis includes review of time series models, which then can be used for the consequent analysis. Main directions are forecasting power consumption, capturing trends and anomalies. In addition, time series data may also be used for calculating average power consumption for each elevator inside the elevator group. As an outcome, spread of the power consumption across 4 elevators inside the elevator group may be seen. One of the thesis’ goals is to check whether it is even or not.
  • Meriläinen, Roosa (2020)
    In the world of constantly growing data masses the efficient extraction, saving and accessing that data for business intelligence and analytics has become increasingly important to businesses. Analytics and business intelligence software is offered by many providers in the market for all sizes of organizations and there are multiple ways to build an analytics system, or pipeline from scratch or integrated with tools available on the market. In this case study we explore and re-design the analytics pipeline solution of a medium sized software product company by utilizing the design science research methodology. We discuss the current technologies and tools on the market for business intelligence and analytics and consider how they fit into our case study context. As design science suggests, we design, implement and evaluate two prototypes of an analyt- ics pipeline with an Extract, Transform and Load (ETL) solution and data warehouse. The prototypes represent two different approaches to building an analytics pipeline - an in-house approach, and a partially outsourced approach. Our study brings out typical challenges similar businesses may face when designing and building their own business intelligence and analytics software. In our case we lean towards an analytics pipeline with an outsourced ETL process to be able to pass various different types of event data with a consistent data schema into our data warehouse with minimal maintenance work. However, we also show the value of near real time analytics with an in-house solution, and offer some ideas on how such a pipeline may be built.
  • Xue, Jiayue (2021)
    The semantic shifts in natural language is a well established phenomenon and have been studied for many years. Similarly, the meanings of scientific publications may also change as time goes by. In other words, the same publication may be cited in distinct contexts. To investigate whether the meanings of citations have changed in different scenarios, which is also called in the semantic shifts in citations, we followed the same ideas of how researchers studied semantic shifts in language. To be more specific, we combined the temporal referencing model and the Word2Vec model to explore the semantic shifts of scientific citations in two aspects: their usages over time and their usages across different domains. By observing how citations themselves changed over time and comparing the closest neighbors of citations, we concluded that the semantics of scientific publications did shift in terms of cosine distances.
  • Wargelin, Matias (2021)
    Musical pattern discovery refers to the automated discovery of important repeated patterns, such as melodies and themes, from music data. Several algorithms have been developed to solve this problem, but evaluating the algorithms has been difficult without proper visualisations of the output of the algorithms. To address this issue a web application named Mupadie was built. Mupadie accepts MIDI music files as input and visualises the outputs of musical pattern discovery algorithms, with implementations of SIATEC and TTWIA built in the application. Other algorithms can be visualised if the algorithm output is uploaded to Mupadie as a JSON file that follows a specified data structure. Using Mupadie, an evaluation of SIATEC and TTWIA was conducted. Mupadie was found to be a useful tool in the qualitative evaluation of these musical pattern discovery algorithms; it helped reveal systematically recurring issues with the discovered patterns, some previously known and some previously undocumented. The findings were then used to suggest improvements to the algorithms.
  • Yumo, Luo (2023)
    Machine Learning Operations (MLOps), derived from DevOps, aims to unify the development, deployment, and maintenance of machine learning (ML) models. Continuous training (CT) automatically retrains ML models, and continuous deployment (CD) automatically deploys the retrained models to production. Therefore, they are essential for maintaining ML model performance in dynamic production environments. The existing proprietary solutions suffer from drawbacks such as a lack of transparency and potential vendor lock-in. Additionally, current MLOps pipelines built using open-source tools still lack flexible CT and CD for ML models. This study proposes a cloud-agnostic and open-source MLOps pipeline that enables users to retrain and redeploy their ML models flexibly. We applied the Design Science methodology, consisting of identifying the problem, defining the solution objectives, and implementing, demonstrating, and evaluating the solution. The resulting solution is an MLOps pipeline called CTCD-e MLOps pipeline. We formed a conceptual model of the needed functionalities of our MLOps pipeline and implemented the pipeline using only open-source tools. The CTCD-e MLOps pipeline runs atop Kubernetes. It can autonomously adapt ML models to dynamic production data by automatically starting retraining ML models when their performance degrades. It can also automatically A/B test the performance of the retrained models in production and fully deploys them only when they outperform their predecessors. Our demonstration and evaluation of the CTCD-e MLOps pipeline show that it is cloud-agnostic and can also be installed in on-premises environments. Additionally, the CTCD-e MLOps pipeline enables its users to flexibly configure model retraining and redeployment as well as production A/B test of the retrained models based on various requirements.
  • Ollila, Risto (2021)
    The Web has become the world's most important application distribution platform, with web pages increasingly containing not static documents, but dynamic, script-driven content. Script-based rendering relies on imperative browser APIs which become unwieldy to use as an application's complexity grows. An increasingly common solution is to use libraries and frameworks which provide an abstraction over rendering and enable a less error-prone declarative programming model. The details of how web frontend frameworks implement rendering vary widely and can potentially have significant consequences for application performance. Frameworks' rendering strategies are typically invisible to the application developer, and may consequently be poorly understood despite their potential impact. In this thesis, we review rendering strategies used in a number of influential and popular web frontend frameworks. By studying their implementation details, we discover ways to categorize and estimate rendering strategies' performance based on input sizes in update loops. To verify and measure the effects of these differences, we implement a number of benchmarks that measure different aspects of rendering. In our benchmarks, we discover significant performance differences ranging up to an order of magnitude under some conditions. Additionally, we confirm that categorizing rendering strategies based on input sizes of update loops is an effective way to estimate their relative performance. The best performing rendering strategies are found to be ones which minimize input sizes in update loops using techniques such as compile-time optimization and reactive programming models.
  • Törnroos, Topi (2021)
    Application Performance Management (APM) is a growing field, and APM tools on the market tend to be complex enterprise solutions with features ranging from traffic analysis and error reporting to real- user monitoring and business transaction management. This thesis is a study done on behalf of Veikkaus Oy, a Finnish government-owned game company and betting agency. It serves as a look into the current state-of-the-art field of leading APM tools as well as a requirements analysis done from the perspective of the company’s IT personnel. A list of requirements was gathered and scored based on perceived importance, and four APM tools on the market—Datadog APM, Dynatrace, New Relic and AppDynamics—were each compared to each other and scored based on the gathered requirements. In addition, open-source alternatives were considered and investigated. Our results suggest that the leading APM vendors have products very similar to each other with marginal differences between them, feature-wise. In general, APMs were deemed useful and valuable to the company, able to assist in the work of a wide variety of IT personnel, as well as able to replace many tools currently in use by Veikkaus Oy and simplify their application ecosystem.
  • Stepanenko, Artem (2022)
    The rapid progress of communication technologies combined with the growing competition for talents and knowledge has made it necessary to reassess the potential of distributed development which has significantly changed the landscape of the IT industry introducing a variety of cooperation models and making notable changes to the software team work environment. Along with this, enterprises pay more attention to teams’ performance improvement, employing emerging management tools for building up efficient software teams, and trying to get the most out of understanding factors which significantly impact a team’s overall performance. The objective of the research is to systematize factors characterizing high-performing software teams; indicate the benefits of global software development (GSD) models positively influencing software teams’ development performance; and study how companies’ strategies can benefit from distributed development approaches in building high-performing software teams. The thesis is designed as a combination of a systematic literature review followed by qualitative research in the form of semi-structured interviews to validate the findings regarding classification of GSD models’ benefits and their influence on the development of high-performing software teams. At a literature review stage, the research (1) introduces a team performance factors’ model reflecting the aspects which impact the effectiveness of development teams; (2) suggests the classification of GSD models based on organizational, legal, and temporal characteristics, and (3) describes the benefits of GSD models which influence the performance of software development teams. Within the empirical part of the study, we refine the classification of GSD models’ benefits based on the qualitative analysis results of semi-structured interviews with practitioners from IT industry, form a comparison table of GSD benefits depending on the model in question, and introduce recommendations for company and team management regarding the application of GSD in building high-performing software teams. IT corporations, to achieve their strategic goals, can enrich their range of available tools for managing high-performing teams by considering the peculiarities of different GSD models. Company and team management should evaluate the advantages of the distributed operational models, and use the potential and benefits of available configurations to increase teams’ performance and build high-performing software teams.
  • Harviainen, Juha (2021)
    Computing the permanent of a matrix is a famous #P-hard problem with a wide range of applications. The fastest known exact algorithms for the problem require an exponential number of operations, and all known fully polynomial randomized approximation schemes are rather complicated to implement and have impractical time complexities. The most promising recent advancements on approximating the permanent are based on rejection sampling and upper bounds for the permanent. In this thesis, we improve the current state of the art by developing the deep rejection sampling method, which combines an exact algorithm with the rejection sampling method. The algorithm precomputes a dynamic programming table that tightens the initial upper bound used by the rejection sampling method. In a sense, the table is used to jump-start the sampling process. We give a high probability upper bound for the time complexity of the deep rejection sampling method for random (0, 1)-matrices in which each entry is 1 with probability p. For matrices with p < 1/5, our high probability bound is stronger than in previous work. In addition to that, we empirically observe that our algorithm outperforms earlier rejection sampling methods by testing it with different parameters against other algorithms on multiple classes of matrices. The improvements in sampling times are especially notable in cases in which the ratios of the permanental upper bounds and the exact value of the permanent are huge.
  • Kahilakoski, Marko (2022)
    Various Denial of Service (DoS) attacks are common phenomena in the Internet. They can consume resources of servers, congest networks, disrupt services, or even halt systems. There are many machine learning approaches that attempt to detect and prevent attacks on multiple levels of abstraction. This thesis examines and reports different aspects of creating and using a dataset for machine learning purposes to detect attacks in a web server environment. We describe the problem field, origins and reasons behind the attacks, typical characteristics, and various types of attacks. We detail ways to mitigate the attacks and provide a review of current benchmark datasets. For the dataset used in this thesis, network traffic was captured in a real-world setting, and flow records were labeled. Experiments performed include selecting important features, comparing two supervised learning algorithms, and observing how a classifier model trained on network traffic on a specific date performs in detecting new malicious records over time in the same environment. The model was also tested with a recent benchmark dataset.
  • Virtanen, Lasse (2023)
    The multi-armed bandit is a sequential decision making problem where an agent chooses actions and receives rewards. The agent faces an explore-exploit dilemma: it has to balance exploring its options to find the optimal actions, and exploiting choosing the empirically best actions. This problem can also be solved by multiple agents who collaborate in a federated learning setting, where agents do not share their raw data samples. Instead, small updates containing learned parameters are shared. In this setting, the learning process can happen with a central server that coordinates the agents to learn the global model, or in a fully decentralized fashion where agents communicate with each other to collaborate. The distribution of rewards may be heterogeneous, meaning that the agents face distributions with local biases. Depending on the context, this can be handled by cancelling the biases by averaging, or by personalizing the global model to fit each individual agent’s local biases. Another common characteristic of federated multi-armed bandits is preserving privacy. Even though only parameter updates are shared, they can be used to infer the original data. To privatize the data, a method known as differential privacy is applied by adding enough random noise to mask the effect of a single contribution. The newest area of interest for federated multi-armed bandits is security. Collaboration between multiple agents means more opportunities for attacks. Achieving robust security means defending against Byzantine attacks that inject arbitrary data into the learning process to affect the model accuracy in an undesirable way. This thesis is a literature review that explores how the federated multi-armed bandit problem is solved, and how privacy and security for it is achieved.
  • Nygren, Henrik (2024)
    The MOOC Center of University of Helsinki maintains a learning management system, primarily used in the online courses offered by the Department of Computer Science. The learning management system is being used in more courses, leading to a need for additional exercise types. In order to satisfy this need, we plan to use additional teams of developers to create these exercise types. However, we would like to minimize any negative effects that the new exercise types may have on the overall system, specifically regarding stability and security. In this work, we propose a plugin system for creating new exercise types, and implement it to production system used by real students. The system's plugins are deployed as separate services and use sandboxed IFrames for their user interfaces. Communication with the plugins occurs through the use of HTTP requests and message passing. The designed plugin system fulfilled its aims and worked in its production deployment. Notably, it was concluded that it is challenging for plugins to disrupt the host system. This plugin system serves as an example that it is possible to create a plugin system where the plugins are isolated from the host system.
  • Liu, Yang (2020)
    Automatic readability assessment is considered as a challenging task in NLP due to its high degree of subjectivity. The majority prior work in assessing readability has focused on identifying the level of education necessary for comprehension without the consideration of text quality, i.e., how naturally the text flows from the perspective of a native speaker. Therefore, in this thesis, we aim to use language models, trained on well-written prose, to measure not only text readability in terms of comprehension but text quality. In this thesis, we developed two word-level metrics based on the concordance of article text with predictions made using language models to assess text readability and quality. We evaluate both metrics on a set of corpora used for readability assessment or automated essay scoring (AES) by measuring the correlation between scores assigned by our metrics and human raters. According to the experimental results, our metrics are strongly correlated with text quality, which achieve 0.4-0.6 correlations on 7 out of 9 datasets. We demonstrate that GPT-2 surpasses other language models, including the bigram model, LSTM, and bidirectional LSTM, on the task of estimating text quality in a zero-shot setting, and GPT-2 perplexity-based measure is a reasonable indicator for text quality evaluation.