Browsing by master's degree program "Tietojenkäsittelytieteen maisteriohjelma"
Now showing items 1-20 of 251
-
(2021)Telecommunication companies are moving towards even more digitalized and agile ways of working. They are expanding their business in other fields, such as television, thus moving further away from the traditional telecommunications model. Recently, Telia has become the largest television company in the Nordics. One of the their main products in the field of television is channel packages, which allow customers to access specific television content. In this study, a benefit analysis for Telia Finland Oyj was conducted to inspect the benefits that test automation brings for the channel package testing process. 8 interviews in total were conducted with Telia employees with knowledge on channel packages. To receive both a business and a technical perspective, the interviewees were divided into two groups fitting their expertise. In general, test automation was seen as a useful tool. The main business related benefits of test automation mentioned were a faster and cheaper testing process, and a faster time-to-market. It was also seen that test automation could help achieve a more efficient testing process, and increase confidence in test automation. Based on the interview results, an epic was defined and analyzed according to the principles of Scaled Agile Framework (SAFe). This included describing the solution in detail and defining a Minimum Viable Product (MVP). By using example variables and generalized values, several calculations were made to present a framework on the costs of implementing the MVP and the estimated reduction of channel package testing costs. By utilizing the MVP as a part of the channel package testing process, the return on investment (ROI) was not as desirable as expected. With more automated tests compared to the number of test cases, combined with regular use of test automation, the investment would pay itself back and start generating additional savings faster. Based on the epic analysis, a Lean Business Case was defined.
-
(2024)Message-oriented middleware (MOM) serves as the intermediary component between the nodes of a distributed system, facilitating their communication and data exchange. By decoupling the interconnected nodes of a system, MOM technologies enable scalable and fault-tolerant messaging, supporting real-time data streams, event-driven architectures and microservices communication. Given the increasing reliance on distributed computing and data-intensive applications, understanding the performance and operational characteristics of MOM technologies is paramount. This master's thesis investigates the comparative performance and operational aspects of two prominent MOM solutions, Apache Kafka and Apache Pulsar, through a systematic literature review (SLR). The key characteristics under inspection are throughput, latency, resource utilization, fault tolerance, security and operational complexity. This study offers a comprehensive analysis to aid informed decision-making in real-world deployment scenarios and augments the existing body of literature. The results of this SLR show that consensus on throughput and latency superiority between Kafka and Pulsar remains elusive. Pulsar demonstrates advantages in resource utilization and security, whereas Kafka stands out for its maturity and operational simplicity.
-
(2024)Monolithic and microservice architectures represent two different approaches to building and organizing software systems. Monolithic architecture offers various advantages, such as simplicity in application deployment, smaller resource requirements, and lower latency. On the other hand, microservice architecture provides benefits in aspects including scalability, reliability, and availability. However, the advantages of each architecture may depend on various sectors especially when it comes to application performance and resource consumption. This thesis aims to provide insights into the differences in application performance and resource consumption between the two architectures by conducting a systematic literature review on the existing literature and research results in this regard and performing a benchmarking with various load tests on two applications of identical functionalities but using the two above mentioned different architectures. Results from the load tests revealed the applications in both software architectures delivered satisfactory outcomes. However, the test outputs indicated the microservice system outperformed by a high margin in nearly all test cases in aspects including throughput, efficiency, stability, scalability, and resource effectiveness. Based on the research outcomes from the reviewed literature, in general, monolithic design is more efficient and cost-effective for simple applications with small user loads. While microservice architecture is more advantageous for large and complex applications targeting high traffic and deployment in cloud environments. Nevertheless, the overall research results indicated both architectures have strengths and drawbacks in different aspects. Both architectures are used in many successful instances of applications. The differences between the two architectures in application performance and resource effectiveness depend on various factors, including application scale and complexity, traffic load, resource availability, and deployment environments.
-
(2022)In recent years, the concept of Metaverse has become a popular buzzword in the media and different communities. In 2021, the company behind Facebook rebranded itself into Meta Platforms, inc. in order to match their new vision of developing the Metaverse. The Metaverse is becoming reality as intersecting technologies, including head-mounted virtual reality displays (HMDs) and non-fungible tokens (NFTs), have been developed. Different communities, such as media, researchers, consumers and companies have different perspectives on the Metaverse and its opportunities and problems. Metaverse technology has been researched thoroughly, while little to none research has been done on gray literature, i.e. non-scientific sources, to gain insight on the ongoing hype. The conducted research analyzed 44 sources in total, ranging from news articles to videos and forum discussions. The results show that people are seeing opportunities in Metaverse entrepreneurship in the changing career landscape. However, the visions of Meta Platforms, inc. also receive a fair amount of critique in the analyzed articles and threads. The results suggest that most of the consumers are only interested in a smaller subset of features than what is being marketed. The conducted research gives insight on how different sources are seeing the Metaverse and can therefore be used as a starting point for more comprehensive gray literature studies on the Metaverse. While making innovations to the underlying technology is important, studying people’s viewpoints is a requirement for the academia to understand the phenomenon and for the industry to produce a compelling product.
-
(2024)Indexes are data structures that are used for retrieving records from a database. They are used in database management systems (DBMS) to optimize queries. The abundance of available data has motivated the research of indexes to provide faster query times and to have smaller memory usage. With the advantages of machine learning, new variations of indexes have been created. These indexes utilize the data distribution for faster query times and smaller footprints in comparison to traditional indexes like B-tree and B+ tree. These indexes are known as the learned indexes. In this thesis, we study the effect the distribution change between normal distributions has on one-dimensional learned indexes and compare that to the B+ tree. We conduct an experiment where we simulate the distribution change and measure the insertion and query times. In this experiment, we include three learned indexes, which are ALEX, PGM, and LIPP.
-
(2023)Context: The Bank of Finland, as the national monetary and central bank of Finland, possesses an extensive repository of data that fulfills both the statistical needs of international organizations and the federal requirements. Data scientists within the bank are increasingly interested in investing in machine learning (ML) capabilities to develop predictive models. MLOps offers a set of practices that ensure the reliable and efficient maintenance and deployment of ML models. Objective: In this thesis, we focus on addressing how to implement an ML pipeline within an existing environment. The case study is explorative in nature, with the primary objective of gaining deeper insight into MLOps tools and their practical implementation within the organization. Method: We apply the design science research methodology to divide design and development into six tasks: problem identification, objective definition, design and development, demonstration, evaluation, and communication. Results: We select the tools for the MLOps based on the user requirements and the existing environment, and then we design and develop a simplified end-to-end ML pipeline utilizing the chosen tools. Lastly, we conduct an evaluation to measure the alignment between the selected tools and the initial user requirements.
-
(2024)Agile software development and DevOps are both well studied methodologies in the field of computer science. Agile software development is an iterative development approach that focuses on collaboration, customer feedback and fast deliveries. DevOps on the other hand highlights the co-operation between the developers and IT operations personnel, in addition to describing how to continuously deploy working software with usage of tools and automation. Even though these two methodologies share similarities and DevOps as a concept can even be seen as a descendant of agile software development, the relationship between the two is not yet as explored as the effects of individual practices. In this thesis, a systematic literature review is conducted to examine the relationship between agile software development and DevOps. The aim was to find benefits and drawbacks of the combined implementation agile software development and DevOps in the field of software development, the key similarities and differences between the two and how the adoption of one methodology influences the implementation of the other. A systematic literature review was conducted to find information on how agile software development and DevOps are related and perform in combination. Results showed that agile software development and DevOps share a complex yet symbiotic relationship. The complementary role of each methodology enhances each other and in unison these methodologies address wider variety of aspects in software development lifecycle. This combination shows a wide array of promising benefits such as improvements in productivity, delivery speed and collaboration. It however presents challenges related to required culture shift and lack of knowledge, for example, that organizations need to be wary of and acknowledge.
-
(2022)The continuously evolving cyber threat landscape has become a major concern because sophisticated attacks against systems connected to the Internet have become frequent. The concern is on particular threats that are known as Advanced Persistent Threats (APT). The thesis aims to introduce what APTs are and illustrate other topics under the scope, such as tools and methods attackers can use. Attack models will also be explained, providing example models proposed in the literature. The thesis also introduces which kind of operational objectives attacks can have, and for each objective, one example attack is given that characterizes the objective. In addition, the thesis also uncovers various countermeasures, including most essential security solutions, complemented with more advanced methods. The last countermeasure that the thesis introduces is attribution analysis.
-
(2024)AI is becoming more and more common in everyday life, and thus, setting guidelines to help create ethical AI is critical. To be able to set guidelines it is necessary to understand what is thought of as ethical AI. To tackle this issue, this study attempts to answer the following questions: what ethical values are thought of as the most important ones for artificial intelligence, are there differences between personal ethical values and ethical values for artificial intelligence, and does culture influence personal ethical values or ethical values chosen for artificial intelligence? The study uses data from the open online course Ethics of AI, where students study different ethical aspects of AI. From this course two exercises were chosen to be studied. In the first exercise students had to pick five of the most important ethical values out of 21. In the second exercise, students had to rate 18 ethical values according to how important they are for AI. As the course is arranged in Finnish and English, it gave the opportunity to compare the results between them and to create a third dataset from the English dataset after the Finnish version of the course was launched in late 2021. The English dataset contained 2650 students, the Finnish dataset 488 students, and the English 2022 onwards 1159 students. This data was studied from the different language datasets grouped up and individually. First the grouped data was studied to learn which were the most popular personal ethical values and most popular ethical values for AI. After the same analyzes were done to the individual datasets to learn what were the results for them and if they had different results. An exploratory factor analysis (EFA) was performed to find factors between each ethical value for artificial intelligence, and this was continued by a K-Means cluster analysis to classify different variations of ethical values students gave for AI. The results indicate that personal ethical values are shifted to more safe, fair and societal ethical values when considering what are important for AI. This reflects that the students found these safe, fair and societal values the most important for AI. While some differences were found between the ethical values students prioritized between the course iterations, each course iteration had similar ethical values as the driving force in each dataset.
-
(2022)Many real-world problem settings give rise to NP-hard combinatorial optimization problems. This results in a need for non-trivial algorithmic approaches for finding optimal solutions to such problems. Many such approaches—ranging from probabilistic and meta-heuristic algorithms to declarative programming—have been presented for optimization problems with a single objective. Less work has been done on approaches for optimization problems with multiple objectives. We present BiOptSat, an exact declarative approach for finding so-called Pareto-optimal solutions to bi-objective optimization problems. A bi-objective optimization problem arises for example when learning interpretable classifiers and the size, as well as the classification error of the classifier should be taken into account as objectives. Using propositional logic as a declarative programming language, we seek to extend the progress and success in maximum satisfiability (MaxSAT) solving to two objectives. BiOptSat can be viewed as an instantiation of the lexicographic method and makes use of a single SAT solver that is preserved throughout the entire search procedure. It allows for solving three tasks for bi-objective optimization: finding a single Pareto-optimal solution, finding one representative solution for each Pareto point, and enumerating all Pareto-optimal solutions. We provide an open-source implementation of five variants of BiOptSat, building on different algorithms proposed for MaxSAT. Additionally, we empirically evaluate these five variants, comparing their runtime performance to that of three key competing algorithmic approaches. The empirical comparison in the contexts of learning interpretable decision rules and bi-objective set covering shows practical benefits of our approach. Furthermore, for the best-performing variant of BiOptSat, we study the effects of proposed refinements to determine their effectiveness.
-
(2019)This thesis presents a wavelet-based method for detecting moments of fast change in the textual contents of historical newspapers. The method works by generating time series of the relative frequencies of different words in the newspaper contents over time, and calculating their wavelet transforms. Wavelet transform is essentially a group of transformations describing the changes happening in the original time series at different time scales, and can therefore be used to pinpoint moments of fast change in the data. The produced wavelet transforms are then used to detect fast changes in word frequencies by examining products of multiple scales of the transform. The properties of the wavelet transform and the related multi-scale product are evaluated in relation to detecting various kinds of steps and spikes in different noise environments. The suitability of the method for analysing historical newspaper archives is examined using an example corpus consisting of 487 issues of Uusi Suometar from 1869–1918 and 250 issues of Wiipuri from 1893–1918. Two problematic features in the newspaper data, noise caused by OCR (optical character recognition) errors and uneven temporal distribution of the data, are identified and their effects on the results of the presented method are evaluated using synthetic data. Finally, the method is tested using the example corpus, and the results are examined briefly. The method is found to be adversely affected especially by the uneven temporal distribution of the newspaper data. Without additional processing, or improving the quality of the examined data, a significant amount of the detected steps are due to the noise in the data. Various ways of alleviating the effect are proposed, among other suggested improvements on the system.
-
(2021)Quantum computing has an enormous potential in machine learning, where problems can quickly scale to be intractable for classical computation. A Boltzmann machine is a well-known energy-based graphical model suitable for various machine learning tasks. Plenty of work has already been conducted for realizing Boltzmann machines in quantum computing, all of which have somewhat different characteristics. In this thesis, we conduct a survey of the state-of-the-art in quantum Boltzmann machines and their training approaches. Primarily, we examine variational quantum Boltzmann machine, a specific variant of quantum Boltzmann machine suitable for the near-term quantum hardware. Moreover, as variational quantum Boltzmann machine heavily relies on variational quantum imaginary time evolution, we effectively analyze variational quantum imaginary time evolution to a great extent. Compared to the previous work, we evaluate the execution of variational quantum imaginary time evolution with a more comprehensive collection of hyperparameters. Furthermore, we train variational quantum Boltzmann machines using a toy problem of bars and stripes, representing more multimodal probability distribution than the Bell states and the Greenberger-Horne-Zeilinger states considered in the earlier studies.
-
(2023)Nowadays power consumption is a hot and actual domain. Efficient energy consumption allows you to use resources from the environment wisely and, moreover, switch to alternative energy sources where it is possible. This thesis is aimed at analyzing elevator’s power consumption data (average per every 5 minutes and 1 hour). The data has been gathered for several years, so it is a time series. This thesis includes review of time series models, which then can be used for the consequent analysis. Main directions are forecasting power consumption, capturing trends and anomalies. In addition, time series data may also be used for calculating average power consumption for each elevator inside the elevator group. As an outcome, spread of the power consumption across 4 elevators inside the elevator group may be seen. One of the thesis’ goals is to check whether it is even or not.
-
(2020)In the world of constantly growing data masses the efficient extraction, saving and accessing that data for business intelligence and analytics has become increasingly important to businesses. Analytics and business intelligence software is offered by many providers in the market for all sizes of organizations and there are multiple ways to build an analytics system, or pipeline from scratch or integrated with tools available on the market. In this case study we explore and re-design the analytics pipeline solution of a medium sized software product company by utilizing the design science research methodology. We discuss the current technologies and tools on the market for business intelligence and analytics and consider how they fit into our case study context. As design science suggests, we design, implement and evaluate two prototypes of an analyt- ics pipeline with an Extract, Transform and Load (ETL) solution and data warehouse. The prototypes represent two different approaches to building an analytics pipeline - an in-house approach, and a partially outsourced approach. Our study brings out typical challenges similar businesses may face when designing and building their own business intelligence and analytics software. In our case we lean towards an analytics pipeline with an outsourced ETL process to be able to pass various different types of event data with a consistent data schema into our data warehouse with minimal maintenance work. However, we also show the value of near real time analytics with an in-house solution, and offer some ideas on how such a pipeline may be built.
-
(2021)The semantic shifts in natural language is a well established phenomenon and have been studied for many years. Similarly, the meanings of scientific publications may also change as time goes by. In other words, the same publication may be cited in distinct contexts. To investigate whether the meanings of citations have changed in different scenarios, which is also called in the semantic shifts in citations, we followed the same ideas of how researchers studied semantic shifts in language. To be more specific, we combined the temporal referencing model and the Word2Vec model to explore the semantic shifts of scientific citations in two aspects: their usages over time and their usages across different domains. By observing how citations themselves changed over time and comparing the closest neighbors of citations, we concluded that the semantics of scientific publications did shift in terms of cosine distances.
-
(2021)Musical pattern discovery refers to the automated discovery of important repeated patterns, such as melodies and themes, from music data. Several algorithms have been developed to solve this problem, but evaluating the algorithms has been difficult without proper visualisations of the output of the algorithms. To address this issue a web application named Mupadie was built. Mupadie accepts MIDI music files as input and visualises the outputs of musical pattern discovery algorithms, with implementations of SIATEC and TTWIA built in the application. Other algorithms can be visualised if the algorithm output is uploaded to Mupadie as a JSON file that follows a specified data structure. Using Mupadie, an evaluation of SIATEC and TTWIA was conducted. Mupadie was found to be a useful tool in the qualitative evaluation of these musical pattern discovery algorithms; it helped reveal systematically recurring issues with the discovered patterns, some previously known and some previously undocumented. The findings were then used to suggest improvements to the algorithms.
-
(2024)Audio signals offer invaluable insights into system operational conditions and potential malfunctions. Proactive fault detection in machinery and other infrastructures through audio monitoring provides significant advantages in numerous sectors, such as industrial maintenance, healthcare, and urban security. Localizing anomalies within the spectral content of audio data opens possibilities to not only diagnose but also effectively address the underlying issues. This thesis addresses the challenge of comprehensively capturing the full context of anomalies detected within audio data. To achieve this, we have developed a novel unsupervised method that adapts visual anomaly localization techniques specifically for the analysis of audio data. This approach utilizes visual representations of audio signals, particularly spectrograms, to apply the Student-Teacher Feature Pyramid Matching Method (STFPM) within an unsupervised learning framework. By harnessing the inherent visual patterns in audio data, our method enables precise localization of anomalies. By augmenting the MIMII dataset with synthetic anomalies and conducting extensive testing, we validated our approach’s ability to localize anomalies in audio data. The findings confirm that our model not only detects but also precisely pinpoints the location of these artificially introduced anomalies within audio spectrograms in terms of both time and frequency. This demonstrates the precision and reliability of our approach, highlighting its potential as a promising solution for accurately localizing anomalies in various audio applications.
-
(2023)Machine Learning Operations (MLOps), derived from DevOps, aims to unify the development, deployment, and maintenance of machine learning (ML) models. Continuous training (CT) automatically retrains ML models, and continuous deployment (CD) automatically deploys the retrained models to production. Therefore, they are essential for maintaining ML model performance in dynamic production environments. The existing proprietary solutions suffer from drawbacks such as a lack of transparency and potential vendor lock-in. Additionally, current MLOps pipelines built using open-source tools still lack flexible CT and CD for ML models. This study proposes a cloud-agnostic and open-source MLOps pipeline that enables users to retrain and redeploy their ML models flexibly. We applied the Design Science methodology, consisting of identifying the problem, defining the solution objectives, and implementing, demonstrating, and evaluating the solution. The resulting solution is an MLOps pipeline called CTCD-e MLOps pipeline. We formed a conceptual model of the needed functionalities of our MLOps pipeline and implemented the pipeline using only open-source tools. The CTCD-e MLOps pipeline runs atop Kubernetes. It can autonomously adapt ML models to dynamic production data by automatically starting retraining ML models when their performance degrades. It can also automatically A/B test the performance of the retrained models in production and fully deploys them only when they outperform their predecessors. Our demonstration and evaluation of the CTCD-e MLOps pipeline show that it is cloud-agnostic and can also be installed in on-premises environments. Additionally, the CTCD-e MLOps pipeline enables its users to flexibly configure model retraining and redeployment as well as production A/B test of the retrained models based on various requirements.
-
(2021)The Web has become the world's most important application distribution platform, with web pages increasingly containing not static documents, but dynamic, script-driven content. Script-based rendering relies on imperative browser APIs which become unwieldy to use as an application's complexity grows. An increasingly common solution is to use libraries and frameworks which provide an abstraction over rendering and enable a less error-prone declarative programming model. The details of how web frontend frameworks implement rendering vary widely and can potentially have significant consequences for application performance. Frameworks' rendering strategies are typically invisible to the application developer, and may consequently be poorly understood despite their potential impact. In this thesis, we review rendering strategies used in a number of influential and popular web frontend frameworks. By studying their implementation details, we discover ways to categorize and estimate rendering strategies' performance based on input sizes in update loops. To verify and measure the effects of these differences, we implement a number of benchmarks that measure different aspects of rendering. In our benchmarks, we discover significant performance differences ranging up to an order of magnitude under some conditions. Additionally, we confirm that categorizing rendering strategies based on input sizes of update loops is an effective way to estimate their relative performance. The best performing rendering strategies are found to be ones which minimize input sizes in update loops using techniques such as compile-time optimization and reactive programming models.
-
(2021)Application Performance Management (APM) is a growing field, and APM tools on the market tend to be complex enterprise solutions with features ranging from traffic analysis and error reporting to real- user monitoring and business transaction management. This thesis is a study done on behalf of Veikkaus Oy, a Finnish government-owned game company and betting agency. It serves as a look into the current state-of-the-art field of leading APM tools as well as a requirements analysis done from the perspective of the company’s IT personnel. A list of requirements was gathered and scored based on perceived importance, and four APM tools on the market—Datadog APM, Dynatrace, New Relic and AppDynamics—were each compared to each other and scored based on the gathered requirements. In addition, open-source alternatives were considered and investigated. Our results suggest that the leading APM vendors have products very similar to each other with marginal differences between them, feature-wise. In general, APMs were deemed useful and valuable to the company, able to assist in the work of a wide variety of IT personnel, as well as able to replace many tools currently in use by Veikkaus Oy and simplify their application ecosystem.
Now showing items 1-20 of 251