Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Tietojenkäsittelytieteen maisteriohjelma"

Sort by: Order: Results:

  • Heikkinen, Juuso (2021)
    Telecommunication companies are moving towards even more digitalized and agile ways of working. They are expanding their business in other fields, such as television, thus moving further away from the traditional telecommunications model. Recently, Telia has become the largest television company in the Nordics. One of the their main products in the field of television is channel packages, which allow customers to access specific television content. In this study, a benefit analysis for Telia Finland Oyj was conducted to inspect the benefits that test automation brings for the channel package testing process. 8 interviews in total were conducted with Telia employees with knowledge on channel packages. To receive both a business and a technical perspective, the interviewees were divided into two groups fitting their expertise. In general, test automation was seen as a useful tool. The main business related benefits of test automation mentioned were a faster and cheaper testing process, and a faster time-to-market. It was also seen that test automation could help achieve a more efficient testing process, and increase confidence in test automation. Based on the interview results, an epic was defined and analyzed according to the principles of Scaled Agile Framework (SAFe). This included describing the solution in detail and defining a Minimum Viable Product (MVP). By using example variables and generalized values, several calculations were made to present a framework on the costs of implementing the MVP and the estimated reduction of channel package testing costs. By utilizing the MVP as a part of the channel package testing process, the return on investment (ROI) was not as desirable as expected. With more automated tests compared to the number of test cases, combined with regular use of test automation, the investment would pay itself back and start generating additional savings faster. Based on the epic analysis, a Lean Business Case was defined.
  • Pulkka, Robert (2022)
    In recent years, the concept of Metaverse has become a popular buzzword in the media and different communities. In 2021, the company behind Facebook rebranded itself into Meta Platforms, inc. in order to match their new vision of developing the Metaverse. The Metaverse is becoming reality as intersecting technologies, including head-mounted virtual reality displays (HMDs) and non-fungible tokens (NFTs), have been developed. Different communities, such as media, researchers, consumers and companies have different perspectives on the Metaverse and its opportunities and problems. Metaverse technology has been researched thoroughly, while little to none research has been done on gray literature, i.e. non-scientific sources, to gain insight on the ongoing hype. The conducted research analyzed 44 sources in total, ranging from news articles to videos and forum discussions. The results show that people are seeing opportunities in Metaverse entrepreneurship in the changing career landscape. However, the visions of Meta Platforms, inc. also receive a fair amount of critique in the analyzed articles and threads. The results suggest that most of the consumers are only interested in a smaller subset of features than what is being marketed. The conducted research gives insight on how different sources are seeing the Metaverse and can therefore be used as a starting point for more comprehensive gray literature studies on the Metaverse. While making innovations to the underlying technology is important, studying people’s viewpoints is a requirement for the academia to understand the phenomenon and for the industry to produce a compelling product.
  • Leskinen, Juno (2022)
    The continuously evolving cyber threat landscape has become a major concern because sophisticated attacks against systems connected to the Internet have become frequent. The concern is on particular threats that are known as Advanced Persistent Threats (APT). The thesis aims to introduce what APTs are and illustrate other topics under the scope, such as tools and methods attackers can use. Attack models will also be explained, providing example models proposed in the literature. The thesis also introduces which kind of operational objectives attacks can have, and for each objective, one example attack is given that characterizes the objective. In addition, the thesis also uncovers various countermeasures, including most essential security solutions, complemented with more advanced methods. The last countermeasure that the thesis introduces is attribution analysis.
  • Jabs, Christoph (2022)
    Many real-world problem settings give rise to NP-hard combinatorial optimization problems. This results in a need for non-trivial algorithmic approaches for finding optimal solutions to such problems. Many such approaches—ranging from probabilistic and meta-heuristic algorithms to declarative programming—have been presented for optimization problems with a single objective. Less work has been done on approaches for optimization problems with multiple objectives. We present BiOptSat, an exact declarative approach for finding so-called Pareto-optimal solutions to bi-objective optimization problems. A bi-objective optimization problem arises for example when learning interpretable classifiers and the size, as well as the classification error of the classifier should be taken into account as objectives. Using propositional logic as a declarative programming language, we seek to extend the progress and success in maximum satisfiability (MaxSAT) solving to two objectives. BiOptSat can be viewed as an instantiation of the lexicographic method and makes use of a single SAT solver that is preserved throughout the entire search procedure. It allows for solving three tasks for bi-objective optimization: finding a single Pareto-optimal solution, finding one representative solution for each Pareto point, and enumerating all Pareto-optimal solutions. We provide an open-source implementation of five variants of BiOptSat, building on different algorithms proposed for MaxSAT. Additionally, we empirically evaluate these five variants, comparing their runtime performance to that of three key competing algorithmic approaches. The empirical comparison in the contexts of learning interpretable decision rules and bi-objective set covering shows practical benefits of our approach. Furthermore, for the best-performing variant of BiOptSat, we study the effects of proposed refinements to determine their effectiveness.
  • Avikainen, Jari (2019)
    This thesis presents a wavelet-based method for detecting moments of fast change in the textual contents of historical newspapers. The method works by generating time series of the relative frequencies of different words in the newspaper contents over time, and calculating their wavelet transforms. Wavelet transform is essentially a group of transformations describing the changes happening in the original time series at different time scales, and can therefore be used to pinpoint moments of fast change in the data. The produced wavelet transforms are then used to detect fast changes in word frequencies by examining products of multiple scales of the transform. The properties of the wavelet transform and the related multi-scale product are evaluated in relation to detecting various kinds of steps and spikes in different noise environments. The suitability of the method for analysing historical newspaper archives is examined using an example corpus consisting of 487 issues of Uusi Suometar from 1869–1918 and 250 issues of Wiipuri from 1893–1918. Two problematic features in the newspaper data, noise caused by OCR (optical character recognition) errors and uneven temporal distribution of the data, are identified and their effects on the results of the presented method are evaluated using synthetic data. Finally, the method is tested using the example corpus, and the results are examined briefly. The method is found to be adversely affected especially by the uneven temporal distribution of the newspaper data. Without additional processing, or improving the quality of the examined data, a significant amount of the detected steps are due to the noise in the data. Various ways of alleviating the effect are proposed, among other suggested improvements on the system.
  • Lehtonen, Leevi (2021)
    Quantum computing has an enormous potential in machine learning, where problems can quickly scale to be intractable for classical computation. A Boltzmann machine is a well-known energy-based graphical model suitable for various machine learning tasks. Plenty of work has already been conducted for realizing Boltzmann machines in quantum computing, all of which have somewhat different characteristics. In this thesis, we conduct a survey of the state-of-the-art in quantum Boltzmann machines and their training approaches. Primarily, we examine variational quantum Boltzmann machine, a specific variant of quantum Boltzmann machine suitable for the near-term quantum hardware. Moreover, as variational quantum Boltzmann machine heavily relies on variational quantum imaginary time evolution, we effectively analyze variational quantum imaginary time evolution to a great extent. Compared to the previous work, we evaluate the execution of variational quantum imaginary time evolution with a more comprehensive collection of hyperparameters. Furthermore, we train variational quantum Boltzmann machines using a toy problem of bars and stripes, representing more multimodal probability distribution than the Bell states and the Greenberger-Horne-Zeilinger states considered in the earlier studies.
  • Meriläinen, Roosa (2020)
    In the world of constantly growing data masses the efficient extraction, saving and accessing that data for business intelligence and analytics has become increasingly important to businesses. Analytics and business intelligence software is offered by many providers in the market for all sizes of organizations and there are multiple ways to build an analytics system, or pipeline from scratch or integrated with tools available on the market. In this case study we explore and re-design the analytics pipeline solution of a medium sized software product company by utilizing the design science research methodology. We discuss the current technologies and tools on the market for business intelligence and analytics and consider how they fit into our case study context. As design science suggests, we design, implement and evaluate two prototypes of an analyt- ics pipeline with an Extract, Transform and Load (ETL) solution and data warehouse. The prototypes represent two different approaches to building an analytics pipeline - an in-house approach, and a partially outsourced approach. Our study brings out typical challenges similar businesses may face when designing and building their own business intelligence and analytics software. In our case we lean towards an analytics pipeline with an outsourced ETL process to be able to pass various different types of event data with a consistent data schema into our data warehouse with minimal maintenance work. However, we also show the value of near real time analytics with an in-house solution, and offer some ideas on how such a pipeline may be built.
  • Xue, Jiayue (2021)
    The semantic shifts in natural language is a well established phenomenon and have been studied for many years. Similarly, the meanings of scientific publications may also change as time goes by. In other words, the same publication may be cited in distinct contexts. To investigate whether the meanings of citations have changed in different scenarios, which is also called in the semantic shifts in citations, we followed the same ideas of how researchers studied semantic shifts in language. To be more specific, we combined the temporal referencing model and the Word2Vec model to explore the semantic shifts of scientific citations in two aspects: their usages over time and their usages across different domains. By observing how citations themselves changed over time and comparing the closest neighbors of citations, we concluded that the semantics of scientific publications did shift in terms of cosine distances.
  • Wargelin, Matias (2021)
    Musical pattern discovery refers to the automated discovery of important repeated patterns, such as melodies and themes, from music data. Several algorithms have been developed to solve this problem, but evaluating the algorithms has been difficult without proper visualisations of the output of the algorithms. To address this issue a web application named Mupadie was built. Mupadie accepts MIDI music files as input and visualises the outputs of musical pattern discovery algorithms, with implementations of SIATEC and TTWIA built in the application. Other algorithms can be visualised if the algorithm output is uploaded to Mupadie as a JSON file that follows a specified data structure. Using Mupadie, an evaluation of SIATEC and TTWIA was conducted. Mupadie was found to be a useful tool in the qualitative evaluation of these musical pattern discovery algorithms; it helped reveal systematically recurring issues with the discovered patterns, some previously known and some previously undocumented. The findings were then used to suggest improvements to the algorithms.
  • Ollila, Risto (2021)
    The Web has become the world's most important application distribution platform, with web pages increasingly containing not static documents, but dynamic, script-driven content. Script-based rendering relies on imperative browser APIs which become unwieldy to use as an application's complexity grows. An increasingly common solution is to use libraries and frameworks which provide an abstraction over rendering and enable a less error-prone declarative programming model. The details of how web frontend frameworks implement rendering vary widely and can potentially have significant consequences for application performance. Frameworks' rendering strategies are typically invisible to the application developer, and may consequently be poorly understood despite their potential impact. In this thesis, we review rendering strategies used in a number of influential and popular web frontend frameworks. By studying their implementation details, we discover ways to categorize and estimate rendering strategies' performance based on input sizes in update loops. To verify and measure the effects of these differences, we implement a number of benchmarks that measure different aspects of rendering. In our benchmarks, we discover significant performance differences ranging up to an order of magnitude under some conditions. Additionally, we confirm that categorizing rendering strategies based on input sizes of update loops is an effective way to estimate their relative performance. The best performing rendering strategies are found to be ones which minimize input sizes in update loops using techniques such as compile-time optimization and reactive programming models.
  • Törnroos, Topi (2021)
    Application Performance Management (APM) is a growing field, and APM tools on the market tend to be complex enterprise solutions with features ranging from traffic analysis and error reporting to real- user monitoring and business transaction management. This thesis is a study done on behalf of Veikkaus Oy, a Finnish government-owned game company and betting agency. It serves as a look into the current state-of-the-art field of leading APM tools as well as a requirements analysis done from the perspective of the company’s IT personnel. A list of requirements was gathered and scored based on perceived importance, and four APM tools on the market—Datadog APM, Dynatrace, New Relic and AppDynamics—were each compared to each other and scored based on the gathered requirements. In addition, open-source alternatives were considered and investigated. Our results suggest that the leading APM vendors have products very similar to each other with marginal differences between them, feature-wise. In general, APMs were deemed useful and valuable to the company, able to assist in the work of a wide variety of IT personnel, as well as able to replace many tools currently in use by Veikkaus Oy and simplify their application ecosystem.
  • Stepanenko, Artem (2022)
    The rapid progress of communication technologies combined with the growing competition for talents and knowledge has made it necessary to reassess the potential of distributed development which has significantly changed the landscape of the IT industry introducing a variety of cooperation models and making notable changes to the software team work environment. Along with this, enterprises pay more attention to teams’ performance improvement, employing emerging management tools for building up efficient software teams, and trying to get the most out of understanding factors which significantly impact a team’s overall performance. The objective of the research is to systematize factors characterizing high-performing software teams; indicate the benefits of global software development (GSD) models positively influencing software teams’ development performance; and study how companies’ strategies can benefit from distributed development approaches in building high-performing software teams. The thesis is designed as a combination of a systematic literature review followed by qualitative research in the form of semi-structured interviews to validate the findings regarding classification of GSD models’ benefits and their influence on the development of high-performing software teams. At a literature review stage, the research (1) introduces a team performance factors’ model reflecting the aspects which impact the effectiveness of development teams; (2) suggests the classification of GSD models based on organizational, legal, and temporal characteristics, and (3) describes the benefits of GSD models which influence the performance of software development teams. Within the empirical part of the study, we refine the classification of GSD models’ benefits based on the qualitative analysis results of semi-structured interviews with practitioners from IT industry, form a comparison table of GSD benefits depending on the model in question, and introduce recommendations for company and team management regarding the application of GSD in building high-performing software teams. IT corporations, to achieve their strategic goals, can enrich their range of available tools for managing high-performing teams by considering the peculiarities of different GSD models. Company and team management should evaluate the advantages of the distributed operational models, and use the potential and benefits of available configurations to increase teams’ performance and build high-performing software teams.
  • Harviainen, Juha (2021)
    Computing the permanent of a matrix is a famous #P-hard problem with a wide range of applications. The fastest known exact algorithms for the problem require an exponential number of operations, and all known fully polynomial randomized approximation schemes are rather complicated to implement and have impractical time complexities. The most promising recent advancements on approximating the permanent are based on rejection sampling and upper bounds for the permanent. In this thesis, we improve the current state of the art by developing the deep rejection sampling method, which combines an exact algorithm with the rejection sampling method. The algorithm precomputes a dynamic programming table that tightens the initial upper bound used by the rejection sampling method. In a sense, the table is used to jump-start the sampling process. We give a high probability upper bound for the time complexity of the deep rejection sampling method for random (0, 1)-matrices in which each entry is 1 with probability p. For matrices with p < 1/5, our high probability bound is stronger than in previous work. In addition to that, we empirically observe that our algorithm outperforms earlier rejection sampling methods by testing it with different parameters against other algorithms on multiple classes of matrices. The improvements in sampling times are especially notable in cases in which the ratios of the permanental upper bounds and the exact value of the permanent are huge.
  • Liu, Yang (2020)
    Automatic readability assessment is considered as a challenging task in NLP due to its high degree of subjectivity. The majority prior work in assessing readability has focused on identifying the level of education necessary for comprehension without the consideration of text quality, i.e., how naturally the text flows from the perspective of a native speaker. Therefore, in this thesis, we aim to use language models, trained on well-written prose, to measure not only text readability in terms of comprehension but text quality. In this thesis, we developed two word-level metrics based on the concordance of article text with predictions made using language models to assess text readability and quality. We evaluate both metrics on a set of corpora used for readability assessment or automated essay scoring (AES) by measuring the correlation between scores assigned by our metrics and human raters. According to the experimental results, our metrics are strongly correlated with text quality, which achieve 0.4-0.6 correlations on 7 out of 9 datasets. We demonstrate that GPT-2 surpasses other language models, including the bigram model, LSTM, and bidirectional LSTM, on the task of estimating text quality in a zero-shot setting, and GPT-2 perplexity-based measure is a reasonable indicator for text quality evaluation.
  • Liu, Yang Jr (2020)
    Automatic readability assessment is considered as a challenging task in NLP due to its high degree of subjectivity. The majority prior work in assessing readability has focused on identifying the level of education necessary for comprehension without the consideration of text quality, i.e., how naturally the text flows from the perspective of a native speaker. Therefore, in this thesis, we aim to use language models, trained on well-written prose, to measure not only text readability in terms of comprehension but text quality. In this thesis, we developed two word-level metrics based on the concordance of article text with predictions made using language models to assess text readability and quality. We evaluate both metrics on a set of corpora used for readability assessment or automated essay scoring (AES) by measuring the correlation between scores assigned by our metrics and human raters. According to the experimental results, our metrics are strongly correlated with text quality, which achieve 0.4-0.6 correlations on 7 out of 9 datasets. We demonstrate that GPT-2 surpasses other language models, including the bigram model, LSTM, and bidirectional LSTM, on the task of estimating text quality in a zero-shot setting, and GPT-2 perplexity-based measure is a reasonable indicator for text quality evaluation.
  • Alho, Riku (2021)
    Modularity is often used to manage the complexity of monolithic software systems. This is done through reducing maintenance costs by minimizing the entanglement in software code and functionality. Modularity also lowers future development costs through enabling the reuse and stacking of different types of modular functionality and software code for different environments and software engineering problems. Although there are important differences between the problem solving processes and practices of machine learning system developers and software engineering developers, machine learning system developers have been shown to be able to adopt a lot from traditional software engineering. A systematic literature review is used to identify 484 studies published in four electronic sources from January 1990 to October 2021. After examination of papers, statistical and qualitative results are formed for selected 86 studies which provide sufficient information regarding the presence of modular operators and comparison to monolithic solutions. The selected studies addressed a wide number of different tasks and domains, which saw performance benefits compared to monolithic machine learning and deep learning methods. Nearly two thirds of studies discovered Modular Neural Networks (MNNs) providing improvements in task accuracy when compared to monolithic solutions. Only 16,3\% of studies reported efficiency values in their comparisons. Over 82,5\% of studies that reported their MNNs efficiency found benefits in computation time, memory/size and energy consumption when compared to monolithic solutions. The majority of studies were carried out in laboratory environments on singular focused tasks and static requirements, which may have limited the visibility of modular operators. MNNs show positive promise for performance and efficiency in machine learning. More comparable studies are needed, especially from the industry, that use MMNs in constantly changing requirements and thus apply multiple modular operators.
  • Porkka, Otto (2022)
    Blockchain technologies and cryptocurrencies have gained massive popularity in the past few years. Smart contracts extend the utility of these distributed ledgers to distributed state machines, where anyone can store and run code and then mutually agree on the next state. This opens up a whole new world of possibilities, but also many new security challenges. In this thesis we give an up-to-date survey on smart contract security issues. First we give a brief introduction to blockchains and smart contracts and explain the most common attack types and some mitigations against them. Then we sum up and analyse our findings. We find out that many of the attacks could be avoided or at least severely mitigated if the coders followed good coding practices and used design patterns that are proven to be good. Another finding is that changing the underlying blockchain technology to counter the issues is usually not the best way, as it is hard and troublesome to do and might restrict the usability of contracts too much. Lastly, we find out that many new automated tools for security are being developed and used, which indicates movement towards more conventional coding where automated tools like scanners and analysers are being used to cover a large set of security issues.
  • Mylläri, Juha (2022)
    Anomaly detection in images is the machine learning task of classifying inputs as normal or anomalous. Anomaly localization is the related task of segmenting input images into normal and anomalous regions. The output of an anomaly localization model is a 2D array, called an anomaly map, of pixel-level anomaly scores. For example, an anomaly localization model trained on images of non-defective industrial products should output high anomaly scores in image regions corresponding to visible defects. In unsupervised anomaly localization the model is trained solely on normal data, i.e. without labelled training observations that contain anomalies. This is often necessary as anomalous observations may be hard to obtain in sufficient quantities and labelling them is time-consuming and costly. Student-teacher feature pyramid matching (STFPM) is a recent and powerful method for unsupervised anomaly detection and localization that uses a pair of convolutional neural networks of identical architecture. In this thesis we propose two methods of augmenting STFPM to produce better segmentations. Our first method, discrepancy scaling, significantly improves the segmentation performance of STFPM by leveraging pre-calculated statistics containing information about the model’s behaviour on normal data. Our second method, student-teacher model assisted segmentation, uses a frozen STFPM model as a feature detector for a segmentation model which is then trained on data with artificially generated anomalies. Using this second method we are able to produce sharper anomaly maps for which it is easier to set a threshold value that produces good segmentations. Finally, we propose the concept of expected goodness of segmentation, a way of assessing the performance of unsupervised anomaly localization models that, in contrast to current metrics, explicitly takes into account the fact that a segmentation threshold needs to be set. Our primary method, discrepancy scaling, improves segmentation AUROC on the MVTec AD dataset over the base model by 13%, measured in the shrinkage of the residual (1.0 − AUROC). On the image-level anomaly detection task, a variant of the discrepancy scaling method improves performance by 12%.
  • Thapa Magar, Purushottam (2021)
    Rapid growth and advancement of next generation sequencing (NGS) technologies have changed the landscape of genomic medicine. Today, clinical laboratories perform DNA sequencing on a regular basis, which is an error prone process. Erroneous data affects downstream analysis and produces fallacious result. Therefore, external quality assessment (EQA) of laboratories working with NGS data is crucial. Validation of variations such as single nucleotide polymor- phism (SNP) and InDels (<50 bp) is fairly accurate these days. However, detection and quality assessment of large changes such as the copy number variation (CNV) continues to be a concern. In this work, we aimed to study the feasibility of an automated CNV concordance analysis for the laboratory EQA services. We benchmarked variants reported by 25 laboratories against the highly curated gold standard for the son (HG002/NA24385) of the askenazim trio from the Personal Genome Project published by the Genome in a Bottle Consortium (GIAB). We employed two methods to conduct concordance of CNVs, the sequence based comparison with Truvari and the in-house exome-based comparison. For deletion calls of two whole genome sequencing (WGS) submissions, Truvari gained a value greater than 88% and 68% for precision and recall respectively. Conversely, the in-house method’s precision and recall score peaked at 39% and 7.9% respectively for one WGS submission for both deletion and duplication calls. The results indicate that automated CNV concordance analysis of the deletion calls for the WGS-based callset might be feasible with Truvari. On the other hand, results for panel-based targeted sequencing for the deletion calls showed precision and recall rates ranging from 0-80% and 0-5.6% respectively with Truvari. The result suggests that automated concordance analysis of CNVs for targeted sequencing remains a challenge. In conclusion, CNV concordance analysis depends on how the sequence data is generated.
  • Gold, Ayoola (2021)
    The importance of Automatic Speech Recognition cannot be underestimated in today’s worlds as they play a significant role in human computer interaction. ASR systems have been studied deeply over time, but their maximum potential is yet to be explored for the Finnish language. Development of a traditional ASR system involves a number of hand-crafted engineering which has made this technology quite difficult and resourceful to develop. However, with advancements in the field of neural networks, end-to-end ASR neural networks can be developed which can automatically learn the mappings of audio to its corresponding transcript., therefore reducing hand crafted engineering requirements. End-to-end neural network ASR systems have been largely developed commercially by tech giants such as Microsoft, Google and Amazon. However, there are limitations to these commercial services such as data privacy and cost of usage. In this thesis, we explored existing studies in the development of an end-to-end neural network ASR for Finnish language. One successful technique utilized in the development of neural network ASR in the advent of inadequate data is Transfer learning. This is the approach explored in this thesis for the development of the end-to-end neural network ASR system. In addition, the success of this approach was evaluated. In order to achieve this purpose, dataset collected from the Finnish Bank of Finland and Kaggle were used to fine-tune Mozilla DeepSpeech model which is a pretrained end-to-end neural network ASR in English language. The results obtained by fine-tuning the pretrained neural network ASR in English for Finnish language showed a word error rate as low as 40% and character error rate as low as 22%. We therefore concluded that transfer learning is a successful technique for creating ASR model for a new language using a pretrained model in another language with little effort, data and resources.