Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by department "Tietojenkäsittelytieteen osasto"

Sort by: Order: Results:

  • Pirttinen, Nea (2020)
    Crowdsourcing has been used in computer science education to alleviate the teachers’ workload in creating course content, and as a learning and revision method for students through its use in educational systems. Tools that utilize crowdsourcing can act as a great way for students to further familiarize themselves with the course concepts, all while creating new content for their peers and future course iterations. In this study, student-created programming assignments from the second week of an introductory Java programming course are examined alongside the peer reviews these assignments received. The quality of the assignments and the peer reviews is inspected, for example, through comparing the peer reviews with expert reviews using inter-rater reliability. The purpose of this study is to inspect what kinds of programming assignments novice students create, and whether the same novice students can act as reliable reviewers. While it is not possible to draw definite conclusions from the results of this study due to limitations concerning the usability of the tool, the results seem to indicate that novice students are able to recognise differences in programming assignment quality, especially with sufficient guidance and well thought-out instructions.
  • Haapasalmi, Risto (2020)
    In recent years highly compact succinct text indexes developed in bioinformatics have spread to the domain of natural language processing, in particular n-gram indexing. One line of research has been to utilize compressed suffix trees as both the text index and the language model. Compressed suffix trees have several favourable properties for compressing n-gram strings and associated satellite data while allowing for both fast access and fast computation of the language model probabilities over the text. When it comes to count based n-gram language models and especially to low-order n-gram models, the Kneser-Ney language model has long been de facto industry standard. Shareghi et al. showed how to utilize a compressed suffix tree to build a highly compact index that is competitive with state-of-the-art language models in space. In addition, they showed how the index can work as a language model and allows computing modified Kneser-Ney probabilities straight from the data structure. This thesis analyzes and extends the works of Shareghi et al. in building a compressed suffix tree based modified Kneser-Ney language model. We explain their solution and present three attempts to improve the approach. Out of the three experiments, one performed far worse than the original approach, but two showed minor gains in time with no real loss in space.
  • Laitinen, Niko (2020)
    The nature of developing digital products and services has changed to adjust to the emergent markets that change fast and are difficult to predict. This has prioritized time-to-market and customer validation. The increase in expectations and complexity of digital products has caused organizations to transform to better operate in the present market space. As a consequence the demand for user experience design in digital product development has grown. Design in this context is defined as a plan or specification for the construction of software applications or systems or for the implementation of an activity or process, or the result of that plan or specification in the form of a prototype, product or process. New ways of organizing design work are needed to adjust to the evolving organizations and end-customer markets. In this case study digital product design was examined as a craft, a process and methods of defining, creating and validating digital products for consumer markets. The significant adoption of Lean-Agile software methodologies has successfully spread to organizations in response to the changed market space. Yet incorporating these methodologies into digital design have not yet reached maturity. Results from extensive studies have shown that successfully applying Lean-Agile methodologies can improve the quality of user experience and team productivity. These results have increased the integration of user experience practices into Lean-Agile methodologies. Successfully integrating Lean-Agile development and design have been proven to have immense effects on business growth on a large scale. This has largely been due to increased customer engagement using user-centered design approaches. This thesis investigates how Lean-Agile methodologies, user-centered design, Lean UX, Design Thinking, Lean Startup Method, Agile Software Development, DesignOps and Design Systems could be incorporated into the design process in digital product development to improve the impact, efficiency and quality of design work outcomes. A case study was conducted to prove the benefits of using Lean-Agile methodologies in the development of customer facing digital products and services. The design organization was examined with participatory and action research to establish a model of design operations in the community of practice, a group of people who share design as a discipline. This constructed model is then evaluated against a DesignOps model constructed from applicable literature. The participation allowed use of operations management methods which aimed to make the design process rapid, robust and scalable. These operations are later referred to as DesignOps (design operations). Quantitative and qualitative research methods were used to gather data. Qualitative methods included a team survey and an analysis of business metrics including the effects on business value and growth. Quantitative methods included discussion, interviews and observation in the design organization. The sustainability of the design practice was also evaluated. In addition the design organization put effort into building a company wide Design System and the benefits and considerations for building one are examined. The results indicated that the design practice at the studied company was still low in maturity. Especially DesignOps methods were noticeably beneficial. Efficiency in digital product development and increase in employee satisfaction could be identified from establishing design practices with DesignOps. Beyond these the establishment of a design organization with DesignOps enabled continuous improvement inside the company. Lean UX showed promising signs for establishing highly functioning digital product development teams that can efficiently produce design work in a sustainable manner. Design Systems spread knowledge and cultivate collaboration across disciplines which grows the impact of design inside the company. Establishing a design practice for continuous learning can increase market value, generate growth and improve employee and customer satisfaction.
  • Louhio, Jaana (2020)
    In the late 2010’s classical games of Go, Chess and Shogi have been considered ’solved’ by deep reinforcement learning AI agents. Competitive online video games may offer a new, more challenging environment for deep reinforcement learning and serve as a stepping stone in a path to real world applications. This thesis aims to give a short introduction to the concepts of reinforcement learning, deep networks and deep reinforcement learning. Then the thesis proceeds to look into few popular competitive online video games and to the general problems of AI development in these types of games. Deep reinforcement learning algorithms, techniques and architectures used in the development of highly competitive AI agents in Starcraft 2, Dota 2 and Quake 3 are overviewed. Finally, the results are looked into and discussed.
  • Pelkonen, Mikko (2020)
    The general topic of the thesis is computer aided music analysis on point-set data utilising theories outlined in Timo Laiho’s Analytic-Generative Methodology (AGM) [19]. The topic is in the field of music information retrieval, and is related to previous work on both pattern discovery and computational models of music. The thesis aims to provide analysis results that can be compared to existing studies. AGM introduces two concepts based on perception, sensation and cognitive processing: interval–time complex (IntiC) and musical vectors (muV). These provide a mathematical framework for the analysis of music. IntiC is a value associated with the velocity, or rate of change, between musical notes. Musical vectors are the vector representations of these rates of change. Laiho explains these attributes as meaningful for both music analysis and as tools for music generation. Both of these attributes can be computed from a point-set representation of music data. The concepts in AGM can be viewed as being related to geometric methods for pattern discovery algorithmsof Meredith, Lemström et al.[24] whointroduce afamily of ‘Structure Induction Algorithms’. These algorithms are used to find repeating patterns in multidimensional point-set data. Algorithmic implementations of intiC and muV were made for this thesis and examined in the use of rating and selecting patterns output by the pattern discovery algorithms. In addition software tools for using these concepts of AGM were created. The concepts of AGM and pattern discovery were further related to existing work in computer aided musicology.
  • Karjalainen, Antti (2020)
    Join indices are used in relational databases to make join operations faster. Join indices essentially materialise the results of join operations and so accrue maintenance cost, which makes them more suitable for use cases where modifications are rare and joins are performed frequently. To make the maintenance cost lower incrementally updating existing indices is to be preferred. The usage of persistent data structures for the join indices were explored. Motivation for this research was the ability of persistent data structures to construct multiple partially different versions of the same data structure memory efficiently. This is useful, because there can exist different versions of join indices simultaneously due to usage of multi-version concurrency control (MVCC) in a database. The techniques used in Relaxed Radix Balanced Trees (RRB-Trees) persistent data structure were found promising, but none of the popular implementations were found directly suitable for the use case. This exploration was done from the context of a particular proprietary embedded in-memory columnar multidimensional database called FastormDB developed by RELEX Solutions. This focused the research into Java Virtual Machine (JVM) based data structures as the implementation of FastormDB is in Java. Multiple persistent data-structures made for the thesis and ones from Scala, Clojure and Paguro were evaluated with Java Microbenchmark Harness (JMH) and Java Object Layout (JOL) based benchmarks and their results analysed via visualisations.
  • Bärlund-Vihtola, Nina (2020)
    Henkilötietojen suoja eli tietosuoja on Euroopan unionin perusoikeuskirjassa vahvistettu perusoikeus. Yksityishenkilöiden tietosuoja vahvistui entisestään, kun vuonna 2018 alettiin kaikissa EU:n jäsenmaissa soveltaa Euroopan parlamentin ja neuvoston asetusta (EU) 2016/679 luonnollisten henkilöiden suojelusta henkilötietojen käsittelyssä sekä näiden tietojen vapaasta liikkuvuudesta ja direktiivin 95/46/EY kumoamisesta, lyhyemmin EU:n yleistä tietosuoja-asetusta. Asetusta täydennettiin myöhemmin kansallisella tietosuojalailla 1050/2018. Organisaatio, joka omistaa henkilörekisterin, on rekisterinpitäjä, ja henkilö, jonka tiedot ovat henkilörekisterissä, rekisteröity. Tietosuoja-asetuksessa on määritetty rekisterinpitäjän velvollisuudet, joiden avulla tämän on vastattava siitä, että rekisteröidyn oikeudet toteutuvat. Lisäksi rekisterinpitäjällä on osoitusvelvollisuus siitä, että henkilötietojen käsittely tapahtuu tietosuoja-asetuksen käsittelyperiaatteiden mukaisesti, sekä ilmoitusvelvollisuus kansalliselle valvontaviranomaiselle tilanteissa, joissa on tapahtunut henkilötietojen tietoturvaloukkaus. Tietosuoja-asetuksella on vaikutuksia sovelluskehitykseen. Asetus velvoittaa huolehtimaan sisäänrakennetusta tietosuojasta (Privacy by Design) eli tietosuojaa tuottavien toiminnallisuuksien toteutuksesta tietojärjestelmiin alusta alkaen. Asetus on pitkä ja hankala, joten ollut organisaatioille työlästä havaita siitä kaikki velvoitteet. Tässä tutkielmassa on pyritty vastaamaan tuohon ongelmaan etsimällä asetuksesta sisäänrakennetun tietosuojan vaatimukset ja kiinnittämällä ne TOGAF-kokonaisarkkitehtuurikehykseen. Tämän pohjalta on toteutettu sovelluskehityksen tietosuojaohjeistuksen runko, jota kehittämällä on organisaatiolle luotavissa toimiva ohjeistus sisäänrakennetun tietosuojan rakentamisen tueksi.
  • Ghasemi, Mandana (2019)
    Over the last years, Location-Based Services (LBSs) have become popular due to the global use of smartphones and improvement in Global Positioning System (GPS) and other positioning methods. Location-based services employ users' location to offer relevant information to users or provide them with useful recommendations. Meanwhile, with the development of social applications, location-based social networking services (LBSNS) have attracted millions of users because the geographic position of users can be used to enhance the services provided by those social applications. Proximity detection, as one type of location-based function, makes LBSNS more flexible and notifies mobile users when they are in proximity. Despite all the desirable features that such applications provide, disclosing the exact location of individuals to a centralized server and/or their social friends might put users at risk of falling their information in wrong hands, since locations may disclose sensitive information about people including political and religious affiliations, lifestyle, health status, etc. Consequently, users might be unwilling to participate in such applications. To this end, private proximity detection schemes enable two parties to check whether they are in close proximity while keeping their exact locations secret. In particular, running a private proximity detection protocol between two parties only results in a boolean value to the querier. Besides, it guarantees that no other information can be leaked to the participants regarding the other party's location. However, most proposed private proximity detection protocols enable users to choose only a simple geometric range on the map, such as a circle or a rectangle, in order to test for proximity. In this thesis, we take inspiration from the field of Computational Geometry and develop two privacy-preserving proximity detection protocols that allow a mobile user to specify an arbitrary complex polygon on the map and check whether his/her friends are located therein. We also analyzed the efficiency of our solutions in terms of computational and communication costs. Our evaluation shows that compared to the similar earlier work, the proposed solution increases the computational efficiency by up to 50%, and reduces the communication overhead by up to 90%. Therefore, we have achieved a significant reduction of computational and communication complexity.
  • Ahlskog, Niki (2019)
    Progressiivisen web-sovelluksen (Progressive Web Application, PWA) tarkoitus on hämärtää tai jo- pa poistaa raja sovelluskaupasta ladattavan sovelluksen ja normaalin verkkosivuston välillä. PWA- sovellus on kuin mikä tahansa normaali verkkosivusto, mutta se täyttää lisäksi seuraavat mitta- puut: Sovellus skaalautuu mille tahansa laitteelle. Sovellus tarjotaan salatun yhteyden yli. Sovellus on mahdollista asentaa puhelimen kotinäytölle pikakuvakkeeksi, jolloin sovellus avautuu ilman se- laimesta tuttuja navigointityökaluja ja lisäksi sovelluksen voi myös avata ilman verkkoyhteyttä. Tässä työssä käydään läpi PWA-sovelluksen rakennustekniikoita ja määritellään milloin sovellus on PWA-sovellus. Työssä mitataan PWA-sovelluksen nopeutta Service Workerin välimuistitallen- nusominaisuuksien ollessa käytössä ja ilman. PWA-sovelluksen luomista ja käyttöönottoa tarkastel- laan olemassa olevassa yksityisessä asiakasprojektissa. Projektin tarkastelussa kiinnitetään huomio- ta PWA-sovelluksen tuomiin etuihin ja kipupisteisiin. Tuloksen arvioimiseksi otetaan Google Chromen Lighthouse -työkalua käyttäen mittaukset sovel- luksen progressiivisuudesta ja nopeudesta. Lisäksi sovellusta vasten ajetaan Puppeteer-kirjastoa hyödyntäen latausnopeuden laskeva testi useita kertoja sekä tarkastellaan PWA-sovelluksen Service Workerin välimuistin hyödyllisyyttä suorituskyvyn ja latausajan kannalta. Jotta Service Workerin välimuistin käytöstä voidaan tehdä johtopäätökset, nopeuden muutosta tarkastellaan progressii- visten ominaisuuksien ollessa käytössä ja niiden ollessa pois päältä. Lisäksi tarkastellaan Googlen tapaustutkimuksen kautta Service Workerin vaikutuksia sovelluksen nopeuteen. Testitulokset osoittavat että Service Workerin välimuistin hyödyntäminen on nopeampaa kaikissa tapauksissa. Service Workerin välimuisti on nopeampi kuin selaimen oma välimuisti. Service Worker voi myös olla pysähtynyt ja odotustilassa käyttäjän selaimessa. Silti Service Workerin aktivoimi- nen ja välimuistin käyttäminen on nopeampaa kuin selaimen välimuistista tai suoraan verkosta lataaminen.
  • Ahlskog, Niki (2019)
    Progressiivisen web-sovelluksen (Progressive Web Application, PWA) tarkoitus on hämärtää tai jopa poistaa raja sovelluskaupasta ladattavan sovelluksen ja normaalin verkkosivuston välillä. PWA-sovellus on kuin mikä tahansa normaali verkkosivusto, mutta se täyttää lisäksi seuraavat mittapuut: Sovellus skaalautuu mille tahansa laitteelle. Sovellus tarjotaan salatun yhteyden yli. Sovellus on mahdollista asentaa puhelimen kotinäytölle pikakuvakkeeksi, jolloin sovellus avautuu ilman selaimesta tuttuja navigointityökaluja ja lisäksi sovelluksen voi myös avata ilman verkkoyhteyttä. Tässä työssä käydään läpi PWA-sovelluksen rakennustekniikoita ja määritellään milloin sovelluson PWA-sovellus. Työssä mitataan PWA-sovelluksen nopeutta Service Workerin välimuistitallennusominaisuuksien ollessa käytössä ja ilman. PWA-sovelluksen luomista ja käyttöönottoa tarkastellaan olemassa olevassa yksityisessä asiakasprojektissa. Projektin tarkastelussa kiinnitetään huomiota PWA-sovelluksen tuomiin etuihin ja kipupisteisiin. Tuloksen arvioimiseksi otetaan Google Chromen Lighthouse -työkalua käyttäen mittaukset sovelluksen progressiivisuudesta ja nopeudesta. Lisäksi sovellusta vasten ajetaan Puppeteer-kirjastoa hyödyntäen latausnopeuden laskeva testi useita kertoja sekä tarkastellaan PWA-sovelluksen Service Workerin välimuistin hyödyllisyyttä suorituskyvyn ja latausajan kannalta. Jotta Service Workerin välimuistin käytöstä voidaan tehdä johtopäätökset, nopeuden muutosta tarkastellaan progressiivisten ominaisuuksien ollessa käytössä ja niiden ollessa pois päältä. Lisäksi tarkastellaan Googlen tapaustutkimuksen kautta Service Workerin vaikutuksia sovelluksen nopeuteen. Testitulokset osoittavat että Service Workerin välimuistin hyödyntäminen on nopeampaa kaikissa tapauksissa. Service Workerin välimuisti on nopeampi kuin selaimen oma välimuisti. Service Worker voi myös olla pysähtynyt ja odotustilassa käyttäjän selaimessa. Silti Service Workerin aktivoiminen ja välimuistin käyttäminen on nopeampaa kuin selaimen välimuistista tai suoraan verkosta lataaminen.
  • Hou, Jue (2019)
    Named entity recognition is a challenging task in the field of NLP. As other machine learning problems, it requires a large amount of data for training a workable model. It is still a problem for languages such as Finnish due to the lack of data in linguistic resources. In this thesis, I propose an approach to automatic annotation in Finnish with limited linguistic rules and data of resource-rich language, English, as reference. Training with BiLSTM-CRF model, the preliminary result shows that automatic annotation can produce annotated instances with high accuracy and the model can achieve good performance for Finnish. In addition to automatic annotation and NER model training, to show the actual application of my Finnish NER model, two related experiments are conducted and discussed at the end of my thesis.
  • Store, Joakim (2020)
    In software configuration management, branching is a common practice, which can enable efficient parallel development between developers and teams. However, the developers might not be aware of the different branching practice options and how to exactly formulate a branching strategy. This could lead to an opposite effect towards productivity, and other issues as well. The focus of this thesis is in what branching practices are considered as beneficial, what affects their usability, what risks are involved, and how to plan these practices in a structured manner. There are plenty of branching practices presented in the literature, which can either complement each other or be completely incompatible. A lot of the practices' beneficiality depends on the surrounding context, such as the tools in use and project characteristics. The most relevant risk to branching is merge conflicts, but there are other risks as well. The approaches for planning a branching strategy, however, are found to be too narrow in the reviewed literature. Thus, Branching Strategy Formulation and Analysis Method (BSFAM) is proposed to help teams and organizations plan their branching strategy in a structured manner. Additionally, the issues of branching are explored in the context of an organization that has multiple concurrent projects ongoing for a single product. Information on this is gathered through a survey, semi-structured interviews, and available documentation. The issues that were found can be attributed to a lack of proper base strategy, difficulties in coordination and awareness, and test automation management in relation to branching. The proposed method is then applied in that same context in order to provide solutions to the organization's issues, and to provide an example case. BSFAM will be taken into use in upcoming projects in the organization, and it will be improved if necessary. If the proposed method is to be adopted more widely and its resulting information published, it could provide further research towards how different branching practices fit in different contexts. Additionally, it could help in new, generally better, branching practices to emerge.
  • Salonen, Antti (2020)
    Roskankeruulla tarkoitetaan automaattista muistinhallinnan mekanismia, jossa roskankeräin vapauttaa sovelluksen varaamat muistialueet, joihin sovellus ei enää viittaa. Keskeisiä roskankeruun perustekniikoita ovat muistiviitteiden laskenta ja jäljittävät keruutekniikat, kuten mark-sweep-keruu ja kopioiva keruu. Reaaliaikaisissa ja interaktiivisissa sovelluksissa roskankeruusta koituvat suoritusviiveet eivät saa olla liian pitkiä. Tällaisissa sovelluksissa keruuta ei voida toteuttaa yhtenä atomisena operaationa, jonka ajaksi ohjelman suoritus keskeytyy. Sen sijaan roskankeruu voidaan kohdistaa vain osaan ohjelman muistista, tai roskankeruu toteutetaan etenemään samanaikaisesti ohjelman suorituksen kanssa. Varsinaiset reaaliaikaiset keruutekniikat vuorottavat roskankeräimen suorituksen siten, että keruusta aiheutuvat viiveet ovat tarkkaan ennakoituja. Tutkielmassa vertailtiin Java-kielen roskankeräimiä erilaisilla työkuormilla ja erikokoisilla muistialueilla. Mittauksissa tarkasteltiin mittausajojen kestoa, roskankeruutaukojen kestoa sekä taukojen jakautumista ohjelman suorituksen ajalle. Mittauksissa löydettiin merkittäviä eroja vertailtujen keräimien välillä. Java-kielen uusi G1-keräin suorittaa koko muistiin kohdistuvan merkintävaiheen rinnakkaisena, ja kopiointivaihe kohdistetaan kerrallaan vain pieneen osaan ohjelman muistista. G1-keräin oli suoritetuissa mittauksissa vain hieman hitaampi kuin vanha Parallel-keräin, mutta G1-keräimen keruutauot olivat huomattavasti lyhyempiä. Kun G1-keräimen keruutauoille asetettiin tavoitekesto, viiveet olivat pisimmillään vain muutamia kymmeniä millisekunteja. Vertailussa mukana olleella Shenandoah- keräimellä, joka on suunniteltu takaamaan erityisen lyhyitä suoritusviiveitä, ohjelman suoritukselle aiheutuneet viiveet olivat vain muutamia millisekunteja.
  • Brandtberg, Ronnie (2020)
    Re-engineering can be described as a process for updating an existing system in order to meet new requirements. Restructuring and refactoring are activities that can be performed as a part of the re-engineering process. Supporting new requirements like migrating to new frameworks, new environments and architectural styles is essential for preservation of quality attributes like maintainability and evolvability. Many larger legacy systems slowly deteriorate over time in quality and adding new functionality becomes increasingly difficult and costly as technical debt accumulates. To modernize a legacy system and improve the cost effectiveness of implementing new features a re-engineering process is often needed. The alternative is to develop a completely new system but this can often lead to loss of years of accumulated functionality and be too expensive. Re-engineering strategies can be specialized and solve specific needs like cloud migration or be more generic in nature supporting several kinds of needs. Different approaches are suitable for different kinds of source and target systems. The choice of a re-engineering strategy is also influenced by organisational and business factors. The re-engineering of a highly tailored legacy system in a small organisation is different from re-engineering a scalable system in a large organisation. Generic and flexible solutions are well suited for especially smaller organisations with complex systems. The re-engineering strategy Renaissance was applied in a case study at Roima Intelligence Oy in order to find out if such a strategy is realistically usable, useful and valuable for a smaller organization. The results show that a re-engineering strategy is possible to be used with low overhead in order to prioritize different parts of the system and determining a suitable modernization plan. Renaissance was also shown to add value especially in the form of deeper understanding of the system and a structured way to evaluate different options for modernization. This is achieved through assessing the system from different views taking into account especially business and technical aspects. A lesson learned about Renaissance is that determining an optimal scope for the system assessment is challenging. The results are applicable for other organisations dealing with complex legacy systems with constrained resources. Limitations of the study are that the number of different kinds of re-engineering strategies discussed is small and more suitable strategies than Renaissance could be discovered with a systematic mapping study. The amount of experts participating in the process itself as well as the evaluation was also low, introducing some uncertainty to the validity of the results. Further research is needed in order to determine how specialized and generic re-engineering strategies compare in terms of needed resources and added value.
  • Ramirez Lahti, Jacinto (2020)
    Modern software development is faster than ever before with products needing to hit the markets in record time to be tested and modified to find a place in the market and start generating profit. This process often leads to an excessive amount of technical debt accrued even specially in the early experimental stages of the development of a new software product. This accumulated technical debt must then be amortized or otherwise it will hinder the future development of the product. This can in many cases be difficult not only by the time pressure for new requirements but by the nature of the problems behind the technical debt. These problems might not be apparent and appear just as symptoms that might not directly indicate the real source. In this thesis, an AntiPattern centric approach to the identification and fixing of the root causes of the technical debt was implemented in the context of a case study of the five-year-old codebase of a startup company. AntiPatterns were not only found and fixed from the codebase but from the Scrum methodologies used in the project and thus these were also analyzed and improved through AntiPattern analysis. The case study showed promise in this approach, generating concrete plans and actions towards decreasing the technical debt in the project. Being limited to the context of this one company and project, more research should be done on a larger scale to be able to generalize the results.
  • Kiistala, Ilkka (2020)
    Tämä tutkielma käsittelee ohjelmistokomponenteista koostetun ohjelmiston päivityksen hallintaa. Tutkimuksen tavoitteena on selvittää, miten päivityksen vaikutusta voidaan arvioida, jotta päivittäminen olisi hallittua ja eri vaihtoehtojen arviointi mahdollista. Tutkielmassa kootaan tieteellisistä tutkimuksista ja ammattikirjallisuudesta näkemys komponentteihin perustuvan ohjelmistojärjestelmän ylläpidosta, ohjelmistokomponenttien integraatiotestauksesta, kokoonpanojen hallinnasta ja ohjelmistokomponenttien päivityksen hallinnasta. Tapaustutkimuksen kohteena on Python-päivitys, joka tehtiin Verohallinnon Valmis-hankkeen regressiotestausta varten kehitettyyn testiautomaatiojärjestelmään. Järjestelmä perustuu Robot Framework-testiautomaatiokehykseen, joka on suunniteltu muokattavaksi toimintaympäristön tarpeisiin. Tapaustutkimuksessa selvitettiin, miksi päivityksen vaikutus ulottui odotettua laajemmalle testiautomaatio-ohjelmistossa ja mitä seurauksia sillä oli.
  • Timonen, Jussi (2020)
    Enormous datasets are a common occurence today and compressing them is often beneficial. Fast direct access to any element in the compressed data is a requirement in the field of compressed data structures, which is not easily supported with traditional compression methods. Variable-byte encoding is a method for compressing integers of different byte lengths. It removes unused leading bytes and adds an additional continuation bit to each byte to denote whether the compressed integer continues to the next byte or not. An existing solution using a rank data structure performs well in this given task. This thesis introduces an alternative solution using a select data structure and compares the two implementations. An experimentation is also done on retrieving a subarray from the compressed data structure. The rank implementation performs better on data containing mostly small integers. The select implementation benefits on larger integers. The select implementation has significant advantages on subarray fetching due to how the data is compressed.
  • Tilles, Jan (2020)
    Serverless Computing aka Function as a Service is a cloud service model in which cloud provider manages computing resources and tenants deploy their code without knowing the details behind the underlying infrastructure. The promise of serverless is to drive the costs down so that a tenant pays only for the computing resources that it actually utilizes instead of paying for idle containers or virtual machines. In this thesis, we discuss that Serverless Computing does not always fulfill these requirements. For instance, some serverless frameworks keep certain resources, such as containers or functions, idle in order to reduce latency during function invocation. This may be particularly problematic in edge domains where computing power and resources are limited. In Function as a Service, the smallest unit of deployment is a function. These functions can be used, for example, to deploy traditional microservice-based applications. Serverless computing allows a tenant to run and scale functions with high availability. Serverless Computing also includes some tradeoffs: developers does not have so much of control over the underlying environment, testing of serverless functions is cumbersome, and commercial cloud service providers have a high degree of lock-in in their serverless technologies. A serverless application is stateless by its nature, and it runs in a stateless container that is event-triggered and managed by the cloud provider. A serverless application can access databases but, in general, state related to the function itself is not stored in files or databases. A number of commercial offerings and a wide range of open-source serverless frameworks are available. In this thesis, we present an overview of the different alternatives and show a qualitative comparison. We also show our benchmarking results with OpenFaaS running on an Kubernetes edge cloud (Raspberry Pi) based on algorithms typically utilized in machine learning.
  • Joentausta, Jussi (2020)
    As the amount of information available on the Internet keeps growing new problems arise. The abundance of information makes it more difficult to find proper, needed content in a timely manner. Recommender systems have been developed as an answer to the problem. They aim to provide relevant information to the user. This thesis presents how information can be described, how information retrieval works in general and how recommender systems work. The operation of recommender systems is based on finding similarities between contents and between users. By combining these recommender systems try to estimate what kind of information would be most suitable for the user at specific moment. In the research section a content-based recommender was built for a website. The aim of the research was to find out whether it is possible to extract keywords and create recommendations with the method chosen for the research. Quality of the extracted keywords was measured by a survey and the recommender system as a whole was piloted on a website. Based on the research results extraction of keywords and creation of recommendations were, on average, successful.
  • Rodriguez Villanueva, Cesar Adolfo (2019)
    Spam detection techniques have made our lives easier by unclogging our inboxes and keeping unsafe messages from being opened. With the automation of text messaging solutions and the increase in telecommunication companies and message providers, the volume of text messages has been on the rise. With this growth came along malicious traffic which users had little control over. In this thesis, we present an implementation of a spam detection system in a real-world text messaging platform. Using well-established machine learning algorithms, we make an in-depth analysis on the performance of the models using two different datasets: one publicly available (N=5,574) and the other gathered from actual traffic of the platform (N=1,477). Making use of the empirical results, we outline the models and hyperparameters which can be used in the platform and in which scenarios they produce optimal performance. The results indicate that our dataset poses a great challenge at accurate classification, most likely due to the small sample size and unbalanced dataset, along with nuances in the dataset. Nevertheless, there were models that were found to have a good all-around performance and they can be trained and used in the platform.