Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by discipline "Tietojenkäsittelytiede"

Sort by: Order: Results:

  • Nietosvaara, Joonas (2019)
    We examine a previously known sublinear-time algorithm for approximating the length of a string’s optimal (i.e. shortest) Lempel-Ziv parsing (a.k.a. LZ77 factorization). This length is a measure of compressibility under the LZ77 compression algorithm, so the algorithm also estimates a string’s compressibility. The algorithm’s approximation approach is based on a connection between optimal Lempel-Ziv parsing length and the number of distinct substrings of different lengths in a string. Some aspects of the algorithm are described more explicitly than in earlier work, including the constraints on its input and how to distinguish between strings with short vs. long optimal parsings in sublinear time; several proofs (and pseudocode listings) are also more detailed than in earlier work. An implementation of the algorithm is provided. We experimentally investigate the algorithm’s practical usefulness for estimating the compressibility of large collections of data. The algorithm is run on real-world data under a wide range of approximation parameter settings. The accuracy of the resulting estimates is evaluated. The estimates turn out to be consistently highly inaccurate, albeit always inside the stated probabilistic error bounds. We conclude that the algorithm is not promising as a practical tool for estimating compressibility. We also examine the empirical connection between optimal parsing length and the number of distinct substrings of different lengths. The latter turns out to be a suprisingly accurate predictor of the former within our test data, which suggests avenues for future work.
  • Pennanen, Teppo (2015)
    This thesis is a study on Lean Startup metrics. It attempts to answer what is measured and how in Lean Startup, how it differs from other software measurement and what information needs today's startups have. This study has literature study and an empirical survey using real start-ups. This study explains how the software measurement has changed over the years and what kind of metrics Lean Startup suggests to be used and why. It shows differences in measurement use between traditional start-ups and Lean Startups. This study suggest reasons and motivations to use measurement in start-ups and gives examples when not to. In the scope of this study a survey with questionnaires and interviews was conducted. It showed distinctly different attitudes towards measurement between traditional start-up entrepreneurs and those who like to call themselves Lean Startup entrepreneurs. Measurement in Lean Startup is not an end in itself, but a useful tool for gaining feedback for the gut-feelings of an entrepreneur. Metrics, when meaningful and correct, communicate the focus within a start-up and will objectively evaluate the business' success.
  • Islam, Hasan Mahmood Aminul (2013)
    The Web has introduced a new technology in a more distributed and collaborative form of communication, where the browser and the user replace the web server as the nexus of communications in a way that after the call establishment through web servers, the communication is performed directly between browsers as peer to peer fashion without intervention of the web servers. The goal of Real Time Collaboration on the World Wide Web (RTCWeb) project is to allow browsers to natively support voice, video, and gaming in interactive peer to peer communications and real time data collaboration. Several transport protocols such as TCP, UDP, RTP, SRTP, SCTP, DCCP presently exist for communication of media and non-media data. However, a single protocol alone can not meet all the requirements of RTCWeb. Moreover, the deployment of a new transport protocol experiences problems traversing middle boxes such as Network Address Translation (NAT) box, firewall. Nevertheless, the current implementation for transportation of non-media in the very first versions of RTCWeb data does not include any congestion control on the end-points. With media (i.e., audio, video) the amount of traffic can be determined and limited by the codec and profile used during communication, whereas RTCWeb user could generate as much as non-media data to create congestion on the networks. Therefore, a suitable transport protocol stack is required that will provide congestion control, NAT traversal solution, and authentication, integrity, and privacy of user data. This master's thesis will give emphasis on the analysis of transport protocol stack for data channel in RTCWeb and selects Stream Control Transmission Protocol (SCTP), which is a reliable, message oriented general-purpose transport layer protocol, operating on top of both IPv4 and IPv6, providing congestion control similar to TCP and additionally, some new functionalities regarding security, multihoming, multistreaming, mobility, and partial reliability. However, due to the lack of universal availability of SCTP within the OS(s), it has been decided to use the SCTP userland implementation. WebKit is an open source web browser engine for rendering web pages used by Safari, Dashboard, Mail, and many other OS X applications. In WebKit RTCWeb implementation using GStreamer multimedia framework, RTP/UDP is utilized for the communication of media data and UDP tunnelling for non-media data. Therefore, in order to allow a smooth integration of the implementation within WebKit, we have decided to implement GStreamer plugins using SCTP userland stack. This thesis work also investigates the way Mozilla has integrated those protocols in the browser's network stack and how the Data Channel has been designed and implemented using SCTP userland stack.
  • Peltonen, Ella (2013)
    Cloud computing offers important resources, performance, and services nowadays when it has became popular to collect, store and analyze large data sets. This thesis builds on Berkeley Data Analysis Stack (BDAS) as a cloud computing environment designed for Big Data handling and analysis. Especially two parts of the BDAS, the cluster resource manager Mesos and the distribution manager Spark will be introduced. They offer important features, such as efficiency, multi-tenancy, and fault tolerance, for cloud computing. The Spark system expands MapReduce, the well-known cloud computing paradigm. Machine learning algorithms can predict trends and anomalies of large data sets. This thesis will present one of them, a distributed decision tree algorithm, implemented on the Spark system. As an example case, the decision tree will be used on the versatile energy consumption data from mobile devices, such as smart phones and tablets, of the Carat project. The data consists of information about the usage of the device, such as which applications have been running, network connections, battery temperatures, and screen brightness, for example. The decision tree aims to find chains of data features that might lead to energy consumption anomalies. Results of the analysis can be used to advise users on how to improve their battery life. This thesis will present selected analysis results together with advantages and disadvantages of the decision tree analysis.
  • Niemistö, Juho (2014)
    Googlen kehittämä Android on noussut viime vuosina markkinaosuudeltaan suurimmaksi mobiililaitteiden käyttöjärjestelmäksi. Kuka tahansa voi kehittää Androidille sovelluksia, joiden kehittämiseen tarvittavat välineet ovat ilmaiseksi saatavilla. Erilaisia sovelluksia onkin kehitetty jo yli miljoona. Sovellusten laatu on erityisen tärkeää Android-alustalla, jossa kilpailua on runsaasti ja sovellusten hinta niin alhainen, ettei se muodosta estettä sovelluksen vaihtamiselle toiseen. Sovelluskauppa on myös aina saatavilla suoraan laitteesta. Tämä asettaa sovellusten testaamisellekin haasteita. Toisaalta sovellukset tulisi saada nopeasti sovelluskauppaan, mutta myös sovellusten laadun pitäisi olla hyvä. Testityökalujen pitäisi siis olla helppokäyttöisiä, ja tehokkaita. Androidille onkin kehitetty lukuisia testaustyökaluja Googlen omien työkalujen lisäksi. Tässä tutkielmassa tutustutaan Android-sovellusten rakenteeseen, niiden testaamiseen ja Android-alustalla toimiviin automaattisen testauksen työkaluihin. Erityisesti keskitytään yksikkö- ja toiminnallisen testauksen työkaluihin. Yksikkötestityökaluista vertaillaan Androidin omaa yksikkötestikehystä Robolectriciin. Toiminnallisen testauksen työkaluista vertaillaan Uiautomatoria, Robotiumia ja Troydia.
  • Pulliainen, Laur (2018)
    Software defect prediction is the process of improving software testing process by identifying defects in the software. It is accomplished by using supervised machine learning with software metrics and defect data as variables. While the theory behind software defect prediction has been validated in previous studies, it has not widely been implemented into practice. In this thesis, a software defect prediction framework is implemented for improving testing process resource allocation and software release time optimization at RELEX Solutions. For this purpose, code and change metrics are collected from RELEX software. The used metrics are selected with the criteria of their frequency of usage in other software defect prediction studies, and availability of the metric in metric collection tools. In addition to metric data, defect data is collected from issue tracker. Then, a framework for classifying the collected data is implemented and experimented on. The framework leverages existing machine learning algorithm libraries to provide classification functionality, using classifiers which are found to perform well in similar software defect prediction experiments. The results from classification are validated utilizing commonly used classifier performance metrics, in addition to which the suitability of the predictions is verified from a use case point of view. It is found that software defect prediction does work in practice, with the implementation achieving comparable results to other similar studies when measuring by classifier performance metrics. When validating against the defined use cases, the performance is found acceptable, however the performance varies between different data sets. It is thus concluded that while results are tentatively positive, further monitoring with future software versions is needed to verify performance and reliability of the framework.
  • Enberg, Pekka (2016)
    Hypervisors and containers are the two main virtualization techniques that enable cloud computing. Both techniques have performance overheads on CPU, memory, networking, and disk performance compared to bare metal. Unikernels have recently been proposed as an optimization for hypervisor-based virtualization to reduce performance overheads. In this thesis, we evaluate network I/O performance overheads for hypervisor-based virtualization using Kernel-based Virtual Machine (KVM) and the OSv unikernel and for container-based virtualization using Docker comparing the different configurations and optimizations. We measure the raw networking latency and throughput and CPU utilization by using the Netperf benchmarking tool and measure network intensive application performance using the Memcached key-value store and the Mutilate benchmarking tool. We show that compared to bare metal Linux, Docker with bridged networking has the least performance overhead with OSv using vhost-net coming a close second.
  • Ibbad, Hafeez (2016)
    The number of devices connected to the Internet is growing exponentially. These devices include smartphones, tablets, workstations and Internet of Things devices, which offer a number of cost and time savings by automating routine tasks for the users. However, these devices also introduce a number of security and privacy concerns for the users. These devices are connected to small office/home-office (SOHO) and enterprise networks, where users have very little to no information about threats associated to these devices and how these devices can be managed properly to ensure user's privacy and data security. We proposed a new platform to automate the security and management of the networks providing connectivity to billions of connected devices. Our platform is low cost, scalable and easy to deploy system, which provides network security and management features as a service. It is consisted of two main components i.e. Securebox and Security and Management Service (SMS). Securebox is a newly designed Openflow enabled gateway residing in edge networks and is responsible for enforcing the security and management decisions provided by SMS. SMS runs a number of traffic analysis services to analyze user traffic on demand for Botnet, Spamnet, malware detection. SMS also supports to deploy on demand software based middleboxes for on demand analysis of user traffic in isolated environment. It handles the configuration update, load balancing and scalability of these middlebox deployments as well. In contrast to current state of the art, the proposed platform offloads the security and management tasks to an external entity, providing a number of advantages in terms of deployment, management, configuration updates and device security. We have tested this platform in real world scenarios. Evaluation results show that the platform can be efficiently deployed in traditional networks in an incremental manner. It also allows us to achieve similar user experience with security features embedded in the connectivity.
  • Schneider, Jenna (2017)
    Missing user needs and requirements often lead to sub-optimal software products that users find difficult to use. Software development approaches that follow the user-centered design paradigm try to overcome this problem by focusing on the needs and goals of end users throughout the process life cycle. The purpose of this thesis is to examine how three different user-centered design methodologies approach the development of user requirements and determine whether the ideal requirements development processes described in these methodologies are applicable to the practice of the software industry. The results of this investigation are finally used as a guideline for defining a high-level requirements development framework for the IT department of a large multinational company.
  • Wang, Ziran (2013)
    This thesis considers the problem of finding a process that, given a collection of news, can detect significant dates and breaking news related to different themes. The themes are unsupervisedly learned from some training corpora, and they mostly have intuitive meanings, like 'finance', 'disaster', 'wars' and so on. They are constructed only based on textual information provided in the corpora without any human intervention. To conduct this learning, the thesis use various types of component models, specifically Latent Dirichlet Allocation(LDA) and Correlated Topic Model(CTM). On top of that, to enrich the experiment, the Latent Semantic Indexing(LSA) and Multinomial Principal Component Analysis(MPCA) are also adopted for comparison. The learning produces every news coverage a relevance weight for given theme, which can be viewed as a theme distribution from statistical perspective. With the help of news time-stamp information, one can sum up and normalize these distributions from all news in day unit, and then draw the moving of accumulated relevance weights on a theme through time-line. It is natural to treat these curves as describing attention strength paid from media to different themes, and one can assume that behind every peak, there are striking events and associated news can be detected. This thesis is valuable in Media Studies research, and also can be further connected to stock or currency market for creating real value.
  • Longi, Joonas (2020)
    Understanding users’ needs and delivering solutions to them is a demanding task that is often based on guesses. Data can be a capable tool in making those guesses more educated, and more importantly, validating them. Developing software is expensive and doing so based on experiences or opinions imposes a big monetary risk. Continuous experimentation introduces an idea where data is used in a systematic manner to reduce these development risks by constantly validating hypotheses providing crucial knowledge whether the innovation is on the right path or not. There are some existing paths in the form of experimentation models, but implementing and adjusting one to fit your specific environment may be difficult. This thesis presents a case study on a mobile application and its journey to using data in the decision making process. We take a look if existing set of written and event data can be utilized and what are the limitations of them. The data reveals there are multiple uncovered lessons to be learned from. We then look at how to take a more systematic approach and apply continuous experimentation practices in the context of the application. Some initial steps along with an experimentation road map and further experiments are presented. We concluded that the key element to initializing continuous experiment practices is to start small and gradually build the knowledge of the team.
  • Harkonsalo, Olli-Pekka (2018)
    Tässä systemaattisesti tehdyssä kirjallisuuskatsauksessa selvitettiin, millainen on arkkitehtuurin kannalta merkittävien suunnittelupäätöksien tekemiseen käytetty päätöksentekoprosessi käytännössä, mitkä tekijät vaikuttavat suunnittelupäätöksien tekemiseen ja miten arkkitehtien rationaalista päätöksentekoprosessia voidaan tukea. Työssä selvisi, että arkkitehdit tekevät päätöksiään ainakin pääosin rationaalisti ja vaikuttivatkin hyötyvän tästä. Arkkitehdit eivät myöskään suosineet erilaisten systemaattisten päätöksenteko- tai dokumentointimenetelmien käyttöä. Arkkitehtien kokemustaso vaikutti päätöksentekoprosessiin siten, että vähemmän kokeneemmat arkkitehdit tekivät päätöksiään vähemmän rationaalisesti (ja oletettavasti myös vähemmän onnistuneesti) kuin kokeneemmat. Tärkeänä päätöksiin vaikuttavana tekijänä puolestaan nousi esiin arkkitehtien omat kokemukset ja uskomukset. Näiden ja erilaisten vaatimusten ja rajoitusten lisäksi päätöksentekoon vaikuttuvina tekijöinä nousivat esiin myös erilaiset kontekstiin liittyvät tekijät. Näistä nousi esiin myös se, kuka varsinaisesti tekee suunnittelupäätökset ja miten tämä tapahtuu. Kirjallisuuskatsauksessa selvisikin, että suurin osa suunnittelupäätöksistä tehdäänkin ryhmissä eikä vain yhden arkkitehdin toimesta. Ryhmäpäätöksenteko tapahtui useimmiten siten, että arkkitehti oli valmis tekemään lopullisen päätöksen, mutta oli kuitenkin myös valmis huomioimaan muiden mielipiteet. Ryhmäpäätöksentekoon liittyi sekä hyötyjä että haasteita. Työssä selvisi myös, että varsinkin vähemmän kokeneiden arkkitehtien rationaalista päätöksentekoprosessia voitiin tukea kokonaisvaltaisesti arkkitehtuurin kannalta merkittävien suunnittelupäätösten ja niiden järjellisten perustelujen tallentamiseen tarkoitettujen dokumentointimenetelmien käytön avulla. Näiden käytöstä voi spekuloida olevan hyötyä myös kokeneemmille arkkitehdeille, vaikkakin heidän voi tosin epäillä välttävän niiden käyttöä mm. niiden raskauden vuoksi. Toisaalta taas rationaalisempaa päätöksentekoprosessia pystyttiin tukemaan myös kannustamalla arkkitehtejä eri päättelytekniikoiden käytössä eri tavoin, mikä olisi dokumentointimenetelmien käyttöä kevyempi vaihtoehto, vaikkakin tässä tapauksessa luovuttaisiin kompromissina dokumentointimenetelmien käytön tuomista muista hyödyistä. ACM Computing Classification System (CCS): • Software and its engineering~Software architectures • Software and its engineering~Software design engineering
  • Nikunlassi, Arvi (2013)
    Internetin ja tietotekniikan yleistymisen vuoksi ohjelmistokehitys painottuu yhä enemmän yhteistyöhön ja interaktioon. Suunnitelmakeskeisyydestä ollaan siirrytty kohti ketterämpiä menetelmiä, joissa muutoksen ja kommunikaation tärkeys on tiedostettu. Asiakas ja asiakassuhde on erittäin merkittävä komponentti ohjelmistoprojektin onnistumisessa. Nykyaikaisissa ketterissä ohjelmistokehitystiimeissä asiakkaan edustaja on tiiviisti yhteydessä kehittäjiin palaverien tai muun läsnäolon kautta. Asiakkaan tiivis yhteys kehitykseen on keino tehostaa kehitystä ja saada tyydyttävämpiä tuotteita. Vähemmälle huomiolle on kuitenkin jäänyt toteutetun asiakassuhteen vaikutus kaikille projektin osapuolille. Tässä tutkielmassa tarkastellaan ohjelmistokehitystä asiakkaan ja asiakassuhteen näkökulmasta. Aluksi analysoidaan ohjelmistokehityksen peruspiirteitä ja esitellään yleisimpiä ketteriä menetelmiä. Tämän jälkeen esitellään erilaisia tutkimuksia asiakassuhteesta ja analysoidaan niiden tuloksia. Lopuksi luodaan yhteenveto havaituista ongelmista ja esitetään niihin ratkaisuehdotuksia.
  • Lehtinen, Sampo (2014)
    Tämän tutkielman tavoitteena oli tarkastella ohjelmiston laatua ja ohjelmistoihin tehdyn investoinnin arvon säilymistä pitkällä aikavälillä asiakkaan näkökulmasta sekä tuoda esiin keinoja välttää toimittajaloukun muodostuminen. Tutkielman teoreettisen viitekehyksen muodostavat ohjelmiston laatua sekä testausta ja laadunvarmistusta käsittelevät toinen ja kolmas luku. Ne perustuvat kirjallisiin lähteisiin. Toimittajaloukun välttämiseksi toimittajan ja asiakkaan intressien välillä vallitseva ristiriita pitää poistaa. Konkreettiset keinot ohjata ohjelmistotoimittaja ajattelemaan asiakkaan pitkäaikaista etua perustuvat vaihtokustannuksien madaltamiseen. Tavoitteeni on ollut kirjoittaa siten, että sen lukeminen ja ymmärtäminen on helppoa ohjelmistoja hankkiville tahoille, joilla ei välttämättä ole alan koulutusta. Olen pyrkinyt keksimään ja löytämään alan termeille helposti ymmärrettäviä ja kuvaavia suomennoksia.
  • Huttunen, Jyri-Petteri (2013)
    Opinnäytetyön tavoitteena oli tutkia modularisoidun, reaaliaikaisesti koulutettavan neuroverkkojärjestelmän toimintaa samankaltaisen, ei-modulaariseen neuroverkkojärjestelmään verrattuna. Tutkimuksen alustaksi luotiin yksinkertainen pelimaailma ja erilaisia koulutusskenaarioita, sekä toteutettiin Texasin yliopistossa kehitetyn rtNEAT-menetelmän päälle rakentuva modulaarinen kontrolliverkkojärjestelmä. Konstruoidun järjestelmän toimintaa verrattiin perus-rtNEAT -järjestelmän toimintaan, erityisesti kiinnitettiin huomiota aiemmin opitun käyttäytymisen muistamiseen. Tutkimuksen tulos osoitti, että opinnäytetyötä varten konstruoitujen järjestelmien välillä ei ollut merkittävää eroa toimintakyvyssä. Tämä johtuu todennäköisesti testiympäristönä käytetyn pelimaailman yksinkertaisuudesta. Mikäli järjestelmissä on merkittäviä eroja esimerkiksi muistiinpalauttamisen suhteen, näiden esille saamiseksi vaadittaisiin lisätutkimusta.
  • Savolainen, Sakari (2013)
    Organisaatioiden ja yhteisöjen henkilöillä on tietoteknistä asiantuntijuutta. Se voidaan nähdä resurssina, jota vertaiset organisaatioissa tarvitsevat. Asiantuntijuuden kohdistus sitä tarvitseville toteutetaan kohdistusmekanismin avulla, joka tuntee resurssit ja tarjoaa käyttäjälle tavan pyytää tarvitsemaansa apua. Oppilaitoksissa ja muissa organisaatioissa käytetään vertaistukijärjestelmiä, joilla vertaisten asiantuntijuutta kohdistetaan avun hakijoille. Resurssit täytyisi saada kohdistettua nopeasti ja tehokkaasti tarvitsijalle. Resurssien kohdistusmekanismeja on runsaasti tietotekniikkaa hyödyntävissä ympäristöissä ja organisaatioissa. Kohdistusmekanismien ominaisuudet ja periaatteet vaihtelevat, mutta kohdistuksessa voidaan tunnistaa viisi vaihetta.Kohdistusmekanismien kohdistuksen vaiheet ovat avun tarpeen määrittely, resurssien tunnistus, resurssien valinta, resurssien kohdistus ja resurssien käyttö. Vertaistuen kohdistuksessa asiantuntijuus on tässä työssä keskeisin kohdistettava resurssi, mutta myös muita resursseja voidaan kohdistaa, esimerkiksi oppimateriaaleja.Yliopistomaailmassa käytössä oleva I-Help-järjestelmä (Intelligent Helpdesk) on laaja ja monimutkainen järjestelmä, joka kohdistaa vertaisapua opiskelijoiden välillä. I-Help on valittu esimerkkisovellukseksi hyvin kehittyneiden kohdistusominaisuuksiensa vuoksi. I-Help ja kohdistusmekanismien ominaisuudet yleensä ovat taustana arvioitaessa itse suunnitellun Apu-sovelluksen kohdistusominaisuuksia.Laajoilla järjestelmillä on etuna monipuolisuus ja kohdistuksen tarkkuus, mutta pienillä järjestelmillä taas edullisuus ja helppo opittavuus. Laajojen järjestelmien heikkouksia ovat kalleus, raskas ylläpidettävyys ja monimutkaisuus, joka vaikeuttaa muun muassa opittavuutta. Pienen järjestelmän heikkous voi olla epätarkka asiantuntijuuden kohdistus.Itse kehitettyä Apu-sovelluksen kohdistusmekanismia ja sen ominaisuuksia arvioidaan. Kriittisen massan saavutettuaan mekanismi löytää auttajia hyvin, jos auttajien kompetenssit jakautuvat tasaisesti. Myös pienellä järjestelmällä voidaan saavuttaa hyviä asiantuntijuuden kohdistustuloksia.
  • Pagels, Max (2013)
    Productivity is an important aspect of any software development project as it has direct implications on both the cost of software and the time taken to produce it. Though software development as a field has evolved significantly during the last few decades in terms of development processes, best practices and the emphasis thereon, the way in which the productivity of software developers is measured has remained comparatively stagnant. Some established metrics focus on a sole activity, such as programming, which paints an incomplete picture of productivity given the multitude of different activities that a software project consists of. Others are more process-oriented — purporting to measure all types of development activities — but require the use of estimation, a technique that is both time-consuming and prone to inaccuracy. A metric that is comprehensive, accurate and suitable in today's development landscape is needed. In this thesis, we examine productivity measurement in software engineering from both theoretical and pragmatic perspectives in order to determine if a proposed metric, implicitly estimated velocity, could be a viable alternative for productivity measurement in Agile and Lean software teams. First, the theory behind measurement — terminology, data types and levels of measurement — is presented. The definition of the term productivity is then examined from a software engineering perspective. Based on this definition and the IEEE standard for validating software quality metrics, a set of criteria for validating productivity metrics is proposed. The motivations for measuring productivity and the factors that may impact it are then discussed and the benefits and drawbacks of established metrics — chief amongst which is productivity based on lines of code written — explored. To assess the accuracy and overall viability of implicitly estimated velocity, a case study comparing the metric to LoC-based productivity measurement was carried out at the University of Helsinki's Software Factory. Two development projects were studied, both adopting Agile and Lean methodologies. Following a linear-analytical approach, quantitative data from both project artefacts and developer surveys indicated that implicitly estimated velocity is a metric more valid than LoC-based measurement in situations where the overall productivity of an individual or team is of more importance than programming productivity. In addition, implicitly estimated velocity was found to be more consistent and predictable than LoC-based measurement in most configurations, lending credence to the theory that implicitly estimated velocity can indeed replace LoC-based measurement in Agile and Lean software development environments.
  • Hamberg, Jiri (2018)
    Sophisticated mobile devices have rapidly become essential tools for various daily activities of billions of people worldwide. Subsequently, the demand for longer battery lives is constantly increasing. The Carat project is advancing the understanding of mobile energy consumption by using collaborative mobile data to estimate and model energy consumption of mobile devices. This thesis presents a method for estimating mobile application energy consumption from mobile device system settings and context factors using association rules. These settings and factors include CPU usage, device travel distance, battery temperature, battery voltage, screen brightness, used mobile networking technology, network type, WiFi signal strength, and WiFi connection speed. The association rules are mined using Apache Spark cluster-computing framework from collaborative mobile data collected by the Carat project. Additionally, this thesis presents a prototype of a web based API for discovering these association rules. The web service integrates Apache Spark based analysis engine with a user friendly front-end allowing an aggregated view of the dataset to be accessible without revealing data of individual participants of the Carat project. This thesis shows that association rules can be used effectively in modelling mobile device energy consumption. Example rules are presented and the performance of the implementation is evaluated experimentally.
  • Pyykkö, Joel (2014)
    In this thesis, we describe Forward Sparse Sampling Search, an algorithm that was published in 2010 by Walsh et al., which combines model-based reinforcement learning with sample-based planning. We show how it can be applied to solving an appropriate set of problems, as well as extend the original tests to give a better view on how the parameters of the algorithm work, and to further the understanding of the method. First, we introduce the concept of reinforcement learning, and identify key environments and points of interest where FSSS is applicable. Next, we explain the terminology and relevant theories the method is based on. The aim is to introduce the reader to a powerful tool for control-problems, and show where to apply it and how to parameterize it. After reading this thesis, one is hopefully fitted with dealing with the basic setup and usage of FSSS. In the final sections of the thesis, we report a series of tests which demonstrate how FSSS works in one particular environment - the Paint/Polish world. The tests focus on understanding the effects of various parameters that the method uses, yielding further understanding on how to effectively apply it, analyzing its performance and comparing it to more basic algorithms on the field. The principal theories and proofs will be explained, and possible paths to improve the algorithm will be explored.
  • Nygren, Saara (2020)
    A relational database management system’s configuration is essential while optimizing database performance. Finding the optimal knob configuration for the database requires tuning of multiple interdependent knobs. Over the past few years, relational database vendors have added machine learning models to their products and Oracle announced the first autonomous (i.e self-driving) database in 2017. This thesis clarifies the autonomous database concept and surveys the latest research on machine learning methods for relational database knob tuning. The study aimed to find solutions that can tune multiple database knobs and be applied to any relational database. The survey found three machine learning implementations that tune multiple knobs at a time. These are called OtterTune, CDBTune, and QTune. Ottertune uses traditional machine learning techniques, while CDBTune and QTune rely on deep reinforcement learning. These implementations are presented in this thesis, along with a discussion of the features they offer. The thesis also presents an autonomic system’s basic concepts like self-CHOP and MAPE-K feedback loop and a knowledge model to define the knowledge needed to implement them. These can be used in the autonomous database contexts along with Intelligent Machine Design and Five Levels of AI-Native Database to present requirements for the autonomous database.