Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by discipline "Datavetenskap"

Sort by: Order: Results:

  • Wang, Ziran (2013)
    This thesis considers the problem of finding a process that, given a collection of news, can detect significant dates and breaking news related to different themes. The themes are unsupervisedly learned from some training corpora, and they mostly have intuitive meanings, like 'finance', 'disaster', 'wars' and so on. They are constructed only based on textual information provided in the corpora without any human intervention. To conduct this learning, the thesis use various types of component models, specifically Latent Dirichlet Allocation(LDA) and Correlated Topic Model(CTM). On top of that, to enrich the experiment, the Latent Semantic Indexing(LSA) and Multinomial Principal Component Analysis(MPCA) are also adopted for comparison. The learning produces every news coverage a relevance weight for given theme, which can be viewed as a theme distribution from statistical perspective. With the help of news time-stamp information, one can sum up and normalize these distributions from all news in day unit, and then draw the moving of accumulated relevance weights on a theme through time-line. It is natural to treat these curves as describing attention strength paid from media to different themes, and one can assume that behind every peak, there are striking events and associated news can be detected. This thesis is valuable in Media Studies research, and also can be further connected to stock or currency market for creating real value.
  • Harkonsalo, Olli-Pekka (2018)
    Tässä systemaattisesti tehdyssä kirjallisuuskatsauksessa selvitettiin, millainen on arkkitehtuurin kannalta merkittävien suunnittelupäätöksien tekemiseen käytetty päätöksentekoprosessi käytännössä, mitkä tekijät vaikuttavat suunnittelupäätöksien tekemiseen ja miten arkkitehtien rationaalista päätöksentekoprosessia voidaan tukea. Työssä selvisi, että arkkitehdit tekevät päätöksiään ainakin pääosin rationaalisti ja vaikuttivatkin hyötyvän tästä. Arkkitehdit eivät myöskään suosineet erilaisten systemaattisten päätöksenteko- tai dokumentointimenetelmien käyttöä. Arkkitehtien kokemustaso vaikutti päätöksentekoprosessiin siten, että vähemmän kokeneemmat arkkitehdit tekivät päätöksiään vähemmän rationaalisesti (ja oletettavasti myös vähemmän onnistuneesti) kuin kokeneemmat. Tärkeänä päätöksiin vaikuttavana tekijänä puolestaan nousi esiin arkkitehtien omat kokemukset ja uskomukset. Näiden ja erilaisten vaatimusten ja rajoitusten lisäksi päätöksentekoon vaikuttuvina tekijöinä nousivat esiin myös erilaiset kontekstiin liittyvät tekijät. Näistä nousi esiin myös se, kuka varsinaisesti tekee suunnittelupäätökset ja miten tämä tapahtuu. Kirjallisuuskatsauksessa selvisikin, että suurin osa suunnittelupäätöksistä tehdäänkin ryhmissä eikä vain yhden arkkitehdin toimesta. Ryhmäpäätöksenteko tapahtui useimmiten siten, että arkkitehti oli valmis tekemään lopullisen päätöksen, mutta oli kuitenkin myös valmis huomioimaan muiden mielipiteet. Ryhmäpäätöksentekoon liittyi sekä hyötyjä että haasteita. Työssä selvisi myös, että varsinkin vähemmän kokeneiden arkkitehtien rationaalista päätöksentekoprosessia voitiin tukea kokonaisvaltaisesti arkkitehtuurin kannalta merkittävien suunnittelupäätösten ja niiden järjellisten perustelujen tallentamiseen tarkoitettujen dokumentointimenetelmien käytön avulla. Näiden käytöstä voi spekuloida olevan hyötyä myös kokeneemmille arkkitehdeille, vaikkakin heidän voi tosin epäillä välttävän niiden käyttöä mm. niiden raskauden vuoksi. Toisaalta taas rationaalisempaa päätöksentekoprosessia pystyttiin tukemaan myös kannustamalla arkkitehtejä eri päättelytekniikoiden käytössä eri tavoin, mikä olisi dokumentointimenetelmien käyttöä kevyempi vaihtoehto, vaikkakin tässä tapauksessa luovuttaisiin kompromissina dokumentointimenetelmien käytön tuomista muista hyödyistä. ACM Computing Classification System (CCS): • Software and its engineering~Software architectures • Software and its engineering~Software design engineering
  • Nikunlassi, Arvi (2013)
    Internetin ja tietotekniikan yleistymisen vuoksi ohjelmistokehitys painottuu yhä enemmän yhteistyöhön ja interaktioon. Suunnitelmakeskeisyydestä ollaan siirrytty kohti ketterämpiä menetelmiä, joissa muutoksen ja kommunikaation tärkeys on tiedostettu. Asiakas ja asiakassuhde on erittäin merkittävä komponentti ohjelmistoprojektin onnistumisessa. Nykyaikaisissa ketterissä ohjelmistokehitystiimeissä asiakkaan edustaja on tiiviisti yhteydessä kehittäjiin palaverien tai muun läsnäolon kautta. Asiakkaan tiivis yhteys kehitykseen on keino tehostaa kehitystä ja saada tyydyttävämpiä tuotteita. Vähemmälle huomiolle on kuitenkin jäänyt toteutetun asiakassuhteen vaikutus kaikille projektin osapuolille. Tässä tutkielmassa tarkastellaan ohjelmistokehitystä asiakkaan ja asiakassuhteen näkökulmasta. Aluksi analysoidaan ohjelmistokehityksen peruspiirteitä ja esitellään yleisimpiä ketteriä menetelmiä. Tämän jälkeen esitellään erilaisia tutkimuksia asiakassuhteesta ja analysoidaan niiden tuloksia. Lopuksi luodaan yhteenveto havaituista ongelmista ja esitetään niihin ratkaisuehdotuksia.
  • Lehtinen, Sampo (2014)
    Tämän tutkielman tavoitteena oli tarkastella ohjelmiston laatua ja ohjelmistoihin tehdyn investoinnin arvon säilymistä pitkällä aikavälillä asiakkaan näkökulmasta sekä tuoda esiin keinoja välttää toimittajaloukun muodostuminen. Tutkielman teoreettisen viitekehyksen muodostavat ohjelmiston laatua sekä testausta ja laadunvarmistusta käsittelevät toinen ja kolmas luku. Ne perustuvat kirjallisiin lähteisiin. Toimittajaloukun välttämiseksi toimittajan ja asiakkaan intressien välillä vallitseva ristiriita pitää poistaa. Konkreettiset keinot ohjata ohjelmistotoimittaja ajattelemaan asiakkaan pitkäaikaista etua perustuvat vaihtokustannuksien madaltamiseen. Tavoitteeni on ollut kirjoittaa siten, että sen lukeminen ja ymmärtäminen on helppoa ohjelmistoja hankkiville tahoille, joilla ei välttämättä ole alan koulutusta. Olen pyrkinyt keksimään ja löytämään alan termeille helposti ymmärrettäviä ja kuvaavia suomennoksia.
  • Huttunen, Jyri-Petteri (2013)
    Opinnäytetyön tavoitteena oli tutkia modularisoidun, reaaliaikaisesti koulutettavan neuroverkkojärjestelmän toimintaa samankaltaisen, ei-modulaariseen neuroverkkojärjestelmään verrattuna. Tutkimuksen alustaksi luotiin yksinkertainen pelimaailma ja erilaisia koulutusskenaarioita, sekä toteutettiin Texasin yliopistossa kehitetyn rtNEAT-menetelmän päälle rakentuva modulaarinen kontrolliverkkojärjestelmä. Konstruoidun järjestelmän toimintaa verrattiin perus-rtNEAT -järjestelmän toimintaan, erityisesti kiinnitettiin huomiota aiemmin opitun käyttäytymisen muistamiseen. Tutkimuksen tulos osoitti, että opinnäytetyötä varten konstruoitujen järjestelmien välillä ei ollut merkittävää eroa toimintakyvyssä. Tämä johtuu todennäköisesti testiympäristönä käytetyn pelimaailman yksinkertaisuudesta. Mikäli järjestelmissä on merkittäviä eroja esimerkiksi muistiinpalauttamisen suhteen, näiden esille saamiseksi vaadittaisiin lisätutkimusta.
  • Savolainen, Sakari (2013)
    Organisaatioiden ja yhteisöjen henkilöillä on tietoteknistä asiantuntijuutta. Se voidaan nähdä resurssina, jota vertaiset organisaatioissa tarvitsevat. Asiantuntijuuden kohdistus sitä tarvitseville toteutetaan kohdistusmekanismin avulla, joka tuntee resurssit ja tarjoaa käyttäjälle tavan pyytää tarvitsemaansa apua. Oppilaitoksissa ja muissa organisaatioissa käytetään vertaistukijärjestelmiä, joilla vertaisten asiantuntijuutta kohdistetaan avun hakijoille. Resurssit täytyisi saada kohdistettua nopeasti ja tehokkaasti tarvitsijalle. Resurssien kohdistusmekanismeja on runsaasti tietotekniikkaa hyödyntävissä ympäristöissä ja organisaatioissa. Kohdistusmekanismien ominaisuudet ja periaatteet vaihtelevat, mutta kohdistuksessa voidaan tunnistaa viisi vaihetta.Kohdistusmekanismien kohdistuksen vaiheet ovat avun tarpeen määrittely, resurssien tunnistus, resurssien valinta, resurssien kohdistus ja resurssien käyttö. Vertaistuen kohdistuksessa asiantuntijuus on tässä työssä keskeisin kohdistettava resurssi, mutta myös muita resursseja voidaan kohdistaa, esimerkiksi oppimateriaaleja.Yliopistomaailmassa käytössä oleva I-Help-järjestelmä (Intelligent Helpdesk) on laaja ja monimutkainen järjestelmä, joka kohdistaa vertaisapua opiskelijoiden välillä. I-Help on valittu esimerkkisovellukseksi hyvin kehittyneiden kohdistusominaisuuksiensa vuoksi. I-Help ja kohdistusmekanismien ominaisuudet yleensä ovat taustana arvioitaessa itse suunnitellun Apu-sovelluksen kohdistusominaisuuksia.Laajoilla järjestelmillä on etuna monipuolisuus ja kohdistuksen tarkkuus, mutta pienillä järjestelmillä taas edullisuus ja helppo opittavuus. Laajojen järjestelmien heikkouksia ovat kalleus, raskas ylläpidettävyys ja monimutkaisuus, joka vaikeuttaa muun muassa opittavuutta. Pienen järjestelmän heikkous voi olla epätarkka asiantuntijuuden kohdistus.Itse kehitettyä Apu-sovelluksen kohdistusmekanismia ja sen ominaisuuksia arvioidaan. Kriittisen massan saavutettuaan mekanismi löytää auttajia hyvin, jos auttajien kompetenssit jakautuvat tasaisesti. Myös pienellä järjestelmällä voidaan saavuttaa hyviä asiantuntijuuden kohdistustuloksia.
  • Pagels, Max (2013)
    Productivity is an important aspect of any software development project as it has direct implications on both the cost of software and the time taken to produce it. Though software development as a field has evolved significantly during the last few decades in terms of development processes, best practices and the emphasis thereon, the way in which the productivity of software developers is measured has remained comparatively stagnant. Some established metrics focus on a sole activity, such as programming, which paints an incomplete picture of productivity given the multitude of different activities that a software project consists of. Others are more process-oriented — purporting to measure all types of development activities — but require the use of estimation, a technique that is both time-consuming and prone to inaccuracy. A metric that is comprehensive, accurate and suitable in today's development landscape is needed. In this thesis, we examine productivity measurement in software engineering from both theoretical and pragmatic perspectives in order to determine if a proposed metric, implicitly estimated velocity, could be a viable alternative for productivity measurement in Agile and Lean software teams. First, the theory behind measurement — terminology, data types and levels of measurement — is presented. The definition of the term productivity is then examined from a software engineering perspective. Based on this definition and the IEEE standard for validating software quality metrics, a set of criteria for validating productivity metrics is proposed. The motivations for measuring productivity and the factors that may impact it are then discussed and the benefits and drawbacks of established metrics — chief amongst which is productivity based on lines of code written — explored. To assess the accuracy and overall viability of implicitly estimated velocity, a case study comparing the metric to LoC-based productivity measurement was carried out at the University of Helsinki's Software Factory. Two development projects were studied, both adopting Agile and Lean methodologies. Following a linear-analytical approach, quantitative data from both project artefacts and developer surveys indicated that implicitly estimated velocity is a metric more valid than LoC-based measurement in situations where the overall productivity of an individual or team is of more importance than programming productivity. In addition, implicitly estimated velocity was found to be more consistent and predictable than LoC-based measurement in most configurations, lending credence to the theory that implicitly estimated velocity can indeed replace LoC-based measurement in Agile and Lean software development environments.
  • Hamberg, Jiri (2018)
    Sophisticated mobile devices have rapidly become essential tools for various daily activities of billions of people worldwide. Subsequently, the demand for longer battery lives is constantly increasing. The Carat project is advancing the understanding of mobile energy consumption by using collaborative mobile data to estimate and model energy consumption of mobile devices. This thesis presents a method for estimating mobile application energy consumption from mobile device system settings and context factors using association rules. These settings and factors include CPU usage, device travel distance, battery temperature, battery voltage, screen brightness, used mobile networking technology, network type, WiFi signal strength, and WiFi connection speed. The association rules are mined using Apache Spark cluster-computing framework from collaborative mobile data collected by the Carat project. Additionally, this thesis presents a prototype of a web based API for discovering these association rules. The web service integrates Apache Spark based analysis engine with a user friendly front-end allowing an aggregated view of the dataset to be accessible without revealing data of individual participants of the Carat project. This thesis shows that association rules can be used effectively in modelling mobile device energy consumption. Example rules are presented and the performance of the implementation is evaluated experimentally.
  • Pyykkö, Joel (2014)
    In this thesis, we describe Forward Sparse Sampling Search, an algorithm that was published in 2010 by Walsh et al., which combines model-based reinforcement learning with sample-based planning. We show how it can be applied to solving an appropriate set of problems, as well as extend the original tests to give a better view on how the parameters of the algorithm work, and to further the understanding of the method. First, we introduce the concept of reinforcement learning, and identify key environments and points of interest where FSSS is applicable. Next, we explain the terminology and relevant theories the method is based on. The aim is to introduce the reader to a powerful tool for control-problems, and show where to apply it and how to parameterize it. After reading this thesis, one is hopefully fitted with dealing with the basic setup and usage of FSSS. In the final sections of the thesis, we report a series of tests which demonstrate how FSSS works in one particular environment - the Paint/Polish world. The tests focus on understanding the effects of various parameters that the method uses, yielding further understanding on how to effectively apply it, analyzing its performance and comparing it to more basic algorithms on the field. The principal theories and proofs will be explained, and possible paths to improve the algorithm will be explored.
  • Smirnova, Inna (2014)
    There is an increasing need for organizations to collaborate with internal and external partners on a global scale for creating software-based products and services. Many aspects and risks need to be addressed when setting up such global collaborations. Different types of collaborations such as engineering collaborations or innovation-focused collaborations need to be considered. Further aspects such as cultural and social aspects, coordination, infrastructure, organizational change process, and communication issues need to be examined. Although there are already experiences available with respect to setting up global collaborations, they are mainly focusing on certain specific areas. An overall holistic approach that guides companies in systematically setting up global collaborations for software-based products is widely missing. The goal of this thesis is to analyze existing literature and related information and to extract topics that need be taken into account while establishing global software development collaborations - to identify solutions, risks, success factors, strategies, good experiences as well as good examples. This information is structured in a way so that it can be used by companies as a well-grounded holistic approach to guide companies effectively in setting up long-term global collaborations in the domain 'software development'. The presented approach is based on scientific findings reported in literature, driven by industry needs, and confirmed by industry experts. The content of the thesis consists of two main parts: In the first part a literature study analyzes existing experience reports, case studies and other available literature in order to identify what aspects and practices need to be considered by organizations when setting up global collaborations in the domain software development. Secondly, based on the results from the literature review and consultation with the industrial partner Daimler AG, the identified aspects and practices are structured and prioritized in the form of activity roadmaps, which present a holistic guide for setting up global collaborations. The developed guidance worksheet, the so-called 'Global canvas', is meant to be a guide and reminder of all major activities that are necessary to perform when doing global collaborations for software-based products and services. The main contributions of this thesis are an analysis of the state of the practice in setting-up of global software development collaborations, identification of aspects and successful practices that need to be addressed by organizations when doing global collaborations for software-based products and services and the creation of a holistic approach that presents scientific findings to industry in an effective and credible way and guides companies in systematically setting up global collaborations.
  • Thakur, Mukesh (2017)
    Over past decade cloud services have enabled individuals and organizations to perform different types of tasks such as online storage, email services, on-demand movies and TV shows. The cloud services has also enabled on-demand deployment of applications, at cheap cost with elastic and scalable, fault tolerant system. These cloud services are offered by cloud providers who use authentication, authorization and accounting framework based on client-server model. Though this model has been used over decades, study shows it is vulnerable to different hacks and it is also inconvenient to use for the end users. In addition, the cloud provider has total control over user data which they are able to monitor, trace, leak and even modify at their will. Thus, the user data ownership, digital identity and use of cloud services has raised privacy and security concern for the users. In this thesis, Blockchain and its applications are studied and alternative model for authentication, authorization and accounting is proposed based on Ethereum Blockchain. Furthermore, a prototype is developed which enables users to consume cloud services by authenticating, authorizing and accounting with a single identity without sharing any private user data. Experiments are run with the prototype to verify that it works as expected. Measurements are done to assess the feasibility and scalability of the solution. In the final part of the thesis, pros and cons of the proposed solution are discussed and perspectives for further research are sketched.
  • Stenudd, Juho (2013)
    This Master's Thesis describes one example on how to automatically generate tests for real-time protocol software. Automatic test generation is performed using model-based testing (MBT). In model-based testing, test cases are generated from the behaviour model of the system under test (SUT). This model expresses the requirements of the SUT. Many parameters can be varied and test sequences randomised. In this context, real-time protocol software means a system component of Nokia Siemens Networks (NSN) Long Term Evolution (LTE) base station. This component, named MAC DATA, is the system under test (SUT) in this study. 3GPP has standardised the protocol stack for the LTE eNodeB base station. MAC DATA implements most of the functionality of the Medium Access Control (MAC) and Radio Link Control (RLC) protocols, which are two protocols of the LTE eNodeB. Because complex telecommunication software is discussed here, it is challenging to implement MBT for the MAC DATA system component testing. First, the expected behaviour of a system component has to be modelled. Because it is not smart to model everything, the most relevant system component parts that need to be tested have to be discovered. Also, the most important parameters have to be defined from the huge parameter space. These parameters have to be varied and randomised. With MBT, a vast number of different kind of users can be created, which is not reasonable in manual test design. Generating a very long test case takes only a short computing time. In addition to functional testing, MBT is used in performance and worst-case testing by executing a long test case based on traffic models. MBT has been noticed to be suitable for challenging performance and worst-case testing. This study uses three traffic models: smartphone-dominant, laptop-dominant and mixed. MBT is integrated into continuous integration (CI) system, which automatically runs MBT test case generations and executions overnight. The main advantage of the MBT implementation is the possibility to create different kinds of users and simulate real-life system behaviour. This way, hidden defects can be found from test environment and SUT.
  • Gafurova, Lina (2018)
    Automatic fall detection is a very important challenge in the public health care domain. The problem primarily concerns the growing population of the elderly, who are at considerably higher risk of falling down. Moreover, the fall downs for the elderly may result in serious injuries or even death. In this work we propose a solution for fall detection based on machine learning, which can be integrated into a monitoring system as a detector of fall downs in image sequences. Our approach is solely camera-based and is intended for indoor environments. For successful detection of fall downs, we utilize the combination of the human shape variation determined with the help of the approximated ellipse and the motion history. The feature vectors that we build are computed for sliding time windows of the input images and are fed to a Support Vector Machine for accurate classification. The decision for the whole set of images is based on additional rules, which help us restrict the sensitivity of the method. To fairly evaluate our fall detector, we conducted extensive experiments on a wide range of normal activities, which we used to oppose the fall downs. Reliable recognition rates suggest the effectiveness of our algorithm and motivate us for improvement.
  • Torkko, Petteri (2013)
    Organisaatioiden liiketoimintajärjestelmät ovat tyypillisesti organisaation toimintaan sopivaksi muokattuja suljettuja kokonaisuuksia, joiden on kuitenkin tarpeen integroitua muihin järjestelmiin. Integraatioalustat tarjoavat malleja ja palveluita, joiden avulla heterogeenisten järjestelmien keskinäistä tiedon ja prosessien jakoa voidaan korkealla tasolla yksinkertaistaa. Tutkielman tarkoituksena on löytää vanhentuneen integraatioalustan rinnalle modernimpi alusta. Vertaamalla olemassaolevaa alustaa tyypillisiin järjestelmäintegraatioissa käytettyihin menetelmiin, arkkitehtuureihin ja suunnittelumalleihin saadaan monien alustojen joukosta valittua yksi (Spring Framework ja sen laajennokset), jota tutkitaan tarkemmin. Käyttäjille suunnatun kyselyn avulla olemassaolevasta alustasta selvinneisiin ongelmakohtiin vertaamalla saadaan uudelle alustalle tehtyä maaliperustaiset vaatimukset, sekä niihin liittyvät metriikat. Alustojen vertailusta saatujen tulosten perusteella uusi alusta täyttää sille asetetut vaatimukset, ja paikkaa olemassaolevan alustan ongelmat.
  • Riippa, Väinö (2016)
    This Master's thesis is empirical literature review, which studies open data at the area of healthcare. The study represents what the open data is and how it has become the concept what it stands for today. At the first chapter we take a look at open data at general viewpoint. In the next chapter there will be comparing of the open data processes from the point of publisher and consumer. After the processes we take a look at the open data at the sectors of healthcare and welfare. Study will be done by examining the current practices, the application solutions and the expectations of open data. This study offers for reader an informative review about the process models regarding to open data. After reading the thesis there's possibility to use process model in data openings of the own organization.
  • Koivisto, Timo (2016)
    This thesis is a review of bandit algorithms in information retrieval. In information retrieval a result list should include the most relevant documents and the results should also be non-redundant and diverse. To achieve this, some form of feedback is required. This document describes implicit feedback collected from user interactions by using interleaving methods that allow alternative rankings of documents to be presented in result lists. Bandit algorithms can then be used to learn from user interactions in a principled way. The reviewed algorithms include dueling bandits, contextual bandits, and contextual dueling bandits. Additionally coactive learning and preference learning are described. Finally algorithms are summarized by using regret as a performance measure.
  • Sotala, Kaj (2015)
    This thesis describes the development of 'Bayes Academy', an educational game which aims to teach an understanding of Bayesian networks. A Bayesian network is a directed acyclic graph describing a joint probability distribution function over n random variables, where each node in the graph represents a random variable. To find a way to turn this subject into an interesting game, this work draws on the theoretical background of meaningful play. Among other requirements, actions in the game need to affect the game experience not only on the immediate moment, but also during later points in the game. This is accomplished by structuring the game as a series of minigames where observing the value of a variable consumes 'energy points', a resource whose use the player needs to optimize as the pool of points is shared across individual minigames. The goal of the game is to maximize the amount of 'experience points' earned by minimizing the uncertainty in the networks that are presented to the player, which in turn requires a basic understanding of Bayesian networks. The game was empirically tested on online volunteers who were asked to fill a survey measuring their understanding of Bayesian networks both before and after playing the game. Players demonstrated an increased understanding of Bayesian networks after playing the game, in a manner that suggested a successful transfer of learning from the game to a more general context. The learning benefits were gained despite the players generally not finding the game particularly fun. ACM Computing Classification System (CCS): - Applied computing - Computer games - Applied computing - Interactive learning environments - Mathematics of computing - Bayesian networks
  • Tuominen, Pasi (2015)
    Tietovarannoissa esiintyy monesti useita tietueita, jotka kuvaavat samaa objektia. Tässä tutkielmassa on vertailtu näiden tietueiden löytämiseen käytettäviä menetelmiä. Kokeet on suoritettu aineistolla, jossa on 6,4 miljoonaa bibliografista tietuetta. Menetelmien vertailussa käytettiin aineistossa olevien teosten nimekkeitä. Eri menetelmien kahta keskeistä piirrettä on mitattu: löydettyjen duplikaattien lukumäärää ja niiden suhdetta muodostettujen kandidaattien lukumäärään. Kahden menetelmän yhdistelmä osoittautui parhaaksi aineiston deduplikointiin. Järjestetyllä naapurustolla löytyi eniten varsinaisia duplikaatteja, mutta myös eniten irrelevantteja kandidaatteja. Suffiksitauluryhmittelyn avulla löytyi lisäksi joukko duplikaatteja joita muilla menetelmillä ei löytynyt. Yhdessä nämä kaksi menetelmää löysivät lähes kaikki duplikaatit mitä kaikki tutkielmassa verratut menetelmät löysivät. Levenshtein-etäisyyteen perustuvat virhesietoiset menetelmät osoittautuivat tehottomiksi nimekkeiden deduplikoinnissa.
  • Toivonen, Mirva (2015)
    Big data creates variety of business possibilities and helps to gain competitive advantage through predictions, optimization and adaptability. Impact of errors or inconsistencies across the different sources, from where the data is originated and how frequently data is acquired is not considered in much of the big data analysis. This thesis examines big data quality challenges in the context of business analytics. The intent of the thesis is to improve the knowledge of big data quality issues and testing big data. Most of the quality challenges are related to understanding the data, coping with messy source data and interpreting analytical results. Producing analytics requires subjective decisions along the analysis pipeline and analytical results may not lead to objective truth. Errors in big data are not corrected like in traditional data, instead the focus of testing is moved towards process oriented validation.
  • Suominen, Kalle (2013)
    Business and operational environments are becoming more and more frenetic, forcing companies and organizations to respond to changes faster. This trend reflects to software development as well, IT units have to deliver needed features faster in order to bring business benefits quicker. During the last decade, agile methodologies have provided tools to answer to this ever-growing demand. Scrum is one of the agile methodologies and it is widely used. It is said that in large-scale organizations Scrum implementation should be done using both bottom-up and top-down approaches. In big organizations software systems are complicated and deeply integrated with each other meaning that no one team can handle whole software development processes alone. Individual teams want to start to use Scrum before whole organization is ready to support it. This leads to a situation where one team is applying agile principles while most of the other teams and organizations around are continuing with old established non-agile practices. In these cases bottom-up approach is the only option. When the top-down part is missing, are the benefits also lost? In this case study, the target is to find out, did it bring benefits when implementing Scrum using only bottom-up approach. In the target unit, which was part of the large organization, Scrum based practices were implemented to replace earlier waterfall based approach. Analyses for the study were made on data, which was collected by survey and from a requirement management tool. This tool was in use during the old and new ways of working. Expression Scrum based practices are used because all of the fine flavours of Scrum could not be able to be implemented because of surrounded non-agile teams and official non-agile procedures. This was also an obstacle when trying to implement Scrum as well as it could be possible. Most of the defined targets given to the implementation of Scrum based practices were achieved and other non-targeted benefit came out. In this context we can conclude that benefits were gained. The top-down approach absence clearly made the implementation more difficult and incomplete; however, it didn't prevent to get benefits. The target unit also faced earlier mentioned difficulties in using Scrum based practices while other units around used non-agile processes. The lack of good established numerical estimations of requirements' business values lowered the power of the Scrum on a company level, because these values were relative and subjective opinions of the business representatives, In the backlog prioritization, when most of the items are so called high priority ones there is no way to evaluate which one is more valuable and prioritization is more or less a lottery