Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by department "Tietojenkäsittelytieteen laitos"

Sort by: Order: Results:

  • Stenberg, Mika (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2008)
    Open Access -liike pyrkii vapauttamaan tieteellisen tiedon kaupallisuuden rajoitteista edesauttamalla artikkeleiden rinnakkaisversioiden avointa ja esteetöntä verkkotallennusta. Sen mahdollistamiseksi verkkoon perustetaan julkaisuarkistoja, joiden toiminta-ajatuksena on säilöä taustayhteisönsä tieteellinen tuotanto avoimesti ja keskitetysti yhteen paikkaan. Avoimen lähdekoodin arkistosovellukset jakavat sisältönsä OAI-protokollan avulla ja muodostavat näin globaalin virtuaalisen tietoverkon. Suurten tietomäärien käsittelyssä on huomioitava erityisesti kuvailutiedon rooli tehokkaiden hakujen toteuttamisessa sekä tiedon yksilöiminen verkossa erilaisten pysyvien tunnisteiden, kuten Handle:n tai URN:n avulla. Tieteellisen tiedon avoimella saatavuudella on merkittävä vaikutus myös oppimisen näkökulmasta. Julkaisuarkistot tarjoavat oppimateriaalin lisäksi uusia mahdollisuuksia julkaisukanavan ja oppimisymp äristön integroimiseen. Työssä esitellään avoimen saatavuuden keskeisiä teemoja sekä sen käytännön toteutusta varten kehitettyjä teknisiä ratkaisuja. Näiden pohjalta toteutetaan Meilahden kampuksen avoin julkaisuarkisto. Työssä pohditaan myös julkaisuarkistojen soveltuvuutta oppimisprosessin tukemiseen tutkivan- ja sulautuvan oppimisen viitekehyksessä. ACM Computing Classification System (CCS): H.3 [INFORMATION STORAGE AND RETRIEVAL], H.3.7 [Digital Libraries], H.3.3 [Information Search and Retrieval], H.3.5 [Online Information Services], K.3 [COMPUTERS AND EDUCATION], K.3.1 [Computer Uses in Education]
  • Riippa, Väinö (2016)
    This Master's thesis is empirical literature review, which studies open data at the area of healthcare. The study represents what the open data is and how it has become the concept what it stands for today. At the first chapter we take a look at open data at general viewpoint. In the next chapter there will be comparing of the open data processes from the point of publisher and consumer. After the processes we take a look at the open data at the sectors of healthcare and welfare. Study will be done by examining the current practices, the application solutions and the expectations of open data. This study offers for reader an informative review about the process models regarding to open data. After reading the thesis there's possibility to use process model in data openings of the own organization.
  • Koivisto, Timo (2016)
    This thesis is a review of bandit algorithms in information retrieval. In information retrieval a result list should include the most relevant documents and the results should also be non-redundant and diverse. To achieve this, some form of feedback is required. This document describes implicit feedback collected from user interactions by using interleaving methods that allow alternative rankings of documents to be presented in result lists. Bandit algorithms can then be used to learn from user interactions in a principled way. The reviewed algorithms include dueling bandits, contextual bandits, and contextual dueling bandits. Additionally coactive learning and preference learning are described. Finally algorithms are summarized by using regret as a performance measure.
  • Sotala, Kaj (2015)
    This thesis describes the development of 'Bayes Academy', an educational game which aims to teach an understanding of Bayesian networks. A Bayesian network is a directed acyclic graph describing a joint probability distribution function over n random variables, where each node in the graph represents a random variable. To find a way to turn this subject into an interesting game, this work draws on the theoretical background of meaningful play. Among other requirements, actions in the game need to affect the game experience not only on the immediate moment, but also during later points in the game. This is accomplished by structuring the game as a series of minigames where observing the value of a variable consumes 'energy points', a resource whose use the player needs to optimize as the pool of points is shared across individual minigames. The goal of the game is to maximize the amount of 'experience points' earned by minimizing the uncertainty in the networks that are presented to the player, which in turn requires a basic understanding of Bayesian networks. The game was empirically tested on online volunteers who were asked to fill a survey measuring their understanding of Bayesian networks both before and after playing the game. Players demonstrated an increased understanding of Bayesian networks after playing the game, in a manner that suggested a successful transfer of learning from the game to a more general context. The learning benefits were gained despite the players generally not finding the game particularly fun. ACM Computing Classification System (CCS): - Applied computing - Computer games - Applied computing - Interactive learning environments - Mathematics of computing - Bayesian networks
  • Santana Vega, Carlos (2018)
    The scope of this project is to provide a set of Bayesian methods to be applied to the task of potential energy barriers prediction. Energy barriers define a physical property of atoms that can be used to characterise their molecular dynamics, with applications in quantum-mechanics simulations for the design of new materials. The goal is to replace the currently used artificial neural network (ANN) with a method that apart of providing accurate predictions, can also assess the predictive certainty of the model. We propose several Bayesian methods and evaluate them on this task, demonstrating that sparse Gaussian process (SGP) are capable of providing predictions, and their confidence intervals, with a level of accuracy equivalent to the current ANN, in a bounded computational complexity time.
  • Meiling, Li (2017)
    In the field of scientific research, computer simulation, Internet applications, e-commerce and many other applications, the amount of data is growing at an extremely fast pace. In order to analyze and utilize these large data resources, it is necessary to rely on effective data analysis techniques. The relational database (RDBMS) model has always been a dominant database model in database management. However, the traditional relational data management technology encountered great obstacles in the scalability, as it has difficulties with the big data analysis. Today, the cloud databases and NoSQL databases are attracting widespread attentions and become optional choices besides the relational database. This thesis mainly focuses on benchmarking studies of two multi-model NoSQL databases, ArangoDB and OrientDB and discusses the use of NoSQL for the big data analysis.
  • Wikström, Axel (2019)
    Continuous integration (CI) and continuous delivery (CD) can be seen as an essential part of modern software development. CI/CD consists of always having software in a deployable state. This is accomplished by continuously integrating the code into a main branch, in addition to automatically building and testing it. Version control and dedicated CI/CD tools can be used to accomplish this. This thesis consists of a case study which aim was to find the benefits and challenges related to the implementation of CI/CD in the context of a Finnish software company. The study was conducted with semi-structured interviews. The benefits of CD that were found include faster iteration, better assurance of quality, and easier deployments. The challenges identified were related to testing practices, infrastructure management and company culture. It is also difficult to implement a full continuous deployment pipeline for the case project, which is mostly due to the risks involved updating software in business-critical production use. The results of this study were found to be similar to the results of previous studies. The case company's adoption of modern CI/CD tools such and GitLab and cloud computing are also discussed. While the tools can make the implementation of CI/CD easier, they still come with challenges in adapting them to specific use cases.
  • Tuominen, Pasi (2015)
    Tietovarannoissa esiintyy monesti useita tietueita, jotka kuvaavat samaa objektia. Tässä tutkielmassa on vertailtu näiden tietueiden löytämiseen käytettäviä menetelmiä. Kokeet on suoritettu aineistolla, jossa on 6,4 miljoonaa bibliografista tietuetta. Menetelmien vertailussa käytettiin aineistossa olevien teosten nimekkeitä. Eri menetelmien kahta keskeistä piirrettä on mitattu: löydettyjen duplikaattien lukumäärää ja niiden suhdetta muodostettujen kandidaattien lukumäärään. Kahden menetelmän yhdistelmä osoittautui parhaaksi aineiston deduplikointiin. Järjestetyllä naapurustolla löytyi eniten varsinaisia duplikaatteja, mutta myös eniten irrelevantteja kandidaatteja. Suffiksitauluryhmittelyn avulla löytyi lisäksi joukko duplikaatteja joita muilla menetelmillä ei löytynyt. Yhdessä nämä kaksi menetelmää löysivät lähes kaikki duplikaatit mitä kaikki tutkielmassa verratut menetelmät löysivät. Levenshtein-etäisyyteen perustuvat virhesietoiset menetelmät osoittautuivat tehottomiksi nimekkeiden deduplikoinnissa.
  • Toivonen, Mirva (2015)
    Big data creates variety of business possibilities and helps to gain competitive advantage through predictions, optimization and adaptability. Impact of errors or inconsistencies across the different sources, from where the data is originated and how frequently data is acquired is not considered in much of the big data analysis. This thesis examines big data quality challenges in the context of business analytics. The intent of the thesis is to improve the knowledge of big data quality issues and testing big data. Most of the quality challenges are related to understanding the data, coping with messy source data and interpreting analytical results. Producing analytics requires subjective decisions along the analysis pipeline and analytical results may not lead to objective truth. Errors in big data are not corrected like in traditional data, instead the focus of testing is moved towards process oriented validation.
  • Ronimus, Tomi (2013)
    Botnets have proven to be consistent nuisance on the Internet. They are the cause for many security concerns and issues that plague the Internet currently. Mitigating these issues is an important task and more research is needed in order to win the battle against constantly evolving botnets. In this thesis, botnets are reviewed thoroughly, starting from what botnets are and how do they manage to stay operational and then moving on to explore some of the more promising methods that can be used to detect botnet activity. A more detailed look is performed on DNS-based botnet detection methods as these methods show great promise and are very capable of detecting many different types of botnets. Finally, a review on the DNS-based botnet detection methods is compiled. Some of the best features of botnet detection are gathered to form an overall picture of what are the characteristics of a good detection method. As botnets evolve over time, botnet detection methods need to keep up with the progress. Gathering characteristics of a good detection method will help to suggest future directions on how to improve and develop new botnet detection methods. ACM Computing Classification System (CCS): A.1 [Introductory and Survey], C.2.0 [Computer Communication Networks]
  • Suominen, Kalle (2013)
    Business and operational environments are becoming more and more frenetic, forcing companies and organizations to respond to changes faster. This trend reflects to software development as well, IT units have to deliver needed features faster in order to bring business benefits quicker. During the last decade, agile methodologies have provided tools to answer to this ever-growing demand. Scrum is one of the agile methodologies and it is widely used. It is said that in large-scale organizations Scrum implementation should be done using both bottom-up and top-down approaches. In big organizations software systems are complicated and deeply integrated with each other meaning that no one team can handle whole software development processes alone. Individual teams want to start to use Scrum before whole organization is ready to support it. This leads to a situation where one team is applying agile principles while most of the other teams and organizations around are continuing with old established non-agile practices. In these cases bottom-up approach is the only option. When the top-down part is missing, are the benefits also lost? In this case study, the target is to find out, did it bring benefits when implementing Scrum using only bottom-up approach. In the target unit, which was part of the large organization, Scrum based practices were implemented to replace earlier waterfall based approach. Analyses for the study were made on data, which was collected by survey and from a requirement management tool. This tool was in use during the old and new ways of working. Expression Scrum based practices are used because all of the fine flavours of Scrum could not be able to be implemented because of surrounded non-agile teams and official non-agile procedures. This was also an obstacle when trying to implement Scrum as well as it could be possible. Most of the defined targets given to the implementation of Scrum based practices were achieved and other non-targeted benefit came out. In this context we can conclude that benefits were gained. The top-down approach absence clearly made the implementation more difficult and incomplete; however, it didn't prevent to get benefits. The target unit also faced earlier mentioned difficulties in using Scrum based practices while other units around used non-agile processes. The lack of good established numerical estimations of requirements' business values lowered the power of the Scrum on a company level, because these values were relative and subjective opinions of the business representatives, In the backlog prioritization, when most of the items are so called high priority ones there is no way to evaluate which one is more valuable and prioritization is more or less a lottery
  • Markkanen, Jani (2012)
    B-puut ovat yleisesti käytettyjä hakemistopuita. Tutkielmassa tutustutaan B-puiden samanaikaisuudenhallintaan ja elvytykseen erityisesti tietokannanhallintajärjestelmän kannalta. Tehokkaan samanaikaisuudenhallinnan tarjoavan Blink-puun algoritmeista esitellään solmujen poistojen seurantaan ja läpikäydessä rakennemuutoksien viimeistelyyn perustuvat algoritmit. Näistä jälkimmäinen toteutetaan ja sen tehokkuutta arvioidaan kokeellisesti. Kokeellisessa arvioinnissa huomataan, että lisäys- ja poisto-operaatioissa samanaikaisuudenhallinnan kustannus nousee jopa 94 %:iin arvioinnin maksimioperaatiotiheydellä. Samalla maksimioperaatiotiheydellä hakuoperaation samanaikaisuudenhallinta vie alle prosentin kokonaisajasta. Korkea samanaikaisuudenhallinnan kustannus lisäys- ja poisto-operaatioissa johtuu päivitysoperaatioiden U-salpaamasta juurisolmusta. Juurisolmun U-salpaus on usein turhan vahva toimenpide, sillä sitä tarvitaan vain 0,06 % päivitysoperaatioita, kun salpa halutaan korottaa kirjoittamista varten X-salvaksi. Puun juuren ruuhkan helpottamiseksi esitellään algoritmille jatkokehitysideoita, jotka perustuvat juuren U-salpauksen tarpeen harvinaisuuteen ja mahdollisuuteen aloittaa puun läpikäynti aina uudelleen puun juuresta.
  • Levitski, Andres (2016)
    With the increase in bandwidths available for internet users, cloud storage services have emerged to offer home users an easy way to share files and extend the storage space available for them. Most systems offer a limited free storage quota and combining these resources from multiple providers could be intriguing to cost-oriented users. In this study, we will implement a virtual file system that utilizes multiple different commercial cloud storage services (Dropbox, Google Drive, Microsoft OneDrive) to store its data. The data will be distributed among the different services and the structure of the data will be managed locally by the file system. The file system will be run in user space using FUSE and will use APIs provided by the cloud storage services to access the data. Our goal is to show that it is feasible to combine the free space offered by multiple services into a single easily accessible storage medium. Building such a system requires making design choices in multiple problem areas ranging from data distribution and performance to data integrity and data security. We will show how our file system is designed to address these requirements and will then conduct several tests to measure and analyze the level of performance provided by our system in different file system operation scenarios. The results will also be compared to the performance of using the distinct cloud storage services directly without distributing the data. This will help us to estimate the overhead or possible gain in performance caused by the distribution of data. It will also help us to locate the bottlenecks of the system. Finally, we will discuss some of the ways that could be used to improve the system based on test results and examples from existing distributed file systems.
  • Osmani, Lirim (2013)
    With the recent advances in efficient virtualization techniques in using commodity servers cloud computing has emerged as a powerful technology to meet new requirements for supporting a new generation of computing services based on utility model. However barriers to widespread adoption still exists and the dominant platform is yet to be seen in years to come. Hence the challenge of providing scalable cloud infrastructures requires a continuous exploration of new technologies and techniques. This thesis describes an experimental investigation of integrating two such open source technologies, OpenStack and GlusterFS, to build our cloud environment. We designed a number of test case scenarios that help us answer the questions around performance, stability and scalability of the cloud infrastructure deployed. Additionally, the work based on this thesis was accepted to the Conference on Computing in High Energy and Nuclear Physics (CHEP2013), and the paper is due for publishing.
  • Koolaji, Mohsen (2014)
    Business ecosystems where services from enterprises across the world are marketed and acquired, demand efficient collaborative project management facilities. In particular, reputation and breach management systems are essential in partner selection and proper project delivery. Reputation systems need to provide measurable scales for collection of objective and arbitrable information about members of the ecosystem. In addition, how breaches or disputes can affect reputation of collaborating partners, and how such disputes can be resolved (i.e. breach recovery) are interesting questions. Furthermore, role of business process management (BPM) systems in resolving breach or dispute situations is also an interesting point for study. This thesis proposes modern model-driven reputation and breach management systems of its own, named Reputation and Breach Management System (RAB_MS). Purpose of the RAB_MS is to improve and refine trust between business partners in the business ecosystems. The presented models are based on state of the art techniques of service oriented architecture (SOA). The models are verified by formal automated verification mechanisms in YAWL system, to avoid syntactical, structural, and semantic errors, and interpretation ambiguities. Results of the formal verification mechanisms ensure that the business processes in the proposed reputation and breach management system meet necessary properties such as soundness and weak soundness. In simpler words, RAB_MS have no deadlocks, livelocks, or dead task within its business process models.
  • Raatikka, Vilho (Helsingin yliopistoUniversity of HelsinkiHelsingfors universitet, 2004)
  • Hämäläinen, Heikki (2016)
    Tämä työ tutkii Clojure-ohjelmointikieltä, joka on erityisesti rinnakkaisohjelmointiin suunniteltu Lisp-kielen murre. Clojure tukee vahvaa liitosta Java-ympäristöön ja sillä kirjoitetut ohjelmat suoritetaan JVM-virtuaalikoneella. Tutkielmassa käydään läpi Lisp-kielten historia, rinnakkaisohjelmoinnin yleiset haasteet ja funktionaalisen ohjelmointiparadigman perusteet. Lisäksi käsitellään Java-kielen ja JVM-virtuaalikoneen ja Clojure-kielen rinnakkaisohjelmointipiirteet. Tutkielman analyysiosassa verrataan Clojuren ja Javan rinnakkaisuusratkaisuja muun muassa tehokkuuden ja käytettävyyden osalta. Clojuren rinnakkaisuusratkaisuista transaktiomuisti osoittautui laskennallisesti hyvin raskaaksi. Lisäksi rinnakkaisratkaisujen lukottomuudesta seuraa se, että tietyt rinnakkaisohjelmointiongelmat ovat hankalia toteuttaa ilman, että käytetään Javan rinnakkaisratkaisuja. Erityisesti synkronisten rinnakkaisratkaisujen osalta kielessä olisi kehittämisen varaa. Javaan verrattuna Clojuren rinnakkaisuusratkaisut ovat hieman yksinkertaisempia käyttää. Tämä johtuu kuitenkin pitkälle Clojuren dynaamisesta tyypityksestä ja funktionaalisesta perusrakenteesta.
  • Kesseli, Henri (2013)
    Embedded systems are everywhere. The variety of different types of embedded systems and purposes are wide. Yet, many of these systems are islands in an age where more and more systems are being connected to the Internet. The ability to connect to the Internet can be taken advantage in multiple ways. One is to take advantage of the resources cloud computing can offer. Currently, there are no comprehensive overviews how embedded systems could be enhanced by cloud computing. In this thesis we study what cloud enhanced embedded systems are and what their benefits, risks, typical implementation methods, and platforms are. This study is executed as an extended systematic mapping study. The study shows that the interest from academia and practice in cloud enhanced embedded systems has been growing significantly in recent years. The most prevalent research area is wireless sensor networks followed by the more recent research area Internet of things. Most of the technology is available for implementing cloud enhanced embedded systems but comprehensive development tools such as frameworks or middlewares are scarce. Results of the study indicate that existing embedded systems and other non-computing devices would benefit from connectivity and cloud resources. This enables the development of new applications for consumers and industry that would not be possible without cloud resources. As an indication of this we see several systems developed for consumers such as remotely controlled thermostats, media players that depend on cloud resources, and network attached storage systems that integrate with cloud access and discovery. The academic literature is full of use cases for cloud enhanced embedded systems and model implementations. However, the actual integration process as well as specific engineering techniques are rarely explained or scrutinized. Currently, the typical integration process is very custom to the application. There are few examples of efforts to create specific development tools, more transparent protocols, and open hardware to support the development of ecosystems for cloud enhanced embedded systems.
  • Linnanvuo, Sami (Helsingin yliopistoUniversity of HelsinkiHelsingfors universitet, 2006)
    Online content services can greatly benefit from personalisation features that enable delivery of content that is suited to each user's specific interests. This thesis presents a system that applies text analysis and user modeling techniques in an online news service for the purpose of personalisation and user interest analysis. The system creates a detailed thematic profile for each content item and observes user's actions towards content items to learn user's preferences. A handcrafted taxonomy of concepts, or ontology, is used in profile formation to extract relevant concepts from the text. User preference learning is automatic and there is no need for explicit preference settings or ratings from the user. Learned user profiles are segmented into interest groups using clustering techniques with the objective of providing a source of information for the service provider. Some theoretical background for chosen techniques is presented while the main focus is in finding practical solutions to some of the current information needs, which are not optimally served with traditional techniques.
  • Lv, Guowei (2014)
    This master thesis discusses two main tasks of computational etymology. First, finding cognates in multilingual text. Second, finding underlying correspondence rules by aligning cognates. For the first part, I briefly described two categories of methods in identifying cognates: symbol based and phonetic based. For the second part, I described the Etymon project, which I had been working in. The Etymon project uses a probabilistic method and Minimum Description Length principle to align cognate sets. The objective of this project is to build a model which can automatically find as much information in the cognates as possible without linguistic knowledge as well as find genetic relationship between those languages. I also discussed the experiment that I did to explore the uncertainty in the data source.