Skip to main content
Login | Suomeksi | På svenska | In English

An approach to Machine Learning with Big Data

Show simple item record

dc.date.accessioned 2013-10-02T06:59:02Z und
dc.date.accessioned 2017-10-24T12:24:35Z
dc.date.available 2013-10-02T06:59:02Z und
dc.date.available 2017-10-24T12:24:35Z
dc.date.issued 2013-10-02T06:59:02Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/3104 und
dc.identifier.uri http://hdl.handle.net/10138.1/3104
dc.title An approach to Machine Learning with Big Data en
ethesis.discipline Computer science en
ethesis.discipline Tietojenkäsittelytiede fi
ethesis.discipline Datavetenskap sv
ethesis.discipline.URI http://data.hulib.helsinki.fi/id/1dcabbeb-f422-4eec-aaff-bb11d7501348
ethesis.department.URI http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department Institutionen för datavetenskap sv
ethesis.department Department of Computer Science en
ethesis.department Tietojenkäsittelytieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Peltonen, Ella
dct.issued 2013
dct.language.ISO639-2 eng
dct.abstract Cloud computing offers important resources, performance, and services nowadays when it has became popular to collect, store and analyze large data sets. This thesis builds on Berkeley Data Analysis Stack (BDAS) as a cloud computing environment designed for Big Data handling and analysis. Especially two parts of the BDAS, the cluster resource manager Mesos and the distribution manager Spark will be introduced. They offer important features, such as efficiency, multi-tenancy, and fault tolerance, for cloud computing. The Spark system expands MapReduce, the well-known cloud computing paradigm. Machine learning algorithms can predict trends and anomalies of large data sets. This thesis will present one of them, a distributed decision tree algorithm, implemented on the Spark system. As an example case, the decision tree will be used on the versatile energy consumption data from mobile devices, such as smart phones and tablets, of the Carat project. The data consists of information about the usage of the device, such as which applications have been running, network connections, battery temperatures, and screen brightness, for example. The decision tree aims to find chains of data features that might lead to energy consumption anomalies. Results of the analysis can be used to advise users on how to improve their battery life. This thesis will present selected analysis results together with advantages and disadvantages of the decision tree analysis. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
dct.identifier.urn URN:NBN:fi-fe2017112251347
dc.type.dcmitype Text

Files in this item

Files Size Format View
pro-gradu-epeltonen.pdf 1.131Mb PDF

This item appears in the following Collection(s)

Show simple item record