Skip to main content
Login | Suomeksi | På svenska | In English

Optimizing MapReduce For Strongly Heterogeneous Environments

Show simple item record

dc.date.accessioned 2014-06-03T11:30:13Z und
dc.date.accessioned 2017-10-24T12:23:50Z
dc.date.available 2014-06-03T11:30:13Z und
dc.date.available 2017-10-24T12:23:50Z
dc.date.issued 2014-06-03T11:30:13Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/3755 und
dc.identifier.uri http://hdl.handle.net/10138.1/3755
dc.title Optimizing MapReduce For Strongly Heterogeneous Environments en
ethesis.department.URI http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department Institutionen för datavetenskap sv
ethesis.department Department of Computer Science en
ethesis.department Tietojenkäsittelytieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Kåll, Simon
dct.issued 2014
dct.language.ISO639-2 eng
dct.abstract Over the last ten years MapReduce has emerged as one of the staples of distributed computing both in small and large scale applications. MapReduce has successfully been employed to perform batch parallel computing applications such as web indexing and data mining. Especially Hadoop, an open source implementation of the MapReduce model has become widely adopted and researched. In MapReduce the input data typically consists of a long list of key/- values pairs which has been split up into smaller parts and stored in the cluster performing the computation. The computation consists of two distinct steps, map and reduce. In the map step nodes are assigned input splits, which they process by applying the user supplied map function to each element of the designated part of the list. The result of the Map step is a new intermediate list of key/value pairs which constitutes the input for the reduce step. In the reduce step, a user supplied reduce function is applied to the intermediate data. The reduce function performs a summary operation on the elements in the intermediate data list, the result of which is the output for the MapReduce job. The performance of a MapReduce implementation is closely tied to its scheduler algorithm. The scheduler decides when and on which node the map and reduce tasks of the computation are executed in the cluster. The implementation of the scheduler in Hadoop and other systems relies on the underlying cluster being relatively homogenous with task progressing in a linear fashion. Experience has however shown that this is rarely the case. Differing hardware generations, faults in both hardware and software as well as varying workloads all contribute to make the environment MapReduce runs in far from homogeneous. In this thesis the performance of nodes executing reduce tasks is shown to strongly correlate with the run-time of the MapReduce job. This correlation is utilized to improve performance in an heterogeneous environment though a reduce delay scheduling algorithm. The algorithm schedules reduce tasks based on historic node performance in order to minimize the likelihood of reduce tasks being executed on poorly performing nodes. In the best case scenario the algorithm improves performance under heterogeneity, and even in the worst case minimizes the effect of heterogeneity. This thesis demonstrates how with heterogeneity modeled as a normal distribution of node performance, reduce delay scheduling decreases MapReduce job run-times with up to 30% when compared to a homogeneous model of node performance. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram Networking and Service en
dct.identifier.urn URN:NBN:fi-fe2017112251721
dc.type.dcmitype Text

Files in this item

Files Size Format View
thesis.pdf 358.0Kb PDF

This item appears in the following Collection(s)

Show simple item record