Optimizing MapReduce For Strongly Heterogeneous Environments

Optimizing MapReduce For Strongly Heterogeneous Environments

dc.date.accessioned	2014-06-03T11:30:13Z	und
dc.date.accessioned	2017-10-24T12:23:50Z
dc.date.available	2014-06-03T11:30:13Z	und
dc.date.available	2017-10-24T12:23:50Z
dc.date.issued	2014-06-03T11:30:13Z
dc.identifier.uri	http://radr.hulib.helsinki.fi/handle/10138.1/3755	und
dc.identifier.uri	http://hdl.handle.net/10138.1/3755
dc.title	Optimizing MapReduce For Strongly Heterogeneous Environments	en
ethesis.department.URI	http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department	Institutionen för datavetenskap	sv
ethesis.department	Department of Computer Science	en
ethesis.department	Tietojenkäsittelytieteen laitos	fi
ethesis.faculty	Matematisk-naturvetenskapliga fakulteten	sv
ethesis.faculty	Matemaattis-luonnontieteellinen tiedekunta	fi
ethesis.faculty	Faculty of Science	en
ethesis.faculty.URI	http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI	http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university	Helsingfors universitet	sv
ethesis.university	University of Helsinki	en
ethesis.university	Helsingin yliopisto	fi
dct.creator	Kåll, Simon
dct.issued	2014
dct.language.ISO639-2	eng
dct.abstract	Over the last ten years MapReduce has emerged as one of the staples of distributed computing both in small and large scale applications. MapReduce has successfully been employed to perform batch parallel computing applications such as web indexing and data mining. Especially Hadoop, an open source implementation of the MapReduce model has become widely adopted and researched. In MapReduce the input data typically consists of a long list of key/- values pairs which has been split up into smaller parts and stored in the cluster performing the computation. The computation consists of two distinct steps, map and reduce. In the map step nodes are assigned input splits, which they process by applying the user supplied map function to each element of the designated part of the list. The result of the Map step is a new intermediate list of key/value pairs which constitutes the input for the reduce step. In the reduce step, a user supplied reduce function is applied to the intermediate data. The reduce function performs a summary operation on the elements in the intermediate data list, the result of which is the output for the MapReduce job. The performance of a MapReduce implementation is closely tied to its scheduler algorithm. The scheduler decides when and on which node the map and reduce tasks of the computation are executed in the cluster. The implementation of the scheduler in Hadoop and other systems relies on the underlying cluster being relatively homogenous with task progressing in a linear fashion. Experience has however shown that this is rarely the case. Differing hardware generations, faults in both hardware and software as well as varying workloads all contribute to make the environment MapReduce runs in far from homogeneous. In this thesis the performance of nodes executing reduce tasks is shown to strongly correlate with the run-time of the MapReduce job. This correlation is utilized to improve performance in an heterogeneous environment though a reduce delay scheduling algorithm. The algorithm schedules reduce tasks based on historic node performance in order to minimize the likelihood of reduce tasks being executed on poorly performing nodes. In the best case scenario the algorithm improves performance under heterogeneity, and even in the worst case minimizes the effect of heterogeneity. This thesis demonstrates how with heterogeneity modeled as a normal distribution of node performance, reduce delay scheduling decreases MapReduce job run-times with up to 30% when compared to a homogeneous model of node performance.	en
dct.language	en
ethesis.language.URI	http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language	English	en
ethesis.language	englanti	fi
ethesis.language	engelska	sv
ethesis.thesistype	pro gradu-avhandlingar	sv
ethesis.thesistype	pro gradu -tutkielmat	fi
ethesis.thesistype	master's thesis	en
ethesis.thesistype.URI	http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram	Networking and Service	en
dct.identifier.urn	URN:NBN:fi-fe2017112251721
dc.type.dcmitype	Text

Files in this item

Files	Size	Format	View
thesis.pdf	358.0Kb	PDF

This item appears in the following Collection(s)

Faculty of Science [4203]

Show simple item record

Optimizing MapReduce For Strongly Heterogeneous Environments

Files in this item

This item appears in the following Collection(s)

Yhteystiedot

HELSINGIN YLIOPISTO