Show simple item record

dc.contributor.advisor Cox, Alan L.
dc.creatorChadha, Mehul
dc.date.accessioned 2016-01-06T21:00:20Z
dc.date.available 2016-01-06T21:00:20Z
dc.date.created 2014-12
dc.date.issued 2014-10-03
dc.date.submitted December 2014
dc.identifier.citation Chadha, Mehul. "Improving the Efficiency of Map-Reduce Task Engine." (2014) Master’s Thesis, Rice University. http://hdl.handle.net/1911/87731.
dc.identifier.urihttp://hdl.handle.net/1911/87731
dc.description.abstract Map-Reduce is a popular distributed programming framework for parallelizing computation on huge datasets over a large number of compute nodes. This year completes a decade since it was invented by Google in 2004. Hadoop, a popular open source implementation of Map-Reduce was introduced by Yahoo in 2005. Over these years many researchers have worked on various problems related to Map-Reduce and similar distributed programming models. Hadoop itself has been the subject of various research projects. The prior work in this field is focussed on making Map- Reduce more efficient for iterative processing, or making it more pipelined across different jobs. This has resulted in an improvement of performance for iterative applications. However, little focus was given to the task engine which carries out the Map-Reduce computation itself. Our analysis of applications running on Hadoop shows that more than 50% of the time is spent in the framework in doing tasks such as sorting, serialization and deserialization . We solve this problem introducing an extension to the Map-Reduce programming model. This extension allows us to use more efficient data structures like hash tables. It also allows us to lower the cost of serialization and deserialization of the key value pairs. With these efforts we have been able to lower the overheads of the framework, and the performance of certain important applications such as Pagerank and Join has improved by 1.5 to 2.5 times.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.subjectHadoop
Map-Reduce
Pagerank
Join
Barrier free
dc.title Improving the Efficiency of Map-Reduce Task Engine
dc.contributor.committeeMember Rixner, Scott
dc.contributor.committeeMember Sarkar, Vivek
dc.date.updated 2016-01-06T21:00:20Z
dc.type.genre Thesis
dc.type.material Text
thesis.degree.department Computer Science
thesis.degree.discipline Engineering
thesis.degree.grantor Rice University
thesis.degree.level Masters
thesis.degree.name Master of Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record