BitFlood: Multicast Enabled P2P Data Sharing in Datacenters
Cox, Alan L
Master of Science
One-to-many data transfers are a common activity within datacenters. Iterative machine learning algorithms, code and VM distribution, and fragment-replicate joins in Hadoop all perform one-to-many data transfers. Moreover, one-to-many data transfers can be costly. Some data analytics applications will distribute hundreds of gigabytes of data from a single node to hundreds of receiver nodes. Lastly, the time it takes to perform these data transfers can represent a large part of the overall execution time of an application. For example, data analytics applications that are used by Twitter and Netflix are reported to spend 30% to 45% of their execution time performing one-to-many data transfers. To address this problem, we propose BitFlood, which exploits the multicast capabilities of commodity switches in datacenters to speed up data transfers and lessen their impact on other application sharing the network. BitFlood is an extension to the BitTorrent protocol which utilizes IP multicast. IP Multicast is advantageous because in a one-to-many data transmission, the sender node sends the data once and it gets duplicated by switches in the network. Since IP multicast does not guarantee delivery of the data, BitFlood takes advantage of its P2P mechanism to recover lost data locally. To evaluate BitFlood, we implement it and compare it with two state-of-the-art data transmission approaches, namely, NORM and BitTorrent. NORM is a reliable multicast protocol and BitTorrent is a P2P protocol. To achieve a fair comparison, we tweak the implementation of both NORM and BitTorrent to have them perform better in a datacenter network. Our analysis shows that BitFlood achieves 10%-50% faster transfer time than both approaches for transferring data to multiple receivers. Furthermore, compared with the P2P approach of BitTorrent, BitFlood can reduce the load on the network by "n" times for a data transfer to "n" receivers.
Peer to peer; File distribution; Multicast layering; Datacenters