Enabling QoS Controls in Modern Distributed Storage Platforms
Doctor of Philosophy
Distributed storage systems provide a scalable approach for hosting multiple clients on a consolidated storage platform. The use of shared infrastructure can lower costs but exacerbates the problem of fairly allocating the IO resources. Providing performance Quality-of-Service (QoS) guarantees in a distributed storage environment poses unique challenges. Workload demands of clients shift unpredictably between servers as their locality and IO intensities fluctuate. This complicates the problem of providing QoS controls like reservations and limits that are based on aggregate client service, as well as providing differentiated tail latency guarantees to the clients. In this thesis, we present novel approaches for providing bandwidth allocation and response time QoS in distributed storage platforms. For bandwidth allocation QoS, we develop a token-based scheduling framework to guarantee the maximum and minimum aggregate throughput of different clients. We introduce a novel algorithm called pTrans for solving the token allocation problem. pTrans is provably optimal and has better theoretical and empirical scalability than competing approaches based on linear-programming or max-flow formulations. For the response time QoS, we introduce Fair-EDF, a framework that extends the earliest deadline first (EDF) scheduler to provide fairness control while supporting latency guarantees.