To handle the growing demands of data intensive applications, storage consolidation is becoming an attractive organizational paradigm. The benefits of consolidation include universal data access, sharing among users, flexibility in allocation and provisioning, and economies of centralized infrastructure management. While high-end storage arrays deliver high performance and reliability, they generally lack support for providing quality of service (QoS) guarantees. Providing best effort service alone may no longer be sufficient in future shared storage systems serving diverse client workloads. Consequently, there is a growing need for storage systems to provide mechanisms for isolating applications from each other and providing differentiated QoS to them based on their needs.
Existing approaches for QoS provisioning in storage servers fall into three different categories: (1) Bandwidth allocation mechanisms, (2) Time slicing at the devices, and (3) Control theoretic approaches. However all of these approaches have shortcomings that discourage their practical use. The challenges include dealing with bursty workloads, access locality, variable workload-dependent server capacity, and the impact of fairness mechanisms on overall throughput.
This dissertation makes two major contributions towards making QoS in storage systems a reality and bridging the gap between theory and practice. First, we provide algorithms with stronger performance guarantees, robust behavior, and ability to handle storage specific issues such as bursts and variable capacity. Second, we provide a framework to evaluate and improve I/O efficiency of QoS mechanisms.
Towards the first goal, we propose pClock, which has several benefits over existing approaches, namely (i) ability to handle bursts in workloads (ii) de-coupling bandwidth and latency allocation (iii) allocating spare capacity to background jobs in large chunks thereby increasing efficiency. To handle the variable system capacity of storage systems, we propose the Reservation Based Fair Queuing (RBFQ) algorithm, that guarantees a certain reserved capacity to each application, while allocating the remaining capacity in a proportionate manner. We also adapt RBFQ to work in distributed storage systems such as FAB [SFV+04] and IceCube [W +06]. The algorithm provides global bandwidth in proportion to application weights while avoiding starvation at any resource.
QoS and efficiency are two strongly opposing forces when it comes to sharing of I/O devices, and it is a challenge to achieve both simultaneously. We explore this trade off in further detail, and propose an adaptive algorithm for improving the I/O efficiency of fair schedulers. We show the robustness of our model by analysis and by implementing and evaluating the algorithms on real systems.