Predictive Parallelization: A Framework for Reducing Tail Latencies of Web Search Queries
Doctor of Philosophy
We have become dependent on web search in our everyday lives. Web search services aim to provide fast responses to user queries, making the tail latency more important to reduce than the average latency. With modern multicore servers, intra-query parallelization becomes a desirable technique for reducing the query response time. Our workload characterization of commercial search engine servers shows that using parallelization to reduce the tail latency is challenging: (1) The search workload consists of mainly short-running queries that do not benefit from parallelism, and a few long-running queries which significantly impact the tail but exhibit high parallelism speedup. (2) The spare resources available to parallelize queries vary over time. This thesis presents predictive parallelization, a framework designed for addressing these challenges and reducing tail latencies in web search. There are two fundamental techniques used as key elements of framework design. First, intra-query parallelization of index searching parallelizes each individual query with small overhead. The key idea is for a parallel search to mimic the sequential order of execution that almost never scans the entire index. Second, query execution time predictor identifies a majority of long-running queries through machine learning. The predictor covers a comprehensive feature set to improve prediction accuracy while avoiding expensive features that have excessive requirements such as large memory footprints. In turn, heuristic algorithms in the framework exploit both query and system load information to decide parallelism degree on a query-by-query basis. At runtime, they selectively parallelize long-running queries with high parallelism efficiency and adapt the parallelism degree to system load. All of the techniques and mechanisms proposed in this thesis have been implemented and evaluated experimentally on production servers and workloads.
Web search; Parallelization; Query execution time prediction; Concurrency; Resource management