Automatic Matrix Format Exploration for Large Scale Linear Algebra
Doctor of Philosophy
The input of a linear algebra (LA) operation, such as matrices and vectors, could be stored in multiple ways: rows/columns, strips, blocks, etc. Usually, it is very difficult for a programmer to figure out the proper format to use to make a LA computation run fast. Predicting and optimizing the runtime behavior of a LA computation is not an easy task, even when one has expert knowledge of the underlying execution engine. The situation is particularly difficult if the computation consists of thousands of operations, and those operations must be run in a distributed manner. In this paper, we argue that we can render a parallel relational database to automatically explore the formats of LA computations. More specifically, our system would take in the existing code and analyze the operations in the code, explore different formats for those operations and select the most efficient formats, and finally automatically generate the new code to run those operations in their selected formats. We show that our implementation is able to find the formats that have a better performance than the formats that are manually picked up by an expert user of the system.
Distributed Database Systems; Large-scale Linear Algebra; Machine Learning