User-Specified and Automatic Data Layout Selection for Portable Performance
McGraw, James R.
This paper describes a new approach to managing array data layouts to optimize performance for scientific codes. Prior research has shown that changing data layouts (e.g., interleaving arrays) can improve performance. However, there have been two major reasons why such optimizations are not widely used: (1) the need to select different layouts for different computing platforms, and (2) the cost of re-writing codes to use to new layouts. We describe a source-to-source translation process that allows us to generate codes with different array interleavings, based on a data layout specification. We used this process to generate 19 different data layouts for an ASC benchmark code (IRSmk) and 32 different data layouts for the DARPA UHPC challenge application (LULESH). Performance results for multicore versions of the benchmarks with different layouts show significant benefits on four computing platforms (IBM POWER7, AMD APU, Intel Sandybridge, IBM BG/Q). For IRSmk, our results show performance improvements ranging from 22.23× on IBM POWER7 to 1.10× on Intel Sandybridge. For LULESH, we see improvements ranging from 1.82× on IBM POWER7 to 1.02× on Intel Sandybridge. We also developed a new optimization algorithm to recommend a layout for an input source program and specific target machine characteristics. Our results show that the performance of this automated layout algorithm outperforms the manual layouts in one case and performs within 10% of the best architecture-specific layout in all the other cases, but one.