Wave Equation Based Stencil Optimizations on a Multi-core CPU
Symes, William W.
Master of Arts
Wave propagation stencil kernels are engines of seismic imaging algo- rithms. These kernels are both compute- and memory-intensive. This work targets improving the performance of wave equation based stencil code parallelized by OpenMP on a multi-core CPU. To achieve this goal, we explored two techniques: improving vectorization by using hardware SIMD technology, and reducing memory traffic to mitigate the bottle- neck caused by limited memory bandwidth. We show that with loop interchange, memory alignment, and compiler hints, both icc and gcc compilers can provide fully-vectorized stencil code of any order with per- formance comparable to that of SIMD intrinsic code. To reduce cache misses, we present three methods in the context of OpenMP paralleliza- tion: rearranging loop structure, blocking thread accesses, and temporal loop blocking. Our results demonstrate that fully-vectorized high-order stencil code will be about 2X faster if implemented with either of the first two methods, and fully-vectorized low-order stencil code will be about 1.2X faster if implemented with the combination of the last two methods. Our final best-performing code achieves 20%∼30% of peak GFLOPs/sec, depending on stencil order and compiler.