Now showing items 11-14 of 14
Efficient call path profiles on unmodified, optimized code
Identifying performance bottlenecks and their associated calling contexts is critical for tuning high-performance applications. This thesis presents a new approach to measuring resource utilization and its calling context. ...
Exploring the potential for accelerating sparse matrix-vector product on a Processing-in-Memory architecture
As the importance of memory access delays on performance has mushroomed over the past few decades, researchers have begun exploring Processing-in-Memory (PIM) technology, which offers higher memory bandwidth, lower memory ...
Performance analysis for parallel programs from multicore to petascale
Cutting-edge science and engineering applications require petascale computing. Petascale computing platforms are characterized by both extreme parallelism (systems of hundreds of thousands to millions of cores) and hybrid ...
Expressiveness, programmability and portable high performance of global address space languages
The Message Passing Interface (MPI) is the library-based programming model employed by most scalable parallel applications today; however, it is not easy to use. To simplify program development, Partitioned Global Address ...