Search
Now showing items 1-10 of 13
Performance Analysis and Optimization of a Hybrid Seismic Imaging Application
(Elsevier, 2016)
Applications to process seismic data are computationally expensive and, therefore, employ scalable parallel systems to produce timely results. Here we describe our experiences of using performance analysis tools to gain insight into an MPI+OpenMP code developed by Shell that performs Reverse Time Migration on a cluster to produce models of the ...
A Sample-Driven Call Stack Profiler
(2004-07-15)
Call graph profiling reports measurements of resource utilization along with information about the calling context in which the resources were consumed. We present the design of a novel profiler that measures resource utilization and its associated calling context using a stack sampling technique. Our scheme has a novel combination of features and ...
Improving Performance with Integrated Program Transformations
(2004-09-09)
Achieving a high fraction of peak performance on today’s computer systems is difficult for complex scientific applications. To do so, an application’s characteristics must be tailored to exploit the characteristics of its target architecture. Today, commercial compilers do not adequately tailor programs automatically; thus, application scientists ...
Tools for Application-Oriented Performance Tuning
(2001-03-14)
Application performance tuning is a complex process that requires assembling various types of information and correlating it with source code to pinpoint the causes of performance bottlenecks. Existing performance tools don't adequately support this process in one or more dimensions. We discuss some of the critical utility and usability issues for ...
The Platform-Aware Compilation Environment: Status and Future Directions
(2012-06-13)
The Platform-Aware Compilation Environment (PACE) is an ambitious attempt to construct a portable compiler that produces code capable of achieving high levels of performance on new architectures. The key strategies in PACE are the design and development of an optimizer and runtime system that are parameterized by system characteristics, the automatic ...
Effective Communication Coalescing for Data-Parallel Applications
(2005-07-29)
Communication coalescing is a static optimization that can reduce both communication frequency and redundant data transfer in compiler-generated code for regular, data parallel applications. We present an algorithm for coalescing communication that arises when generating code for regular, data-parallel applications written in High-Performance Fortran ...
Effective Performance Measurement and Analysis of Multithreaded Applications
(2008-10-13)
Understanding why the performance of a multithreaded program does not improve linearly with the number of cores in a sharedmemory node populated with one or more multicore processors is a problem of growing practical importance. This paper makes three contributions to performance analysis of multithreaded programs. First, we describe how to measure ...
Understanding Unfulfilled Memory Reuse Potential in Scientific Applications
(2007-10-05)
The potential for improving the performance of data-intensive scientific programs by enhancing data reuse in cache is substantial because CPUs are significantly faster than memory. Traditional performance tools typically collect or simulate cache miss counts or rates and attribute them at the function level. While such information identifies program ...
The Platform-Aware Compilation Environment: Preliminary Design Document
(2010-09-15)
The Platform-Aware Compilation Environment (PACE) is an ambitious attempt to construct a portable compiler that produces code capable of achieving high levels of performance on new architectures. The key strategies in PACE are the design and development of an optimizer and runtime system that are parameterized by system characteristics, the automatic ...
Compiling Stencils in High Performance Fortran
(1997-11-12)
For many Fortran90 and HPF programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils. Stencil computations are commonly used in solving partial differential equations, image processing, and geometric modeling. The efficient handling of such stencils is critical for ...