Developing a scalable, extensible parallel performance analysis toolkit
Master of Arts
Modern parallel systems and applications are constantly increasing in scale and complexity, and consequently good parallel performance is impossible to achieve without the help of performance tools. However, monitoring application performance on these large-scale systems generates massive amounts of performance data. Current performance tools are insufficient for practical analysis of such large-scale data, typically either showing only basic summary information, or bombarding the user with all of the performance details with little help for pinpointing useful patterns. This thesis presents HPCVision, an extensible tool framework with a novel approach for scalable parallel performance analysis and visualization. This framework provides two performance toolkits for examining similarities and differences in parallel performance among an ensemble of processes, identifying equivalence classes of behavior, and pinpointing performance anomalies. HPCVision presents the performance data and analysis results in an intuitive, scalable manner to provide insight into application performance, automating the tuning cycle and increasing the productivity of the human analyst.