Computer Science Technical Reports
CS at VT

CampProf: A Visual Performance Analysis Tool for Memory Bound GPU Kernels

Aji, Ashwin M. and Daga, Mayank and Feng, Wu-chun (2010) CampProf: A Visual Performance Analysis Tool for Memory Bound GPU Kernels. Technical Report TR-10-10, Computer Science, Virginia Tech.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.
CampProf-TechReport.pdf (928402)

Abstract

Current GPU tools and performance models provide some common architectural insights that guide the programmers to write optimal code. We challenge these performance models, by modeling and analyzing a lesser known, but very severe performance pitfall, called 'Partition Camping', in NVIDIA GPUs. Partition Camping is caused by memory accesses that are skewed towards a subset of the available memory partitions, which may degrade the performance of memory-bound CUDA kernels by up to seven-times. No existing tool can detect the partition camping effect in CUDA kernels. We complement the existing tools by developing 'CampProf', a spreadsheet based, visual analysis tool, that detects the degree to which any memory-bound kernel suffers from partition camping. In addition, CampProf also predicts the kernel's performance at all execution configurations, if its performance parameters are known at any one of them. To demonstrate the utility of CampProf, we analyze three different applications using our tool, and demonstrate how it can be used to discover partition camping. We also demonstrate how CampProf can be used to monitor the performance improvements in the kernels, as the partition camping effect is being removed. The performance model that drives CampProf was developed by applying multiple linear regression techniques over a set of specific micro-benchmarks that simulated the partition camping behavior. Our results show that the geometric mean of errors in our prediction model is within 12% of the actual execution times. In summary, CampProf is a new, accurate, and easy-to-use tool that can be used in conjunction with the existing tools to analyze and improve the overall performance of memory-bound CUDA kernels.

Item Type:Departmental Technical Report
Keywords:Partition Camping, Analysis, Optimization, NVIDIA GPU's
Subjects:Computer Science > Information Visualization
ID Code:1123
Deposited By:Administrator, Eprints
Deposited On:25 October 2010