An Application-Oriented Approach for Accelerating Data-Parallel Computation with Graphics Processing Unit

Ponce, Sean and Jing, Huang and Park, Seung In and Khoury, Chase and Quek, Francis and Cao, Yong (2009) An Application-Oriented Approach for Accelerating Data-Parallel Computation with Graphics Processing Unit. Technical Report TR-09-05, Computer Science, Virginia Tech.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.
paper.pdf (1491672)

Abstract

This paper presents a novel parallelization and quantitative characterization of various optimization strategies for data-parallel computation on a graphics processing unit (GPU) using NVIDIA's new GPU programming framework, Compute Unified Device Architecture (CUDA). CUDA is an easy-to-use development framework that has drawn the attention of many different application areas looking for dramatic speed-ups in their code. However, the performance tradeoffs in CUDA are not yet fully understood, especially for data-parallel applications. Consequently, we study two fundamental mathematical operations that are common in many data-parallel applications: convolution and accumulation. Specifically, we profile and optimize the performance of these operations on a 128-core NVIDIA GPU. We then characterize the impact of these operations on a video-based motion-tracking algorithm called vector coherence mapping, which consists of a series of convolutions and dynamically weighted accumulations, and present a comparison of different implementations and their respective performance profiles.

Item Type:	Departmental Technical Report
Keywords:	Video tracking, GPGPU, convolution, accumulation
Subjects:	Computer Science > Parallel Computation Computer Science > Algorithms and Data Structure
ID Code:	1064
Deposited By:	Cao, Assistant Professor Yong
Deposited On:	04 March 2009