Prediction-based Power-Performance Adaptation of Multithreaded Scientific Codes
2007) Prediction-based Power-Performance Adaptation of Multithreaded Scientific Codes. Technical Report TR-07-21, Computer Science, VT. (
Computing is currently at an inflection point, with the degree of on-chip thread-level parallelism doubling every one to two years. The number of cores has become one of the most important architectural parameters that characterize performance and power-efficiency of a modern microprocessor, and a computer system in general. Concurrency lends itself naturally to allowing a program to trade some of its performance for power savings, by regulating the number of active cores. Unfortunately, in several computing domains, users are unwilling to sacrifice performance to save power. Futhermore, the opportunities for saving power via other means, such as voltage and frequency scaling, may be limited in heavily optimized applications. In this paper, we present a prediction model for identifying energy-efficient operating points of concurrency in well-tuned multithreaded scientific applications, and a runtime system which uses live analysis of hardware event rates through the prediction model, to optimize applications dynamically. The runtime system throttles concurrency so that power consumption can be reduced and performance can be set at the knee of the scalability curve of each parallel execution phase. We present a dynamic, phase-aware performance prediction model (DPAPP), which combines multivariate regression techniques with runtime analysis of data collected from hardware event counters, to locate optimal operating points of concurrency. DPAPP is hardware-aware, in the sense that it takes into account the dimensions of parallelism in the architecture, using distinct predictors and hardware events for each dimension. It is also phase-aware. Using DPAPP, we develop a prediction-driven runtime optimization scheme, which drastically reduces the overhead of searching the optimization space for power-performance efficiency, while achieving near-optimal performance and power savings in real parallel applications.
|Item Type:||Departmental Technical Report|
|Subjects:||Computer Science > Parallel Computation|
|Deposited By:||Curtis-Maury, Matthew|
|Deposited On:||27 February 2013|