Computer Science Technical Reports
CS at VT

Shortening Time-to-Discovery with Dynamic Software Updates for Parallel High Performance Applications

Kim, Dong Kwan and Tilevich, Eli and Ribbens, Calvin J. (2009) Shortening Time-to-Discovery with Dynamic Software Updates for Parallel High Performance Applications. Technical Report TR-09-11, Computer Science, Virginia Tech.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.
techReport.pdf (1304805)

Abstract

Despite using multiple concurrent processors, a typical high performance parallel application is long-running, taking hours, even days to arrive at a solution. To modify a running high performance parallel application, the programmer has to stop the computation, change the code, redeploy, and enqueue the updated version to be scheduled to run, thus wasting not only the programmer’s time, but also expensive computing resources. To address these inefficiencies, this article describes how dynamic software updates can be used to modify a parallel application on the fly, thus saving the programmer’s time and using expensive computing resources more productively. The net effect of updating parallel applications dynamically reduces their time-to-discovery metrics, the total time it takes from posing a problem to arriving at a solution. To explore the benefits of dynamic updates for high performance applications, this article takes a two-pronged approach. First, we describe our experience in building and evaluating a system for dynamically updating applications running on a parallel cluster. We then review a large body of literature describing the existing state of the art in dynamic software updates and point out how this research can be applied to high performance applications. Our experimental results indicate that dynamic software updates have the potential to become a powerful tool in reducing the time-to-discovery metrics for high performance parallel applications.

Item Type:Departmental Technical Report
Keywords:dynamic software updates; high performance applications; binary rewriting; HotSwap
Subjects:Computer Science > Parallel Computation
Computer Science > Software Engineering
ID Code:1075
Deposited By:Administrator, Eprints
Deposited On:09 July 2009