Date of Completion

12-13-2013

Embargo Period

12-13-2013

Keywords

High Performance Computing, Dynamic Voltage & Frequency Scaling, Scheduling, Duplication, Recursive, Energy, Distributed Systems, Message Passing Interface

Major Advisor

Reda Ammar

Associate Advisor

Sanguthevar Rajasekaran

Associate Advisor

Zhijie Shi

Associate Advisor

Mohammad Khan

Field of Study

Computer Science and Engineering

Degree

Doctor of Philosophy

Open Access

Campus Access

Abstract

The massive demand for running parallel applications on distributed systems has led to an upsurge in the system power consumption. These systems often consist of thousands or millions of cores, storage disks, interconnection devices and other power-hungry components. Some of these applications might not be ready for parallelism. Yet, parallelism can be achieved by partitioning a parallel program represented as a Directed Acyclic Graph (DAG) into sets of tasks called clusters, and assigning each of these clusters to a distinct processor. For a distributed memory model, tasks as- signed to different processors communicate solely by message-passing. In existing parallel machines, message-passing overhead is quite large. Furthermore, competing communication traffic caused by message-passing can saturate the available network bandwidth and usually synchronization is required between tasks. Moreover, the Environmental Protection Agency reported that the energy consumption of data centers and servers is almost 1.5 percent of the total U.S. electricity consumption. Therefore, to address the above problems, we present the following algorithms: First, we present a duplication based scheduling heuristic, Recursive Critical Path Approach RCPA*, with a time complexity of O(|V | · (|V | + |E|)), where |V | is the number of tasks, |E| is the number of edges of a parallel program represented by a DAG. RCPA* finds a scheduling by recursively scanning through the critical path of every sink task. The main purpose behind this approach is to obtain a scheduling with minimal schedule length and minimum number of processors while reducing the inter-processor communications between tasks.

Second, we present an energy-aware scheduling namelys Energy-Aware with Duplication (EED) algorithm, with a time complexity of O(|V^2| · (|V | + |E|)). EED finds energy savings by merging clusters with subset relationships generated by our RCPA* approach. This strategy leads to energy savings on both CPUs and the inter-connection devices. Furthermore, more task replicas are eliminated from the clusters if this leads to minimizing the overall energy consumption subject to the schedule length constraints. In this context, we devised a novel objective function γ that aids in assessing the performance of our algorithm in terms of schedule length, number of processors and energy consumption.

Third, we present an Energy-Aware with Non Duplication (EEND) scheduling al- gorithm, with a time complexity of O(|V^2| · (|V | + |E|)). EEND conserves CPU energy by omitting all task replicas belonging to distinct processors. As this attempt can lead to a severe degradation of the overall schedule length, cluster selection for eliminating task replicas must be conducted carefully to allow for a minimal increase in the overall schedule length. To assess the performance of this approach, we used the objective function γ presented above.

COinS