Date of Completion


Embargo Period



Omer Khan, Lei Wang and John Chandy

Field of Study

Electrical Engineering


Master of Engineering

Open Access

Open Access


Sequential graph algorithms are implemented through ordered execution of tasks to achieve high work efficiency. Exposing parallelism in these ordered workloads tends to be an elusive problem. Strict-ordered parallel implementations find nodes that don't have read-write dependencies and hence can be executed in parallel. They have the work efficiency of their sequential counter-parts due to strict ordering constraints. Larger amount of parallelism can be achieved at the expense of redundant work. Relax-ordered implementations remove the global order and only impose the local order. They go through multiple iterations and have the property of monotonically increasing or decreasing output values allowing them to converge efficiently. Unordered implementations move one step ahead and remove the local order as well. Due to the absence of the order, a large amount of redundant work is done but at the same time more parallelism is exposed. Different parallel implementations perform optimally for different algorithms. Similarly, as the graph input changes, the optimal parallel version may change. The choice of optimal parallel implementations is strongly correlated with the characteristics of graph benchmark and input. This work proposes an analytical prediction model that chooses the optimal parallel implementation for a given benchmark-input combination on a single accelerator setup e.g. multicore or GPU. The prediction model is also integrated with a state-of-the-art performance predictor that lacks this capability on a multi-accelerator setup. The proposed predictor shows geometric performance gains of 54% on a multicore, 14% on a GPU, and 31.5% in a multi-accelerator setup.

Major Advisor

Omer Khan