Date of Completion


Embargo Period


Major Advisor

Reda Ammar

Co-Major Advisor

Sanguthevar Rajasekaran

Associate Advisor

Chun-Hsi Huang

Associate Advisor

Yufeng Wu

Field of Study

Computer Science and Engineering


Doctor of Philosophy

Open Access

Campus Access


Designing parallel models that fully utilize the computation capabilities of Graphics Processing Units (GPUs) faces some challenges. Some of these challenges are the large size of input data, the relatively large global barrier synchronization overhead on GPUs, the limitations of the size of the on-chip shared memory, and the high cost of the multiple transfers of data between the CPU's main memory and the GPU's global memory for out-of-core computations. Designing a suitable techniques and algorithms that efficiently use the huge computation capabilities of GPUs and successfully resolving such challenges is essential. Some fast GPU algorithms and techniques are presented in this dissertation to implement heavily computation problems efficiently on GPUs.

The first technique is Hyperspectral Image Reduction for Endmembers Extraction (HIREE). HIREE is used to reduce the input data set size for hyperspectral images as a preprocessing step before applying any endmember extraction algorithm. HIREE is used to enhance the performance of N-FINDER endmember extraction algorithm. An improvement of about 20% in the performance was successfully achieved.

The main idea of the second technique is to eliminate the unnecessary global barrier synchronization for spectral processing algorithms. This technique is used to model the N-FINDER algorithm on GPUs. An overall speedup of about 2 over the most recent published model was achieved.

The third technique is called Memory Extension with Communication Enabled technique (MECE). This technique successfully manages to utilize the available register file in each Streaming Microprocessor to improve the overall performance. This technique was used to enhance the performance of the red-black Gauss-Seidel method for solving partial differential equations on GPUs. An overall speedup of 2.6 relative the most recently published algorithm is reported.

Finally, a GPU out-of-core sorting algorithm is proposed. This new algorithm is using a randomized sampling to efficiently sort a large data in the GPUs. The algorithm has been tested with different data set sizes with different distributions. The proposed algorithm reaches an overall speedup of about 1.7 over the most recently published algorithm. The proposed algorithm also shows a good scalability with increasing data set sizes.