General-Purpose Computation on Graphics Hardware

This paper presents a novel algorithm for solving dense linear systems using graphics processors (GPUs). It reduces matrix decomposition and row operations to a series of rasterization problems on the GPU. These include new techniques for streaming index pairs, swapping rows and columns and parallelizing the computation to utilize multiple vertex and fragment processors. The paper describes implementation of the algorithm on different GPUs and compares the performance with optimized CPU implementations. In particular, implementation on an NVIDIA GeForce 7800 GTX GPU outperforms a CPU-based ATLAS implementation. Moreover, the results show that the algorithm is cache and bandwidth efficient and scales well with the number of fragment processors within the GPU and the core GPU clock rate. The algorithm is demonstrated in the context of fluid flow simulation. (LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware To appear in Proceedings of the 2005 ACM/IEEE Super Computing Conference. November 12-18, 2005.)

This IEEE Visualization 2005 paper (accepted for publication) describes a new algorithm for the illustrative rendering of iso-surfaces and polygonal models. Using a combination of multi-pass rendering and image-space processing passes, hidden structures and optional additional inner geometry are displayed in real-time. No pre-processing of the geometric models is necessary. This work is part of Jan Fischer’s PhD thesis. (Illustrative Display of Hidden Iso-Surface Structures, Jan Fischer et al., IEEE Visualization 2005)

Sudipto Guha, Shankar Krishnan and Suresh Venkatasubramanian are presenting a tutorial on the use of the GPU for data visualization and mining at the ACM International Conference on Knowledge Discovery and Data Mining (KDD 2005). (Data Visualization and Mining on the GPU)

Once again this year ACM SIGGRAPH will feature a full-day course titled “GPGPU: General-Purpose Computing on Graphics Hardware”. The course, organized by Mark Harris of NVIDIA and David Luebke of the University of Virginia, will feature GPGPU experts from industry and academia. The course will discuss core computational building blocks such as sorting, searching, and linear algebra, using case studies ranging from adaptive shadow mapping to database queries and data mining. Particular focus will be given to tools, perils, and tricks of the trade in general-purpose GPU programming. The course has been updated from SIGGRAPH 2004, with all new case studies. (http://www.gpgpu.org/s2005)

Genetic Algorithms (GA) comprise a class of evolutionary computation (EC). A difficulty with GA is that the traditional crossover operation introduces order-dependency and hence an increase in rendering passes on SIMD GPUs. To parallelize EC on GPUs, this project proposes to use another class of EC called Evolutionary Programming (EP), which applies mutations locally. The project studies in-depth how to efficiently map an EP algorithm to SIMD GPUs, including a scalable and visualizable genome map, mutation, tournament and selection, and finally convergence visualization. Intensive experiments and careful comparisons are conducted to demonstrate its performance speedup and accuracy. The project also shows that it is conceptually wrong and infeasible to generate high-quality random numbers on the current generation of GPUs and that the low-quality random numbers will lead to poor performance of EC. (K. L. Fok, T. T. Wong, and M. L. Wong, “Evolutionary Computing on Consumer-Level Graphics Hardware”, IEEE Intelligent Systems, and “Parallel Evolutionary Algorithms on Graphics Processing Unit” in Proc. of IEEE Congress on Evolutionary Computation 2005.)

This new report by Owens et al. is a comprehensive survey of the history and state of the art in GPGPU. It describes, summarizes and analyzes the latest research in mapping general-purpose computation to graphics hardware. The report begins with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. The authors describe the techniques used in mapping general-purpose computation to graphics hardware, and survey and categorize the latest developments in general-purpose application development on graphics hardware. (A Survey of General-Purpose Computation on Graphics Hardware, by John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron E. Lefohn, Timothy J. Purcell. To appear in proceedings of Eurographics 2005, State of the Art Reports.)

This paper by Govindaraju et al. describes a cache-efficient bitonic sorting algorithm on GPUs. The algorithm uses the special purpose texture mapping and programmable hardware to sort IEEE 32-bit floating point data including pointers, and has been used to perform stream data mining and relational database queries. Their results indicate a significant performance improvement over prior CPU-based and GPU-based sorting algorithms. ( GPUSORT: A High Performance Sorting Library”. Also see this Tom’s Hardware article)

Bioinformatics applications are one of the most compute-demanding applications today. While traditionally these applications are executed on cluster or dedicated parallel systems, this paper by M. Charalambous, P. Trancoso, and A. Stamatikis at the University of Cyprus and FORTH explores the use of an alternative architecture. The authors focus on exploiting the characteristics offered by the graphics processors (GPU) in order to accelerate a bioinformatics application. This paper presents the initial results on porting RAxML, a bioinformatics program for phylogenetic tree inference, to the GPU. (Initial Experiences Porting a Bioinformatics Application to a Graphics Processor. M. Charalambous, P. Trancoso, and A. Stamatakis. Proceedings of the 10th Panhellenic Conference in Informatics (PCI 2005))

gDEBugger, an OpenGL debugger and profiler, traces application activity on top of the OpenGL API letting programmers see what is happening within the graphic system implementation. The new V1.5 introduces a Shader Viewer that displays a list of shading programs and shaders existing in each render context. This viewer displays each shader’s source code and parameters. Also displayed is a list of each program’s attached shaders, active uniforms values and program parameters. In addition, this version supports multithreaded applications, displaying a list of the debugged process threads and thread current render contexts. The Call Stack View now displays the call stack of any chosen thread. (www.gremedy.com)

A new version of the Sh language for GPU programming in C++ has been released. This version features a new backend infrastructure implementation allowing such things as running part of a stream application on the GPU and part on the CPU at the same time. Many other fixes as well as platform compatability enhancements were also added. (http://libsh.org)