GPUTeraSort is a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. It uses the data parallelism within a GPU along with task parallelism by scheduling memory-intensive and compute-intensive tasks on the GPU. This sorting architecture leverages multiple memory interfaces on the same PC . using both the high-bandwidth GPU memory interface along with the general CPU main memory interface. GPUTeraSort is structured as a task pipeline: read disk, build keys, sort using the GPU, generate runs, write disk, and then in the second phase, read, merge, write. As a result of these design decisions, it achieves higher memory bandwidth than purely CPU-based algorithms running on commodity PCs. It takes into account the limited communication bandwidth between the CPU and the GPU, and reduces the data communication between the two processors. It also pipelines disk transfers and achieves near-peak I/O performance. We tested the performance of GPUTera- Sort on the standard Sort benchmark at hundred Gigabyte scale. A 3 GHz Pentium IV PC with $300 NVIDIA 7800 GT GPU indicates significant performance improvements over optimized CPU-based algorithms on high-end PCs with 3.6 GHz Dual Xeon processors. GPUTeraSort outperforms the reported PennySort benchmark results and has better priceperformance. Overall, the results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.