Since the discovery of RRATs, interest in single pulse radio searches has increased dramatically. Due to the large data volumes generated by these searches, especially in planned surveys for future radio telescopes, such searches have to be conducted in real-time. This has led to the development of a multitude of search techniques and real-time pipeline prototypes. In this work we investigated the applicability of GPUs. We have designed and implemented a scalable, flexibile, GPU-based, transient search pipeline composed of several processing stages, including RFI mitigation, dedispersion, event detection and classification, as well as data quantisation and persistence. These stages are encapsulated as a standalone framework. The optimised GPU implementation of direct dedispersion achieves a speedup of more than an order of magnitude when compared to an optimised CPU implementation. We use a density-based clustering algorithm, coupled with a candidate selection mechanism to group detections caused by the same event together and automatically classify them as either RFI or of celestial origin. This setup was deployed at the Medicina BEST-II array where several test observations were conducted. Finally, we calculate the number of GPUs required to process all the beams for the SKA1-mid non-imaging pipeline. We have also investigated the applicability of GPUs for beamforming, where our implementation achieves more than 50% of the peak theoretical performance. We also demonstrate that for large arrays, and in observations where the generated beams need to be processed outside of the GPU, the system will become PCIe bandwidth limited. This can be alleviated by processing the synthesised beams on the GPU itself, and we demonstrate this by integrating the beamformer to the transient detection pipeline.