If you're going to the upcoming SuperComputing conference in Seattle, you'll have the opportunity to attend John Humphrey's presentation on CULA . John's talk this year will focus on the new product features, including Sparse solvers and the zero-effort Link Interface for instant acceleration. He will show how easy it is to use the link interface with MATLAB, and will also share examples of how users are taking advantage of the new feature.

If you can't make it to John's presentation, stop by our EM Photonics booth (#244) to meet the entire CULA team, myself included. Finding us may be tricky this year, so you may want to check out the exhibitor map online first. Hope to see you there!

CULA Sparse Beta 2 is undergoing final packaging and testing to be sent out to our Beta testers very soon. This is a Feature update, with the following changes:

Added the BiCGSTAB solver

Added the BiCGSTAB(L) solver

Complex (Z) data types available for all solvers

Fortran module added

Configuration parameter to return best experienced solution

Maximum runtime configuration parameter

New example for Fortran interface

New example for MatrixMarket data

Several important bug fixes, as noted by the Beta testers

This release also contains the first steps towards interoperability with CULA for dense linear algebra, which some hybrid methods require. A user will now need to link cula_core and cula_sparse rather than just the sparse lib. Full interoperability will require CULA R13, which is also coming soon.

We've had plenty of questions regarding the performance of the upcoming CULA Sparse package - hopefully the following performance plot will answer some of those questions!

Here, we have plotted the performance of CULA Sparse (beta-1) against the performance of another GPU library, CUSP (0.2), and an optimized CPU library, Intel MKL (10.3). As you can see, the GPU accelerated libraries perform over a magnitude faster than the CPU counterpart with CULA coming out about 10-20% faster than CUSP!

For this benchmark, we measured the throughput of the conjugate gradient (CG) iterative solver in GB/s such that the execution time is related to the size of the matrix. The CPU benchmarks were obtained using dual hex-core Intel Xeon X5560s (all 12 cores active) and the GPU benchmarks were obtained using an NVIDIA C2050. No preconditioners were used and all solvers converged within very similar iteration counts.

Stay tuned for more performance numbers and the upcoming CULA Sparse (beta-2) release!