Nvidia wants to accelerate mobile-device performance with underlying tools that enable CPUs and graphics processors to work in a coherent manner.

The company released on Tuesday its CUDA 5.5 programming tools, which will for the first time support ARM CPUs used in most smartphones and tablets. The tools could bring the type of performance gains that have helped supercomputers surpass petaflops in performance. But with mobile devices, the performance boosts will fit within a specific power limit.

Many tablets and smartphones already come with Nvidia's Tegra chips, which offer a strong gaming experience.

Developers use CUDA parallel programming tools to write and manage applications that harness the combined processing power of GPUs, CPUs and other processors.

The CUDA-related performance boosts will be especially felt in image processing, said Ian Buck, general manager for GPU computing at Nvidia. Smartphones with cameras could see improved image processing and recognition, and CUDA could open the door for more feature-rich smartphones.

"This is a progression where we started GPU computing on the desktop, and now it's coming to Tegra," Buck said. "This is the first time we are bringing CUDA to the mobile market."

Nvidia already offers graphics processors in mobile chips that are considered the best, and the company is consistently improving the capabilities. Graphics processing will be faster and more power-efficient through CUDA tools, said Buck, who also invented CUDA.

Supercomputers have been moving to graphics processors and other accelerators as a way to boost performance. Nvidia's graphics processors are used in the world's second fastest computer called Titan, which is at the U.S. Department of Energy's Oak Ridge National Laboratory. The supercomputer achieves a peak performance of 17.59 petaflops per second with 299,008 Opteron CPUs and 261,632 Nvidia Tesla K20X GPU cores.

An upcoming mobile Tegra chip code-named Logan, due next year, will be the first to support CUDA 5.5. Logan's CUDA support will come through an integrated graphics processor based on the Kepler architecture, which is in the Titan supercomputer today. The current Tegra chips have GeForce graphics cores and are not optimized for CUDA.

While the CUDA-compatible mobile chips aren't ready, Nvidia is providing mobile developers an early snapshot of the benefits of CUDA 5.5 through a prototype board, Buck said. Nvidia has introduced hardware for developers that connects Tegra 3 chips with a CUDA-compatible GPU called Kayla through a PCI-Express slot. The hardware, which was introduced earlier this year at Nvidia's GPU Technology Conference, was also being shown at the International Supercomputing Conference this week in Leipzig.

Beyond mobile devices, the benefits of CUDA will also come to supercomputers running on ARM processors or Nvidia's graphics cards, Buck said.

Right now more than 400 of the top 500 supercomputers use x86 processors from Intel or Advanced Micro Devices, but many run on Nvidia's graphics cards. ARM is making its way to servers, and the Barcelona Supercomputing Center (BSC) last week announced a prototype supercomputer running on ARM processors. A presentation at the ISC focused on CUDA 5.5 for supercomputing.

There are multiple parallel programming development tools for mobile devices and supercomputing. The most popular is perhaps OpenCL, which is backed by Nvidia. Intel offers its own software development tools to work with its Xeon Phi accelerator chip, while AMD was a founder member of HSA (Heterogeneous System Architecture) Foundation, which aims to make applications easily portable across different chip architectures and devices. Nvidia is not a member of HSA, though ARM and other chip makers like Qualcomm and Texas Instruments are members of the Foundation.

Industry observers speculate that Nvidia isn't a member of the HSA Foundation primarily because it's focused on CUDA. There isn't a one-size-fits-all approach that can be taken to parallel programming, Buck said.

In the long run there won't be one way to approach programming for GPUs, and CUDA will provide the best tools for Nvidia chips, Buck said. Programmers already use multiple tools -- C, C++, Java, Ruby on Rails, Python, among others -- to write applications, and similarly, there will be multiple approaches to bring parallelism on mobile devices and supercomputers, Buck said.

Nvidia is also making hardware improvements that will make programming for its chips easier. Nvidia's upcoming Tegra 6 processor code-named Parker will make memory in CPUs and GPUs a shared resource. Currently GPU and CPU memory are divided, and in Parker the amount of addressable memory will expand.

"A developer doesn't have to manage where data is to take advantage of the GPU," Buck said.