Where Andi notes about his Ph.D … and beyond!

Menü

How does Thrust determine, with which numbers of threads and blocks a kernel is started?

The relevant file is thrust/detail/device/cuda/detail/launch_closure.inl (with definitions in thrust/detail/device/cuda/arch.inl)

In short: “[it] implements a simple heuristic that selects launch parameters to maximize occupancy. Specifically, it looks at the footprint of the kernel (registers, shared memory, etc.) and picks launch parameters that maximize the number of threads that can execute on the device at once. Essentially, it runs the “CUDA Occupancy Calculator” spreadsheet automatically for you.

Maximum occupancy doesn’t imply maximum performance, but it’s the best solution we’ve come up with so far :)”

The CUDA Occupancy Calculator refers to this xls document, explained here, with the occupancy being the ratio of active warps to max nof warps on given multiprocessor.

Jared defines the heuristical approach a bit more: “The idea is to first choose a block_size which is large but which also results in large occupancy. Then, we launch the largest number of blocks num_blocks of this size which can be resident concurrently.

If num_blocks * block_size < n, then threads will process multiple elements of the computation serially”

So, the short answer: Automatically. By analyzing the kernel and then optimizing the occupancy.