Parallel

Programming with OpenCL 1.2

printf-style debugging and the ability to partition computing devices into subdevices make OpenCL 1.2 a very useful upgrade.

The most complex parameter is properties because you need to develop a list of properties whose required values will vary according to the selected partition type. OpenCL 1.2 supports the following three partition types as the first value for the properties list:

CL_DEVICE_PARTITION_EQUALLY

CL_DEVICE_PARTITION_BY_COUNTS

CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN

CL_DEVICE_PARTITION_EQUALLY requires you to specify the number of compute units you want for each subdevice. The partition scheme splits the device into as many subdevices as possible, and each subdevice will have the specified compute units. If the specified compute units do not divide evenly into the CL_DEVICE_MAX_COMPUTE_UNITS value retrieved from the device, the remainder compute units won't be assigned to any subdevice. The following lines show sample C code that defines the three partition properties (properties) to split a device (device_id) with eight compute units into two subdevices, each containing four compute units. The second value specifies the desired number of compute units per subdevice (4). Notice that the third element of the properties array is CL_DEVICE_PARTITION_BY_COUNTS_LIST_END and specifies the end of the partition properties list. If it was successful, the call to clCreateSubDevices will generate two cl_device_id elements in sub_device_ids. Then, you can use sub_device_ids as an argument for the call to clCreateContext to create an OpenCL context (cl_context) for the subdevices. In the code sample, I've removed OpenCL error checking to make it easier to understand the calls.

// Partition properties
cl_device_partition_property properties[3];
// Partition type
properties[0] = CL_DEVICE_PARTITION_EQUALLY;
// Desired number of compute units per subdevice
properties[1] = 4;
// List end
properties[2] = CL_DEVICE_PARTITION_BY_COUNTS_LIST_END;
// Specifies the size of the out_devices array
cl_uint num_sub_devices = 2;
// Provides a buffer for the generated subdevices with a number of elements specified by num_sub_devices
cl_device_id sub_device_ids[2];
// clCreateSubDevices returns the number of subdevices in which the device may be partitioned into considering the partition type and the other values specified in the property list
cl_uint num_devices_ret = 0;
// Create the sub-devices for the device_id device
auto result = clCreateSubDevices(device_id, properties, num_sub_devices, sub_device_ids, &num_devices_ret);

CL_DEVICE_PARTITION_BY_COUNTS requires you to specify a list of compute unit counts for each subdevice you want to create. OpenCL will create a new subdevice for each of the compute units specified with this number of compute units. The following lines show sample C code that defines five partition properties (properties) to split a device (device_id) with eight compute units into three subdevices with the following configuration: subdevice #0 with two compute units, subdevice #1 with four compute units, and subdevice #2 with two compute units. Notice that the fifth element of the properties array is CL_DEVICE_PARTITION_BY_COUNTS_LIST_END and specifies the end of the partition properties list. If successful, the call to clCreateSubDevices will generate three cl_device_id elements in sub_device_ids with the specified number of compute units for each subdevice.

// Partition properties
cl_device_partition_property properties[5];
// Partition type
properties[0] = CL_DEVICE_PARTITION_BY_COUNTS;
// Desired number of compute units for subdevice #0
properties[1] = 2;
// Desired number of compute units for subdevice #1
properties[2] = 4;
// Desired number of compute units for subdevice #2
properties[3] = 2;
// List end
properties[4] = CL_DEVICE_PARTITION_BY_COUNTS_LIST_END;
// Specifies the size of the out_devices array
cl_uint num_sub_devices = 3;
// Provides a buffer for the generated subdevices with a number of elements specified by num_sub_devices
cl_device_id sub_device_ids[3];
// clCreateSubDevices returns the number of subdevices in which the device may be partitioned into considering the partition type and the other values specified in the property list
cl_uint num_devices_ret = 0;
// Create the sub-devices for the device_id device
auto result = clCreateSubDevices(device_id, properties, num_sub_devices, sub_device_ids, &num_devices_ret);

CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN enables you to partition the device into subdevices considering the affinity of the compute units, so that each subdevice shares either a cache level or a NUMA node. This partition type requires you to specify the element that you want the compute units to share in the partition properties list. The following list shows the possible values and their effect in the partitioning of the device:

CL_DEVICE_AFFINITY_DOMAIN_NUMA: Split the device into subdevices with compute units that share a NUMA node.

CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE: Split the device into subdevices with compute units that share the Level 4 cache, also known as L4 cache.

CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE: Split the device into subdevices with compute units that share the Level 3 cache, also known as L3 cache.

CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE: Split the device into subdevices with compute units that share the Level 2 cache, also known as L2 cache.

CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE: Split the device into subdevices with compute units that share the Level 1 cache, also known as L1 cache.

CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE: OpenCL finds the first level along which the device can be further subdivided following this order in shared elements of the compute units: NUMA node, Level 4 cache, Level 3 cache, Level 2 cache, and Level 1 cache.

The following lines show sample C code that defines the three partition properties (properties) to split a device (device_id) with 16 compute units into two subdevices, each containing eight compute units that share the L3 cache.

// Partition properties
cl_device_partition_property properties[3];
// Partition type
properties[0] = CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN;
// Desired affinity of the compute units
properties[1] = CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE;
// List end
properties[2] = CL_DEVICE_PARTITION_BY_COUNTS_LIST_END;
// Specifies the size of the out_devices array
cl_uint num_sub_devices = 2;
// Provides a buffer for the generated subdevices with a number of elements specified by num_sub_devices
cl_device_id sub_device_ids[2];
// clCreateSubDevices returns the number of subdevices in which the device may be partitioned into considering the partition type and the other values specified in the property list
cl_uint num_devices_ret = 0;
// Create the subdevices for the device_id device
auto result = clCreateSubDevices(device_id, properties, num_sub_devices, sub_device_ids, &num_devices_ret);

After you successfully create the subdevices with any of the previously explained partitioning schemes, you can make calls to clGetDeviceInfo on each generated subdevice to retrieve information about the following partition properties:

CL_DEVICE_PARTITION_TYPE: Retrieve the partition type used to generate the subdevice.

CL_DEVICE_PARTITION_PROPERTIES: Retrieve the partition properties specified when the device or subdevice was split into subdevices.

CL_DEVICE_PARENT_DEVICE: Retrieve the parent device or subdevice for the subdevice.

Conclusion

OpenCL 1.2's addition of the printf debug tool and the new device fission feature provide obvious benefits. With device fission incorporated into OpenCL 1.2, there are new opportunities for using OpenCL to target multicore CPUs or Cell Broadband devices. The ability to partition a device makes it simpler to assign work in a more effective way to the appropriate subdevices in order to achieve the best performance. However, developers who work with device fission must have a very good understanding of the underlying hardware architecture to access the most efficient partitioning schemes based on performance needs.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!