Revision History:

Support threadIdx, blockIdx, blockDim directly (no need for hipify conversions in kernels.) HIP
Kernel syntax is now identical to CUDA kernel syntax - no need for extra parms or conversions.

Refactor launch syntax. HIP now extracts kernels from the executable and launches them using the
existing module interface. Kernels dispatch no longer flows through HCC. Result is faster
kernel launches and with less resource usage (no signals required).

Add cross-linking support between G++ and HCC, in particular for interfaces that use
standard C++ libraries (ie std::vectors, std::strings). HIPCC now uses libstdc++ by default on the HCC
compilation path.

HIP_TRACE_API now prints arguments to the HIP function (in addition to name of function).

Deprecate hipDeviceGetProp (Replace with hipGetDeviceProp)

Deprecate hipMallocHost (Replace with hipHostMalloc)

Deprecate hipFreeHost (Replace with hipHostFree)

The mixbench benchmark tool for measuring operational intensity now has a HIP target, in addition to CUDA and OpenCL. Let the comparisons begin. :)
See here for more : https://github.com/ekondis/mixbench.