Parallel performance for GPUs today surpasses the traditional multi-core CPUs. Currently, many AI algorithms started to be tested on GPUs rather than CPUs, especially after the release of libraries such as Cuda and OpenCL that allows the implementation of general algorithms on the GPU. One of the most famous game tree search algorithms is Negamax, which tries to find the optimal next move for zero sum games. In this research, a parallel implementation for enhanced NegaMax algorithm with no divergence, dynamic parallelism and shared GPU table that runs on GPU using CUDA library is presented. The approach was tested in checkers and chess games. It was compared to previous studies, including threads on the CPU up to 6x speedup for 8 core processor and threads on the GPU using iterative dependence and fixed grid and block size up to 40x speedup at 14 depth. Furthermore, the approach was tested with different depth on CPU and GPU. The result shows speed up for parallel GPU up to 80x at 14 depth for checkers game and 90x at 14 depth for chess game, which doubled the previous research results.