In this paper, we present the optimization of the particle-based volume rendering for GPU platforms. In general, data transfer between CPU and GPU accompanies long latency. Using page lock memory of the CUDA runtime API, data area is selected so that the data transfer between CPU and GPU becomes faster to reduce the execution time. In the meantime, Using streams, the overlap of data transfer and the execution of kernels is achieved. As the result of experiment with the voxel data of a typhoon (1,188 x 979 x 64, 140MB), data transfer time and kernel execution time are improved and bring out about 30% performance of the GPU.