It is difficult to implement optimal control for a system whose model is unknown and operation environment is uncertain, such as the intelligent cruise control of vehicles. This article will address the problem from the perspective of reinforcement learning by learning the optimal policy from the state transition data. The model-free optimal control algorithm is employed to approximate the optimal control policy for the intelligent cruise control system, which considers the comfort performance and the safety performance comprehensively by setting up a total performance index. The algorithm is implemented by two multi-layer neural networks which are the critic network and the actor network. The critic and actor networks are employed to approximate the state-action value function and the control action, respectively. In addition, a data collecting strategy is proposed to obtain the state transition data distributed uniformly in the state action space from the running trajectory of the host car. The critic network and the action network are trained alternatively by the collected data until converging. The convergent action network is used to obtain the optimal control policy. At last, the policy is tested on a hardware-in-the-loop simulator built upon dSPACE by comparing with a linear quadratic regulator (LQR) controller and a proportion integration differentiation (PID) controller. Results show its excellent performance on both aspects of the safety and the comfort.