Designing dialog management (DM) policies that are robust to environmental noises is a nontrivial task. Approaches based on reinforcement learning (RL) are popular in academia and have been empirically shown to exhibit much better performance than handcrafted policies. However, the policies trained using RL are mostly incomprehensible, thus limiting the deployments for commercial applications. Policy optimization using genetic algorithm (GA) is a relatively new approach to spoken DM. The most notable advantage of this approach is that the trained policies can be directly interpreted by human experts. In this letter, we make several contributions to the GA-based framework. First, a structural policy learning procedure is presented. Second, a new fitness estimation method based on fitted policy evaluation is proposed. Finally, combining with these methods, an online evolutionary policy learning algorithm is designed which is much more data efficient than direct policy search using Monte Carlo simulations. These proposed approaches are empirically evaluated and compared with several state-of-the-art methods in a simulated environment. The experiments show favorable results for our approach.