Learning is enabled by using -F -L. By adding the -O argument, you're telling it to parse the already-generated learning log and produce the policy. Remove the -O argument and learning will be enabled properly.