Automatic creation of animated crowd scenes involving multiple interacting characters is currently a field of extensive research. This is because automatic generation of animation finds immediate applications in film post-production and special effects, computer games or event simulation in crowded areas. The work presented here addresses the problem of inadequate application of AI techniques in current animation research. The thesis presents a survey of different industrial and academic approaches and a number of lacking features are identified. After extensive research in existing systems an agent-based system and an animation framework are chosen for extension and the cognitive architecture FreeWill is proposed. The architecture further extends its underlying principles and combines software agent solutions with typical animation elements. It also allows for easy integration with existing tools. Another important contribution of FreeWill is a proposal of an algorithm for automatic generation of high-level actions using reinforcement learning. The chosen learning technique lends itself well to the animation task, as reinforcement learning allows for easy definition of the learning task - only the ultimate goal of the learning agent must be defined. The process of defining and conducting the learning task is explained in detail. The learning module allows for further automation of the process of animation generation, shortens the task of creating crowd scenes and also reduces the production costs. Newly learnt actions can be applied to increase the quality of the generated sequences. The learning module can be used in both deterministic and non-deterministic environments. Experiments in both modes are presented, and conclusions are drawn. Two modes of control - inverse and forward kinematics are also compared and a number of experiments are demonstrated. Learning with inverse kinematics control was found to converge faster for the same task. A working prototype of the architecture is presented and the learnt motion is compared with human motion. The results of the comparison demonstrate that the learning scheme could be used to imitate human motion in crowd scenes. Finally a number of metrics are defined which allow for easy selection of most relevant actions from the learnt set, again helping to automate the process. The work concludes with pointing out further directions of research based on this work and suggests possible extensions and applications.