Abstract : This thesis aims at improving the complex task of hand gesture recognition by utilizing machine learning techniques to learn from features calculated from 3D point cloud data. The main contributions of this work are embedded in the domains of machine learning and in the human-machine interaction. Since the goal is to demonstrate that a robust real-time capable system can be set up which provides a supportive means of interaction, the methods researched have to be light-weight in the sense that descriptivity balances itself with the calculation overhead needed to, in fact, remain real-time capable. To this end several approaches were tested:Initially the fusion of multiple ToF-sensors to improve the overall recognition rate was researched. It is examined, how employing more than one sensor can significantly boost recognition results in especially difficult cases and get a first grasp on the influence of the descriptors for this task as well as the influence of the choice of parameters on the calculation of the descriptor. The performance of MLPs with standard parameters is compared with the performance of SVMs for which the parameters have been obtained via grid search.Building on these results, the integration of the system into the car interior is shown. It is demonstrated how such a system can easily be integrated into an outdoor environment subject to strongly varying lighting conditions without the need for tedious calibration procedures. Furthermore the introduction of a modified light-weight version of the descriptor coupled with an extended database significantly boosts the frame rate for the whole recognition pipeline. Lastly the introduction of confidence measures for the output of the MLPs allows for more stable classification results and gives an insight on the innate challenges of this multiclass problem in general.In order to improve the classification performance of the MLPs without the need for sophisticated algorithm design or extensive parameter search a simple method is proposed which makes use of the existing recognition routines by exploiting information already present in the output neurons of the MLPs. A simple fusion technique is proposed which combines descriptor features with neuron confidences coming from a previously trained net and proves that augmented results can be achieved in nearly all cases for problem classes and individuals respectively.These findings are analyzed in-depth on a more theoretical scale by comparing the effectiveness of learning solely on neural activities in the output layer with the previously introduced fusion approach. In order to take into account temporal information, the thesis describes a possible approach on how to exploit the fact that we are dealing with a problem within which data is processed in a sequential manner and therefore problem-specific information can be taken into account. This approach classifies a hand pose by fusing descriptor features with neural activities coming from previous time steps and lays the ground work for the following section of making the transition towards dynamic hand gestures. Furthermore an infotainment system realized on a mobile device is introduced and coupled with the preprocessing and recognition module which in turn is integrated into an automotive setting demonstrating a possible testing environment for a gesture recognition system.In order to extend the developed system to allow for dynamic hand gesture interaction a simplified approach is proposed. This approach demonstrates that recognition of dynamic hand gesture sequences can be achieved with the simple definition of a starting and an ending pose based on a recognition module working with sufficient accuracy and even allowing for relaxed restrictions in terms of defining the parameters for such a sequence.