Abstract : The increase of available data in almost every domain raises the necessity of employing algorithms for automated data analysis. This necessity is highlighted in predictive maintenance, where the ultimate objective is to predict failures of hardware components by continuously observing their status, in order to plan maintenance actions well in advance. These observations are generated by monitoring systems usually in the form of time series and event logs and cover the lifespan of the corresponding components. Analyzing this history of observation in order to develop predictive models is the main challenge of data driven predictive maintenance.Towards this direction, Machine Learning has become ubiquitous since it provides the means of extracting knowledge from a variety of data sources with the minimum human intervention. The goal of this dissertation is to study and address challenging problems in aviation related to predicting failures of components on-board. The amount of data related to the operation of aircraft is enormous and therefore, scalability is a key requirement in every proposed approach.This dissertation is divided in three main parts that correspond to the different data sources that we encountered during our work. In the first part, we targeted the problem of predicting system failures, given the history of Post Flight Reports. We proposed a regression-based approach preceded by a meticulous formulation and data pre-processing/transformation. Our method approximates the risk of failure with a scalable solution, deployed in a cluster environment both in training and testing. To our knowledge, there is no available method for tackling this problem until the time this thesis was written.The second part consists analyzing logbook data, which consist of text describing aircraft issues and the corresponding maintenance actions and it is written by maintenance engineers. The logbook contains information that is not reflected in the post-flight reports and it is very essential in several applications, including failure prediction. However, since the logbook contains text written by humans, it contains a lot of noise that needs to be removed in order to extract useful information. We tackled this problem by proposing an approach based on vector representations of words (or word embeddings). Our approach exploits semantic similarities of words, learned by neural networks that generated the vector representations, in order to identify and correct spelling mistakes and abbreviations. Finally, important keywords are extracted using Part of Speech Tagging.In the third part, we tackled the problem of assessing the health of components on-board using sensor measurements. In the cases under consideration, the condition of the component is assessed by the magnitude of the sensor's fluctuation and a monotonically increasing trend. In our approach, we formulated a time series decomposition problem in order to separate the fluctuation from the trend by solving a convex program. To quantify the condition of the component, we compute a risk function which measures the sensor's deviation from it's normal behavior, which is learned using Gaussian Mixture Models.