P { margin-bottom: 0.08in; direction: ltr; color: rgb(0, 0, 10); text-align: left; }P.western { font-family: "Cambria",serif; font-size: 12pt; }P.cjk { font-family: "Droid Sans Fallback"; font-size: 12pt; }P.ctl { font-size: 12pt; }A:link {Nowadays, the condition of wind turbines is monitored with SCADA data and purpose-specific sensors. If predictive maintenance is understood as a data-driven decision-making one, then it should aim to find the right balance between: low cost, accuracy of prediction and actionable results. Developing a monitoring technique with low cost and high accuracy, we previously demonstrated that it is possible to monitor fatigue loads on wind turbines based solely on available SCADA data, which can be used to assess lifetime consumption. However, working on actionable results, we found the process limited by the complete-case assumption, a broadly used technique that removes incomplete records beforehand. Since datasets are almost never complete due to sensors malfunction, records failed, etc., this may lead to biased results. To overcome this limitation we investigated one-year of blade loads and SCADA records from the offshore wind farm EnBW Baltic 1. We assessed the impact of various missing data mechanisms, which define randomness of lost data, and evaluated three traditional replacement procedures, which make different levels of assumptions on data patterns. Our results indicated that hot-deck imputation with a K-Nearest Neighbour (K-NN) algorithm is a robust approach suited to deal with missing data problems. Finally, although it uses a simple traditional approach, our results improve decision-making from data-driven model based on a complete-case assumption.

P { margin-bottom: 0.08in; direction: ltr; color: rgb(0, 0, 10); text-align: left; }P.western { font-family: "Cambria",serif; font-size: 12pt; }P.cjk { font-family: "Droid Sans Fallback"; font-size: 12pt; }P.ctl { font-size: 12pt; }A:link { }The main quantifiable objective is to raise awareness about the impact that missing data might have on data-driven models used for decision-making. The second one is to present a practical approach, which even if built on simple and traditional techniques in statistical terms, can already help the modeller to address the main issue. Finally, the qualitative goal is to inspire the community to address the same question that motivated this research: How do missing data affect your data-driven model?