Höb, M. (2017):

Performance Predictions for Large-Scale Data Applications

The cloud computing community is constantly developing new applications, job submitting frameworks and data processing paradigms. In all areas, new approaches are recognized, which should enable each user to profit from this fast development. To support users in their decision, which of the huge amount of different cloud configurations is the optimal for a current application, this thesis introduces a methodology to predict the runtime behavior of any large-scale data application within a cloud. Therefore, significant features of these configurations are varied during runtime measurements of the underlying application, which resulting values will be mathematically investigated within a function analysis. The main statistical method will be a Multiple Linear Regression, which will value the relation between the selected features. Based on them, a runtime prediction and a configuration selection is possible. The developed methodology will be instantiated on two use cases, the clustering algorithm K-Means and Wordcount, using Apache Flink as a job processing framework.