OCTWAS - Online Check-pointer for Workflows on Apache Spark

View/Open

Date

Author

Metadata

Abstract

With the advent of Web 2.0, Big Data workflows which are being designed in order to extract useful information from data, are run on many engines, one of which is Apache Spark. Apache Spark is a fast and efficient in-memory parallel-processing software framework. By retaining the data in memory, it speeds up the execution of a wide variety of applications. It maintains fault-tolerance by storing information required to recompute the data in case of failure in stable storage. However, for Big Data, recomputing large amounts of data will lead to a performance penalty. In addition, because Apache Spark runs on commodity hardware, the increased likelihood of failure strengthens the need for a fault-tolerance mechanism.

In order to reduce the impact of failures on the processing of these workflows, we propose a framework called OCTWAS (Online Check-pointer for Workflows on Apache Spark). OCTWAS makes use of knowledge of how the data is used in a workflow and takes into account the gain in time obtained when the data is check-pointed at a particular stage and a failure occurs. The framework is designed to be lightweight so that the performance does not degrade when there are no failures. OCTWAS decides the check-points and writes the data to the disks at these check-points while the workflow is being executed. OCTWAS has been tested by simulating failures at fixed intervals. A sample workflow has been designed for Apache Spark for testing purposes. The performance when the workflow is run with OCTWAS, both with and without failures, is compared to the performance when the workflow is run without OCTWAS, under the same conditions. Results indicate that OCTWAS not only reduces the impact of failures on the processing time of Big Data workflows, but also does not degrade the performance when failures do not occur.