Testing

Big Data Testing

Testing Big Data application is more a verification of its data processing rather than testing the individual features of the software product. When it comes to big data testing, performance and functional testing are the key focus areas.

In Big data testing, QA engineers verify the successful processing of terabytes of data using commodity cluster and other supportive components. It demands a high level of testing skills as the processing is very fast.

Processing may be of three types

Along with this, data quality is also an important factor in big data testing. Before testing the application, it is necessary to check the quality of data and should be considered as a part of database testing. It involves checking various characteristics like conformity, accuracy, duplication, consistency, validity, data completeness, etc.

Big Data Testing can be broadly divided into three steps

Step 1: Data Staging Validation

The first step of big data testing also referred as Pre-Hadoop stage involves process validation.

Data from various sources like RDBMS, web logs etc. should be validated to make sure that correct data is pulled into the system.

Comparing source data with the data pushed into the Hadoop system to make sure they match.

Verify the right data is extracted and loaded into the correct HDFS location.

Step 2: “Map Reduce” Validation

The second step is a validation of “Map Reduce”. In this stage, the tester verifies the business logic validation on every node and then validating them after running against multiple nodes, ensuring that the

Map-Reduce process works correctly

Data aggregation or segregation rules are implemented on the data

Key value pairs are generated

Validating the data after Map-Reduce process

Step 3: Output Validation Phase

The final or third stage of Big Data testing is the output validation process. The output data files are generated and ready to be moved to an EDW (Enterprise Data Warehouse) or any other system based on the requirement.

To check the transformation rules are correctly applied

To check the data integrity and successful data load into the target system

To check that there is no data corruption by comparing the target data with the HDFS file system data.