Introduction

What is BigDataBench

BigDataBench is a big data benchmark suite from web search engines. The first release (BigDataBench 1.0) provides 6 representative big data applications from search
engines which are the most important domain in Internet services in terms of the number of page
views and daily visitors. It also provides an innovative data generation tool to generate scalable
volumes of big data from a small-scale of real data preserving semantics and locality of the real
data. The data sets in BigDataBench are generated by the tool. Users can also combine other
applications in BigDataBench according to their own requirements.

Who can use BigDataBench

The big data benchmark suite with all its applications and input sets is available as open source free of
charge. It is available for researchers interested in pursuing research in the field of big data
application.

Overview

Motivation

In the era of information explosion, more and more data are produced. People are producing and
sharing data continuously. The pressure of evaluating and comparing performance, energy efficiency,
and cost effectiveness of big data systems rises. However, little benchmark suite for big data
applications exists. In this regard, we propose a new big data benchmark suite—BigDataBench.

Key Features

BigDataBench differs from other big data benchmark suites in the following ways:

Incremental Approach: Firstly, we investigate application domains and single out the
most important one—Search Engine, considering its daily visitors and pages views. Secondly, we
choose typical workloads from search engines as candidates of BigDataBench.

Variety of Workloads: BigDataBench consists of six representative workloads,
including analysis workload and service workload. They have different characteristics in terms of
computation, memory and I/O access patterns.

Benchmark Programs

The current version of the suite contains the following 6 workloads from web search engines and a
data generation tool:

Sort—sort the input directory into the output directory;

Wordcount—reads text files and counts how often words occur;

Grep—extracts matching strings from text files and counts how many times they occurred;

Lisence

BigDataBench is made available as open source. All of the software components are governed by their
own licensing terms. Users intending to use BigDataBench are required to fully understand abide by
the licensing terms of the various components.

Downloads

If you want all content in BigDataBench and large file sizes do not bother you, you can use the
following link to download the whole distribution.