Summary

As architecture, system, data management, and machine learning communities pay greater attention to innovative big data and data-driven artificial intelligence (in short, AI) algorithms, architecture, and systems, the pressure of benchmarking rises. However, complexity, diversity, frequently changed workloads, and rapid evolution of big data, especially AI systems raise great challenges in benchmarking. First, for the sake of conciseness, benchmarking scalability, portability cost, reproducibility, and better interpretation of performance data, we need understand what are the most time-consuming classes of unit of computation among big data and AI workloads. Second, for the sake of fairness, the benchmarks must include diversity of data and workloads. Third, for co-design of software and hardware, the benchmarks should be consistent across different communities.

We consider each big data and AI workload as a pipeline of one or more classes of units of computation performed on different initial or intermediate data inputs, each of which we call a data motif. For the first time, among a wide variety of big data and AI workloads, we identify eight data motifs— including Matrix, Sampling, Logic, Transform, Set, Graph, Sort and Statistic computation, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations. Significantly different from the traditional kernels, a data motif’s behaviors are affected by the sizes, patterns, types, and sources of different data inputs; Moreover, it reflects not only computation patterns, memory access patterns, but also disk and network I/O patterns.

As a multi-discipline research and engineering effort, i.e., architecture, system, data management, and machine learning communities from both industry and academia, we set up an open-source big data and AI benchmark suite—BigDataBench. The current version BigDataBench 4.0 provides 13 representative real-world data sets and 47 benchmarks. Other than creating a new benchmark or proxy for every possible workload, we propose using data motif-based benchmarks—the combination of eight data motifs—to represent diversity of big data and AI workloads. Our benchmark suite includes micro benchmarks, each of which is a single data motif, components benchmarks, which consist of the data motif combinations, and end-to-end application benchmarks, which are the combinations of component benchmarks.

The benchmarks cover seven workload types including AI, online services, offline analytics, graph analytics, data warehouse, NoSQL, and streaming from important application domains, i. e., search engines, social networks, e-commerce, multimedia processing and bioinformatics. Meanwhile, data sets have great impacts on workloads behaviors and running performance (CGO’18). Hence, data varieties are considered with the whole spectrum of data types including structured, semi-structured, and unstructured data. Currently, the included data sources are text, graph, table, and image data. Using real data sets as the seed, the data generators—BDGS— generate synthetic data by scaling the seed data while keeping the data characteristics of raw data.

To achieve the consistency of benchmarks across different communities, we absorb state-of-the-art algorithms from the machine learning communities that considers the model’s prediction accuracy. For the benchmarking requirements of system and data management communities, we provide diverse implementations using the state-of-the-art techniques. For offline analytics, we provide Hadoop, Spark, Flink and MPI implementations. For graph analytics, we provide Hadoop, Spark GraphX, Flink Gelly and GraphLab implementations. For AI, we provide TensorFlow and Caffe implementations. For data warehouse, we provide Hive, Spark-SQL and Impala implementations. For NoSQL, we provide MongoDB and HBase implementations. For streaming, we provide Spark streaming and JStorm implementations.

For the architecture community, whatever early in the architecture design process or later in the system evaluation, it is time-consuming to run a comprehensive benchmark suite. The complex software stacks of the big data and AI workloads aggravate this issue. To tackle this challenge, we propose the data motif-based simulation benchmarks for architecture communities, which speed up runtime 100 times while preserving system and micro-architectural characteristic accuracy. Also, we propose another methodology to reduce the benchmarking cost, we select a small number of representative benchmarks, called the BigDataBench subset according to workload characteristics from an architecture perspective. We provide the BigDataBench architecture subset on the MARSSx86, gem5, and Simics simulator versions, respectively.

On a typical state-of-practice processor: Intel Xeon E5- 2620 V3, we also perform comprehensive characterizations on the benchmarks of seven workload types in BigDataBench 4.0 in addition to traditional benchmarks like SPECCPU, PARSEC and HPCC, in a hierarchical manner and drill down on five levels, using the Top-Down analysis from an architecture perspective. We have the following observations. First, as listed in Figure 1, the ILP (instruction-level parallelism) of the AI benchmarks 2 is 1.26 on average, slightly lower than SPECCPU (1.32). The MLP (memory-level parallelism) of AI is 2.65, similar with HPCC (2.78). Big data has lower ILP (0.85 on average) and MLP (1.86 on average) than AI for almost all types, except that Hive based data warehouse has slightly higher ILP than AI. Further, their performance vary across workload types and software stacks.

Figure 1 Average Execution Performance.

Second, as listed in Figure 2, in terms of uppermost-level breakdown, AI reflect similar pipeline behaviors with the traditional benchmarks, with approximately equal retiring (35% v.s. 39.8%), bad speculation (6.3% v.s. 6.1%), frontend bound (both about 9%), and backend bound percentages (49.7% v.s. 45.1%). The frontend bound of big data is more severe than that of traditional benchmarks (9% on average). However, we notice that the frontend bound varies across different workload types. NoSQL has the highest percentage of 35%, while data warehouse has 25% and the other has only 15% on average. Please see our technical report for more details.

Figure 2 Uppermost Level Breakdown of All Benchmarks.

To model and reproduce multi-application or multi-user scenarios on Cloud or datacenters, we provide the multi-tenancy version of BigDataBench, which allows flexible setting and replaying of mixed workloads according to the real workload traces—the Facebook, Google and Sogou traces.

Together with several industry partners, including Telecom Research Institute Technology, Huawei, Intel (China), Microsoft (China), IBM CDL, Baidu, Sina, INSPUR, ZTE and etc, we also release China’s first industry standard big data benchmark suite—-BigDataBench-DCA, which is a subset of BigDataBench.

Why BigDataBench?

As shown in Table 1, among seven desired properties, we can find that BigDataBench is more sophisticated than the other state-of-art big data benchmarks.

Table 1: The Differences of BigDataBench from Other Benchmarks Suites.

Micro Benchmark Specification. Data motifs are fundamental concepts and units of computation among a majority of big data and AI workloads. We design a suite of micro benchmarks, each of which is a single data motif, as listed in Table 2. We also notice that these micro benchmark implementations have different characteristics, i.e., CPU-intensive, memory-intensive or I/O-intensive.

Component Benchmark Specification. Data motif combinations can compose original complex workloads. In addition to micro benchmarks consisting of a single data motif, we also consider component benchmarks, which are representative workloads in different application domains. Component benchmarks are combinations of one or more data motifs using a DAG-like structure, as listed in Table 3. For example, SIFT is a combination of five data motifs, including matrix, sampling, transform, sort and statistic computations.

Application Benchmark Specification. To model an application domain, we define end-to-end application benchmark specification considering user characteristics and processing logic, based on the real process of an application domain. Due to the complexity and difficulty to benchmark a real application domain, we simplify and model the primary process of an application domain, and provide portable and usable end-to-end benchmarks. We use the combination of component benchmarks to represent the processing logic. For example, for online service, we generate queries considering query number, rate, distribution and locality to reflect the user characteristics.

Benchmarks

BigDataBench is in fast expansion and evolution. Currently, we proposed benchmarks specifications modeling five typical application domains. The current version BigDataBench 4.0 includes 13 real-world data sets and 47 big data workloads, covering seven types. Table 2 summarizes the real-world data sets and scalable data generation tools included into BigDataBench 4.0, covering the whole spectrum of data types, including structured, semi-structured, and unstructured data, and different data sources, including text, graph, image, audio, video and table data. Table 3 and Table 4 present the micro benchmarks and component benchmarks in BigDataBench 4.0 from perspectives of involved data motif, application domain, workload type, data set and software stack. For some end users, they may just pay attention to big data application of a specific type. For example, they want to perform an apples-to- apples comparison of software stacks for offline analytics. They only need to choose benchmarks with the type of offline analytics. But if the users want to measure or compare big data systems and architecture, we suggest they cover all benchmarks.

Evolution

As shown in Figure 2, the evolution of BigDataBench has gone through three major stages: At the first stage, we released three benchmarks suites, BigDataBench 1.0 (6 workloads from Search engine), DCBench 1.0 (11 workloads from data analytics), and CloudRank 1.0(mixed data analytics workloads).

At the second stage, we merged the previous three benchmark suites and release BigDataBench 2.0, through investigating the top three important application domains from internet services in terms of the number of page views and daily visitors. BigDataBench 2.0 includes 6 real-world data sets, and 19 big data workloads with different implementations, covering six application scenarios: micro benchmarks, Cloud OLTP, relational query, search engine, social networks, and e-commerce. Moreover, BigDataBench 2.0 provides several big data generation tools–BDGS– to generate scalable big data, e.g, PB scale, from small-scale real-world data while preserving their original characteristics.

Alumni

Dr. Zhen Jia, Princeton University

Hainan Ye, BAFST

Dr. Yingjie Shi,

Zijian Ming, Tencent

Yuanqing Guo, Sohu

Yongqiang He, Dropbox

Kent Zhan, WUBA

Xiaona Li

Bizhu Qiu, Yahoo

License

BigDataBench is available for researchers interested in big data. Software components of BigDataBench are all available as open-source software and governed by their own licensing terms. Researchers intending to use BigDataBench are required to fully understand and abide by the licensing terms of the various components. BigDataBench is open-source under the Apache License, Version 2.0. Please use all files in compliance with the License. Our BigDataBench Software components are all available as open-source software and governed by their own licensing terms. If you want to use our BigDataBench you must understand and comply with their licenses. Software developed externally (not by BigDataBench group)

Software developed internally (by BigDataBench group) BigDataBench_4.0 License BigDataBench_4.0 Suite Copyright (c) 2013-2018, ICT Chinese Academy of Sciences All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistribution of source code must comply with the license and notice disclaimers

Redistribution in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimers in the documentation and/or other materials provided by the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE ICT CHINESE ACADEMY OF SCIENCES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.