Demonstrating Scalability in Hybrid Workloads

This is the second in a series of blogs that we plan to publish to benchmark the performance of the Splice Machine as a hybrid transactional/analytical processing (HTAP) platform.

Syed Mahmood, VP Product Marketing, Splice Machine

Splice Machine is a data platform with a dual-engine architecture that automatically analyzes each incoming query and then determines the best execution path based on query type and data size. The platform is designed to specifically route analytic queries to Apache Spark while transactional queries are managed within Apache HBase. Splice Machine’s hybrid architecture poses an interesting challenge to benchmark its performance.

Recap

In September of 2018, we published the results of the first performance benchmark for the Splice Machine HTAP platform in a blog titled, How to Measure an HTAP Data Platform for AI Applications. In that blog, we demonstrated how transactional throughput remained relatively steady as we increased the number of concurrent OLAP users (see picture below). This performance testing has important implications not only for the use cases that require OLAP workloads, but also for data science and machine learning use cases. In this context, HTAP offerings – such as Splice Machine that now offers ML capability – can serve as a unified platform that can run multiple analytic jobs such as OLAP or ML as well as transactional workloads simultaneously and at scale.

To benchmark the performance of our platform, we used CH-benCHmark1, which we believe is an appropriate performance measure for a platform such as Splice Machine. Since the TPC-C and TPC-H schemas are not compatible, TPC-H and its queries have to be rewritten to work in the TPC-C schema. Whereas TPC-C and TPC-H follow different scaling models, CH-benCHmark scales in a similar fashion as TPC-C.

Benchmark

Definition

TPC-C

TPC Benchmark C or TPC-C is a transaction processing (OLTP) benchmark that simulates an environment in which operators execute transactions against a database.

TPC-H

TPC Benchmark H or TPC-H is a decision support (OLAP) benchmark that simulates an environment in which complex and ad-hoc queries are executed against a database to answer business questions.

CH-benCHmark

CH-benCHmark is the hybrid benchmark which takes the generic TPC-C benchmark and overlays TPC-H to constitute an “HTAP”2 (H = “hybrid) workload

OLTP-Bench

OLTP-Bench is a load generator built for online transaction processing (OLTP) and Web-oriented workloads.

We ran the benchmark to simulate the operation of 1000 warehouses (hence the name HTAP-1000) with one thousand simultaneous transactional worker threads. To determine the impact of the analytics load on the transactional throughput, and how well the system handles both workloads simultaneously, we increased the cluster size to add more RegionServers and Spark Executors in order to process additional short-running transactional queries and longer running OLAP queries simultaneously.

The results are summarized in the graph below. To recap, using OLTP-Bench3 running CH-benCHmark on a 4-node cluster, Splice Machine’s throughput was 11,588 tpmC (transactions per minute) with no analytics workers. With one analytics worker, the transactional throughput actually improved slightly. And when we increased the analytical workload to four workers, the throughput drops to 10,772 tpmC, only a 7% reduction in tpmC throughput.

Latest HTAP Benchmark

In January of 2019, we embarked on the second phase of testing our benchmark HTAP performance. Our objective was to demonstrate that as the number of resources or servers are increased, both transactional and analytical throughput scales in parallel on the same cluster.

The value for an organization is that if the throughput scales as we expect, in a horizontally scalable architecture such as Hadoop, the user can simply add nodes to achieve their desired performance targets. This testing is also relevant for machine learning use cases. Since ML uses algorithms to predict a certain outcome based on real-time transactional data, the fact that the performance of these models and transactional updates can be enhanced using a single platform can drive the companies to their goal of making intelligent decisions in near real-time. In this regard, Splice Machine also provides a native Spark DataSource that greatly enhances the performance for large-scale data operations.

For this benchmark, we simulated the operation of 1000 warehouses (HTAP-1000) and measured the throughput using HTAP tpmC and HTAP QpH (query per hour) benchmarks as we doubled the compute nodes from 4 to 8 and then to 16. The results are depicted in the graph below (Note: these tests were performed using AWS i3.8xl servers instead of our existing 1U commodity servers in our colo facility). Splice Machine’s transactional throughput rose from 12880 using 4 servers as measured by HTAP tpmC to 25565 with 8 servers and reaches 37285 with 16 servers. On the analytical workload front, we observed a similar phenomenon as HTAP QpH benchmark went from 26 OLAP queries with 4 servers, to 30 queries with 8 servers and then nearly doubling to 59 OLAP queries completed with 16 servers.

Analysis, Next Steps & Areas of Investigation

As we can see from the graphs, adding resources improves both the analytic and transactional throughputs of the hybrid benchmark. The transactional throughput is fairly linear, while the analytic results trend upward as well but they are not as smooth. Important factors that can impact these numbers include:

How isolated are these loads from each other? We need to assess the total resources (and c-groups) on the hardware to ensure that, even though Splice Machine’s architecture ensures analytic and transactional processing is performed on independent Splice engines, the resources for each are adequately partitioned from each other.

Is the load sufficient to drive the hardware? This applies to both transactional and analytic loads. The load comes from a higher number of concurrent users and/or larger data sets.

The degree to which the Splice Machine’s own tuning parameters (for task parallelism, for example) influence these results.

As a follow-up, we will investigate the data collected thus-far and assess the results. For example, how much were we driving the hardware? To what degree do parameters need to be tuned as more resources are brought to bear? How does adding more data or greater concurrency impact the profile of results? We plan to look closely into query isolation and use larger datasets to ensure the server resources are maximally utilized. In the next phase of testing, we also plan to continue to grow the number of servers to benchmark these hybrid workloads.

We believe this is important for standard analytic jobs as well as machine learning pipelines that can be run while continuing to meet the SLA for transactional processing. If an organization is looking to further improve the performance for its hybrid workload consisting of analytic and transactional processing it can do so by simply adding compute resources.