Table of Contents

Efficient In-memory Transactional Processing Using HTM

Overview

The commercial availability of Intel’s Haswell processor suggests that hardware transactional memory (HTM), a technique inspired by database transactions, is likely to be widely exploited for commercial in-memory databases in the near future. Its features such as hardware-maintained read/write sets and automatic conflict detection naturally put forward a challenge on how HTM can successfully support concurrency control for fast in-memory transaction processing.

We aim at study the feasibility of applying hardware transactional memory to boost the performance of database transactions.

Approaches

1) Characterizing commercial HTM for multicore scaling: Does hardware transactional memory actually deliver its promise in practice? To answer this question, we try to shed some light on this question by studying the performance scalability of a concurrent skip list using competing synchronization techniques, including fine-grained locking, lock-free and RTM. Our experience suggests that RTM in-deed simplifies the implementation, however, a lot of care must be taken to get good performance. Specifically, to avoid excessive aborts due to RTM capacity miss or conflicts, programmers should move memory allocation/deallocation out of RTM region, tuning fallback functions, and use compiler optimization. [APSys'13]

3) Reusing HTM for Concurrency Control of IMDB: HTM and concurrency control in databases share significant similarity: 1) tracking read/write sets; 2) detecting conflicting accesses. Such similarities intrigue us to study the feasibility of directly reusing HTM for database concurrency control. However, HTM only provides “ACI” instead of “ACID” and has limited working set. We address this challenge through an optimized transaction chopping algorithm and an efficient snapshot algorithm for durability. The resulting system, DBX-TC, achieves the peak throughput of 604,220 txns/sec for TPC-C at 8 threads, which outperforms DBX by 36% to 43% at 8 cores at different contention levels. [TR]

4) Persistent Transactional Memory: Due to the lack of durability support, HTM usually requires a complex software mechanism to asynchronously log transactions into persistent storage. This affects both performance and durability. With the emergence of persistent memory, we propose persistent transactional memory (PTM), a new design that adds (eventual) durability to transactional memory (TM) by incorporating with the emerging non-volatile memory (NVM). We describes the PTM design based on Intel’s restricted transactional memory. A preliminary evaluation using a concurrent key/value store and a database with a cache-based simulator shows that the additional cache line flushes are small. [IEEE CAL]

5) Scalable and Efficient Distributed Transactions using HTM and RDMA (DrTM): We further study how HTM and RDMA can be collectively used to scale out distributed transactions. The key to DrTM’s high performance is mainly offloading concurrency control within a local processor into HTM and leveraging the strong atomicity of RDMA operations concerning HTM to ensure serializability among concurrent transactions across machines. DrTM is built with a lease-based protocol that al- lows read-read sharing of database records among concurrent transactions, as well as an RDMA-friendly hash table that leverages HTM to notably reduce RDMA operations. Evaluation using typical transactional workloads including TPC-C and SmallBank show that DrTM scales well on a 6-node cluster and achieves over 5.52 million transactions per second for TPC-C. This number outperforms a state-of-the-art distributed transaction system (namely Calvin) by at least 17.9X. [SOSP'15]. We also extended DrTM with the support for high availability by using an optimistic replication protocol. [EuroSys'16]

6) Fast and Concurrent RDF queries using RDMA (Wukong). We further extend existing graph-based store with builtin index vertices and leverages differentiated graph partitioning to distribute vertices and indexes. Wukong's design is centered around the use of low-latency, high-throughput one-sided RDMA operations, including a predicate-based RDMA-friendly distributed hashtable, RDMA cost-aware adaption among migration code and data, RDMA-aware full-history pruning. To support highly concurrent queries, Wukong further leverages a worker-obliger work stealing design that minimizes the impact from lengthy queries. [OSDI 2016]

Source Code

The source code of DrTM is available through
git clone git@github.com:SJTU-IPADS/drtm.git or
git clone http://ipads.se.sjtu.edu.cn:1312/opensource/drtm.git

Acknowledgements

The project is supported in part by the Program for New Century Excellent Talents in University of Ministry of Education of China (No.ZXZY037003), a foundation for the Author of National Excellent Doctoral Dissertation of PR China(No. TS0220103006), Doctoral Fund of Ministry of Education of China (No. 20130073120040), China National Natural Science Foundation (61572314, 61402284), and Singapore CREATE E2S2.