HBase-2.0.0 has been a couple of years in the making. It is chock-a-block full of a long list of new features and fixes. In this session, the 2.0.0 release manager will perform the impossible, describing the release content inside the session time bounds.

HBase Practice At XiaoMi

Zheng Hu

We'll share some HBase experience at XiaoMi:

1. How did we tuning G1GC for HBase Clusters.

2. Development and performance of Async HBase Client.

Track1

Offheap bucket cache success story and Offheaping the write path in HBase

Ramkrishna Vasudevan and Anoop Sam John

The first part of the talk covers the success story of deploying the latest improvements to offheap mode bucket cache in one of the biggest clusters at Alibaba.

It highlights how off heap read from bucket cache helped in improving the avg QPS and avoided the frequent dips in QPS due to GC.

The second part covers the efforts that went into making the HBase write path to effectively use the offheap memory, various design changes in terms of size accounting and the performance gains that we achieved at the end of the task.

HBase Multi tenancy use cases and various solution

Bhupendra Jain

In a multi tenant scenario the biggest challenge is to achieve the QoS for each tenant without impacting the other tenants workload. This session will talk about the multi tenancy use cases and challenges present in HBase. Session will talk in detail about

HBase is the core storage of Alibaba's search infrastructure and meets big challenge on improving its throughputs, which decides the speed of machine learning program processing thus the accuracy of recommendations made. In this session we will talk about work done and in progress to increase both read and write throughputs, as well as the real performance on the past Singles' Day and latest benchmark data in laboratory.

HBase is used to serve online facing traffic in Pinterest. It means no downtime is allowed. However, we were on HBase 94. To upgrade to latest version, we need to figure out a way to live upgrade while keeping Pinterest site live. Recently, we successfully upgrade 94 HBase cluster to 1.2 with no downtime. We made change to both Asynchbase and HBase server side. We will talk about what we did and how we did it. We will also talk about the finding in config and performance tuning we did to achieve low latency.

HBase Disaster Recovery Solution at Huawei

Ashish Singhi

HBase Disaster recovery solution aims to maintain high availability of HBase service in case of disaster of one HBase cluster with very minimal user intervention. This session will introduce the HBase disaster recovery use cases and the various solutions adopted at Huawei like.

In the talk, we are going to cover the newly merged backup and restore phases 2 and 3.

Previously users can perform snapshot for backing up data. However, the associated execution cost may be high due to the flush across region servers. There was no incremental snapshot either.

Backup and restore functionality provides two types of backup:

Full backup – foundation for incremental backups

Incremental backup – can be periodic to capture changes over time

We'll cover three types of backup strategies:

Intra-cluster backup

backup on a separate HDFS archive cluster

backup involving Cloud or a Storage Vendor

Best practices for Backup-and-Restore will be presented next.

We'll explain concepts such as Backup Image, Backup Set with example commands of how they are used.

Mechanism for Incremental backups is covered next.

Finally we'll cover bulk load support for backup.

HBase on Beam

Jingcheng Du

Apache Beam is an open source and unified programming model for defining batch and streaming jobs that run on many execution engines, HBase on Beam is a connector that allows Beam to use HBase as a bounded data source and target data store for both batch and streaming data sets. With this connector HBase can work with many batch and streaming engines directly, for example Spark, Flink, Google Cloud Dataflow, etc. In this session, I will introduce Apache Beam, and the current implementation of HBase on Beam and the future plan on this.

Track 2

HBase: recent improvement and practice at Alibaba

Wenlong Yang and Han Yang

AliHB, a tailored HBase branch for Alibaba Group's business characteristics and requirements, is widely used as a basic storage service to support the online and nearline applications of whole alibaba economy companies, like taobao.com、tmall.com、alipay.com、cainiao.com and etc.

In this talk, we will share the experience of high availability and low cost to maintain the clusters including more than ten thousand nodes:

1. Several typical scenes introduction at Alibaba

2. SQL(based on Apache Phoenix) improvement

3. Range-level data copy feature cross clusters

4. Prefix-Bloomfilter for scan performance

5. Dual-Service based on async api, enabling concurrent access on two clusters for expected low latency

When we do real-time data loading to HBase, we use put/putlist interface. After receiving put request, regionserver will write WAL, write data into memory store, flush memory store to disk-store, then compact files again and again. That precedure occupies too much resource and causing read/write performance decrease. To solve the problem, we provide a kind of near-line loading method and architecture, greatly increase the loading bandwidth, and decrease the influence to read operations.

First, we will give a brief introduction about the HBase service at Netease，include the basic cluster info and the key HBase service. And then we will talk same tips about the tuning practices for HBase. Last, we will introduce some improvements at the internal HBase version.

Building online HBase cluster of Zhihu based on Kubernetes

Zhiyong Bai

As a high performance and scalable key value database, Zhihu use HBase to provide online data store system along with Mysql and Redis. Zhihu’s platform team had accumulated some experience in technology of container, and this time, based on Kubernetes, we build flexible platform of online HBase system, create multiple logic isolated HBase clusters on the shared physical cluster with fast rapid，and provide customized service for different business needs. Combined with Consul and DNS server, we implement high available access of HBase using client mainly written with Python. This presentation is mainly shared the architecture of online HBase platform in Zhihu and some practical experience in production environment.