Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

11.
Separating hot cold data
The Problems of separating data by KV timestamp
• Timestamp may not represents the heat of business data very
well
• KeyValue’s timestamp is also used as version number in HBase
e.g. Write an order ID advance in current ts
e.g. Data Source(Kafka, Spark…) delayed, resulting ts lag

17.
Separating hot cold data
• Only Cold/Warm/Hot window is needed
• Data will move from hot to warn then to Cold window
• Secondary Filed or timestamp is used
Our layered compaction is inspired by Date-tiered
Compaction.

18.
Layered Compaction
• HFile flushed by Memstore is
always in L0
• Hot/Warm/Cold layer have
their own compaction
Strategy
• Data is separated by
secondary field or timestamp
• Data out of boundary will be
compacted out to next layer

35.
Conclusion
• A new approach to separate hot-cold data was introduced
• A new Secondary Field Slicer was used to decide layer boundaries
besides timestamp
• Layered compaction was used to separate data to different layer
• Heterogeneous storage was used to balance cost and performance
• New technology like Prefix Bloom Filter and Secondary Field Range
Lazy Seek was used to do auto query optimization
• Production test shows that our approach can lower the query RT by
50% and decrease the storage usage by 25%

36.
We are hiring!
• If you are interested in or familiar
with Hadoop ecosystem or any
other No-SQL database
• If you are eager to accept challenge
of building high concurrency, low
latency and flexible system