Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

From cache to in-memory data grid. Introduction to Hazelcast.

This presentation:
* covers basics of caching and popular cache types
* explains evolution from simple cache to distributed, and from distributed to IMDG
* not describes usage of NoSQL solutions for caching
* is not intended for products comparison or for promotion of Hazelcast as the best solution

4.
What?
• This presentation:
• covers basics of caching and popular cache
types
• explains evolution from simple cache to
distributed, and from distributed to IMDG
• not describes usage of NoSQL solutions for
caching
• is not intended for products comparison or
for promotion of Hazelcast as the best
solution

5.
Why?
• to expand horizons regarding modern
distributed architectures and solutions
• to share experience from my current
project where Infinispan was replaced
with Hazelcast as in-memory distributed
cache solution

15.
Cache Aside Pattern
• application is responsible for reading and writing
from the storage and the cache doesn't interact
with the storage at all
• the cache is “kept aside” as a faster and more
scalable in-memory data store
Client
Cache
Storage

16.
Read-Through/Write-Through
• the application treats cache as the main data
store and reads/writes data from/to it
• the cache is responsible for reading and writing
this data to the database
Client Cache Storage

17.
Write-Behind Pattern
• modified cache entries are asynchronously
written to the storage after a configurable delay
Client Cache Storage

34.
Put in Distributed Cache
• the data is being sent to a primary cluster node
and a backup cluster node if backup count is 1
• modifications to the cache are not considered
complete until all backups have acknowledged
receipt of the modification, i.e. slight
performance penalty
• such overhead guarantees that data consistency
is maintained and no data is lost

38.
Distributed Cache Summary
Distributed in-memory key/value stores
supports a simple set of “put” and “get”
operations and optionally read-through and
write-through behavior for writing and
reading values to and from underlying
disk-based storage such as an RDBMS

41.
Remote Cache
a cache that is located remotely and
should be accessed by a client(s)

42.
Remote Cache
Majority of existing distributed/replicated
caches solutions support 2 modes:
• embedded mode
• when cache instance is started within the same JVM
as your application
• client-server mode
• when remote cache instance is started and clients
connect to it using a variety of different protocols

49.
In-memory Data Grid
In-memory distributed cache plus:
• ability to support co-location of computations
with data in a distributed context and move
computation to data
• distributed MPP processing based on standard
SQL and/or Map/Reduce, that allows to
effectively compute over data stored in-memory
across the cluster

50.
IMDC vs. IMDG
• in-memory distributed caches were
developed in response to a growing need
for data high-availability
• in-memory data grids were developed to
respond to the growing complexities of
data processing

57.
Hazelcast Configuration
• programmatic configuration
• XML configuration
• Spring configuration
Nuance:
It is very important that the configuration on all
members in the cluster is exactly the same,
it doesn’t matter if you use the XML based
configuration or the programmatic configuration.

62.
Hazelcast Instance
Each module that uses Hazelcast for distributed
cache should have its own separate Hazelcast
instance.
The “Hazelcast Instance” is a factory for creating
individual cache objects.
Each cache has a name and potentially distinct
configuration settings (expiration, eviction,
replication, and more).
Multiple instances can live within the same JVM.

63.
Hazelcast Cluster Group
Groups are used in order to have multiple isolated
clusters on the same network instead of a single
cluster.
JVM can host multiple Hazelcast instances (nodes).
Each node can only participate in one group and it
only joins to its own group, does not mess with
others.
In order to achieve this group name and group
password configuration properties are used.

64.
Hazelcast Network Config
In our environment multicast mechanism for
joining the cluster is not supported, so only TCP/IP-cluster
approach will be used.
In this case there should be a one or more well
known members to connect to.

66.
Hazelcast Map Store
• useful for reading and writing map entries from
and to an external data source
• one instance per map per node will be created
• word of caution: the map store should NOT call
distributed map operations, otherwise you
might run into deadlocks

67.
Hazelcast Map Store
• map pre-population via loadAllKeys method that
returns the set of all “hot” keys that need to be
loaded for the partitions owned by the member
• write through vs. write behind using “write-delay-
seconds” configuration (0 or bigger)
• MapLoaderLifecycleSupport to be notified of
lifecycle events, i.e. init and destroy

69.
Hazelcast Executor Service
• extends the java.util.concurrent.ExecutorService,
but is designed to be used in a distributed
environment
• scaling up via threads pool size
• scaling out is automatic via addition of new
Hazelcast instances

70.
Hazelcast Executor Service
• provides different ways to route tasks:
• any member
• specific member
• the member hosting a specific key
• all or subset of members
• supports execution callback

71.
Hazelcast Executor Service
Drawbacks:
• work-queue has no high availability:
• each member will create local ThreadPoolExecutors
with ordinary work-queues that do the real work but
not backed up by Hazelcast
• work-queue is not partitioned:
• it could be that one member has a lot of unprocessed
work, and another is idle
• no customizable load balancing

75.
Infinispan vs. Hazelcast
Infinispan Hazelcast
Pros • backed by relatively large
company for use in largely
distributed environments
(JBoss)
• been in active use for
several years
• well-written documentation
• a lot of examples of different
configurations as well as
solutions to common
problems
• easy setup
• more performant than
Infinispan
• simple node/cluster
discovery mechanism
• relies on only 1 jar to be
included on classpath
• brief documentation
completed with simple
code samples

76.
Infinispan vs. Hazelcast
Infinispan Hazelcast
Cons • relies on JGroups that
proven to be buggy
especially under high load
• configuration can be
overly complex
• ~9 jars are needed in
order to get Infinispan up
and running
• code appears very
complex and hard to
debug/trace
• backed by a startup based
in Palo Alto and Turkey,
just received Series A 2.5
M funding from Bain
Capital Ventures
• customization points are
fairly limited
• some exceptions can be
difficult to diagnose due to
poorly written exception
messages
• still quite buggy

78.
Best Practices
• each specific Hazelcast instance should have its
unique instance name
• each specific Hazelcast instance should have its
unique group name and password
• each specific Hazelcast instance should start on
separate port according to predefined ranges

79.
Personal Recommendations
• use XML configuration in production, but don’t
use spring:hz schema. Our Spring based “lego
bricks” approach for building resulting Hazelcast
instance is quite decent.
• don’t use Hazelcast for local caches as it was
never designed with that purpose and always
performs serialization/deserialization
• don’t use library specific classes, use common
collections, e.g. ConcurrentMap, and you will be
able to replace underlying cache solution easily