The consistent hash algorithm is a distributed hash implementation algorithm proposed by MIT in 1997, and the goal of the design is to solve the hot issues in the Internet.

The consistent hash algorithm proposes four definitions for determining the hash algorithm in a dynamically changing cache environment. balance (Balance): The balance means that the result of the hash can be distributed to all nodes as far as possible, thus solving the problem of load balancing from the algorithm. monotonicity (monotonicity): Monotonicity refers to the addition or deletion of nodes, does not affect the normal operation of the system. dispersibility (spread): The dispersion refers to the data should be scattered in the distributed cluster of nodes (nodes can have backups), do not have each node to store all the data. load: Load as a problem is actually from another angle to see the dispersion problem. Since different terminals may map the same content to different nodes, it is possible for a particular node to map different content to different users. As with dispersion, this situation should be avoided as far as possible, so a good hash algorithm should minimize the load on the node.

a simple hash algorithm

Hash calculation is a common technique for data distribution, which computes the hash value by modulo operations, and then maps the data to storage space accordingly. With a storage space composed of N storage nodes, the formula for mapping a data object object to a storage space using simple hashing is: hash (object)%N. As a result of simple calculation, the simple hashing has many disadvantages: the update efficiency is low when adding and deleting nodes. When the number of storage nodes in the system increases or decreases, the mapping formula will change to the hash (object)% (n±1), which will make the mapping location of all objects change, the mapping location of the whole system data object needs to be recalculated, and the system cannot normally respond to the external access. Will cause the system to be in a crash state. Poor balance, no consideration of node performance differences. Due to the improvement of hardware performance, the newly added nodes have better load capacity, how to improve the algorithm, so that the node performance can be better utilized, is also a problem to be solved urgently. Lack of monotony.

principle of consistent hash algorithm

Consistent hash simply by removing or adding a server, this algorithm can change the mapping relationship between existing service requests and processing request servers as little as possible, and satisfy the monotony requirements as much as possible.

In a common distributed cluster, the service request and the processing request server can correspond, that is, the mapping relationship between the fixed service request and the processing server, and a request is handled by a fixed server. This approach does not load-balance the entire system and may cause some servers to be too busy to handle new requests. Others are too idle, the overall system's resource utilization is low, and when a server in a distributed cluster goes down, it directly causes some service requests to be unhandled.

Further improvements can be made by using the hash algorithm to map the relationship between service requests and processing servers to achieve the goal of dynamic allocation. The common hash algorithm adopts the method of simple modulo, and the value after modulo is the request processing server corresponding to the service request. This can achieve satisfying results in the case of node invariant, but in the case of node dynamic change, this approach obviously does not meet the monotonic requirements (when a machine is added or reduced, all stored content will be hashed).

A well-designed distributed system should have a good monotonicity, that is, the server's addition and removal will not cause a lot of hash relocation, and the consistent hash can solve this problem.

The basic principle of the consistent hash algorithm is to map the machine nodes and key values to a 0-2^32 ring with the same hash algorithm. When a write request arrives, calculate the key value K corresponding hash (k), if the value exactly corresponds to a previous machine node hash value, then directly to the machine, if there is no corresponding machine node, then look for the next node clockwise, write, if more than 2^32 has not found the corresponding node, The lookup starts at 0.

When the number of machines on the hash ring is relatively small, there may be uneven machine divisions on the ring, causing some machines to handle a lot of data, while some machines can only handle very little data. So when machine mapping, you can map an entity node to multiple virtual nodes according to the processing power of the machine.

Virtual node is a copy of the actual node (machine) in the hash Space (replica), an actual node (machine) corresponds to a number of "virtual nodes", the corresponding number is also "copy number", "Virtual node" in the hash space in the hash value arrangement.

After hashing a consistent hash algorithm, when a new machine joins, it will only affect the storage of one machine. For example, the new node h to between A and B, the data previously handled by B may be moved to H processing, and all other nodes will be treated unchanged, thus showing good monotonicity.

If you delete a machine, such as deleting the C node, the data that was originally processed by C will be transferred to the D node, and the other node's processing remains unchanged. The same hashing algorithm is used in both the machine node hashing and the cached data hashing, so the dispersibility and load are reduced well.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or
reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or
complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Try 40+ Products For Free !

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.