Version 0.3 of HSCALE is almost in the door

After working on build and test improvements (for example incorporating lualint and LuaCov) as well as other lua “side-projects” (i.e. Log4LUA) we are running towards HSCALE 0.3.

The focus of the forthcoming version 0.3 of HSCALE is Dictionary Based Partition Lookup. Using this partition lookup module lets you take full control over how your partitions are created and where they are actually located.

Update: Dictionary Based Partition Lookup is fully implemented. See this blog post and the wiki page about it.

Please note: Due to the problems with backend connection handling version 0.3 will still focus on single server backends. Even though support for multiple backends is already implemented in HSCALE. Please look at the proxyUnit tests for a glimpse on how multi-server backends will be handled.

Please check out the current development snapshot at svn.hscale.org to see what is already there. Currently the partition lookup is fully functional but the administrative commands and some further hashing functions are missing (see below).

How does dictionary partition lookup work?

As the name implies partitions are looked up in a dictionary. The dictionary itself is stored in the main database and cached internally. So now you can freely move partitions around and create new ones.

What is done internally is this:

Apply a hashing function (read further) to the value.

Lookup the partition based on the hashed value.

If no partition has been found use the default partition created for every partitioned table.

Return the partition (and assigned backend)

Hashing functions

To reduce the number of partitions to be created and the overall administration overhead, a hashing function may be applied to the partition value before the partition is looked up.
Currently there are 3 hashing functions available:

MOD(X): A modulus function grouping X partition values together. Works only for numbers of course. If you have *lots* of different partition values with a smaller number of rows each, then it might be better to group them together instead of creating *lots* of partitions.
Example: Using MOD(3) the values 1, 4 and 7 will end up in the same partition.

PREFIX(length) Partition values are grouped together by the first length characters. Works on everything (everything is treated as string).
Example: Using PREFIX(3) the values “foo“, “foobar” and “footaliciuos” will end up in the same partition.

NONE(): This function does … nothing! Use it if you really want a 1:1 relationship between partition values and partitions.

Further hashing functions are planned (and might make it to version 0.3):

DIV(X): As opposite to MOD(X) this function divides the partition value by X. So this like a fixed range function. While MOD(X) creates at most X partitions DIV(X) creates infinite number of partitions.

DATE(pattern): Enables date-range based partitions.
Example: Using DATE("%Y-%m") will group by (year-)month.

Administrative commands

Because handling of partitions is a delicate thing, creation and maintenance of partitions should not be left to some obscure SQL-statements. Therefor the dictionary partition lookup will provide administrative commands that try to avoid mis-configuration. It will be checked whether partitions overlap etc.

Because this is still work in progress the commands might change.

Table setup

First of all your table has to be set up:
HSCALE SETUP_TABLE('[table]', '[column]', '[default table]', [backend], '[hashing function]')

Add partitions

The example above creates a partition with the name 'nick1' for table 'users' and partition value 'mar' (users with nickname 'marvel', 'martin' etc.). The partition name will directly reflect to the table name of the partition. So the table for this partition will be users_nick1. Usually you would use the partition value ('mar') as partition name but you don’t have to. You are able store multiple partitions in the same table. This sounds strange but it makes sense to define a finer partitioning scheme upfront and actually use a wider scheme until you really need to split up data. This makes it a lot easier to split things up afterwards.

Moving partition data

In version 0.3 partition data will not be moved to a newly created partition. You will have to do it by hand. This will be implemented as soon as multiple backends are fully supported because that implies a different approach since data has to be moved between different servers then. An administrative command to move partitions will also be available then (HSCALE MOVE_PARTITION(...)).

Multiple instances of HSCALE working on the same data

HSCALE is designed to support multiple instances of it running in parallel mostly to avoid to be the bottleneck and single point of failure. Every instance of HSCALE periodically refreshes the internal partition configuration to reflect changes made by another instance. In the case of creating and moving partitions, partitions will be locked inside HSCALE so all clients will wait until the operation finishes. This guarantees data integrity.

Version 0.3 will be released within the next few weeks but definitely after MySQL Proxy 0.7 has been released so we can thoroughly test it against the newest version.

Please feel free to discuss certain features and design decisions shown above. Any feedback is welcome!

2 thoughts on “Version 0.3 of HSCALE is almost in the door”

[…] dictionary partition lookup which allows for explicit partition definition. It works as described here. Another feature implemented is auto-partitioning. Depending on the partitioning function used it […]