Wednesday, May 04, 2011

Architecture Overview of an Open Source Low TCO cloud storage system

I present here a possible solution for a low TCO open source cloud storage system, For those out there creating their own cloud (or hosting service).

I am not claiming that it will suits everyone needs but at least i hope it will give you some valuable pointers and alternatives.
Also you might want to adapt it for your specific needs because you might not require every single feature of the system.

Summary:

This setup allows you to build your own redundant storage network with common PC hardware, easier but far more expensive way to achieve this would be to get a SAN and some fiber channel attached hosts. This setup provide similar feature as the one provided by Amazon ESB as well as a HR cluster file system for your cloud storage.

Features:

High availability (DRBD , cluster file system)

High reliability ( DRBD)

Flexible Dynamic storage resource managment

File system export or Block device " amazon ESB style"

Dynamic fail over configuration ( Pacemaker Corosync )

Active / Passive ; N+1 ; N to N ; Split site

Overview:

A set of paired storage back end composed of hosts that use DRBD to keep the data redundant between each paired hosts.

On top of DRBD we have LVM (or CLVM ) using LVM we can do on-the-fly logical partition resizing, snapshots ,including hosting snapshot+diffs,you can even resize a logical partition across multiple underlying DRBD partition .

note: LVM can be used as a front end and backend of DRBD

LVM block device will be exported to the cluster nodes using GNBD. Another node makes a GNBD import and the block device appears to be a local block device there, ready to mount into the file hierarchy.

OCFS2 as a cluster file system allow all cluster nodes to access this file system concurrently.

Another possibility is to export a GNBD device for each virtual machine (but you still need a distributed/ network file system for config etc..).

Use of Pacemaker and Corosync to manage resources for HA / HR

For the managment / control and monitoring part, a custom made solution might be needed.

GRAM could be used to expose the resource management

Any monitoring framework should be able to do the trick

"Simple" Schema:

Pro:

Most of the independent parts are Proven solution used in large scale production environment

Open source / readily available tools

OCFS2 provide back-end storage for Image while GNBD can provide on demand block storage for the cloud instance

Easy Accounting : as any other file system ( might need custom build tools thought depending of the needs/ requirement)

COTS components

Con :

DRBD provide HA/HR through replication ( think RAID 1 ) which means you have HA/ HR and speed at the expense of half of your storage( slightly more if you are using raid for your actual disk storage)

Complex / and risk of cascading failure due to dominoes effect ( similar to what happen in Amazon cloud recently with their ESB)

Performance will be extremely dependent of the set of physical resource available as well as the topology usage :

It will require a lot of tweaking / customization to extract the best performance ( ex dual head for DRBD, load balancing etc.. ) and every setup will be different

Creation of dedicated monitoring tools will be require in order to manage and automate the performance tweaking

Some link providing a step in the right direction. However, none of them provide the full range of features i presented in the overview / schema , but it shouldn't be to hard for you to figure out how to get there ( if i have time i might post the actual how-to).

Search This Blog

Subscribe To This Blog

About Me

Provide advisory services via Blopeur Ltd.Use to work / do research for SAP and more specifically HANA enterprise Cloud. Also use to lead the (now retired by SAP) Open Source project: Hecatonchire - it aims to bring together the flexibility of virtualization, cloud and high
performance computing in order to break free of current cloud
limitations. Hecatonchire deliver a
framework of tools aiming to provide memory, I/O and CPU resource aggregation
capabilities to x86/Linux native application.