I've lived around databases all my life, 21st century is challenging for them: big data, throughput, complexity, virtualization, global distribution - it's all scalability.
I'm the founder and CTO of ScaleBase, solving this problem is a workoholic's heaven, so I'm having great time!
My agenda is to stay technical, no marketing and sales BS, give my summarized set of views and opinions to urgent topics, events and latest news in database scalability.

The webinar was successful, we had many attendees and great participation in questions and answers throughout the session and in the end. Only after the webinar it only occurred to me that one specific graphic was missing from the webinar deck. It was occurred to me after answering several audience questions about "the difference between partitioning and sharding" or "why partitioning doesn't qualify as scale-out".

Having the webinar today, I would definitely include the following picture, describing the core difference between Scale Up, Partitioning, and Scale Out:

In the above (poor) graphics, I used the black server box as the database server machine, the good old cylinder as the disk or storage device, and the colorful square thingy stands for the database engine. Believe it or not, this is a real complete architecture chart of Oracle 10gR2 SGA, miniatured to a small scale. Yes, all databases including Oracle and also MySQL, are complex beasts, a lot of stuff is going on inside the database engine for every command.

If my DB is like in the "starting point" then I'm either really small, or I'm in a really bad shape by now.

Partitioning makes wonders as data grows towards being "big data". It optimizes the data placement on separate files or disks, it makes every partition optimized and "thin" and less fragmented as you would expect from a gigantic busy monolithic table. Still, although splitting the data across files, we're still "stuck" with busy monolithic database engine that relies on a single box "compute" or "computing power".

While we distributed the data, we didn't distribute the "compute".

When there is a heavy join operation, there is one busy monolithic database engine to collect data from all partitions and process this join.

When there are 10000 concurrent transactions to handle right here and now, there is one busy monolithic database engine to do all database-engine activities such as buffer management, locking, thread locks/semaphores, and recovery tasks. Buffers, locking queues, transaction queues... are still the same for all partitions.

This is where Scale-out is different than partitions. It enables distribution and parallelism of the data as well as the so important compute, brings the compute closer to the data, enables several database engine process different sets of data, handling different sets of the overall session concurrency.

You can think of it as one step forward from partitioning, and it comes with great great results. It's not a simple step though, an abstraction layer is required to represent the databases grid as one database to the application, same as what it's used to use.

In further posts I'll go into more on this "Scale Out Abstraction Layer", and also about ScaleBase which is a provider of such layer

Follow by Email

Subscribe now

Total Pageviews

Share

Google+ Followers

About Me

Technology leader, data and databases expert, hand on system
architect, senior consultant and project manager, with great experience in
understanding various aspects of organizations, distributed applications, and
integration of various technologies, hardware and software solutions.

I'm the founder and CTO of ScaleBase, a venture backed startup company building a next-generation distributed database engine based on standard MySQL databases to bring true cloud elasticity and scale-out capabilities to standard relational databases

An Oracle DBA since 1997, have administered database versions from Oracle7 till Oracle11g. My experience includes a senior and leading DBA in large organizations in the government and hi-tech industries, administering complex databases serving critical applications and data warehouses with large volumes and 24X7 availability, and with integration of almost every feature and product in the Oracle database offering.

An enterprise application software architect since 2001, specializes in the Java/JEE technology stack, with specific focus on Oracle middleware offering – from Oracle9iAS, OC4J and nowadays – the WebLogic platform.