I’m happy to announce that the last couple of years of hard work is paying off and the Gluent Offload Engine is production now! After beta testing with our early customers, we are now out of complete stealth mode and are ready talk more about what exactly are we doing :-)

If you haven’t read the previous parts of this series yet, here are the links: [ Part 1 | Part 2 ].

A Refresher

In the first part of this series I said that RAM access is the slow component of a modern in-memory database engine and for performance you’d want to reduce RAM access as much as possible. Reduced memory traffic thanks to the new columnar data formats is the most important enabler for the awesome In-Memory processing performance and SIMD is just icing on the cake.

In the second part I also showed how to measure the CPU efficiency of your (Oracle) process using a Linux perf stat command. How well your applications actually utilize your CPU execution units depends on many factors. The biggest factor is your process’es cache efficiency that depends on the CPU cache size and your application’s memory access patterns. Regardless of what the OS CPU accounting tools like top or vmstat may show you, your “100% busy” CPUs may actually spend a significant amount of their cycles internally idle, with a stalled pipeline, waiting for some event (like a memory line arrival from RAM) to happen.

Luckily there are plenty of tools for measuring what’s actually going on inside the CPUs, thanks to modern processors having CPU Performance Counters (CPC) built in to them.

A key derived metric for understanding CPU-efficiency is the IPC (instructions per cycle). Years ago people were actually talking about the inverse metric CPI (cycles per instruction) as on average it took more than one CPU cycle to complete an instruction’s execution (again, due to the abovementioned reasons like memory stalls). However, thanks to today’s superscalar processors with out-of-order execution on a modern CPU’s multiple execution units – and with large CPU caches – a well-optimized application can execute multiple instructions per a single CPU cycle, thus it’s more natural to use the IPC (instructions-per-cycle) metric. With IPC, higher is better.

It took a while (1.5 years since my last class – I’ve been busy!), but I am ready with my Advanced Oracle Troubleshooting training (version 2.5) that has plenty of updates, including some more modern DB kernel tracing & ASH stuff and of course Oracle 12c topics!

The online training will take place on 16-20 November & 14-18 December 2015 (Part 1 and Part 2).

A notable improvement of AOT v2.5: now attendees will get downloadable video recordings after the sessions for personal use! So, no crappy streaming with 14-day expiry date, you can download the video MP4 files straight to your computer or tablet and keep for your use forever!

I won’t be doing any other classes this year, but there will be some more (pleasant) surprises coming next year ;-)

In the previous article I explained that the main requirement for high-speed in-memory data scanning is column-oriented storage format for in-memory data. SIMD instruction processing is just icing on the cake. Let’s dig deeper. This is a long post, you’ve been warned.

Test Environment

I will cover full test results in the next article in this series. First, let’s look into the test setup, environment and what tools I used for peeking inside CPU hardware.

I was running the tests on a relatively old machine with 2 CPU sockets, with 6-core CPUs in each socket (2s12c24t):

Even though the /proc/cpuinfo above shows the CPU clock frequency as 2.93GHz, these CPUs have Intel Turboboost feature that allows some cores run at up to 3.33GHz frequency when not all cores are fully busy and the CPUs aren’t too hot.

I will come back to this CPU frequency turbo-boosting later when explaining some performance metrics.

I ran the experiments in Oct/Nov 2014, so used a relatively early Oracle 12.1.0.2.1 version with a bundle patch (19189240) for in-memory stuff.

The test was deliberately very simple as I was researching raw in-memory scanning and filtering speed and was not looking into join/aggregation performance. I was running the query below with different hints and parameters to change access path options:

Gluent – where I’m a cofounder & CEO – is hiring awesome developers and (big data) infrastructure specialists in US and UK!

We are still in stealth mode, so won’t be detailing publicly what exactly we are doing ;-)

However, it is evident that the modern data platforms (for example Hadoop) with their scalability, affordability-at-scaleand freedom to use many different processing engines on open data formats are turning enterprise IT upside down.

This shift has already been going on for years in large internet & e-commerce companies and small startups, but now the shockwave is arriving to all traditional enterprises too. And every single one of them must accept it, in order to stay afloat and win in the new world.