is that even a real word?

and risk reward ratios with big
memory "flash as RAM"

A recent conversation I had with Kevin Wagner at Diablo Technologies
began with talking about the recent benchmarks they have been sharing related
to their Memory1 (128GB flash as RAM DIMMs) when running large scale
analytics software. But it finished somewhere unexpected.

I'll start
with the benchmarks.

Kevin said that some of the results (for SPARK SQL
performance) came from a real financial customer who had run these tests
themselves using data from their own production environments.

They
were able to achieve a 4 to 1 server reduction using Memory1 enhanced servers
(3x M1 nodes versus 12 384GB DRAM only nodes) and still get 24% faster
performance.

Diablo also has other customer based benchmarks
which show useful acceleration when using identical numbers of server nodes
and comparing the results to alternative SSD implementations (NVMe and arrays of
SATA SSDs).

risk reward ratios

As
in the
early phases of
flash array market 10 years ago - you have to filter yourself in or out of
following up interest in this depending on whether you think you have the right
type of problem.

The risk reward factors of using this DIMM based
flash as RAM system like Memory1 is that users with big memory apps will be
able to choose whether they prefer the idea of getting faster results (using a
similar number of servers) or using less servers to save costs (or some
combination).

Small
jobs which would have fitted into the DRAM comfortably anyway could run upto
30% slower in a Memory 1 server cluster. Although big jobs do run faster.

Users
who are evaluating this new tiered memory approach can buy preconfigured
supported servers from a variety of sources and Diablo says that no changes are
required to the OS or applications.

Compared to many of the alternative
emerging new semiconductor memory approaches - flash (as RAM) seems like it
will be the mainstream safe choice for the next 2 or 3 years - because it will
take that long for the newcomers to prove their
reliability and
even after that - we have the issue of
software support for
the tiering and caching.

competition?

There
will of course be alternative competing implementations of "flash as
RAM in the DIMM form factor". (Which is not the only form factor for this
concept - but I'm trying to keep this article as short as I can.)

Companies like Xitore
and Netlist have been
saying they want to get into the "flash as RAM in the DIMM form factor"
market for a while now.

I haven't seen details from these expected
competitors from my guess is that - unlike Diablo's product - which leverages
the DRAM which is already in other DIMM sockets in the same motherboard - that
some of the later contestants in this market will take the approach of
placing everything needed to provide transparent emulation and caching into a
single DIMM.

That alternative approach might work better for smaller
scale embedded systems which don't have a lot of DIMMs - but creates difficult
design constraints - because the "all in a single DIMM" approach means
there will be less flexibility about
RAM flash cache
ratios. The real RAM will be fixed into the design. (Unlike Diablo's
solution where the ratio of flash to DRAM DIMMs is fluid.)

That lack
of flexibility is why I predicted that the
hybrid storage
drive market would never succeed in the enterprise and so far no one has
been silly enough to admit to stuffing JBODs with 2.5" hybrid flash-HDDs
and instead the hybrid
storage appliance market picked and chose components from a wide range of
best of breed flash modules.

But I think the "all in a single
DIMM approach to implement flash as a RAM tier will succeed as a viable market
too. (In addition to the Memory1 approach.) Personally I think the all in one
DIMM solutions will work better in small capacity memory systems but be less
upwardly scalable for large capacity servers.

I expect that the "flash as RAM
in a DIMM form factor" market will fragment into:-

applications which only need a single such DIMM on each server and

the other segment will be those applications which tend to use the
maximum number of DIMM slots.

the interesting thing? - dataflow
controllernomics

Let's pause for some perspective...

What's
data? - Now hold on - that's too philosophical. Encode data a different way...
to make it work better - now you're talking engineering. But that's a
discussion for another time.

Right here we don't care what the data
means.

It just comes and goes.

And it's surprising how far
or how little it may have traveled.

From the cloud? Another storage
device? Maybe it was computed just now from an earlier matching of data.
Sometimes the data arrives in a rush only to sadly discover that it's not
needed after all. There's a lot of data shuffling happening around the world.
Most of it isn't even for you.

It's when that data (or lack of it) is
the next thing which the software is going to look at - that the economics of
having data in the right or wrong place suddenly becomes very serious. Because
if we have to wait too long to get the data then we may need a faster processor
(or more processors) to get the next thing done.

You may like to think
that data lives in cables, or in storage media or flying around on
electromagnetic waves. But from a memory systems perspective the time when
data really comes alive is when it's in our memory locality.

We
care that data comes when we need it.

And if we haven't got it in our
live place (the memory) then we really care about and want to know where it
lives. (And not just addresses in memory spaces - but locations in between the
memory spaces - in transit.)

Even better if we can tell the data
where to live. And if its comings and goings heed our calls.

(Sadly
other controllers too have a say in this matter. And even when they think
they're trying to be helpful their understanding is based on past customs of
politeness.)

back to my conversation with Diablo

Our
conversation took an interesting diversion when Kevin Wagner said something
about the techniques Diablo uses in the management of its data caching.

We
had discussed its DMX software in depth before and I wrote
something
about it last summer.

The new point which I latched onto is that
Diablo has used machine learning to not only get a better understanding of the
applications it commonly works with - but also to reverse engineer and
understand the behavior of some of the external controllers which it encounters
- in particular memory controllers.

That enables DMX to sometimes
predict the best way it should request and deliver new data.

The
behavior of controllers is a very important factor in the modern digital
economy.

Analyzing how to get optimal performance from
tiered memory, tiered storage etc which will be at the center of future focus
for much of our industry - especially in emerging fields like in-situ (SSD /
memory) processing, fast elements and software.

Although
latencies for raw data media and communications and interfaces have been well
understood and managed in their own ways for some time. The science of how to
manage large populations of different types of controllers in different
localities is fragmented with differing purposes.

Every controller
company has its own IP which does the best it can with the things it connects
to and can control.

What is becoming more important - when you are
in the memory zone - right in close with the RAM and processor - is getting a
better understanding of the connected controllers in your space. Because
application performance in the data world is limited by the complex
interactions of controller-controller speak (from the cloud right down into
each processor DRAM cache request ) to a much greater extent than ever before.

When
storage was slower and memories were smaller and the software was older - all
the controller designs looked good in comparison to the other devices
surrounding them. Now with faster storage, bigger memories and modern apps
software controllernomics has become the limiting factor.

So it's not
how fast the intrinsic memory cells or blocks work... you never get that
physical - because media controllers sit between you and noisy physics.
(Remember the "memory modem" from
DensBits - that
encapsulated the problem brilliantly).

And if you are that media
controller - speed (from the software's point of view - isn't just how well you
and the host interface get along together.) And it's not just how fast your
application's CPU works either - because other CPUs and other tasks are
competing in the same data highways.

Datasystems controllernomics
is like figuring out traffic patterns - some of which you can anticipate
(the effects of predicted snow, or the rush hour) but most of which you just
have to react to as best you can (a big truck took the wrong turn). And mixing
up the two things at the same time. And BTW - each time you call it wrong - you
contribute to the next controllernomics snafu.

So you might ask...
hey why doesn't some software manage all this? And what about the role of
operating systems?

Let's look at the OS first. If you've read any
histories of computing you'll know that in the dotcom era (which was the last
grand ball era when server CPUs, DRAM and hard drives all knew their place
and were equally respected because they had all grown a little bit faster and
fatter together up to that that). Chronologically that's upto about 1999-2000
- if you prefer a date. Well upto that time - the OSes took many of their
responsibilities seriously.

After that we got into the causes of the
great war (I mean the
modern era of SSDs).
I already dealt with most of the decline, fall and abandonment of the OS (in a
useful memory systems context) in my 2012 article -
where are we now
with SSD software? (And how did we get into this mess?).

Rather
than repeat that analysis here - and to be fair to the OS companies their
traditional systems partners didn't know what was happening either. But in any
case the OS companies had other distractions - like trying to be the next
search engine destination, the next phone platform or trying to hang on while
pesky open source OS startups were giving enterprise OSes away free to
whoever could download them quickly enough.

Anyway that's how the
critical software for SSDs
got to be written by SSD companies themselves - because for a long time - no-one
else was going to do it.

This brings us to the present day. And the SSD
market has grown large enough to merit its own conferences, standards etc -
which is how we got new form factors like
M.2 and new software like
NVMe. So the OS companies and the hypervisor companies are more than happy
enough to gatecrash the SSD party . But...

And this is a big but...
They have no real incentive to improve performance to the next level. And as
their business models depend on remaining as hardware agnostic as possible -
they have every reason to avoid tying themselves too closely to any quick
changing deep piece of single sourced semiconductor trickery. And - even if that
wasn't so - the enterprise OS companies have business models which depend on
supporting hardware platforms which are already shipping in high volume - and
not in creating new platforms.

Give them a problem like tiered memory
- which can be solved with a purely software solution and yeah they'll support
it eventually (or buy little software companies who can show them how to do it).

But
give them a problem where the little pieces involve nanosecond hardware
support in semiconductors and where the analysis comes from learning what they
themselves have been doing wrong for years - and you can see why the OS
companies are not where the best solutions are going to come from.

Diablo
got into big datasystems controllernomics (that's my term for it - not theirs)
because they spent a lot of time analyzing problems from a particular angle (in
the memory close to the processors). And they discovered that even after you've
understood the stacks and the apps and the architecture there's still another
factor of modeling and predicting which it's worth getting to know - but only
if you can do something about it.

And once you've done that - and are
comfortably working in the memory and storage and controller-controller
alternate universe - then just as Google found with search - you're in a better
vantage point to learn more and stay ahead. And if you do - and occupy enough
server boxes - then you might become the controller behavior which
others in the controllernomics universe have to reverse analyze and understand.

And
although this started out as a "flash as RAM" problem - the solution
methodology isn't tied to flash.

As predicted 8 years ago - the
widespread adoption of SSDs signed the death warrant for hardware
RAID controllers.

Sleight
of hand tricks which seemed impressive enough to make hard drive arrays (RAID) seem fast in the
1980s - when viewed in slow motion from an impatient SSD perspective - were
just too inelegant and painfully slow to be of much use in true
new dynasty
SSD designs.

The confidence of "SSDs everywhere"
means that the data processing market is marching swiftly on - without much
pause for reflection - towards memory centric technologies. And many old
ideas which seemed to make sense in 1990s architecture are failing new tests
of questioning sanity.

"We are at a junction
point where we have to evolve the architecture of the last 20-30 years. We can't
design for a workload so huge and diverse. It's not clear what part of it runs
on any one machine. How do you know what to optimize? Past benchmarks are
completely irrelevant."

The memory intensive tests were run on the same cluster of five
servers (Inspur NF5180M4, two Intel Xeon CPU E5-2683 v3 processors, 28 cores
each, 256GB DRAM, 1TB NVME drive).

The servers were first configured
to use only the installed DRAM to process multiple datasets. Next, the cluster
was set up to run the tests on the same datasets with 2TB of Memory1 per server.

The k-core algorithm (which is typically used to analyze large
amounts of data to detect cross-connectivity patterns and relationships) was
run in an Apache Spark environment to analyze three graph datasets of
varying sizes upto a 516GB set of 300 million vertices with 30 billion edges.

Completion
times for the smallest sets were comparable. However, the medium-sized sets
using Memory1 completed twice as fast as the traditional DRAM configuration (156
minutes versus 306 minutes). On the large sets, the Memory1 servers completed
the job in 290 minutes, while the DRAM servers were unable to complete due to
lack of memory space.

Editor's comments:- As has been noted in
previously published research by others - being able to have more RAM emulation
flash memory in a single server box can (in big data computing) give similar
or better results than implementing the server set with more processors and more
DRAM in more boxes.

This is due to the traffic controller and fabric
latencies between server boxes which can negate most of the intrinsic
benefits of the faster raw memory chips - if they are physically located in
another box.

The key takeaway message from this benchmark is that a
single Memory1 enhanced server can perform the same workload as 2 to 3 non
NVDIMM enhanced servers when the size of the working data set is the
limiting factor.

More useful however (as you will always find an
ideal benchmark which is a good fit to the hardware) is that the Memory1 system
places lower (3x lower) caching demands on the next level up in the
storage system (in this case the attached NVMe SSDs). This provides a higher
headroom of scalability before the SSDs themselves become the next critical
bottleneck.

In their
datasheet
about Memory1 enhanced servers Inspur give another example of the advantages of
this approach - quoting a 3 to 1 reduction in server footprint and faster
job completion for a 500GB SORT.

Many of the important and
sometimes mysterious behavioral aspects of SSDs which predetermine their
application limitations and usable market roles can only be understood when you
look at how well the designer has dealt with managing the symmetries and
asymmetries which are implicit in the underlying technologies which are
contained within the SSD.

I'm just saying (as I have
been saying since 2003 in my article on SSD-CPU equivalence) why I think the TAM
for server based SSDs is a percentage of the server market - and almost entirely
decoupled from the cost of storage capacity on the SAN.

If you spend a lot of your
time analyzing the performance characteristics and limitations of flash SSDs -
this article will help you to easily predict the characteristics of any new SSDs
you encounter - by leveraging the knowledge you already have.