FMS 2015: Novachips HLNAND Pushes SSDs Beyond 16TB Per SSD Controller

It turns out Samsung wasn’t the only company to have 16TB SSDs at Flash Memory Summit after all:

Now that I’ve got your attention, Novachips is an SSD company that does not make their own flash, but I would argue that they make other peoples flash better. They source flash memory wafers and dies from other companies, but they package it in a unique way that enables very large numbers of flash dies per controller. This is handy for situations where very large capacities per controller are needed (either physically or logically).

Normally there is a limit to the number of dies that can communicate on a common bus (similar limits apply to DRAM, which is why some motherboards are picky with large numbers of DIMMs installed). Novachips gets around this with an innovative flash packaging method:

The 16-die stack in the above picture would normally just connect out the bottom of the package, but in the Novachips parts, those connections are made to a microcontroller die also present within the package. This part acts as an interface back to the main SSD controller, but it does so over a ring bus architecture.

To clarify, those 800 or 1600 MB/sec figures on the above slide are the transfer rates *per ring*, and Novachips controller is 8-channels, meaning the flash side of the controller can handle massive throughputs. Ring busses are not limited by the same fanout requirements seen on parallel addressed devices, which means there is no practical limit to the number of flash packages connected on a single controller channel, making for some outrageous amounts of flash hanging off of a single controller:

That’s a lot of flash on a single card (and yes, the other side was full as well).

The above pic was taken at last years Flash Memory Summit. Novachips has been making steady progress on controller development as well. Here is a prototype controller seen last year running on an FPGA test system:

…and this year that same controller had been migrated to an ASIC:

It’s interesting to see the physical differences between those two parts. Note that both new and old platforms were connected to the same banks of flash. The newer photo showed two complete systems – one on ONFi flash (IMFT Intel / Micron) and the other on Toggle Mode (Toshiba). This was done to demonstrate that Novachips HLNAND hardware is compatible with both types.

Novachips also had NVMe PCIe hardware up and running at the show.

Novachips was also showing some impressive packaging in their SATA devices:

At the right was a 2TB SATA SSD, and at the left was a 4TB unit. Both were in the 7mm form factor. 4TB is the largest capacity SSD I have seen in that form factor to date.

Novachips also makes an 8TB variant, though the added PCB requires 15mm packaging.

All of this means that it is not always necessary to have huge capacity per die to achieve a huge capacity SSD. Imagine very high capacity flash arrays using this technology, connecting a single controller to a bank of Toshiba’s new QLC archival flash or Samsung’s new 256Gbit VNAND. Then imagine a server full of those PCIe devices. Things certainly seem to be getting big in the world of flash memory, that’s for sure.

doubtful. The current cost/GB for SSD versus HDD is still not a compelling reason to buy large sized (multi TB) SSDs.

Once SSD $/GB comes down below $0.10/GB for >=1TB SSDs then you'll see a decline in HDD sales and manufacturing.

My prediction is that HDD companies like Seagate and WesternDigital will continue to create high capacity HDDs for many more years, continue to drive down the cost/TB ratio and eventually hit a wall with physics and areal density. In the interim, SSD tech will mature to the point where performance becomes the prime example of diminishing returns, but NAND density will continue to improve greatly. Over the next 2 or 3 years a 1TB SSD will cost <$100

Lastly, the Enterprise level stuff will get all the good stuff like NVMe, 16TB, ST-MRAM, CBRAM, PCM, XPoint, etc. Some of that tech will trickle down to the consumer space, but realistically, it will only be in "capacity" not speed related attributes.

> Some of that tech will trickle down to the consumer space, but realistically, it will only be in "capacity" not speed related attributes.

RIIIIIGHT!

It's interesting that the clock rate of SATA cables is STUCK at 6GHz and has been for a long time.

And, NOW with U.2 cables, it seems to me that the industry is reverting to a form of parallel cables, which consume PCIe lanes.

Heck, if I wish to exploit the resiliency of higher RAID modes, I'm up to 16 PCIe lanes if I RAID 4 x 2.5" NVMe drives.

Since all leading Nand Flash SSDs are bumping against that 6G ceiling, it only makes sense to increase the SATA clock rate e.g. perhaps in discrete steps: 8G, 12G, 16G and then 32G on the visible horizon.

The PCIe 3.0 spec calls for an 8G clock and 128b/130b jumbo frame.

(Please note that the USB 3.1 spec did just that!)

I see no reason why the industry should not "sync" SATA-IV with the PCIe 3.0 spec, and "sync" SATA-V (or SATA-IV.2) with the 16G clock being adopted for PCIe 4.0.

The "S" in SATA stands for serial, implying only one PCIe lane is required to transmit data to "S"ATA devices i.e. a serial stream of binary digits.

I hope I am wrong about this, but the industry is starting to look and act like an oligopoly, pricing enterprise solutions beyond the reach of consumers, and leaving consumers stranded below the glass ceiling imposed by the SATA-III standard.

One "trickle down" I would certainly like to see is a DDR3 or DDR4 JEDEC-compatible Non-Volatile RAM mounted on the SODIMM form factor.

I tried contacting Intel to see if they could supply same, using their Octane memory, and I received a rather condescending reply from an "agent under contract to Intel".

That agent replied that "Intel is no longer accepting donation requests" [sic]

Would somebody like to translate that phrase for me, please?

Being an Intel customer for many years, I thought I had a write to complain to Intel's Investor Relations department: they fell silent.

HDD and Tape will be around a lot longer than you think for backup, and some data staging tasks. Especially for HPC/Server systems that are already using tiered storage management software solutions. HDD for backup, as well as staging data-sets before they are needed with the tiered storage management software ready to transfer data from the HDDs to the SSDs before the advanced OS data requests begin arriving and the data be transferred to RAM. Tape for the long term storage/archival storage of massive amounts of data. Those long HPC/server runs of analytics on massive data sets still use thousands of HDDs along with SSDs in order to stay ahead of the data requests for these massive data-set workloads. Think of the large hadron collider, and even some business tasks like market analysis.

HDDs maybe not be as profitable for the consumer market, but for the HPC/server market there is still a need and that need will continue to get larger. Even on the consumer side. I still trust HDDs for backup, and most consumers will still have HDD backup with its proven reliability for longer term storage.

With the new vertical flash chips, the price per GB should be going down significantly. The current vertical flash technology is made on an older process node, but it requires quite a few extra steps in wafer processing due to the large number of layers. Since it just requires more layers, but does not require all of the extreme measures of a cutting edge process, it should be relatively cheap. I think price parity with hard drives is probably closer than 10 years due to the capacity increases allowed by using 3D structures.

10 years is very optimistic. SSDs don't need to be equal to HDDs in price due to the extreme performance boost they provide. Even at their current high prices on the enterprise side, SSDs are quickly overtaking HDDs. Tech like this that octuples capacity will be very quickly adopted. HDDs are already being relegated to cold storage status as they approach their physical storage limits. 10 years from now, even cold storage will be flash-based.

Yes and you'll have to keep those expensive Nand Based cells in an actual cold storage, least all those faster than HDD, or tapes, Nand cells will quickly loose their state and become slower than the HDDs, or tape drives if the data stored in NAND can even be recovered by error correction in the first place. Where is the logic of ever using SSDs for long term storage and waiting any of the speed advantage that NAND currently provides. HDD, and Tapes will be around in large quantities even after that 10 years, with NAND pretty much replaced by CrossBar and other technologies. HDD and tapes can last a good while completely powered down, while those Nand cells are not very good for long term storage.

if you think about something like YouTube or other VOD service, they have a lot of storage that needs to be "on-line", but may not get accessed very often. This is the access pattern for many storage systems, although video obviously takes up a massive amount of space. With almost everyone carrying a 1080p capable camera withe them, the amount of video and images is just going to go up. I was just watching some videos on YouTube from 2011 that may not have been accessed for months.

I assume that all of the videos are on massive hard drive based storage arrays. Tapes can store a lot of data, but they don't offer quick enough access. If they can make these drives significantly lower power than hard drives, then it could be a compelling product. It would always be powered on, so they could implement a refresh cycle to refresh blocks that have not been written in a while, if that is a concern. A 16 TB device that offers near instant access and doesn't consume much of any power when not being accessed sounds like a perfect product for this market. As far as I know, the power consumption of an inactive flash drive is very low. This has the added benefit of reducing cooling needs. In my experience, hard drives are actually not that reliable in this setting. They are often failing requiring swapping in new drives and rebuilding the array. Replacing this with a flash device would probably be significantly more reliable.

Flash Nand, eventually replaced by CrossBar and other NVM IP, and HDDs only used for long term storage, with tapes for offline backup. Maybe Google will use more NAND, but that will be part of a complex tiered storage management system with any stale videos residing on HDD, and the popular cats of the week videos getting stored on the SSDs, etc. etc. Those enterprise HDDs will also have their usable life extended when they and the SSDs are used in a properly managed tiered hierarchical storage, and the HDD will see mostly large sequential serial read/write loads, while the SSD lend themselves to the random workloads. NAND with any required power standby usage for backups will still be bested by the completely un-powered long term storage abilities of spinning rust or tapes. If anything NAND will be pushed into a more equal a hybrid SSD/HDD usage model after the appearance of CrossBar, with the hybrid HDDs sporting equal amounts of NAND and spinning rust capacities. The Hybrid drive's NAND acting as a buffer/cache that can be in background copied over onto the spinning rust portion without any need for OS supervision, or any bus bandwidth outside the hybrid drive's internal controller channels.

So Flash NAND and spinning platters pared in a one to one storage capacity relationship inside hybrid drives and the platters powered down mostly except for the occasional background NAND write through/syncing to maintain coherency of data between the drive's NAND memory and the data on the platters. These Hybrid drives will have both long term un-powered storage longevity, as well as the faster NAND responsiveness in one device. So with the hybrid drive that SSD longer term storage defficiency will be a non issue, as the spinning platter portion of the drive will be there with its intrensic long term storage strengths, while on the short term the NAND in equal parts will be what provides its intrensic strengths to the hybrid paring for faster data access and less latancy. All of the goodness of both technologies in a single package, plus the ability to offload the staging/syncing of the data stored between the SSD and HDD portion onto the device's controller itself. Hard drives will just evolve more built-in SDD capisity, and the hybrid device will be what continues past those 10 years.

Do hard drives in data centers usually spin down? I was under the impression that they usually stayed running continuously. I have heard a lot of stories about continuously spinning drives not coming back up after being powered down due to a system outage of some kind. A flash device is, as far as I know, lower power than a hard drive even when both are in sleep modes. You just get a much higher latency to wake up the hard drive. Also, the flash device can wake up, service the request, and be back in a sleep state before the hard drive can even spin up. Video on demand usage would be all sequential from the storage perspective. It would read the entire video into memory whenever it is requested. Also the video will remain in cache until pushed out. Given how fast capacity is going up, flash will still be more expensive than hard drives for a while, but this doesn't mean that there wouldn't be a market for such devices.

The hard drives on most data centers are now on the back side of large flash arrays, with the tiered management software doing the transferring of data to and from the hard drives and SSDs, as well as the servers. Have you ever even gone to any of the non enthusiast's professional server trade magazine’s websites and read about the storage systems available to the server industry, they usually consist of dedicated Xeon/other server SKUs with access to TBs of flash, and even more TBs of HDD storage, and add to that large amounts of RAM to Cache the Trillions of read/write transactions! Power usage is a big cost in the server room, so if a drive can be spun down and parked it will be to save energy, do you even comprehend the level of complexity in the algorithms that just go into the load balancing of the read/write loads to the hierarchical memory systems and subsystems on modern server systems. Things are very tightly controlled and power usage is one of the most tightly controlled of the many thousands of operating metrics that are monitored on large HPC/Server systems.

Flash(NAND) is going to get replaced by CrossBar memory, and other newer technologies, and the displaced NAND technology will be increasily added to HDDs in larger amounts, because of NANDs inherent advantages, as well as being paired with HDDs because of NAND's inherent disatvantages. The HDD will still be more trusted for long(er) term storage, bolstered by the NAND as a form of inexpensive cache for the HDD to make use of. Even consumer level Hybrid drives will begin to get more NAND cache integrated into the Hybrid drive, and the hybrid drive's controllers will become more sophisticated to the point that even consumer hybrid drives will become teired/hierarchical storage systems in their own right.

P.S. do not forget about Tape, as that will remain there for those offline backups, and Tapes are tops for cold/old storage of archival data with HDDs a level above for the not so cold/old that may be needed more often.

OK, this got me really excited. I'm still in awe-jaw-dropped-to the floor position. Wet dream for now. 4 of these 16TB beauties would sent whole current setup into oblivion. And I don't even need for them to be NVMe, old school AHCI will do!

BTW: Allyn what is that cute big monster card with 128 dies of HLNAND? I'm drooling uncontrollably at that picture.... Storage pr0n at its finest! :D

128 dies? You're forgetting that those are not dies. Those are packages (with 16 die stacks in each). They are not mass producing that part of course, but it's a demo to show the capabilities of their technology.

This is kind of similar to hyprid memory cube interconnect. I have been wondering if we will get flash and/or xpoint memory connected using the HMC interface. This would result in a similar architecture: it just converts a wide, parallel interface into a narrow, serialized interface.

In the short term, we are probably going to get some large PCIe devices based on standard controllers. High performance will be reachable by using multiple cards. In the long run, I would expect the bigger players are working on similar technology, so Novachips may have a limited window of opportunity here.

Each and every week we see new players rolling into the SSD-manufacturing segment, but prices are still not low enough...for how many months more do we need to suffer? I want my 512GB of quality SSD space for ~100$ RIGHT EFFING NAO!

Actually we're pretty close to the number you are looking for there. That's $0.20/GB, and we're currently hovering at around $0.30/GB. Just remember that's another 50% reduction so it may take a bit longer.

Let's say you were tasked with developing a storage system
that was intended to recover from extremely serious outages,
such as a massive Electro-Magnetic Pulse that fries at least
50% of the electrical circuits operating between Boston and
San Diego. Albert Einstein called this exercise a "thought
experiment" -- good for exciting idle neurons in the
hemisphere between our ears. I am aware of some niche
technologies which can survive such an EMP event,
but their capacities are extremely limited when compared
to the amount of data (and programs) that will be lost
in that event. In my own case, I can quickly identify
all prior drive images of the C: partition on my primary
workstation, and a 12GB database which mirrors a website
that I maintain. Where do I look and what do I purchase
that will guarantee the survival of all high-priority data,
not only in the event of an EMP catastrophe, but also
in the event that marketing hype has vastly over-stated
the operational longevities of current memory technologies
like Nand Flash SSDs? I don't know, for a fact, whether
or not technologies like MRAM will do the job. So,
what do I do, and where do I look, given that such a
search should not take longer than the warranty periods
on my existing storage devices -- as a practical deadline?

p.s. I recently had reason to retrieve a Windows 7 workstation after it sat idle for 11 months.

It was built with that OS hosted on a RAID-0 array
of 2 x Corsair Nand Flash SSDs that were working fine
at the start of that 11-month period.

At the end of that 11-month period, something had
happened to that RAID-0 array which prevented that
OS from booting. So, I booted from a backup
partition on a conventional HDD, re-formatted
that RAID-0 array, and restored a working drive
image to those 2 x Corsair SSDs.

After that RESTORE task, I was able to boot again
from that RAID-0 array of 2 x Corsair SSDs.

I don't know if this was the problem, but I do remember
reading comments by Allyn Malventano concerning the
problems that can occur with Nand Flash SSDs
when they are left idle for a long period of time.

I can honestly say that there was NEVER any mention
of this type of failure in any of the marketing
literature I read about Corsair SSDs back when
they were being released, nor for that matter
about any OTHER SSDs that were being manufactured
at that time.