Posts

We haven’t talked with Tom Lyon (@aka_pugs) or Brian Pawlowski before on our show but both Howard and I know Brian from his prior employers. Tom and Brian work for DriveScale, a composable infrastructure software supplier.

There’s been a lot of press lately on NVMeoF and the GreyBeards thought it would be good time to hear from another way to supply DAS like performance and functionality. Tom and Brian have been around long enough to qualify as greybeards in their own right.

The GreyBeards have heard of composable infrastructure before but this was based on PCIe switching hardware and limited to a rack or less of hardware. DriveScale is working with large enterprises and their data center’s full of hardware.

Composable infrastructure has many definitions but the one DriveScale probably prefers is that it manages resource pools of servers and storage, that can be combined, per request, to create any mix of servers and DAS storage needed by an application running in a data center. DriveScale is targeting organizations that have from 1K to 10K servers with from 10K to 100K disk drives/SSDs.

Composable infrastructure for large enterprises

DriveScale provides large data centers the flexibility to better support workloads and applications that change over time. That is, these customers may, at one moment, be doing big data analytics on PBs of data using Hadoop, and the next, MongoDB or other advanced solution to further process the data generated by Hadoop.

In these environments, having standard servers with embedded DAS infrastructure may be overkill and will cost too much. For example., because one has no way to reconfigure (1000) server’s storage for each application that comes along, without exerting lots of person-power, enterprises typically over provision storage for those servers, which leads to higher expense.

But if one had some software that could configure 1 logical server or a 10,000 logical servers, with the computational resources, DAS disk/SSDs, or NVMe SSDs needed to support a specific application, then enterprises could reduce their server and storage expense while at the same time provide applications with all the necessary hardware resources.

When that application completes, all those hardware resources could be returned back to their respective pools and used to support the next application to be run. It’s probably not that useful when an enterprise only runs one application at a time, but when you have 3 or more running at any instant, then composable infrastructure can reduce hardware expenses considerably.

DriveScale composable infrastructure

DriveScale is a software solution that manages three types of resources: servers, disk drives, and SSDs over high speed Ethernet networking. SAS disk drives and SAS SSDs are managed in an EBoD/EBoF (Ethernet (iSCSI to SAS) bridge box) and NVMe SSDs are managed using JBoFs and NVMeoF/RoCE.

DriveScale’s composer orchestrator self-discovers all hardware resources in a data center that it can manage. It uses an API to compose logical servers from server, disk and SSD resources under its control available, throughout the data center.

Using Ethernet switching any storage resource (SAS disk, SAS SSD or NVMe SSD) can be connected to any server operating in the data center and be used to run any application.

There’s a lot more to DriveScale software. They don’t sell hardware. but have a number of system integrators (like Dell) that sell their own hardware and supply DriveScale software to run a data center.

The podcast runs ~44 minutes. The GreyBeards could have talked with Tom and Brian for hours and Brian’s very funny. They were extremely knowledgeable and have been around the IT industry almost since the beginning of time. They certainly changed the definition of composable infrastructure for both of us, which is hard to do. Listen to the podcast to learn more. .

Tom Lyon, Co-Founder and Chief Scientist

Tom Lyon is a computing systems architect, a serial entrepreneur and a kernel hacker.

Prior to founding DriveScale, Tom was founder and Chief Scientist of Nuova Systems, a start-up that led a new architectural approach to systems and networking. Nuova was acquired in 2008 by Cisco, whose highly successful UCS servers and Nexus switches are based on Nuova’s technology.

He was also founder and CTO of two other technology companies. Netillion, Inc. was an early promoter of memory-over-network technology. At Ipsilon Networks, Tom invented IP Switching. Ipsilon was acquired by Nokia and provided the IP routing technology for many mobile network backbones.

As employee #8 at Sun Microsystems, Tom was there from the beginning, where he contributed to the UNIX kernel, created the SunLink product family, and was one of the NFS and SPARC architects. He started his Silicon Valley career at Amdahl Corp., where he was a software architect responsible for creating Amdahl’s UNIX for mainframes technology.

Brian Pawlowski, CTO

Brian Pawlowski is a distinguished technologist, with more than 35 years of experience in building technologies and leading teams in high-growth environments at global technology companies such as Sun Microsystems, NetApp and Pure Storage.

Before joining DriveScale as CTO, Brian served as vice president and chief architect at Pure Storage, where he focused on improving the user experience for the all-flash storage platform provider’s rapidly growing customer base. He also was CTO at storage pioneer NetApp, which he joined as employee #18.

Brian began his career as a software engineer for a number of well-known technology companies. Early in his days as a technologist, he worked at Sun, where he drove the technical analysis and discussion on alternate file systems technologies. Brian has also served on the board of trustees for the Anita Borg Institute for Women and Technology as well as a member of the board at the Linux Foundation.

Brian studied computer science at Arizona State University, physics at the University of Texas at Austin, as well as physics at MIT.

We’ve talked with Frederic before (see: Episode #33 on HPC storage) but since then, he has worked for an analyst firm and now he’s back on his own again, at HighFens. Given all the interest of late in AI, machine learning and deep learning, we thought it would be a great time to catch up and have him shed some light on deep learning and what it needs for IT infrastructure.

Frederic has worked for HPC / Big Data / AI / IoT solutions in the speech recognition industry, providing speech recognition services for some of the largest organizations in the world. As I understand it, the last speech recognition AI application he worked on implemented deep learning.

A brief history of AI

Frederic walked the Greybeards through the history of AI from the dawn of computing (1950s) until the recent emergence of deep learning (2010).

He explained that, early on one could implement a chess playing program, using hand coded rules based on a chess expert’s playing technique. Later when machine learning came out, one could use statistical analysis on multiple games and limited rule creation to teach a AI machine learning system how to play chess. With deep learning (DL), all you have to do now is to feed a DL model all the games you have and it learns how to play chess well all by itself. No rule making needed.

AI DL training and deployment infrastructure

Frederic described some of the infrastructure and data needs for various phases of an industrial scale, AI DL workflow.

Training deep learning models takes data and the more, the better. Gathering/saving large amounts of data used for DL training is a massive write workload and at the end of that process, hopefully you have PB of data to work with.

Selecting DL training data from all those PBs, involves a lot of mixed read and write IO. In the end, one has selected and extracted the data to use to train your DL models.

During DL training, IO needs are all about heavy data read throughput. But there’s more, in the later half of the talk, Frederic talked about the need to keep expensive GPU cores busy and that requires sophisticated caching or Tier 0 storage supporting low latency IO.

Ray’s been doing a lot of blogging and other work on AI machine and deep learning (e.g., see Learning machine learning – parts 1, 2, & 3) so it was great to hear from Frederic, a real practitioner of the art. Frederic (with some of Ray’s help) explained the deep learning training process. But it wasn’t detailed enough for Howard, so per Howard’s request, we went deeper into how it really works.

Once you have a DL model trained and working within specifications (e.g., prediction accuracy), Frederic said deploying DL models into production involves creating two separate clusters. One devoted to deep learning model inferencing, which takes in data from the world and performs inferencing (prediction, classification, interpretations, etc.) and the other uses that information for model adaption to fine tune DL models for specific instances.

Adaption and inferencing were both read and write IO workloads and the performance of this IO was dependent on a specific model’s use

Model adaption would personalize model predictions for each and every person, car, genotype, etc. This would be done periodically (based on SLAs, e.g. every 4 hrs). After that, a new, adapted model could be introduced into production, adapted for that specific person/car/genotype.

If the adaption applied more generally, that data and its human-machine validated/vetted prediction, classification, interpretation, etc. would be added back into the DL model training set to be used the next time a full model training pass was to be done. Frederic said AI DL model training is never done.

Sometime later, all this DL training, production and adaption data needs to be archived for long term access.

We then discussed the recent offerings from NVIDIA and major storage vendors that package up a solution for AI deep learning. It seems we are seeing another iteration of Converged Infrastructure, only this time for AI DL.

Finally, over the course of Ray’s AI DL education, he had come to the belief that AI deep learning could be applied by anyone. Frederic corrected Ray stating that AI deep learning should be applied by anyone.

The podcast runs ~44 minutes. Frederic’s been an old friend of Howard’s and Ray’s, since before the last podcast. He’s one of the few persons in the world that the GreyBeards know that has real world experience in deploying AI DL, at industrial scale. Frederic’s easy to talk with and very knowledgeable about the intersection of Ai DL and IT infrastructure. Howard and I had fun talking with him again on this episode. Listen to the podcast to learn more. .

Frederic Van Haren

Frederic Van Haren is the Chief Technology Officer @ HighFens. He has over 20 years of experience in high tech and is known for his insights in HPC, Big Data and AI from his hands-on experience leading research and development teams. He has provided technical leadership and strategic direction in the Telecom and Speech markets.

He spent more than a decade at Nuance Communications building large HPC and AI environments from the ground up and is frequently invited to speak at events to provide his vision on the HPC, AI, and storage markets. Frederic has also served as the president of a variety of technology user groups promoting the use of innovative technology.

As an engineer, he enjoys working directly with engineering teams from technology vendors and on challenging customer projects.

Frederic lives in Massachusetts, USA but grew up in the northern part of Belgium where he received his Masters in Electrical Engineering, Electronics and Automation.

In this, our yearend industry wrap up episode, we discuss trends and technology impacting the IT industry in 2018 and what we can see ahead for 2019 and first up is NVMeoF

NVMeoF has matured

In the prior years, NVMeoF was coming from startups, but last year it’s major vendors like IBM FlashSystem, Dell EMC PowerMAX and NetApp AFF releasing new NVMeoF storage systems. Pure Storage was arguably earliest with their NVMeoF JBOF.

Dell EMC, IBM and NetApp were not far behind this curve and no doubt see it as an easy way to reduce response time without having to rip and replace enterprise fabric infrastructure.

In addition, NVMeoFstandards have finally started to stabilize. With the gang of startups, standards weren’t as much of an issue as they were more than willing to lead, ahead of standards. But major storage vendors prefer to follow behind standards committees.

As another example, VMware showed off an NVMeoF JBOF for vSAN. A JBoF like this improves vSAN storage efficiency for small clusters. Howard described how this works but with vSAN having direct access to shared storage, it can reduce data and server protection requirements for storage. Especially, when dealing with small clusters of servers becoming more popular these days to host application clusters.

The other thing about NVMeoF storage is that NVMe SSDs have also become very popular. We are seeing them come out in everyone’s servers and storage systems. Servers (and storage systems) hosting 24 NVMe SSDs is just not that unusual anymore. For the price of a PCIe switch, one can have blazingly fast, direct access to a TBs of NVMe SSD storage.

HCI reaches critical mass

HCI has also moved out of the shadows. We recently heard news thet HCI is outselling CI. Howard and I attribute this to the advances made in VMware’s vSAN 6.2 and the appliance-ification of HCI. That and we suppose NVMe SSDs (see above).

HCI makes an awful lot of sense for application clusters that VMware is touting these days. CI was easy but an HCI appliance cluster is much, simpler to deploy and manage

For VMware HCI, vSAN Ready Nodes are available from just about any server vendor in existence. With ready nodes, VARs and distributors can offer an HCI appliance in the channel, just like the majors. Yes, it’s not the same as a vendor supplied appliance, doesn’t have the same level of software or service integration, but it’s enough.

[If you want to learn more, Howard’s is doing a series of deep dive webinars/classes on HCI as part of his friend’s Ivan’s ipSpace.net. The 1st 2hr session was recorded 11 December, part 2 goes live 22 January, and the final installment on 5 February. The 1st session is available on demand to subscribers. Sign up here]

Computional storage finally makes sense

Howard and I 1st saw computational storage at FMS18 and we did a podcast with Scott Shadley of NGD systems. Computational storage is an SSD with spare ARM cores and DRAM that can be used to run any storage intensive, Linux application or Docker container.

Because it’s running in the SSD, it has (even faster than NVMe) lightening fast access to all the data on the SSD. Indeed, And the with 10s to 1000s of computational storage SSDs in a rack, each with multiple ARM cores, means you can have many 1000s of cores available to perform your data intensive processing. Almost like GPUs only for IO access to storage (SPUs?).

We tried this at one vendor in the 90s, executing some database and backup services outboard but it never took off. Then in the last couple of years (Dell) EMC had some VM services that you could run on their midrange systems. But that didn’t seem to take off either.

The computational storage we’ve seen all run Linux. And with todays data intensive applications coming from everywhere these days, and all the spare processing power in SSDs, it might finally make sense.

Futures

Finally, we turned to what we see coming in 2019. Howard was at an Intel Analyst event where they discussed Optane DIMMs. Our last podcast of 2018 was with Brian Bulkowski of Aerospike who discussed what Optane DIMMs will mean for high performance database systems and just about any memory intensive server application. For example, affordable, 6TB memory servers will be coming out shortly. What you can do with 6TB of memory is another question….

In this episode we discuss high performance databases and the storage needed to get there, with Brian Bulkowski, Founder and CTO of Aerospike. Howard met Brian at an Intel Optane event last summer and thought he’d be a good person to talk with. I couldn’t agree more.

Howard and I both thought Aerospike was an in memory database but we were wrong. Aerospike supports in memory, DAS resident and SAN resident distributed databases.

Database performance is all about the storage (or memory)

When Brian first started Aerospike, they discovered that other enterprise database vendors were using fast path SAS SSDs for backend storage and so that’s where Aerospike started with on storage.

As NVMe SSDs came out, Brian expected higher performance but wasn’t too impressed with what he found out with NVMe SSD’s real performance as compared to SAS SSDs. However lately, the SSD industry has bifurcated into fast, low-capacity (NVMe) SSDs and slow, large capacity (SAS) SSDs. And over time the Linux Kernel (4.4 and above) has sped up NVMe IO stack. So now he has become more of a proponent of NVMe SSDs for high performing database storage.

In addition to SAS and NVMe SSDs, Aerospike supports SAN storage. One recent large customer uses SAN shared storage and loves the performance. Moreover, Aerospike also offers an in memory database option for the ultimate in high performance (low capacity) databases.

Write IO performance

One thing that Aerospike is known for is their high performance under mixed R:W workloads. Brian says just about any database can perform well with an 80:20 R:W IO mix, but at 50:50 R:W, most databases fall over.

Aerospike did detailed studies of SSD performance with high write IO and used SSD native APIs to understand what exactly was going on with SAS SSDs. Today, they understand when SSDs go into garbage collection and and can quiesce IO activity to them during these slowdowns. Similar APIs are available for NVMe SSDs.

Optane memory

The talk eventually turned to Optane DIMMs (3D Crosspoint Memory). With Optane DIMMs, server memory address space will increase from 1TB to 6TB. From Brian’s perspective this is still not enough to host a copy of a typical database but it would suffice to hold cache a database index. Which is exactly how they are going to use Optane DIMMs.

Optane DIMMs are accessed via PMEM (an Intel open source memory access API) and can specify caching (L1-L2-L3) characteristics, so that the processor(s) data and instruction caching tiers don’t get flooded with database information. Aerospike has done for in-memory databases in the past, it’s just requires a different API.

As a distributed database, they support data protection for DAS and in memory databases through mirroring, dual redundancy. But Aerospike was developed as a distributed database, so data can be sharded, across multiple servers to support higher, parallelized performance.

With Optane DIMMs being 1000X faster than NVMe SSD, the performance bottleneck has now moved back to the network. Given the dual redundancy data protection scheme, any data written on one server would need to be also written (across the network) to another server.

Data consistency in databases

This brought us around to the subject of database consistency. Brian said Aerospike database consistency for reads was completely parameterized, e.g. one can specify linear (database wide) consistency to session level consistency, with some steps in between. Aerospike is always 100% write consistent but read consistency can be relaxed for better performance.

Howard and I took a deep breath and said data has to be a 100% consistent. Brian disagreed, and in fact, historically relational databases were not fully read consistent. Somehow this felt like a religious discussion and in the end, we determined that database consistency is just another knob to turn if you want high performance.

The podcast runs ~44 minutes. Brian’s been around databases for a long time and seemingly, most of that time has been figuring out the best ways to use storage to gain better performance. He has a great perspective on NVMe vs. SAS SSD performance as well as (real) memory vs SCM performance, which we all need to understand better as SCM rolls out. Possibly, barring the consistency discussion, Brian was also easy to talk with. Listen to our podcast to learn more.

Brian Bulkowski, Founder and CTO, Aerospike

Brian is a Founder and the CTO of Aerospike. With almost 30 years in Silicon Valley, his motivation for starting Aerospike was the confluence of what he saw as the rapidly advancing flash storage technology with lower costs that weren’t being fully leveraged by database systems as well as the scaling limitations of sharded MySQL systems and the need for a new distributed database.

He was able to see these needs as both a Lead Engineer at Novell and Chief Architect at Cable Solutions at Liberate – where he built a high-performance, embedded networking stack and high scale broadcast server infrastructure.

In this episode we talkindexing old backups, GDPR and CyberSense, a new approach to cyber security, with Jim McGann, VP Marketing and Business Development, Index Engines.

Jim’s an old industry hand that’s been around backups, e-discovery and security almost since the beginning. Index Engines solution to cyber security, CyberSense, is also offered by Dell EMC and Jim presented at a TFDx event this past October hosted by Dell EMC (See Dell EMC-Index Engines TFDx session on CyberSense).

It seems Howard’s been using Index Engines for a long time but keeping them a trade secret. In one of his prior consulting engagements he used Index Engines technology to locate a a multi-million dollar email for one customer.

Universal backup data scan and indexing tool

Index Engines has long history as a tool to index and understand old backup tapes and files. Index Engines did all the work to understand the format and content of NetBackup, Dell EMC Networker, IBM TSM (now Spectrum Protect), Microsoft Exchange backups, database vendor backups and other backup files. Using this knowledge they are able to read just about anyone’s backup tapes or files and tell customers what’s on them.

But it’s not just a backup catalog tool, Index Engines can also crack open backup files and index the content of the data. In this way customers can search backup data, with Google like search terms. This is used day in and day out, for E-discovery and the occasional consulting engagement.

Index Engines technology is also useful for companies complying with GDPR and similar legislation. When any user can request information about them be purged from corporate data, being able to scan, index and search backups is great feature.

In addition to backup file scanning, Index Engines has a multi-PB, indexing solution which can be used to perform the same, Google-like searching on a data center’s file storage. Once again, Index Engines has done the development work to implement their own, highly parallelized metadata and content search engine, demonstratively falter than any open source (Lucene) search solution available today.

CyberSense

All that’s old news, what Jim presented at a TFDx event was their new CyberSense solution. CyberSense was designed to help organizations detect and head off ransomware, cyber assaults and other data corruption attacks.

CyberSense computes a data entropy (randomness) score as well as ~39 other characteristics for every file in backups or online in a custmer’s data center. It then uses that information to detect when a cyber attack is taking place and determine the extent of the corruption. With current and previous entropy and other characteristics on every data file, CyberSense can flag files that look like they have been corrupted and warn customers that a cyber attack is in process before it corrupts all of customers data files.

One typical corruption is to change file extensions. CyberSense cracks open file contents and can determine if it’s an office or other standard document type and then check to see if its extension matches its content. Another common corruption is to encrypt files. Such files necessarily have an increased entropy and can be automatically detected by CyberSense

When CyberSense has detected some anomaly, it can determine who last accessed the file and what executable was used to modify it. In this way CyberSecurity can be used to provide forensics on who, what, when and where about a corrupted file, so that IT can shut the corruption activity down before it’s gone to far.

CyberSense can be configured to periodically scan files online as well as just examine backup data (offline) during or after it’s backed up. Their partnership with Dell EMC is to do just that with Data Domain and Dell EMC backup software.

Index Engines proprietary indexing functionality has been optimized for parallel execution and for reduced index size. Jim mentioned that their content indexes average about 5% of the full storage capacity and that they can index content at a TB/hour.

Index Engines is a software only offering but they also offer services for customers that want a turn key solution. They also are available through a number of partners, Dell EMC being one.

The podcast runs ~44 minutes. Jim’s been around backups, storage and indexing forever. And seems to have good knowledge on data compliance regimes and current security threats impacting customers, across the world today . Listen to our podcast to learn more.

Jim McGann, VP Marketing and Business Development, Index Engines

Jim has extensive experience with the eDiscovery and Information Management in the Fortune 2000 sector. Before joining Index Engines in 2004, he worked for leading software firms, including Information Builders and the French based engineering software provider Dassault Systemes.

In recent years he has worked for technology based start-ups that provided financial services and information management solutions. Prior to Index Engines, Jim was responsible for the business development of Scopeware at Mirror Worlds Technologies, the knowledge management software firm founded by Dr. David Gelernter of Yale University. Jim graduated from Villanova University with a degree in Mechanical Engineering.

Jim is a frequent writer and speaker on the topics of big data, backup tape remediation, electronic discovery and records management.