Posts

This is the first time we’ve had Sumit Puri, CEO & GM Co-founder of Liqid on the show but both Greg and I have talked with Liqid in the past. Given that we talked with another composable infrastructure company (see our DriveScale podcast), we thought it would be nice to hear from their competition.

We started with a brief discussion of the differences between them and DriveScale. Sumit mentioned that they were mainly focused on storage and not as much on the other components of composable infrastructure.

[This was Greg Schulz’s (@storageIO & StorageIO.com), first time as a GreyBeard co-host and we had some technical problems with his feed, sorry about that.]

Multi-fabric composable infrastructure

At Dell Tech World (DTW) 2019 last week, Liqid announced a new, multi-fabric composability solution. Originally, Liqid composable infrastructure only supported PCIe switching, but with their new announcement, they also now support Ethernet and InfiniBand infrastructure composability. In their multi-fabric solution, they offer JBoG(PUs) which can attach to Ethernet/InfiniBand as well as other compute accelerators such as FPGAs or AI specific compute engines.

For non-PCIe switch fabrics, Liqid adds an “HBA-like” board in the server side that converts PCIe protocols to Ethernet or InfiniBand and has another HBA-like board sitting in the JBoG.

As such, if you were a Media & Entertainment (M&E) shop, you could be doing 4K real time editing during the day, where GPUs were each assigned to a separate servers running editing apps and at night, move all those GPUs to a central server where they could now be used to do rendering or transcoding. All with the same GPU-sever hardware andusing Liqid to re-assign those GPUs, back and forth during day and night shifts.

Even before the multi-fabric option Liqid supported composing NVMe SSDS and servers. So with a 1U server which in the package may support 4 SSDS, with Liqid you could assign 24-48 or whatever number made the most sense to that 1U server for a specialized IO intensive activity. When that activity/app was done, you could then allocate those NVMe SSDs to other servers to support other apps.

Why compose infrastructure

The promise of composability is no more isolated/siloed/dedicated hardware in your environment. Resources like SSDs, GPUS, FPGAs and really servers can be torn apart and put back together without sending out a service technician and waiting for hours while they power down your system and move hardware around. I asked Sumit how long it took to re-configure (compose) hardware into a new congfiguration and he said it was a matter of 20 seconds.

Sumit was at an NVIDIA show recently and said that Liqid could non-disruptively swap out GPUs. For this you would just isolate the GPU from any server and then go over to the JBoG and take the GPU out of the cabinet.

How does it work

Sumit mentioned that they have support for Optane SSDsto be used as DRAM memory (not Optane DC PM) using IMDT (Intel Memory Drive Technology). In this way you can extend your DRAM up to 6TB for a server. And with Liqid it could be concentrated on one server one minute and then spread across dozens the next.

I asked Sumit about the overhead of the fabrics that can be used with Liqid. He said that the PCIe switching may add on the order of 100 nanoseconds and the Ethernet/InfiniBand networks on the order of 10-15 microseconds or roughly 2 orders of magnitude difference in overhead between the two fabrics.

Sumit made a point of saying that Liqid is a software company. Liqid software runs on switch hardware (currently Mellanox Ethernet/InfiniBand switches) or their PCIe switches.

But given their solution can require HBAs, JBoGs and potentially PCIe switches there’s at least some hardware involved. But for Ethernet and InfiniBand their software runs in the Mellanox switch gear. Liqid control software has a CLI, GUI and supports an API.

Liqid supports any style of GPU (NVIDIA, AMD or ?). And as far as they were concerned, anything that could be plugged into a PCIe bus was fair game to be disaggregated and become composable.

Solutions using Liqid

Their solution is available from a number of vendors. And at last week’s, DTW 2019 Liqid announced a new OEM partnership with Dell EMC. So now, you can purchase composable infrastructure, directly from Dell. Liqid’s route to market is through their partner ecosystem and Dell EMC is only the latest.

Sumit mentioned a number of packaged solutions and one that sticks in my mind was a an AI appliance pod solution (sold by Dell), that uses Liqid to compose an training data ingestion environment at one time, a data cleaning/engineering environment at another time, a AI deep learning/model training environment at another time, and then an scaleable inferencing engine after that. Something that can conceivably do it all, an almost all in one AI appliance.

Sumit said that these types of solutions would be delivered in 1/4, 1/2, or full racks and with multi-fabric could span racks of data center infrastructure. The customer ultimately gets to configure these systems with whatever hardware they want to deploy, JBoGs, JBoFs, JBoFPGAs, JBoAIengines, etc.

The podcast runs ~42 minutes. Sumit was very knowledgeable data center infrastructure and how composability could solve many of the problems of today. Some composability use cases he mentioned could apply to just about any data center. Ray and Sumit had a good conversation about the technology. Both Greg and I felt Liqid’s technology represented the next step in data center infrastructure evolution. Listen to the podcast to learn more.

Sumit Perl, CEO & Co-founder, Liqid, Inc.

Sumit Puri is CEO and Co-founder at Liqid. An industry veteran with over 20 years of experience, Sumit has been focused on defining the technology roadmaps for key industry leaders including Avago, SandForce, LSI, and Toshiba.

Sumit has a long history with bringing successful products to market with numerous teams and large-scale organizations.

This is our first time talking with David Friend, (@Wasabi_Dave) Co-founder and CEO, Wasabi Technologies, but he certainly knows his way around storage. He has started a number of successful companies, the last one prior to Wasabi was Carbonite, a cloud backup company.

Before we get to the podcast, Howard Marks has retired from active GreyBeards co-hosting duty and has become Co-host Emeratus. We will all miss him and his astute insight. We wish him well. Howard did volunteer to be a co-host on the occasional podcast. So he will be back, just not a regular co-host anymore.

In his stead, Ray’s recruited a band of technical wizards that will share co-hosting duties with Howard. This is our first podcast with a new co-host, Matt Leib (@MBLeib). If you want to learn more about Matt his bio is on our About page and his website is linked in our menu above. Matt’s been a long time friend and chief IT architect for a number of firms in the past and present. Although he might not sport a grey beard, Matt certainly qualifies as a IT GreyBeard from our perspective.

One of the many things that make’s Wasabi cloud storage special is that it has no egress charges. Dave spent a lot of time after his last company talking to customers about cloud storage. Their number one complaint was unpredictable expense. Public cloud storage expense is unpredictable because it’s hard to predict data egress. With Wasabi cloud storage, customers get a one line invoice charge, just for the amount of data they are storing.

They also support immediate consistency. David said when customer applications receive an ack, their data has been received and can read back from anywhere in the world. Most other cloud storage vendors only support eventual consistency, which means “sometime” later the data on your cloud storage will be updated.

Wasabi does not support cloud compute. However, they do have software partners that can provide this. In some cases, these partners share proximity to Wasabi cloud data centers so access latencies can be minimized.

Their storage interface is fully S3 compliant and as mentioned above, have a number (>100) of “certified” software partners that can provide application storage access services, rather than having to use the S3 interface directly. Further, Wasabi supports both CommVault and Veeam for data protection cloud storage tiering..

Wasabi is also faster than AWS S3 storage because they’ve taken the time to optimize their writing to understand disk geometry, seeking and head switching. There’s upsides and downsides to this level of optimization. Yes you can write and subsequently read data faster but every new disk that comes along requires work to optimize to its unique geometry. For an example of their performance, David said that they can support direct surveillance camera video at 4K or 8K resolution to Wasabi cloud storage.

They are also cheaper than AWS S3. Dave mentioned Wasabi cloud storage is 1/5th the cost, on a GB/month basis, of AWS S3. We asked about Glacier support and he said at these prices, why add the complexity of another storage media.

Wasabi has 3 data centers, one in Virginia, one on the west coast of the US and the other in Amsterdam in Europe. The one in Europe is fully GDPR compliant. Wasabi supports data at rest encryption where the customer owns and holds encryption keys. They also support a WORM bucket, where you can supply an expiration date and the data will remain unmodified until that time has expired.

Matt asked if Wasabi could be used to replace all the storage at a data center. David said possibly for file data but not for block. However, customers would need to be aware that access latencies may suffer if they are far away from Wasabi data centers.

The podcast runs ~42 minutes. We feel that David qualifies as a GreyBeard. He . Ray and David could have talked at length optimizing disk storage performance. Also, this was Matt’s first time as a GreyBeard co-host and we think he did just fine. Listen to the podcast to learn more. .

David Friend, Co-Founder and CEO, Wasabi Cloud Systems

David Friend is the co-founder and CEO of Wasabi, a revolutionary cloud storage company. David’s first company, ARP Instruments developed synthesizers used by Stevie Wonder, David Bowie, Led Zeppelin and even helped Steven Spielberg communicate with aliens providing that legendary five-note communication in Close Encounters of the Third Kind.

Friend founded or co-founded five other companies: Computer Pictures Corporation – an early player in computer graphics, Pilot Software – a company that pioneered multidimensional databases for crunching large amounts of customer data, Faxnet – which became the world’s largest provider of fax-to-email services, Sonexis – a VoIP conferencing company, and immediately prior to Wasabi, what is now one of the world’s leading cloud backup companies, Carbonite.

David is a respected philanthropist and is on the board of Berklee College of Music, where there is a concert hall named in his honor, serves as president of the board of Boston Baroque, an orchestra and chorus that has received 7 Grammy nominations. An avid mineral and gem collector he donated Friend Gem and Mineral Hall at the Yale Peabody Museum of Natural History.

David graduated from Yale and attended the Princeton University Graduate School of Engineering where he was a David Sarnoff Fellow.

We haven’t talked with Tom Lyon (@aka_pugs) or Brian Pawlowski before on our show but both Howard and I know Brian from his prior employers. Tom and Brian work for DriveScale, a composable infrastructure software supplier.

There’s been a lot of press lately on NVMeoF and the GreyBeards thought it would be good time to hear from another way to supply DAS like performance and functionality. Tom and Brian have been around long enough to qualify as greybeards in their own right.

The GreyBeards have heard of composable infrastructure before but this was based on PCIe switching hardware and limited to a rack or less of hardware. DriveScale is working with large enterprises and their data center’s full of hardware.

Composable infrastructure has many definitions but the one DriveScale probably prefers is that it manages resource pools of servers and storage, that can be combined, per request, to create any mix of servers and DAS storage needed by an application running in a data center. DriveScale is targeting organizations that have from 1K to 10K servers with from 10K to 100K disk drives/SSDs.

Composable infrastructure for large enterprises

DriveScale provides large data centers the flexibility to better support workloads and applications that change over time. That is, these customers may, at one moment, be doing big data analytics on PBs of data using Hadoop, and the next, MongoDB or other advanced solution to further process the data generated by Hadoop.

In these environments, having standard servers with embedded DAS infrastructure may be overkill and will cost too much. For example., because one has no way to reconfigure (1000) server’s storage for each application that comes along, without exerting lots of person-power, enterprises typically over provision storage for those servers, which leads to higher expense.

But if one had some software that could configure 1 logical server or a 10,000 logical servers, with the computational resources, DAS disk/SSDs, or NVMe SSDs needed to support a specific application, then enterprises could reduce their server and storage expense while at the same time provide applications with all the necessary hardware resources.

When that application completes, all those hardware resources could be returned back to their respective pools and used to support the next application to be run. It’s probably not that useful when an enterprise only runs one application at a time, but when you have 3 or more running at any instant, then composable infrastructure can reduce hardware expenses considerably.

DriveScale composable infrastructure

DriveScale is a software solution that manages three types of resources: servers, disk drives, and SSDs over high speed Ethernet networking. SAS disk drives and SAS SSDs are managed in an EBoD/EBoF (Ethernet (iSCSI to SAS) bridge box) and NVMe SSDs are managed using JBoFs and NVMeoF/RoCE.

DriveScale’s composer orchestrator self-discovers all hardware resources in a data center that it can manage. It uses an API to compose logical servers from server, disk and SSD resources under its control available, throughout the data center.

Using Ethernet switching any storage resource (SAS disk, SAS SSD or NVMe SSD) can be connected to any server operating in the data center and be used to run any application.

There’s a lot more to DriveScale software. They don’t sell hardware. but have a number of system integrators (like Dell) that sell their own hardware and supply DriveScale software to run a data center.

The podcast runs ~44 minutes. The GreyBeards could have talked with Tom and Brian for hours and Brian’s very funny. They were extremely knowledgeable and have been around the IT industry almost since the beginning of time. They certainly changed the definition of composable infrastructure for both of us, which is hard to do. Listen to the podcast to learn more. .

Tom Lyon, Co-Founder and Chief Scientist

Tom Lyon is a computing systems architect, a serial entrepreneur and a kernel hacker.

Prior to founding DriveScale, Tom was founder and Chief Scientist of Nuova Systems, a start-up that led a new architectural approach to systems and networking. Nuova was acquired in 2008 by Cisco, whose highly successful UCS servers and Nexus switches are based on Nuova’s technology.

He was also founder and CTO of two other technology companies. Netillion, Inc. was an early promoter of memory-over-network technology. At Ipsilon Networks, Tom invented IP Switching. Ipsilon was acquired by Nokia and provided the IP routing technology for many mobile network backbones.

As employee #8 at Sun Microsystems, Tom was there from the beginning, where he contributed to the UNIX kernel, created the SunLink product family, and was one of the NFS and SPARC architects. He started his Silicon Valley career at Amdahl Corp., where he was a software architect responsible for creating Amdahl’s UNIX for mainframes technology.

Brian Pawlowski, CTO

Brian Pawlowski is a distinguished technologist, with more than 35 years of experience in building technologies and leading teams in high-growth environments at global technology companies such as Sun Microsystems, NetApp and Pure Storage.

Before joining DriveScale as CTO, Brian served as vice president and chief architect at Pure Storage, where he focused on improving the user experience for the all-flash storage platform provider’s rapidly growing customer base. He also was CTO at storage pioneer NetApp, which he joined as employee #18.

Brian began his career as a software engineer for a number of well-known technology companies. Early in his days as a technologist, he worked at Sun, where he drove the technical analysis and discussion on alternate file systems technologies. Brian has also served on the board of trustees for the Anita Borg Institute for Women and Technology as well as a member of the board at the Linux Foundation.

Brian studied computer science at Arizona State University, physics at the University of Texas at Austin, as well as physics at MIT.

We’ve talked with Frederic before (see: Episode #33 on HPC storage) but since then, he has worked for an analyst firm and now he’s back on his own again, at HighFens. Given all the interest of late in AI, machine learning and deep learning, we thought it would be a great time to catch up and have him shed some light on deep learning and what it needs for IT infrastructure.

Frederic has worked for HPC / Big Data / AI / IoT solutions in the speech recognition industry, providing speech recognition services for some of the largest organizations in the world. As I understand it, the last speech recognition AI application he worked on implemented deep learning.

A brief history of AI

Frederic walked the Greybeards through the history of AI from the dawn of computing (1950s) until the recent emergence of deep learning (2010).

He explained that, early on one could implement a chess playing program, using hand coded rules based on a chess expert’s playing technique. Later when machine learning came out, one could use statistical analysis on multiple games and limited rule creation to teach a AI machine learning system how to play chess. With deep learning (DL), all you have to do now is to feed a DL model all the games you have and it learns how to play chess well all by itself. No rule making needed.

AI DL training and deployment infrastructure

Frederic described some of the infrastructure and data needs for various phases of an industrial scale, AI DL workflow.

Training deep learning models takes data and the more, the better. Gathering/saving large amounts of data used for DL training is a massive write workload and at the end of that process, hopefully you have PB of data to work with.

Selecting DL training data from all those PBs, involves a lot of mixed read and write IO. In the end, one has selected and extracted the data to use to train your DL models.

During DL training, IO needs are all about heavy data read throughput. But there’s more, in the later half of the talk, Frederic talked about the need to keep expensive GPU cores busy and that requires sophisticated caching or Tier 0 storage supporting low latency IO.

Ray’s been doing a lot of blogging and other work on AI machine and deep learning (e.g., see Learning machine learning – parts 1, 2, & 3) so it was great to hear from Frederic, a real practitioner of the art. Frederic (with some of Ray’s help) explained the deep learning training process. But it wasn’t detailed enough for Howard, so per Howard’s request, we went deeper into how it really works.

Once you have a DL model trained and working within specifications (e.g., prediction accuracy), Frederic said deploying DL models into production involves creating two separate clusters. One devoted to deep learning model inferencing, which takes in data from the world and performs inferencing (prediction, classification, interpretations, etc.) and the other uses that information for model adaption to fine tune DL models for specific instances.

Adaption and inferencing were both read and write IO workloads and the performance of this IO was dependent on a specific model’s use

Model adaption would personalize model predictions for each and every person, car, genotype, etc. This would be done periodically (based on SLAs, e.g. every 4 hrs). After that, a new, adapted model could be introduced into production, adapted for that specific person/car/genotype.

If the adaption applied more generally, that data and its human-machine validated/vetted prediction, classification, interpretation, etc. would be added back into the DL model training set to be used the next time a full model training pass was to be done. Frederic said AI DL model training is never done.

Sometime later, all this DL training, production and adaption data needs to be archived for long term access.

We then discussed the recent offerings from NVIDIA and major storage vendors that package up a solution for AI deep learning. It seems we are seeing another iteration of Converged Infrastructure, only this time for AI DL.

Finally, over the course of Ray’s AI DL education, he had come to the belief that AI deep learning could be applied by anyone. Frederic corrected Ray stating that AI deep learning should be applied by anyone.

The podcast runs ~44 minutes. Frederic’s been an old friend of Howard’s and Ray’s, since before the last podcast. He’s one of the few persons in the world that the GreyBeards know that has real world experience in deploying AI DL, at industrial scale. Frederic’s easy to talk with and very knowledgeable about the intersection of Ai DL and IT infrastructure. Howard and I had fun talking with him again on this episode. Listen to the podcast to learn more. .

Frederic Van Haren

Frederic Van Haren is the Chief Technology Officer @ HighFens. He has over 20 years of experience in high tech and is known for his insights in HPC, Big Data and AI from his hands-on experience leading research and development teams. He has provided technical leadership and strategic direction in the Telecom and Speech markets.

He spent more than a decade at Nuance Communications building large HPC and AI environments from the ground up and is frequently invited to speak at events to provide his vision on the HPC, AI, and storage markets. Frederic has also served as the president of a variety of technology user groups promoting the use of innovative technology.

As an engineer, he enjoys working directly with engineering teams from technology vendors and on challenging customer projects.

Frederic lives in Massachusetts, USA but grew up in the northern part of Belgium where he received his Masters in Electrical Engineering, Electronics and Automation.

In this, our yearend industry wrap up episode, we discuss trends and technology impacting the IT industry in 2018 and what we can see ahead for 2019 and first up is NVMeoF

NVMeoF has matured

In the prior years, NVMeoF was coming from startups, but last year it’s major vendors like IBM FlashSystem, Dell EMC PowerMAX and NetApp AFF releasing new NVMeoF storage systems. Pure Storage was arguably earliest with their NVMeoF JBOF.

Dell EMC, IBM and NetApp were not far behind this curve and no doubt see it as an easy way to reduce response time without having to rip and replace enterprise fabric infrastructure.

In addition, NVMeoFstandards have finally started to stabilize. With the gang of startups, standards weren’t as much of an issue as they were more than willing to lead, ahead of standards. But major storage vendors prefer to follow behind standards committees.

As another example, VMware showed off an NVMeoF JBOF for vSAN. A JBoF like this improves vSAN storage efficiency for small clusters. Howard described how this works but with vSAN having direct access to shared storage, it can reduce data and server protection requirements for storage. Especially, when dealing with small clusters of servers becoming more popular these days to host application clusters.

The other thing about NVMeoF storage is that NVMe SSDs have also become very popular. We are seeing them come out in everyone’s servers and storage systems. Servers (and storage systems) hosting 24 NVMe SSDs is just not that unusual anymore. For the price of a PCIe switch, one can have blazingly fast, direct access to a TBs of NVMe SSD storage.

HCI reaches critical mass

HCI has also moved out of the shadows. We recently heard news thet HCI is outselling CI. Howard and I attribute this to the advances made in VMware’s vSAN 6.2 and the appliance-ification of HCI. That and we suppose NVMe SSDs (see above).

HCI makes an awful lot of sense for application clusters that VMware is touting these days. CI was easy but an HCI appliance cluster is much, simpler to deploy and manage

For VMware HCI, vSAN Ready Nodes are available from just about any server vendor in existence. With ready nodes, VARs and distributors can offer an HCI appliance in the channel, just like the majors. Yes, it’s not the same as a vendor supplied appliance, doesn’t have the same level of software or service integration, but it’s enough.

[If you want to learn more, Howard’s is doing a series of deep dive webinars/classes on HCI as part of his friend’s Ivan’s ipSpace.net. The 1st 2hr session was recorded 11 December, part 2 goes live 22 January, and the final installment on 5 February. The 1st session is available on demand to subscribers. Sign up here]

Computional storage finally makes sense

Howard and I 1st saw computational storage at FMS18 and we did a podcast with Scott Shadley of NGD systems. Computational storage is an SSD with spare ARM cores and DRAM that can be used to run any storage intensive, Linux application or Docker container.

Because it’s running in the SSD, it has (even faster than NVMe) lightening fast access to all the data on the SSD. Indeed, And the with 10s to 1000s of computational storage SSDs in a rack, each with multiple ARM cores, means you can have many 1000s of cores available to perform your data intensive processing. Almost like GPUs only for IO access to storage (SPUs?).

We tried this at one vendor in the 90s, executing some database and backup services outboard but it never took off. Then in the last couple of years (Dell) EMC had some VM services that you could run on their midrange systems. But that didn’t seem to take off either.

The computational storage we’ve seen all run Linux. And with todays data intensive applications coming from everywhere these days, and all the spare processing power in SSDs, it might finally make sense.

Futures

Finally, we turned to what we see coming in 2019. Howard was at an Intel Analyst event where they discussed Optane DIMMs. Our last podcast of 2018 was with Brian Bulkowski of Aerospike who discussed what Optane DIMMs will mean for high performance database systems and just about any memory intensive server application. For example, affordable, 6TB memory servers will be coming out shortly. What you can do with 6TB of memory is another question….