VMworld 2016 Big Data Series – Virtualizing Hadoop on vSphere

Virtualizing big data is always a lively topic in our discussions at VMware with our partners and customers. Interest is growing at quite a pace in this area and there have been some great new developments both at VMware and in the big data industry since last year.

This year we are highlighting our Hadoop partners in separate talks and also giving insights into the latest performance numbers on Spark and Machine Learning algorithms running on virtual machines. These types of applications are just about the most contemporary ones that we come across in the Hadoop/Big Data world these days, so this is very exciting information to have. There is a shortlist of talks that we would recommend you attend here We look forward to seeing you at these talks in Las Vegas in late August.

Below is a short description of each of the talks that are Hadoop related, to whet your appetite for the technical content you will be seeing in them.

VIRT8239-QT: Things You Should Know about Big Data

Sunday 28th August, 2-3pm

This shorter talk introduces the topics that will be delved into in more detail in the other big data talks later in the week. It gives an overview of the Hadoop and Spark concepts for those who are new to them – and maps these concepts into virtual machine implementations. The talk briefly explores the reference architectures and the joint technical work on tools going on at VMware with a partner and Hadoop distro vendor. The talk also gives a brief overview of some of the interesting performance tests that have been conducted recently on Spark and other technologies.

VIRT7709: Innovations from Cloudera and VMware for Virtualizing Hadoop

Tuesday 30th August, 2-3pm

This technical talk is given by people from Cloudera and VMware, working together. These two companies have collaborated from some years now on various parts of the Hadoop ecosystem. From reference architecture to performance and tooling, there are common points of reference that the companies continue to work on together. The speaker from Cloudera will highlight some of the technical reference architectures and best practices. The VMware speaker will show some early impressions of collaborative work on infrastructure provisioning that the companies are doing together. This will give you a technical insight into the direction the partners are taking in the big data space.

VIRT8071: Making Virtualized Hadoop Deployments Successful

Tuesday, 30th August 2-3pm

This session describes how to optimize a virtualized Big Data platform. This is a topic that is applicable to cloud deployments as well. The talk is given by Hortonworks technical staff along with real-world Hadoop deployment management staff from VMware. There is emphasis on creating a proper operational partnership between the Hadoop Admins and VMware vSphere Admins, which is essential to your success. These two communities tend to view the infrastructure from different viewpoints. Between the two platforms, i.e. the virtualization and Hadoop layers, there could be many different parameters to be managed to make sure that optimal configurations are applied across the stack. Some parameters will be more critical than others. These are discussed in this talk and recommendations are given.

This talk provides a very useful update on the data VMware has been gathering on Hadoop’s performance on the virtualization platform. Contemporary workloads like Spark, Machine Learning and MapReduce version 2 will be used to show detailed test results that support the virtualization of your Hadoop-based workloads. Best practices for hardware choices, software configuration and parameter tuning will be explored in some depth here. This talk is essential to anyone who is serious about virtualizing Hadoop for production and other workloads. The session will also cover common use cases and benefits for virtualizing Hadoop. The talk concludes with a call to you for further workload types that you consider important to be tested for performance also.

STO8795: Ushering in a New Era of Hyper-converged Big Data – Hadoop on VSAN

Wednesday 31st August 2:30 PM

This session delves into big data in a Hyperconverged environment by running Hadoop on an all-flash VSAN cluster. Intel and VMware present the results of running Hadoop workloads on a standard all-flash VSAN cluster. VMware and Intel embarked on a joint project to show Hadoop workloads on VSAN based on high customer demand for running big data analytics on VSAN nodes. This testing was done on an 8 node followed VSAN cluster powered by some of the latest Intel technology. This talk will be useful to anyone who is considering VSAN as a basis for their Hadoop workloads. It compares and contrasts a set of different configurations that can be used, based on the results of the tests executed.

Virtualizing Big Data – Meet the Experts

Wednesday 31st August 3-4pm

Big Data technical and product management people will be available at a scheduled time to chat with you about your Hadoop experiences.

About the Author

Justin Murray works as a Technical Marketing Manager at VMware and has been at
the company for over six years. Justin creates technical material and gives
guidance to customers and the VMware field organization to promote the
virtualization of big data workloads on VMware’s vSphere platform. Justin has
worked closely with VMware’s partner ISVs (Independent Software Vendors) to
ensure their products work well on vSphere and continues to bring best
practices to the field as the customer base for big data expands.