from VMware's performance team

Monthly Archives: February 2010

Paravirtualized SCSI (PVSCSI) is a high-performance storage
adapter available in VMware vSphere 4. The PVSCSI adapter is best-suited for
virtual machines that run applications which generate heavy I/O.

The vSphere 4 performance study compares the performance of
PVSCSI with LSI Logic for Fibre Channel and software iSCSI protocols. The
experiments show that PVSCSI greatly improves CPU efficiency and improves
throughput when the workload drives very high I/O rates. For Fibre Channel, the
test results show that PVSCSI
reduces the CPIO by 10%-30%. For iSCSI, PVSCSI reduces the CPIO
by up to 25%.

An included part of every SAP system is the SGEN transaction that is used to recompile all the objects in the system after every system update. This job is run by SAP Basis administrators after a fresh install or upgrade of SAP and requires users to be offline, hence the task should perform as quickly as possible to minimize downtime. SGEN has a parallel processing design that is also inherent in other SAP business batch jobs. SGEN should not be confused with the standardized and popular SAP SD benchmark that is representative of OLTP performance and is used as the basis for the standard SAPS rating.

To get a quick measure of how best practices could impact performance of SAP batch jobs in vSphere VMs, I started with some un-optimized VMs to get a baseline. I then applied a series of best practices to the VM and measured their effect on the performance of the SGEN transaction. The time for the SGEN transaction to complete went from 2500 seconds down to about 800 seconds, showing how much of an impact best practices can have on the performance of an intensive batch process.

The server used was a Dell PowerEdge M710 with 2 x quad core Intel Xeon X5570 2.93 GHz processors, 72 GB of Memory, and running VMware vSphere 4. Two 4 vCPU VMs were created and configured with SuSE Enterprise Linux 10 64-bit. Oracle 10.2.0.2 was installed in the first VM. The second VM was configured as a SAP application server with NetWeaver 7.0 ABAP. In this two-VM configuration, the SGEN transaction was run on the app server VM which in turn accessed the DB VM during the processing of the jobs. The SAP_BW set of objects was used for this set of tests.

The first set of best practices applied were VMware specific. The second were specific to SAP and were made by adjusting the profile settings for the SAP application server instance. The final adjustment was to give the app server VM more resources. The chart below shows the results as each change was made:

Baseline: 4vCPUs, e1000, 10 Dialog Work Processes

Install VMTools

Upgrade to VMXNET 3

Increase Dialog Work Processes to 20

Increase Dialog Work Processes to 30

Increase vCPUs to 8 andWork Processes to 40

Time to Complete SGEN Transaction (Seconds)

2491

2421

1451

1097

1021

813

There are several jumps in performance. The first is the upgrade to use the VMXNET 3 virtual adapter instead of the default e1000 adapter. In this network intensive application where lots of packets are passed between the app and DB VMs, the better performance of the VMXNET 3 adapter makes a big difference.

The other big jump in performance comes with the increase in the number of dialog work processes for the application instance. By monitoring the CPU utilization with "top" inside the guest, "esxtop" from the ESX console, and the SAP dialog queue (with SAP’s dpmon tool) it was clear that the VM was not working very hard with its initial configuration of 10 processes. By increasing SAP dialog instance work processes until the CPU of the app server VM was nearly saturated, the time for the SGEN job to complete improved greatly.

While the CPU utilization of the app server VM was high, the DB VM was still only about half utilized. To push more work over to the DB VM, the number of vCPUs on the app server VM was increased from 4 to 8 and the number of work processes was also increased to 40. This change resulted in another decrease in the SGEN completion time. In the vSphere Performance Best Practices Guide (page 21), it states that a VM with a number of vCPUs less than or equal to the number of cores in each NUMA node will be able to take advantage of NUMA optimizations and have the best performance. This SGEN workload is CPU intensive enough to get continued performance improvement even as the number of vCPUs (8) exceeds the number of cores in a NUMA node(4). However to really determine if this is the "best performance" another test would need to be run to compare a single 8vCPU app server VM with two 4vCPU app server VMs.

These tests show that applying both VMware and SAP best practices can make a big difference in performance. In workloads that are resource intensive, such as the SAP SGEN transaction, using the best performing virtual NIC and adjusting key application settings based on performance monitoring are key to achieving best performance.