You may be asking yourself, how does this address my cluster computing needs? Does the Windows OFED stack released by Mellanox provide the same performance seen on the Linux OFED stack release?

Well, the Windows networking stack is optimized to address the needs of various HPC vertical segments. In our benchmark tests with MPI applications that require low-latency and high-performance, the latency is in the low 1us with bandwidth of 3GByte/sec uni-directional using the Microsoft MS-MPI protocol.

Mellanox’s 40Gb/s InfiniBand Adapters (ConnectX) and Switches (InfiniScale IV) with their proven performance efficiency and scalability, allow data centers to scale up to tens-of-thousands of nodes with no drop in performance. Our drivers and Upper Level Protocols (ULPs) allow end-users to take advantage of the RDMA networking available in Windows® HPC Server 2008.

As previously suggested, I will review in this post a different application that is focused on converting protocols. QuickTransit, developed by a company called Transitive (recently acquired by IBM), is a cross-platform virtualization technology which allows applications that have been compiled for one operating system and processor to run on servers that use a different processors and operating systems, without requiring any source code or binary changes.

We are using: QuickTransit for Solaris/SPARC-to-Linux/x86-64 which we used to test for Latency by a basic test which was related to the financial-industry operating method and involves interconnect between servers performance.

The Topology we’ve used was 2 servers (the 1st acting as server and the 2nd as a client). We’ve measured Latency with different object sizes and rates when running using the following interconnects GigE, Mellanox ConnectX VPI 10GigE, and Mellanox ConnectX VPI 40Gb/s InfiniBand. I would like to re-iterate, to any of you who have not read the first posts, that we’re committed to our guideline of “out-of-the-box”, meaning that neither the application nor any of the drivers are to be changed after downloading it off of the web.

With InfiniBand we’ve used 3 different Upper-Layers-Protocols (ULPs) – none requiring code intervention; IPoIB connect-mode (CM), IPoIB datagram mode (UD), and Socket-Direct-Protocol (SDP). The results were stunning mainly because our assumption was that with all the layers of software, in addition to the software which converts Sparc Solaris code to x86 Linux code, the interconnect will have small impact, if at all.

We’ve learned that 40Gb/s InfiniBand performance is significantly better then GigE for a wide range of packets size and transmission rates. We could see superiority in latency of over 2x faster when using InfiniBand and 30% faster execution when using 10GigE. Go and beat that…

Let’s look at the results in a couple of different ways. In particular, let’s look at the size of the messages being sent – the above advantage is related to the small message sizes (see graph #2) while when moving to larger message sizes the advantage (which, as it is, is strikingly better) becoming humongous.

In my next blog I plan to show more results that are closely related to the financial markets. If anyone out there identifies an application they would like our dedicated team to benchmark, please step forward and send me an e-mail.

The performance the Chiliean Stock Exchange is seeing is really impressive – 3000 orders per second with latency reduced by 100x of its current level. Latency performance is very critical for the financial markets, and InfiniBand is certainly showing it is the preferred data center connectivity platform of choice.

You don’t have to ask – vacation was awesome and as always not as long as we would like it to be.

Now that we’ve taken the rust off our fingers, we’ve made progress with a bit more complex testbed.

We’ve decided to look at the virtualization space and run our next application on top of VMware ESX 3.5. The application we’ve picked was the Dell DVD-Store application. Dell DVD Store is a complete online e-commerce test application, with a backend database component, a web application layer, and driver programs. In order to stay in-line with what is being used in the industry we’ve taken a 2-tier configuration which is using a MS SQL server (which will be running on VMware). This means we’ve used (as you can see in the picture) 2 hosts/systems running 20 Virtual Machines, Microsoft SQL server and Client driver.
The database contained a size of 1GB, serving 2,000,000 customers. During the testing we increased the number of Virtual Machines running the client driver from 2 to 20, and measured the number of generated orders per minute from the database.

The only change we performed after the out of the box deployment (which if you recall we’ve set as our goal) in order to execute the test more efficiently, was some developed scripts we created for test execution and results analysis.

The results of our tests are shown in the graph below:
The results clearly show a greater than 10% benefit when using VPI (both 10GigE and 40Gb/s InfiniBand). We’ve added the “up to 10 VMs” results, but from our results it seemed that the valid numbers (when jitter is not a factor) are till 8 VMs, and it seems like there is a dependency on the amount of cores on the systems running VMware.

In my next blog post I’ll plan to either review a new application or anther aspect of this application.
Nimrod Gindinimrodg@mellanox.com