Secure Web communication is possible through the utilization of the Secure Sockets Layer (SSL) protocol. Using the command "openssl speed rsa" we can measure the number of RSA public key operations (signs) that a system can perform per second.

While "openssl speed rsa" is sufficient to test the Xeons and Opterons, the Sun T1 can speed up the Rivest Shamir Adleman (RSA) and Digital Signal Algorithm (DSA) encryption and decryption operations needed for SSL processing, thanks to a modular arithmetic unit (MAU) that supports modular exponentiation and multiplication. Each T1 core has a MAU, thus one 8 core T1 has 8 MAUs. To make use of those 8 MAUs, you have run the SSL calculations through the Solaris Cryptographic Framework (SCF). To test the T1 with the MAU crunching at full speed we used the command: "openssl speed -engine pkcs11 rsa". The Solaris 10 OS also provides in-kernel SSL termination, offering greater security than SSL termination outside the kernel.

We included the HP DL585 to see whether 8 cores of complex general purpose CPUs (Opteron 880) can keep up with the 8 MAU of the Sun T1. If you want to compare Woodcrest and the Opteron, you should check the 2 and 4 concurrency numbers. You can find our 1024-bit numbers in the graph below. One thread per core is optimal, so we tested the DL585 with a maximum of 16 threads, to show you that the peak is attained at 8 threads. The Xeon Irwindale was tested with 8 threads to show you that 4 threads (4 logical cores) is optimal and so on.

Notice that the 8 MAUs of the Sun T1 can only get in full action if we fire off 32 "SSL RSA signing" threads. Once that happens, the little 1 GHz T1 is able to keep up with the massive 2.4 GHz 8 core DL585. Without MAU, the T1 is as fast as a 1.8 GHz Xeon Irwindale. It is thus very important to check that your favorite web server works with SCF if you want to run your secure web services on the Sun T2000.

It looks like we've discovered the first - but rather insignificant to most people - "weakness" of the new Core architecture: decryption and encryption. The Opteron at 2.4 GHz has no trouble keeping up with the 3 GHz Woodcrest. This might be a result of the fact that the Woodcrest can only perform one rotate per cycle, while the Opteron can do 3. Although the RSA algorithm doesn't really use rotations, the hash algorithms needed to sign or encrypt a key make use of rotations. However, the most important reason is probably that the Opteron can sustain 2 ADC (Add with Carry) instructions per clock cycle, while Woodcrest can only do one. As ADC is good for about 17% of the instruction mix of the RSA algorithm, this might be enough to negate the extra integer power (Memory disambiguation, 4 wide decode ...) that the Woodcrest has.

Also notice that the previous NetBurst architecture, represented by the Xeon Irwindale, does very badly. The reason is that the P4 doesn't have a barrel shifter, a circuit in the chip which can shift or rotate any number in one clock cycle. Without this shifter, rotates and shifts take much longer, resulting in high latency. Most x86 code couldn't care less, but most encrypting code makes heavy use of rotates or shifts or both. We also did a quick test with Hyper-Threading on and off. In this case Hyper-Threading sped up the encryption (signs/s) with 20 to 28%.

To end the RSA sign/s benchmark, we'll make a quick comparison between quad core AMD Opteron 2.4 GHz, quad-core Intel Xeon Woodcrest and Sun's T1 with MAU enabled across different RSA bit lengths.

RSA Encryption (Signs/s)

Opteron 2.4 GHz4 threads

Xeon 5160 3 GHz4 threads

SUN T1 with MAU32 threads

512 bit

19003

21194

35613

1024 bit

6098

6240

10722

2048 bit

1145

1087

1918

4096 bit

185

164

1

Notice that the hardware acceleration of the T1 does not work beyond 2048-bit keys. Considering that most secure applications use 1024-bit and only a few "high security" ones use 2048-bit, this is not an issue.

In case of doing verifies as opposed to signs, the server has to authenticate the identity of the client. This is a lot less intensive, and we'll show you the verifies per second numbers at 2048-bits. At 1024-bits length, both the Woodcrest and Opteron were able to verify more than 50000 keys per core, and that is a hard limit of the OpenSSL benchmark.

Again, the Opteron takes the lead. The Sun T1 even with the 8 MAUs is half as slow as four Opterons or Woodcrests, but this is hardly an issue. Encrypting or signing will slow down a server much quicker than verifying keys.

Both verifies/s and signs/s benchmark are rather synthetic. It is much more realistic to test with a real web server running SSL, and that is what we are currently doing. We followed Sun's instructions to enable RSA hardware acceleration for Apache, but for some reason, the Apache web server is still not making use of the Solaris Cryptographic Framework. So our Web server SSL test is work in progress.

What you're saying in general is irrelevant. Intel calls their integrated graphics "high performance" but that doesn't make it so.

MSI calling that a server board is just marketing, it does not represent what a true, high performance server class mobo is all about. Not that it's a bad piece of hardware, it is good for the price to be sure. But it is NOT a server class product. Reply

One of the motherboards used in this review is a cheap piece that trades performance to keep price low.

Why was that motherboard selected over mainstream server/workstation boards that are proven to offer slightly better performance? Why pick a 250$ MSI board for opteron over $500 boards from Tyan, Iwill, Supermicro or others. The Intel Xeon "Inderwale" gets a $500 board, so price could not have been the issue.

So what's the point in using a Single Channel board for this benchmark, when price was not a limitation?

Single memory channel boards like the one from MSI, are known to offer lower performance than dual / dedicated memory channel boards when used in 2P Opteron configurations. Dual Channel boards are the mainstream boards for 2P Opteron systems. There are plently Server boards available in Dual / dedicated memory lane configuration. There are enough reviews on the net to show the performance diff b/w single memory channel boards and dual memory channel boards

The issue is not about the MSI or its class, the issue is why did Anandtech pick a Single memeory channel board instead of a more mainstream dual memory channel board.