Oracle Blog

Blog for chichang1

Tuesday Nov 14, 2006

Normally, you don't have to do this, as Jan Pechanec's two excellent blogs explain why this is so and talk about OpenSSL in Solaris and the PKCS#11 engine patch that Sun contributed to OpenSSL community. Please read the blogs and you will probably stop here :-).

If, for some reason, you have to build your own OpenSSL from source, and would still like to take advantage of MAU in UltraSPARC T1 on Sun Fire T2000/T1000 to speed up the RSA/DSA operations, I'll show you what I did.

First, download the PKCS#11 engine patch from OpenSSL's Contribution area. There are three PKCS#11 engine patch entries at the bottom of the list -- two for openssl-0.9.7d and one for openssl-0.9.7l. You have to apply the appropriate patch to either OpenSSL 0.9.7d or 0.9.7l, no other versions. For OpenSSL 0.9.7d, use pkcs11_engine-0.9.7d.patch.2006-09-12.gz as the other one suffers from Bug 6411001.

Before starting, there is one subtle thing to note -- the patch command. This PKCS#11 engine patch requires GNU patch and does not work with Solaris patch, so we have to use /usr/bin/gpatch.

Assume you want to patch OpenSSL 0.9.7l. At the directory level same as openssl-0.9.7l directory, unzip the patch:

(At this point, you can take a look at README.pkcs11. Most of the details are there.)

To build OpenSSL, you need to specify a PKCS#11 library on the system. On Solaris 10 or Solaris Express, it's libpkcs11.so under /usr/lib (32-bit) or /usr/lib/sparcv9 (64-bit). This is how I built 64-bit OpenSSL on a T2000 using Sun Studio cc:

Tuesday Dec 06, 2005

You might have heard that UltraSPARC T1 has special hardware circuitry to accelerate certain crypto operations. In this blog I will show you what operations it is good at, and how good.

UltraSPARC T1 comes with Modular Arithmetic Unit (MAU) per core which can
accelerate expensive modular arithmetic operations found in
public key crypto algorithms such as RSA, DSA and DH. In Solaris, the
utilization of MAU has to go through Niagara Cryptographic Provider (NCP)
within Solaris Cryptographic Framework (SCF). Currently only RSA (up to 2048 bit) and DSA (up to 1024 bit) are
supported by NCP.

On the Sun Fire T2000/T1000 with the UltraSPARC T1 processor, you can readily
get a glimpse of
the fast RSA operations performed by MAU. Here is an example for the popular 1024-bit and 2048-bit RSA on a Sun Fire T2000 with 1.2 GHz UltraSPARC T1 with 8 cores:

This invokes the OpenSSL speed test bundled with Solaris. The OpenSSL bundled
with Solaris has
PKCS#11 engine built-in which is necessary to access SCF (and thus MAU); if
you download OpenSSL package and build it yourself, you will not be able to
take advantage of MAU because it does not have PKCS#11 engine.
Let's examine the performance numbers above. What we just did was to test
the single-threaded RSA performance. Each RSA operation is run for
10 seconds. However, due to the timing errors in OpenSSL speed test in the
single-threaded case,
the throughput numbers at the bottom cannot be trusted when the operations are
done in hardware. After some re-calculations we get:

Are these numbers good? They are actually very good. Take 1024-bit RSA
sign operation number, 1033.2, and compare it with the number on 3.6 GHz Xeon
Dell box - 843.0. UltraSPARC T1 offers 20% more RSA performance at 1/3 clock
rate and uses less power. Note that this is single-threaded test. As shown below, UltraSPARC T1 really dwarfs others in the multi-process test.

Now, let's look at multi-process RSA performance. This is
where UltraSPARC T1 really shines.
Do an OpenSSL speed test again, this time with the "-multi" option to invoke
multiple processes to conduct RSA operations concurrently:

We have used 32 processes to fully saturate the 32 hardware threads on
UltraSPARC T1 to get the maximum throughput. Compare this with the results
on the 2-way 3.6 GHz Xeon Dell PowerEdge 2850 (with hyperthreading on):

For 1024-bit RSA sign operation (as commonly used in web server SSL
handshaking), Sun Fire T2000 outperforms Dell PowerEdge 2850 by a whopping 6x! UltraSPARC
T1 also excels when compared with the Sun Crypto Accelerator 4000, which
can do 8000 1024-RSA signs/s. And remember, all this comes with just the
Sun Fire T2000/T1000 box, no extra crypto accelerator card is needed.

In summary, if RSA/DSA operations consume a certain amount of CPU cycles in your application (e.g. HTTPS), Sun Fire T2000/T1000 with UltraSPARC T1 will offer you the biggest bang for the bucks with its per-core MAU and unique 8-core CMT architecture.