Oracle Blog

Blog for allanp

Tuesday Apr 21, 2009

As a followup to my MySQL 5.4 Scalability on 64-way CMT Servers blog, I'm posting MySQL 5.4 Sysbench results on the same platform. The tests were carried out using the same basic approach (i.e. turning off entire cores at a time) - see my previous blog for more details.

The Sysbench version used was 0.4.8, and the read-only runs were invoked with the following command:

Today Sun Microsystems announced MySQL 5.4, a release that focuses on performance and scalability. For a long time it's been possible to escape the confines of a single system with MySQL, thanks to scale-out technologies like replication and sharding. But it ought to be possible to scale-up efficiently as well - to fully utilize the CPU resource on a server with a single instance.

MySQL 5.4 takes a stride in that direction. It features a number of performance and scalability fixes, including the justifiably-famous Google SMP patch along with a range of other fixes. And there's plenty more to come in future releases. For specifics about the MySQL 5.4 fixes, check out Mikael Ronstrom's blog.

So how well does MySQL 5.4 scale? To help answer the question I'm going to take a look at some performance data from one of Sun's CMT systems based on the UltraSPARC T2 chip. This chip is an interesting test case for scalability because it amounts to a large SMP server in a single piece of silicon. The UltraSPARC T2 chip features 8 cores, each with 2 integer pipelines, one floating point unit, and 8 hardware threads. So it presents a total of 64 CPUs (hardware threads) to the Operating System. Since there are only 8 cores and 16 integer pipelines, you can't actually carry out 64 concurrent operations. But the CMT architecture is designed to hide memory latency - while some threads on a core are blocked waiting for memory, another thread on the same core can be executing - so you expect the chip to achieve total throughput that's better than 8x or even 16x the throughput of a single thread.

The data in question is based on an OLTP workload derived from an industry standard benchmark. The driver ran on the system under test as well as the database, and I used enough memory to adequately cache the active data and enough disks to ensure there were no I/O bottlenecks. I applied the scaling method we've used for many years in studies of this kind. Unused CPUs were turned off at each data point (this is easily achieved on Solaris with the psradm command). That ensured that "idle" CPUs did not actually participate in the benchmark (e.g. the OS assigns device drivers across the full set of CPUs, so they routinely handle interrupts if you leave them enabled). I turned off entire cores at a time, so an 8 "CPU" result, for example, is based on 8 hardware threads in a single core with the other seven cores turned off, and the 32-way result is based on 4 cores each with 8 hardware threads. This approach allowed me to simulate a chip with less than 8 cores. Sun does in fact ship 4- and 6-core UltraSPARC T2 systems; the 32- and 48-way results reported here should correspond with those configurations. For completeness, I've also included results with only one hardware thread turned on.

Since the driver kit for this workload is not open source, I'm conscious that community members will be unable to reproduce these results. With that in mind, I'll separately blog Sysbench results on the same platform. So why not just show the Sysbench results and be done with it? The SQL used in the Sysbench oltp test is fairly simple (e.g. it's all single table queries with no joins); this test rounds out the picture by demonstrating scalability with a more complex and more varied SQL mix.

It's worth noting that any hardware threads that are turned on will get access to the entire L2 cache of the T2. So, as the number of active hardware threads increases, you might expect that contention for the L2 would impact scalability. In practice we haven't found it to be a problem.

Enough background - on to the data. Here are the results presented graphically.

And here is some raw data:

MySQL 5.4.0

Throughput

vCPUs

Us

Sy

Id

smtx

icsw

sycl

1.00

1

62

38

0

0

371

4566

4.76

8

77

23

0

220

593

6159

9.24

16

77

23

0

265

571

6456

13.45

24

77

23

0

345

565

6606

17.42

32

76

22

2

380

477

6621

24.29

48

76

23

1

557

496

6731

29.79

64

74

24

2

674

457

6736

Abbreviations: "vCPUS" refers to hardware threads, "Us", "Sy", "Id" are CPU utilization for User, System, and Idle respectively, "smtx" refers to mutex spins, "icsw" to involuntary context switches, and "sycl" to systems calls - the data in each of these last three columns is normalized per transaction.

Some interesting takeaways:

The throughput increases 71% from 32 to 64 vCPUs. It's encouraging to see a significant increase in throughput beyond 32 vCPUs.

The scaleup from 1 to 64 vCPUs is almost 30x. As I noted earlier, the UltraSPARC T2 chip does not have 64 cores - it only has 8 cores and 16 integer pipelines, so this is a good outcome.

To put these results into perspective, there are still many high volume high concurrency environments that will still require replication and sharding to scale acceptably. And while MySQL 5.4 scales better than previous releases, we're not quite seeing scalability equivalent to that of proprietary databases. It goes without saying, of course, that we'll be working hard to further improve scalability and performance in the weeks and months to come.

But the clear message today is that MySQL 5.4 single-server scalability is now good enough for many commercial transaction-based application environments. If your organization hasn't yet taken the plunge with MySQL, now is definitely the time. And you certainly won't find yourself complaining about the price-performance.

Monday Oct 13, 2008

Find out about Sun's new 4-chip UltraSPARC T2 Plus system direct from the source: Sun's engineers.

Sun today announced the 4-chip variant of its UltraSPARC T2 Plus system, the Sun SPARC Enterprise T5440. This new system is the big brother of the 2-chip Sun SPARC Enterprise T5140 and T5240 systems released in April 2008. Each UltraSPARC T2 Plus chip offers 8 hardware strands in each of 8 cores. With up to four UltraSPARC T2 Plus chips delivering a total of 32 cores and 256 hardware threads and up to 512Gbytes of memory in a compact 4U package, the T5440 raises the bar for server performance, price-performance, energy efficiency, and compactness. And with Logical Domains (LDoms) and Solaris Containers, the potential for server consolidation is compelling.

Standard configurations of the Sun SPARC Enterprise T5440 include 2- and 4-chip systems at 1.2 GHz, and a 4-chip system at 1.4 GHz. All of these configurations come with 8 cores per chip.

The blogs posted today by various Sun engineers offer a broad perspective on the new system. The system design, the various hardware subsystems, the performance characteristics, the application experiences - it's all here! And if you'd like some background on how we arrived at this point, check out the earlier UltraSPARC T2 blogs (CMT Comes of Age) and the first release of the UltraSPARC T2 Plus (Sun's CMT goes multi-chip).

Let's see what the engineers have to say (and more will be posted throughout the day):

Database Performance. Cherry Shu offers tips on running IBM DB2 UDB optimally on the T5440. Note that the results referenced below by bmseer were based on the Oracle Database: the SPECjAppServer2004 result used the Oracle 11g database, and both the Siebel benchmark and the SAP-SD result used an Oracle 10g database. The SugarCRM result referred to above used MySQL 5.0.67.

World Record Benchmarks. Once again, the mysterious bmseer delivers the goods with some World Record Benchmark results on the new server, including noteworthy results with SAP-SD, Siebel, and SPECjAppServer2004.

Sunday Oct 12, 2008

Today Sun released the Sun Enterprise UltraSPARC T5440 server, a wild beast caged in a tiny 4U package. Putting it into perspective, this system has roughly the same performance potential as four Enterprise 10000 (Starfire) systems. Compared to the T5440, the floor space, energy consumption, and cooling required by the four older Starfire systems doesn't bear thinking about, either.

In more modern terms, the T5440 will handily outperform two Sun Fire E2900 systems with 12 dual-core UltraSPARC IV+ chips. Not bad for just four UltraSPARC T2 Plus chips. And when you add in up to 512Gbytes of memory and plenty of I/O connectivity, that's a lot of system.

Using up all that horsepower gets to be an interesting challenge. There are some applications that can consume the entire system, as demonstrated by some of the benchmarks published today. But for the most part, end users will be expecting to find other ways of deploying the considerable resources delivered by the system. Let's take a brief look at some of the issues to consider.

The first important factor is that the available resource is delivered in the form of many hardware threads. A four-chip T5440 delivers 32 cores with a whopping 256 hardware threads, and the Operating System in turn sees 256 CPUs. Each "CPU" has lower single-thread performance than many other current CPU chip implementations. But the capacity to get work done is enormous. For a simple analogy, consider water. One drop of water can't do a lot of damage. But that same drop together with a bunch of its friends carved out the Grand Canyon. (The UltraSPARC T1 chip was not codenamed "Niagara" for nothing!)

For applications that are multi-threaded, multi-process, or multi-user, you can spread the work across the available threads/CPUs. But don't expect an application to show off the strengths of this platform if it has heavy single-threaded dependencies. This is true for all systems based on the UltraSPARC T1, T2, and T2 Plus chip.

The good news is that people are starting to understand this message. When Sun first released the UltraSPARC T1 chip back in December 2005, it was a bit of a shock to many people. The Sun Fire T1000 and T2000 systems were the first wave of a new trend in CPU design, and some took a while to get their heads around the implications. Now, with Intel, AMD, and others jumping on the bandwagon, the stakes have become higher. And the rewards will flow quickest to those application developers who had the foresight to get their act together earliest.

The second factor is virtualization. Once again, people today understand the benefits of consolidating a larger number of smaller systems onto a fewer number of larger systems using virtualization technologies. The T5440 is an ideal platform for such consolidations. With its Logical Domain (LDom) capabilities and with Solaris Containers, there are many effective ways to carve the system up into smaller logical pieces.

And then there's the even simpler strategy of just throwing a bunch of different applications onto the system and letting Solaris handle the resource management. Solaris actually does an excellent job of managing such complex environments.

Summing it up, the T5440 is made to handle large workloads. As long as you don't expect great throughput if you're running a single thread, you should find it has a lot to offer in a small package.

Tuesday Apr 08, 2008

Today Sun is announcing new CMT-based systems, hard on the heels of the UltraSPARC T2 systems launched in October 2007 (the Sun SPARC Enterprise T5120 and T5220 systems). Whereas previous Sun CMT systems were based around a single-socket UltraSPARC T1 or T2 processor, the new systems incorporate two processors, doubling the number of cores and the number of hardware threads compared to UltraSPARC T2-based systems. Each UltraSPARC T2 Plus chip includes 8 hardware strands in each of 8 cores, so the Operating System sees a total of 128 CPUs. The new systems deliver an unprecedented amount of CPU capacity in a package this size, as evidenced by the very impressive benchmark results published today.

Systems come in both 1U and 2U packaging: the 1U Sun SPARC Enterprise T5140 ships with two UltraSPARC T2 Plus chips, each with 4, 6, or 8 cores at 1.2 GHz, and the 2U Sun SPARC Enterprise T5240 ships with two UltraSPARC T2 Plus chips, each with 6 or 8 cores at 1.2 GHz, or 8 cores at 1.4 GHz. For more information about the systems, a whitepaper is available which provides details on the processors and systems.

Once again, some of the engineers who have worked on these new systems have shared their experiences and insights in a series of wide-ranging blogs (for engineers' perspectives on the UltraSPARC T2 systems, check out the CMT Comes of Age blog). These blogs will be cross referenced here as they are posted. You should expect to see more appear in the next day or two, so plan on visiting again later to see what's new.

Here's what the engineers have to say:

UltraSPARC T2 Plus Server Technology. Tim Cook offers insights into what drove processor design toward CMT. Marc Hamilton serves up a brief overview of CMT for those less familiar with the technology. Dwayne Lee touches on the UltraSPARC T2 Plus chip. Josh Simons offers us a look under the hood of the new servers. Denis Sheahan provides an overview of the hardware components of the UltraSPARC T2 Plus sytems, then follows it up with details of the memory and coherency of the UltraSPARC T2 Plus processor. Lawrence Spracklen introduces the crypto acceleration on the chip. Richard Elling talks about RAS (Reliability, Availability, and Serviceability) in the systems, and Scott Davenport describes their predictive self-healing features.

Virtualization. Honglin Su announces the availability of the Logical Domains 1.0.2 release, which supports the UltraSPARC T2 Plus platforms. Eric Sharakan offers further observations on LDoms on T5140 and T5240. Ning Sun discusses a study designed to show how LDoms with CMT can improve scalability and system utilization, and points to a Blueprint on the issue.

Solaris Features. Steve Sistare outlines some of the changes made to Solaris to support scaling on large CMT systems.

Application Performance. What happens when you run Batch Workloads on a Sun CMT server? Giri Mandalika's blog shares experiences running the Oracle E-Business Suite Payroll 11i workload. You might also find Satish Vanga's blog interesting - it focusses on SugarCRM running on MySQL (on a single-socket T5220). Josh Simons reveals the credentials of the new systems for HPC applications and backs it up by pointing to a new SPEComp2001 world record. Joerg Schwarz considers the applicability of the UltraSPARC T2 Plus servers for Health Care applications.

Java Performance. Dave Dagastine announces a World Record SPECjbb2005 result.

Benchmark Performance. The irrepressible bmseer details a number of world record results, including SPECjAppServer2004, SAP-SD 2 tier, and SPECjbb2005.

Open Source Community. Josh Berkus explores the implications for PostgreSQL using virtualization on the platform. Jignesh Shah discusses the possibilities with Glassfish V2 and PostgreSQL 8.3.1 on the T5140 and T5240 systems.

Sun engineers give the inside scoop on the new UltraSPARC T2 systems

Sun launched the Chip-Level MultiThreading (CMT) era back in December 2005 with the release of the highly successful UltraSPARC T1 (Niagara) chip, featured in the Sun Fire T2000 and T1000 systems. With 8 cores, each with 4 hardware strands (or threads), these systems presented 32 CPUs and delivered an unprecedented amount of processing power in compact, eco-friendly packaging. The systems were referred to as CoolThreads servers because of their low power and cooling requirements.

The new systems can probably be best described by some of the engineers who have developed them, tested them, and pushed them to their limits. Their blogs will be cross-referenced here, so if you're interested to learn more, come back from time to time. New blogs should appear in the next 24 hours, and more over the next few weeks.

Performance.
The inimitable bmseer gives us a bunch of good news about benchmark performance on the new systems - no shortage of world records, apparently!
Peter Yakutis offers detailed PCI-E I/O performance data.
Ganesh Ramamurthy muses on the implications of UltraSPARC T2 servers from the perspective of a senior performance engineering director.

System Management.
Find out about Lights Out Management (ILOM) from Tushar Katarki's blog.

Networking.
Alan Chiu gives us some insights into 10 Gigabit Ethernet performance and tuning on the UltraSPARC T2 systems.

RAS.
Richard Elling carries out a performability analysis of the T5120 and T5220 servers.

Virtualization.
Learn about Logical Domains (LDoms) and the release of LDoms 1.0.1 from Honglin Su.
Eric Sharakan has some more to say about LDoms and the UltraSPARC T2.
Ashley Saulsbury presents a flash demo of 64 Logical Domains booting on an UltraSPARC T2 system.
Find out why Sun xVM and Niagara 2 are the ultimate virtualization combo from Marc Hamilton.

Security Performance.
Ning Sun discusses Cryptography Acceleration on T2 systems.
Glenn Brunette offers us a Security Geek's point of view on the T5x20 systems.
Lawrence Spracklen has several posts on UltraSPARC T2 cryptographic acceleration.
Martin Mueller proposes a UltraSPARC T2 system deployment designed to deliver a high performance, high security environment.

Solaris features.
Scott Davenport blogs on Predictive Self-Healing on the T5120.
Steve Sistare gives us a lot of insight into features in Solaris to optimize the UltraSPARC T2 platforms.
Walter Bays salutes the folks who reliably deliver consistent interfaces on the new systems.