Oracle Blog

From My Brain to Your Browser

Monday Dec 08, 2014

Oracle Solaris 11 introduced the Image Package System, a new software packaging in which Solaris software components are delivered. It replaces the System V Release 4 ("SVR4") packaging system used by Solaris 2.0 through Solaris 10. If you are learning Solaris 11, learning about IPS is a must! The links below will take you to the documents, videos, blog entries, and other artifacts that I think are the most important ones to begin.

Wednesday Nov 12, 2014

Oracle offers the ability to license much of its software, including the Oracle Database, based on the quantity of CPUs that will run the software. When performance goals can be met with only a subset of the computer's CPUs, it may be appropriate to limit licensing costs by using a processor-based licensing metric and using one of the hardware partitioning technologies included with the computer.

In general, computers that use the Oracle Solaris OS can choose from a variety of resource management features, including CPU, memory, and I/O. Because of the high level of integration between the plethora of Solaris features, using resource management does not mean avoiding other features. In particular, the use of resources by workloads running in Solaris Zones can be constrained.

The document Hard Partitioning With Oracle Solaris Zones explains the different Solaris features that can be used to limit software licensing costs when a processor-based metric is used. It also demonstrates the use of those features. The approved methods include the ability to limit a Solaris Zone to a specific quantity of CPUs, or the ability to limit a set of Solaris Zones to a specific quantity of shared CPUs.

Tuesday Aug 26, 2014

The Introduction

Oracle Solaris 11.2 introduced Oracle Solaris Kernel Zones. Kernel Zones (KZs)
offer a midpoint between traditional operating system virtualization
and virtual machines. They exhibit the low overhead and low management
effort of Solaris Zones, and add the best parts of the independence of
virtual machines.

A Kernel Zone is a type of Solaris Zone that runs its own Solaris kernel. This gives each Kernel Zone complete independence of software packages, as well as other benefits.

One of the more interesting new abilities that Kernel Zones bring to
Solaris Zones is the ability to "pause" a running KZ and, "resume" it
on a different computer - or the same computer, if you prefer.

Of what value is the ability to "pause" a zone? One potential use is moving a workload from a smaller computer (with too few CPUs, or insufficient RAM) to a larger one. Some workloads do not maintain much state, and can restart quickly, and so they wouldn't benefit from suspend/resume. Others, such as static (read-only) databases, may take 30 minutes to start and obtain good performance. The ability to suspend, but not stop, the workload and its operating system can be very valuable.

Another possible use of this ability is the staging of multiple KZs which have already booted and, perhaps, have started to run a workload. Instead of booting in a few minutes, the workload can continue from a known state in just a few seconds. Further, the suspended zone can be "unpaused" on the computer of your choice. Suspended kernel zones are like a nest of dozing ants, waiting to take action at a moment's notice.

This blog entry shows the steps to create and move a KZ, highlighting both
the Solaris iSCSI implementation as well as kernel zones and their suspend/resume feature.
Briefly, the steps are:

Create shared storage

Make shared storage available to both computers - the one that will run the zone, at first, as well as the computer on which the zone will be resumed.

Configure the zone on each system.

Install the zone on one system.

"Warm migrate" the zone by pausing it, and then, on the other computer, resuming it.

Links to relevant documentation and blogs are provided at the bottom.

The Method

The Kernel Zones suspend/resume feature requires the use of storage accessible by multiple computers. However, neither Kernel Zones nor suspend/resume requires a specific type of shared storage. In Solaris 11.2 the only types of shared storage that supports zones are iSCSI and Fiber Channel. This blog entry uses iSCSI.

The example below uses three computers. One is the iSCSI target, i.e. the storage server. The other two run the KZ, one at a time. All three systems run Solaris 11.2, although the iSCSI features below work on early updates to Solaris 11, or a ZFS Storage Appliance (the current family shares the brand name ZS3), or another type of iSCSI target.

In the commands shown below, the prompt "storage1#" indicates commands that would be entered into the iSCSI target. Similarly, "node1#" indicates commands that you would enter into the first computer that will run the kernel zone. The few commands preceded by the prompt "bothnodes#" must be run on the both node1 and node2. The name of the kernel zone is "ant1".

Finally, note that these commands should be run by a non-root user who prefaces each command with the pseudo-command "sudo".

Step 1. Provide shared storage for the kernel zone. The zone only needs one device for its zpool. Redundancy is provided by the zpool in the iSCSI target.
(For a more detailed explanation, see the link to the COMSTAR documentation in the section "The Links" below.)

Step 2B. On each of the two computers that will host the zone, identify the Storage Uniform Resource Identfiers ("SURI") - see suri(5) for more information. This command tells you the SURI of that LUN, in each of multiple formats. We'll need this SURI to specify the storage for the kernel zone.

Step 2C. When you suspend a kernel zone, its RAM pages must be stored temporarily in a file. In order to resume the zone on a different computer, the "suspend file" must be on storage that both computers can access. For this example, we'll use an NFS share. (Another iSCSI LUN could be used instead.) The method shown below is not particularly secure, although the suspended image is first encrypted. Secure methods would require the use of other Solaris features, but they are not the topic of this blog entry.

Step 3. Configure a kernel zone, using the two iSCSI LUNs and a system profile.
You can configure a kernel zone very easily. The only required settings are the name and the use of the kernel zone template. The name of the latter is SYSsolaris-kz. That template specifies a VNIC, 2GB of dedicated RAM, 1 virtual CPU, and local storage that will be configured automatically when the zone is installed.
We need shared storage instead of local storage, so one of the first steps will be deleting the local storage resource. That device will have an ID number of zero. After deleting that resource, we add the LUN, using the SURI determined earlier.

Step 5. With all of the hard work behind us, we can "warm migrate" the zone. The first step is preparation of the destination system - "node2" in our example - by applying the zone's configuration to the destination.

node1# zonecfg -z ant1 export -f /mnt/suspend/ant1.cfg

node2# zonecfg -z ant1 -f /mnt/suspend/ant1.cfg

The "detach" operation does not delete anything. It merely tells node1 to cease considering the zone to be usable.

node1# zoneadm -z ant1 suspend
node1# zoneadm -z ant1 detach

A separate "resume" sub-command for zoneadm was not necessary. The "boot" sub-command fulfills that purpose.

node2# zoneadm -z ant1 attach
node2# zoneadm -z ant1 boot

Of course, "warm migration" is different from "live migration" in one important respect: the duration of the service outage. Live migration achieves a service outage that lasts a small fraction of a second. In one experiment, warm migration of a kernel zone created a service outage that lasted 30 seconds. It's not live migration, but is an important step forward, compared to other types of Solaris Zones.

The Notes

This example used a zpool as back-end storage. That zpool provided data redundancy, so additional redundancy was not needed within the kernel zone. If unmirrored devices (e.g. physical disks were specified in zonecfg) then data redundancy should be achieved within the zone. Fortunately, you can specify two devices in zonecfg, and "zoneadm ... install" will automatically mirror them.

In a simple network configuration, the steps above create a kernel zone that has normal network access. More complicated networks may require additional steps, such as VLAN configuration, etc.

Some steps regarding file permissions on the NFS mount were omitted for clarity. This is one of the security weaknesses of the steps shown above. All of the weaknesses can be addressed by using additional Solaris features. These include, but are not limited to, iSCSI features (iSNS, CHAP authentication, RADIUS, etc.), NFS security features (e.g. NFS ACLs, Kerberos, etc.), RBAC, etc.

Wednesday Aug 28, 2013

Recently, I contributed to a new white paper that
addresses the question
"ok... now I have a computer with
over 1,000 hardware threads... what do I do
with all of those threads?" The topics include details of the newest SPARC S3
core, workload consolidation and server virtualization, multi-threaded programming,
and more.

Thursday Jul 11, 2013

Curious about Oracle Solaris, Oracle Linux or Oracle VM?
Or are you beyond curious, and in need of hands-on experience?

Oracle is proud to host "Virtual Sysadmin Day!" During this event,
you will learn how to build a secure, multi-level application deployed using virtualization capabilities of Oracle Solaris 11, and/or many other activities.

Thursday Jun 27, 2013

Boot Environments for Solaris 10 Branded Zones

Until recently, Solaris 10 Branded Zones on Solaris 11 suffered one notable regression:
Live Upgrade did not work. The individual packaging and patching tools work correctly, but the ability to upgrade Solaris while the production workload continued running did not exist. A recent Solaris 11 SRU (Solaris 11.1 SRU 6.4) restored most of that functionality, although with a slightly different concept, different commands, and without all of the feature details. This new method gives you the ability to create and manage multiple boot environments (BEs) for a Solaris 10 Branded Zone, and modify the active or any inactive BE, and to do so while the production workload continues to run.

Background

In case you are new to Solaris: Solaris includes a set of features that enables you to create a bootable Solaris image, called a Boot Environment (BE). This newly created image can be modified while the original BE is still running your workload(s). There are many benefits, including improved uptime and the ability to reboot into (or downgrade to) an older BE if a newer one has a problem.

In Solaris 10 this set of features was named Live Upgrade. Solaris 11 applies the same basic concepts to the new packaging system (IPS) but there isn't a specific name for the feature set. The features are simply part of IPS. Solaris 11 Boot Environments are not discussed in this blog entry.

Although a Solaris 10 system can have multiple BEs, until recently a Solaris 10 Branded Zone (BZ) in
a Solaris 11 system did not have this ability. This limitation was addressed recently, and that enhancement is the subject of this blog entry.

This new implementation uses two concepts. The first is the use of a ZFS clone for each BE. This makes it very easy to create a BE, or many BEs. This is a distinct advantage over the Live Upgrade feature set in Solaris 10, which had a practical limitation of two BEs on a system, when using UFS. The second new concept is a very simple mechanism to indicate the BE that should be booted: a ZFS property. The new ZFS property is named com.oracle.zones.solaris10:activebe (isn't that creative? ).

It's important to note that the property is inherited from the original BE's file system to any BEs you create. In other words, all BEs in one zone have the same value for that property. When the (Solaris 11) global zone boots the Solaris 10 BZ, it boots the BE that has the name that is stored in the activebe property.

Here is a quick summary of the actions you can use to manage these BEs:

To create a BE:

Create a ZFS clone of the zone's root dataset

To activate a BE:

Set the ZFS property of the root dataset to indicate the BE

To add a package or patch to an inactive BE:

Mount the inactive BE

Add packages or patches to it

Unmount the inactive BE

To list the available BEs:

Use the "zfs list" command.

To destroy a BE:

Use the "zfs destroy" command.

Preparation

Before you can use the new features, you will need a Solaris 10 BZ on a Solaris 11 system. You can use these three steps - on a real Solaris 11.1 server or in a VirtualBox guest running Solaris 11.1 - to create a Solaris 10 BZ. The Solaris 11.1 environment must be at SRU 6.4 or newer.

The rest of this blog entry demonstrates the commands you can use to accomplish the aforementioned actions related to BEs.

New features in action

Note that the demonstration of the commands occurs in the Solaris 10 BZ, as indicated by the shell prompt "s10z# ". Many of these commands can be performed in the global zone instead, if you prefer. If you perform them in the global zone, you must change the ZFS file system names.

The output shows that two BEs exist. Their names are "zbe-0" and "newBE".

You can tell Solaris that one particular BE should be used when the zone next boots by using a ZFS property. Its name is com.oracle.zones.solaris10:activebe. The value of that property is the name of the clone that contains the BE that should be booted.

Patch an inactive BE

At this point, you can modify the original BE. If you would prefer to modify the new BE, you can restore the original value to the activebe property and reboot, and then mount the new BE to /mnt (or another empty directory) and modify it.

Let's mount the original BE so we can modify it. (The first command is only needed if you haven't already mounted that BE.)

Delete an inactive BE

ZFS clones are children of their parent file systems. In order to destroy the parent, you must first "promote" the child. This reverses the parent-child relationship. (For more information on this, see the documentation.)

The original rpool/ROOT file system is the parent of the clones that you create as BEs. In order to destroy an earlier BE that is that parent of other BEs, you must first promote one of the child BEs to be the ZFS parent. Only then can you destroy the original BE.

Documentation

This feature is so new, it is not yet described in the Solaris 11 documentation. However, MOS note 1558773.1 offers some details.

Conclusion

With this new feature, you can add and patch packages to boot environments of a Solaris 10 Branded Zone. This ability improves the manageability of these zones, and makes their use more practical. It also means that you can use the existing P2V tools with earlier Solaris 10 updates, and modify the environments after they become Solaris 10 Branded Zones.

Wednesday Jun 12, 2013

Many people have asked whether Oracle Solaris 11 uses sparse-root zones or whole-root zones. I think the best answer is "both and neither, and more" - but that's a wee bit confusing. This blog entry attempts to explain that answer.

First a recap: Solaris 10 introduced the Solaris Zones feature set, way back in 2005. Zones are a form of
server virtualization called "OS (Operating System) Virtualization." They improve consolidation ratios by isolating processes from each other so that they cannot interact. Each zone has its own set of users, naming services, and other software components. One of the many advantages is that there is no need for a hypervisor, so there is no performance overhead. Many data centers run tens to hundreds of zones per server!

In Solaris 10, there are two models of package deployment for Solaris Zones. One model is called "sparse-root" and the other "whole-root." Each form has specific characteristics, abilities, and limitations.

A whole-root zone has its own copy of the Solaris packages. This allows the inclusion of other software in system directories - even though that practice has been discouraged for many years. Although it is also possible to modify the Solaris content in such a zone, e.g. patching a zone separately from the rest, this was highly frowned on. (More importantly, modifying the Solaris content in a whole-root zone may lead to an unsupported configuration.)

The other model is called "sparse-root." In that form, instead of copying all of the Solaris packages into the zone, the directories containing Solaris binaries are re-mounted into the zone. This allows the zone's users to access them at their normal places in the directory tree. Those are read-only mounts, so a zone's root user cannot modify them. This improves security, and also reduces the amount of disk space used by the zone - 200MB instead of the usual 3-5 GB per zone. These loopback mounts also reduce the amount of RAM used by zones because Solaris only stores in RAM one copy of a program that is in use by several zones. This model also has disadvantages. One disadvantage is the inability to add software into system directories such as /usr. Also, although a sparse-root can be migrated to another Solaris 10 system, it cannot be moved to a Solaris 11 system as a "Solaris 10 Zone."

In addition to those contrasting characteristics, here are some characteristics of zones in Solaris 10 that are shared by both packaging models:

A zone can modify its own configuration files in /etc.

A zone can be configured so that it manages its own networking, or so that it cannot modify its network configuration.

It is difficult to give a non-root user in the global zone the ability to boot and stop a zone, without giving that user other abilities.

In a zone that can manage its own networking, the root user can do harmful things like spoof other IP addresses and MAC addresses.

It is difficult to assign network patcket processing to the same CPUs that a zone used. This could lead to unpredictable performance and performance troubleshooting challenges.

You cannot run a large number of zones in one system (e.g. 50) that each managed its own networking, because that would require assignment of more physical NICs than available (e.g. 50).

Except when managed by Ops Center, zones could not be safely stored on NAS.

Solaris 10 Zones cannot be NFS servers.

The fsstat command does not report statistics per zone.

Solaris 11 Zones use the new packaging system of Solaris 11. Their configuration does not offer a choice
of packaging models, as Solaris 10 does. Instead, two (well, four) different models of "immutability"
(changeability) are offered. The default model allows a privileged zone user to modify the zone's content.
The other (three) limit the content which can be changed: none, or two overlapping sets of configuration files.
(See "Configuring and Administering Immutable Zones".)

Solaris 11 addresses many of those limitations. With the characteristics listed above in mind, the
following table shows the similarities and differences between zones in Solaris 10 and in Solaris 11.
(Cells in a row that are similar have the same background color.)

As you can see, the statement "Solaris 11 Zones are whole-root zones" is only true using the narrowest definition of whole-root zones: those zones which have their own copy of Solaris packaging content. But there are other valuable characteristics of sparse-root zones that are still available in Solaris 11 Zones. Also, some Solaris 11 Zones do not have some characteristics of whole-root zones.

For example, the table above shows that you can configure a Solaris 11 zone that has
read-only Solaris content. And Solaris 11 takes that concept further, offering the
ability to tailor that immutability. It also shows that Solaris 10 sparse-root and
whole-root zones are more similar to each other than to Solaris 11 Zones.

Conclusion

Solaris 11 Zones are slightly different from Solaris 10 Zones. The former can achieve
the goals of the latter, and they also offer features not found in Solaris 10 Zones.
Solaris 11 Zones offer the best of Solaris 10 whole-root zones and sparse-root zones,
and offer an array of new features that make Zones even more flexible and powerful.

Tuesday Apr 23, 2013

Last week, SPEC published the most recent result for the SPECjbb013-MultiJVM benchmark.
This benchmark "is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community" according to SPEC.

For the first table below, I selected all of the max-JOPS results greater than 50,000 JOPS using the most recent Java version, for the
SPARC T5-2
and for competing systems. From the SPECjbb2013 data, I derived two new values, max-JOPS/chip and max-JOPS/core. The latter value compensates for the different quantity of cores used in one of the tests. Finally, the "Advantage of T5" column shows the portion by which the T5-2 cores perform better than the other systems' cores. For example, on this benchmark a 32-core T5-2 computer demonstrated 15% better per-core performance than an HP DL560p with the same number of cores.

As you can see, a SPARC T5 core is faster than an Intel Xeon core, compared against competing systems with 32 or more cores.

Model

CPU

Chips

Cores

OS

max-JOPS

Date Published

max-JOPS per chip

max-JOPS per core

Advantage of T5

SPARC T5-2

SPARC T5

2

32

Solaris 11.1

75658

April 2013

37829

2364

HP ProLiant DL560p Gen8

Intel E5-4650

4

32

Windows Server 2008 R2 Enterprise

67850

April 2013

16963

2120

12%

HP ProLiant DL560p Gen8

Intel E5-4650

4

32

RHEL 6.3

66007

April 2013

16502

2063

15%

HP ProLiant DL980 G7

Intel E7-4870

8

80

RHEL 6.3

106141

April 2013

13268

1327

78%

The SPECjbb2013 benchmark also includes a performance measure called "critical-JOPS." This measurement represents the ability of a system to achieve high levels of throughput while still maintaining a short response time. The performance advantage of the T5 cores is even more pronounced.

Model

CPU

Chips

Cores

OS

critical- JOPS

Date Published

critical- JOPS per chip

critical- JOPS per core

Advantage of T5

SPARC T5-2

SPARC T5

2

32

Solaris 11.1

23334

April 2013

11667

729

HP ProLiant DL560p Gen8

Intel E5-4650

4

32

Windows Server 2008 R2 Enterprise

16199

April 2013

4050

506

44%

HP ProLiant DL560p Gen8

Intel E5-4650

4

32

RHEL 6.3

18049

April 2013

4512

564

29%

HP ProLiant DL980 G7

Intel E7-4870

8

80

RHEL 6.3

23268

April 2013

2909

291

151%

As always, care should be taken in choosing a benchmark that is similar to the workload that you will run on a computer. For example, if you plan to implement a database server, using the SPECint benchmark will not help you, because that benchmark merely measures the performance of the CPU cores and speed and size of memory caches (and perhaps the memory system). It does not measure performance of network or disk I/O, and both of those are important factors in database performance - especially storage I/O.

According to the SPECjbb2013 design document, this benchmark "exercises the CPU, memory and network I/O, but not disk I/O."
Because of this, it can be used as a simple method to estimate relative Java processing performance. From the data shown in the tables above, it is clear that the newest SPARC cores deliver Java performance that is competitive with the most recent Intel Xeon CPU cores.

Wednesday Apr 17, 2013

IDC's conclusion: "Oracle has invested deeply in improving`the performance of the T-series processors it developed following its acquisition of Sun Microsystems in 2010. It has pushed its engineering efforts to release new SPARC processor technology — providing a much more competitive general-purpose server platform. This will provide an immediate improvement for its large installed base, even as it lends momentum to a new round of competition in the Unix server marketplace."

IDC also noted the "dramatic performance gains for SPARC, with 16-core microprocessor technology based on three years of IP (intellectual property) development at Oracle, following Oracle's acquisition of Sun Microsystems Inc. in January, 2010."

The new T5 servers use SPARC T5 processor chips that offer more than double the performance of SPARC T4 chips, which were released just over a year ago. And the T4 chips, in turn, were a significant departure from all previous SPARC CMT CPUs, in that the T4 chips offered excellent performance for single-threaded workloads.

The new M5 servers use up to 32 SPARC M5 processors, each using the same "S3" SPARC cores as the T5 chips.

The new SPARC T5 chip uses the "S3" core which has been in the SPARC T4 generation for over a year.
That core offers, among other things, 8 hardware threads, two simultaneous integer pipelines,
and some other compute units as well (FP, crypto, etc.). The S3 core also includes the instructions
necessary to work with the SPARC hypervisor that implements SPARC virtual machines
(Oracle VM Server for SPARC, previously called Logical Domains or LDoms) including Live Migration of VMs.

However, four significant improvements have been made in the new systems:

16 cores on each T5 chip, instead of T4's 8 cores per chip, was made possible
because of a die shrink (to 28 nm).

An increase in clock rate to 3.6 GHz yields an immediate 20% improvement in processing over T4 systems.

Increased chip scalability allows up to 8 CPU chips per system, doubling the maximum
number of cores in the mid-range systems.

In addition to the mid-range servers, now the high-end M5-32 also supports OVM Server for SPARC (LDoms),
while maintaining the ability to use hard partitions (Dynamic Domains) in that system.
(The T5-based servers (PDF) also have LDoms, just like the T4-based systems.)

The result of those is the "world's fastest microprocessor." Between the four T5 (mid-range) systems and
the M5-32, this new generation of systems has already achieved
17 performance world records.

A single, 2-socket T5-2 has three times the performance, at 13% of the cost, of two Power 770's - on a JD Edwards performance test.

Two T5-2 servers have almost double the Siebel performance of two Power 750 servers - at one-fourth the price.

One 8-processor T5-8 outperforms an 8-processor Power 780 - at one-seventh the cost - on the common SPECint_rate 2006 benchmark.

The new high-end SPARC system - the M5-32 - sports 192 cores (1,536 hardware threads) of compute power. It can also be packed with 32 TB (yes, terabytes!) of RAM. Put your largest DB entirely in RAM, and see how fast it goes!

Oracle has refreshed its entire SPARC server line all at once, greatly improving performance - not only compared to the previous SPARC generation, but also compared to the current generation of servers from other manufacturers.

Monday Mar 25, 2013

On Tuesday, Oracle will announce new SPARC servers with the world's fastest
microprocessor. Considering that the current SPARC processors
already have performance comparable with the newest from competing architectures,
the performance of these new processors should
give you the best real-world performance for your enterprise workloads.