Oracle Blog

Jeff's blog

Logical Domains 1.3 Released

Logical Domains 1.3 Released

While this may be overlooked with all the excitement over the impending Oracle acquisition
(I'll wait till the ink is dry to comment, if I do - though I share the excitement
and enthusiasm expressed by many of my colleagues for the opportunity this represents),
innovation continues and
Sun just announced the latest enhancements to Logical Domains, with version 1.3.

Super-fast review of Logical Domains (LDoms)

LDoms is a virtual machine capability for Sun's Chip Multithreading (CMT) servers.
It permits as many as 128 domains (virtual machines) on a single server at no extra cost.
LDoms exploits the "many CPUs" nature of CMT servers for efficient implementation of virtual machines,
without the overhead commonly seen in VM systems. Instead of timeslicing CPUs among many "guests" (which creates overhead),
each domain gets its own CPUs, which it can use at native speed. Domains can also use advanced features
like the hardware cryptographic accelerators that are standard with the CMT servers.

New Features

Among the new features are:

Domain Migration enhancements:

You can now migrate domains that have a cryptographic accelerator (a restriction removed).

Multi-threaded memory compression speeds migration. Memory contents are compressed before being encrypted
and transmitted to the target system - an 80% speedup compared to the prior release.
Processing is multi-threaded and takes advantage of
the CPU threads in the control domain, and exploits the cryptographic accelerator.
You wouldn't want memory contents of a guest domain (with passwords and other private data) to be transmitted in clear, would you? As Liam mentions in his comment below, memory contents are always encrypted, but it's much faster with the hardware accelerator.

Automated migration for non-interactive migration. Passwords are stored in root-access-only files so migration can
be done without interactive prompts.

Link-based IPMP
Previously you couldn't do IP Multi Pathing link-based failure detection in a guest domain using a virtual switch
(probe based failure detection worked, however).
If the physical NIC's connection failed, that status wasn't passed
to virtual network devices connected to the virtual switch associated with the NIC.
The connection from virtual NIC to virtual switch was intact, but didn't know the downstream connection wasn't.
You can now specify the link-prop=phys-state option on the virtual device to pass link state of physical NIC to virtual for failover.

Crypto Dynamic Reconfiguration (DR)
Guests with crypto accelerators can now have CPUs dynamically added and removed, providing
they are running on Solaris 10 10/09 or later (a restriction removed).

Boot domain from disk bigger than 1TB

Ability to change guest hostid You can use the
host-id and mac-addr in the ldm set-domain command to change the hostid.

There are other changes, plus bug fixes and performance improvements, but the above are the highlights.
There is one important restriction: LDoms 1.3 is for T2 and T2+ based systems: the T5x40 and T5x20 servers and blades.
Older, T1-based systems such as T1000 and T2000 can continue to use LDoms 1.2.

Dynamic Resource Management

This is my favorite addition...
Before explaining it, a little more review of logical domains. Instead of assigning CPU shares or weights
as is done with traditional hypervisors or the Solaris Fair Share Scheduler, you adjust the CPU capacity of
a domain by assigning it more or fewer CPUs.
This is consistent with the CPU-rich design of CMT servers - with so many addressable CPUs, you simply don't have
to timeslice CPUs to share the physical processor. You can assign them directly to the guest domain.
Each domain is given some number of CPU threads that belong to it, and to it alone.

Logical Domains has supported dynamic reconfiguration from the outset: you adjust CPU capacity for a domain by
adding and removing CPUs, using commands like:

# ldm set-vcpu 16 mydomain # set the number of CPUs for 'mydomain'
# ldm add-vcpu 8 mydomain # give it some more CPUs for a spike in load
# ldm rm-vcpu 8 mydomain # take them back - set-vcpu would have worked too

It is easy to put these commands in a script, perhaps initiated by cron.
It also has always been possible to parse the output of ldm list -p to see the CPU utilization for each domain
and adjust CPU counts accordingly.
A mere "SMOP" (Small Matter Of Programming), eh? But it takes a fair bit of work to do this properly!

LDoms Dynamic Resource Management

LDoms 1.3 provides a policy-based resource manager that automatically adds
or removes CPUs from a running domain based on its utilization and relative priority.
Policies can be prioritized to ensure important domains get preferential access to resources.
Policies can also be enabled or disabled manually or based on time of day for different prime shift and off-hours policies.
For example, one domain may have the highest resource needs and priority during the day time, while a domain running
batch work may be more resource-intensive at night.

Policy rules specify the number of CPUs that a domain has, bounded by mininum and maximum values, and based on their utilization:

The number of CPUs is adjusted between vcpu-min and vcpu-max based on util-upper and util-lower CPU busy percentages (all of these variables are property values associated with the policy)

If CPU utilization exceeds the value of util-upper, virtual CPUs are added to the domain until the number is between vcpu-min and vcpu-max

If the utilization drops below util-lower, virtual CPUs are removed from the domain until the number is between vcpu-min and vcpu-max

If vcpu-min is reached, no more virtual CPUs can be dynamically removed. If vcpu-max is reached, no more virtual CPUs can be dynamically added (manual changes to the number of CPUs can still be done using the ldm commands shown above)

Multiple policies can be in effect, and are optionally controlled by tod-begin and tod-end (Time Of Day) values

The resource manager includes
ramp-up (attack) and ramp-down (decay) controls to adjust response to workload changes, specifying the
number of CPUs to add or remove based on changes in utilization, and how quickly the resource manager responds.

Resource management is disabled in elastic power management mode, in which CPUs are powered down when unused to reduce power consumption.

This policy controls the number of CPUs for domain ldom1, is named high-usage and is in effect between 9am and 6pm.
The lower and upper CPU utilization settings are 25% and 75% CPU busy.
The number of CPUs is adjusted between 2 and 16: one CPU is added or removed at a time (the attack and decay values).
For example, if the CPU utilization is above 75%, a CPU is added unless ldom1 already has 16 CPUs.

This provides flexible and powerful dynamic CPU resource management for Logical Domains. I expect there will be future
enhancements, possibly for other resources categories.

Summary

Logical Domains 1.3 provides a new level of functional capability, representing the continued investment and enhancement
of this flexible and powerful virtualization capability.
For further information on the latest update, see
www.sun.com/ldoms
<script type="text/javascript">
var sc_project=6611784;
var sc_invisible=1;
var sc_security="4251aa3a";