Oracle Blog

Blog for scottdickson

Wednesday Jul 09, 2014

This is the summer of Oracle Solaris 11.2 and OpenStack workshops. I'm on the road covering some, along with by team mates, Pavel Anni and Bob Netherton.

The Solaris workshops are full-day, hands-on workshops that will give you not only an introduction into Solaris 11, but a view into the new features and capabilities in Solaris 11.2. Primarily, these will use Pavel Anni's fantastic hands-on lab and Virtualbox.

OpenStack workshops are a shorter, 2-3 hour event that will let you see what we are up to with OpenStack in Solaris and how OpenStack can help you move into a world of modern cloud computing.

These are only the events that I am participating in. Take a look at http://oracle.com/events to see the rest of the events. Also take a look at Bob's blog for more info on where he will be.

Really, the overall process is exactly the same, at least for Solaris 11, with only minor updates. We will focus on Solaris 11 for this blog. Once I verify that the same approach works for Solaris 10, I will provide another update.

Booting Solaris 11 on x86

Just as before, in order to configure the server for network boot across a card-based NIC, it is necessary to declare the asset to associate the additional MACs with the server. You likely will need to access the server console via the ILOM to figure out the MAC and to get a good idea of the network instance number.

The simplest way to find both of these is to start a network boot using the desired NIC and see where it appears in the list of network interfaces and what MAC is used when it tries to boot. Go to the ILOM for the server. Reset the server and start the console. When the BIOS loads, select the boot menu, usually with Ctrl-P. This will give you a menu of devices to boot from, including all of the NICs. Select the NIC you want to boot from. Its position in the list is a good indication of what network number Solaris will give the device.

In this case, we want to boot from the 5th interface (GB_4, net4). Pick it and start the boot processes. When it starts to boot, you will see the MAC address for the interface

Once you have the network instance and the MAC, go through the same process of declaring the asset as in the SPARC case. This associates the additional network interface with the server..

Creating an OS Provisioning Plan

The simplest way to do the boot via an alternate interface on an x86 system is to do a manual boot. Update the OS provisioning profile as in the SPARC case to reflect the fact that we are booting from a different interface. Update, in this case, the network boot device to be GB_4/net4, or the device corresponding to your network instance number. Configure the profile to support manual network boot by checking the box for manual boot in the OS Provisioning profile.

Booting the System

Once you have created a profile and plan to support booting from the additional NIC, we are ready to install the server. Again, from the ILOM, reset the system and start the console. When the BIOS loads, select boot from the Boot Menu as above. Select the network interface from the list as before and start the boot process. When the grub bootloader loads, the default boot image is the Solaris Text Installer. On the grub menu, select Automated Installer and Ops Center takes over from there.

Lessons

The key lesson from all of this is that Ops Center is a valuable tool for provisioning servers whether they are connected via built-in network interfaces or via high-speed NICs on cards. This is great news for modern datacenters using converged network infrastructures. The process works for both SPARC and x86 Solaris installations. And it's easy and repeatable.

Tuesday Oct 16, 2012

It's been a long time since last I added something here, but having some conversations this last week, I got inspired to update things.

I've been spending a lot of time with Ops Center for managing and installing systems these days. So, I suspect a number of my upcoming posts will be in that area.

Today, I want to look at how to provision Solaris using Ops Center when your network is not connected to one of the built-in NICs. We'll talk about how this can work for both Solaris 10 and Solaris 11, since they are pretty similar. In both cases, WANboot is a key piece of the story.

Here's what I want to do: I have a Sun Fire T2000 server with a Quad-GbE nxge card installed. The only network is connected to port 2 on that card rather than the built-in network interfaces. I want to install Solaris on it across the network, either Solaris 10 or Solaris 11. I have met with a lot of customers lately who have a similar architecture. Usually, they have T4-4 servers with the network connected via 10GbE connections.

Add to this mix the fact that I use Ops Center to manage the systems in my lab, so I really would like to add this to Ops Center. If possible, I would like this to be completely hands free. I can't quite do that yet. Close, but not quite.

WANBoot or Old-Style NetBoot?

When a system is installed from the network, it needs some help getting the process rolling. It has to figure out what its network configuration (IP address, gateway, etc.) ought to be. It needs to figure out what server is going to help it boot and install, and it needs the instructions for the installation. There are two different ways to bootstrap an installation of Solaris on SPARC across the network. The old way uses a broadcast of RARP or more recently DHCP to obtain the IP configuration and the rest of the information needed. The second is to explicitly configure this information in the OBP and use WANBoot for installation

WANBoot has a number of benefits over broadcast-based installation: it is not restricted to a single subnet; it does not require special DHCP configuration or DHCP helpers; it uses standard HTTP and HTTPS protocols which traverse firewalls much more easily than NFS-based package installation. But, WANBoot is not available on really old hardware and WANBoot requires the use o Flash Archives in Solaris 10. Still, for many people, this is a great approach.

As it turns out, WANBoot is necessary if you plan to install using a NIC on a card rather than a built-in NIC.

Identifying Which Network Interface to Use

One of the trickiest aspects to this process, and the one that actually requires manual intervention to set up, is identifying how the OBP and Solaris refer to the NIC that we want to use to boot. The OBP already has device aliases configured for the built-in NICs called net, net0, net1, net2, net3. The device alias net typically points to net0 so that when you issue the command "boot net -v install", it uses net0 for the boot. Our task is to figure out the network instance for the NIC we want to use.

We will need to get to the OBP console of the system we want to install in order to figure out what the network should be called. I will presume you know how to get to the ok prompt. Once there, we have to see what networks the OBP sees and identify which one is associated with our NIC using the OBP command show-nets.

By looking at the devalias and the show-nets output, we can see that our Quad-GbE card must be the device nodes starting with /pci@780/pci@0/pci@8/network@0. The cable for our network is plugged into the 3rd slot, so the device address for our network must be /pci@780/pci@0/pci@8/network@0,2.

With that, we can create a device alias for our network interface. Naming the device alias may take a little bit of trial and error, especially in Solaris 11 where the device alias seems to matter more with the new virtualized network stack. So far in my testing, since this is the "next" network interface to be used, I have found success in naming it net4, even though it's a NIC in the middle of a card that might, by rights, be called net6 (assuming the 0th interface on the card is the next interface identified by Solaris and this is the 3rd interface on the card). So, we will call it net4. We need to assign a device alias to it:

{4} ok nvalias net4 /pci@780/pci@0/pci@8/network@0,2

{4} ok devalias
net4 /pci@780/pci@0/pci@8/network@0,2
...

We also may need to have the MAC for this particular interface, so let's get it, too. To do this, we go to the device and interrogate its properties.

From this, we can see that the MAC for this interface is 00:21:28:20:42:92. We will need this later.

This is all we need to do at the OBP. Now, we can configure Ops Center to use this interface.

Network Boot in Solaris 10

Solaris 10 turns out to be a little simpler than Solaris 11 for this sort of a network boot. Since WANBoot in Solaris 10 fetches a specified

In order to install the system using Ops Center, it is necessary to create a OS Provisioning profile and its corresponding plan. I am going to presume that you already know how to do this within Ops Center 12c and I will just cover the differences between a regular profile and a profile that can use an alternate interface.

Create a OS Provisioning profile for Solaris 10 as usual. However, when you specify the network resources for the primary network, click on the name of the NIC, probably GB_0, and rename it to GB_N/netN, whereNis the instance number you used previously in creating the device alias. This is where the trial and error may come into play. You may need to try a few instance numbers before you, the OBP, and Solaris all agree on the instance number. Mark this as the boot network.

For Solaris 10, you ought to be able to then apply the OS Provisioning profile to the server and it should install using that interface. And if you put your cards in the same slots and plug the networks into the same NICs, this profile is reusable across multiple servers.

Why This Works

If you watch the console as Solaris boots during the OSP process, Ops Center is going to look for the device alias netN. Since WANBoot requires a device alias called just net, Ops Center uses the value of your netNdevice alias and assigns that device to the net alias. That means that boot net will automatically use this device. Very cool! Here's a trace from the console as Ops Center provisions a server:

See what happened? Ops Center looked for the network device alias called net4 that we specified in the profile, took the value from it, and made it the net device alias for the boot. Pretty cool!

WANBoot and Solaris 11

Solaris 11 requires an additional step since the Automated Installer in Solaris 11 uses the MAC address of the network to figure out which manifest to use for system installation. In order to make sure this is available, we have to take an extra step to associate the MAC of the NIC on the card with the host. So, in addition to creating the device alias like we did above, we also have to declare to Ops Center that the host has this new MAC.

Declaring the NIC

Start out by discovering the hardware as usual. Once you have discovered it, take a look under the Connectivity tab to see what networks it has discovered. In the case of this system, it shows the 4 built-in networks, but not the networks on the additional cards. These are not directly visible to the system controller.

In order to add the additional network interface to the hardware asset, it is necessary to Declare it. We will declare that we have a server with this additional NIC, but we will also specify the existing GB_0 network so that Ops Center can associate the right resources together. The GB_0 acts as sort of a key to tie our new declaration to the old system already discovered. Go to the Assets tab, select All Assets, and then in the Actions tab, select Add Asset. Rather than going through a discovery this time, we will manually declare a new asset.

When we declare it, we will give the hostname, IP address, system model that match those that have already been discovered. Then, we will declare both GB_0 with its existing MAC and the new GB_4 with its MAC. Remember that we collected the MAC for GB_4 when we created its device alias.

After you declare the asset, you will see the new NIC in the connectivity tab for the asset. You will notice that only the NICs you listed when you declared it are seen now. If you want Ops Center to see all of the existing NICs as well as the additional one, declare them as well. Add the other GB_1, GB_2, GB_3 links and their MACs just as you did GB_0 and GB_4.

Installing the OS

Once you have declared the asset, you can create an OS Provisioning profile for Solaris 11 in the same way that you did for Solaris 10. The only difference from any other provisioning profile you might have created already is the network to use for installation. Again, use GB_N/netN where N is the interface number you used for your device alias and in your declaration.

And away you go. When the system boots from the network, the automated installer (AI) is able to see which system manifest to use, based on the new MAC that was associated, and the system gets installed.

Conclusion

So, why go to all of this trouble? More and more, I find that customers are wiring their data center to only use higher speed networks - 10GbE only to the hosts. Some customers are moving aggressively toward consolidated networks combining storage and network on CNA NICs. All of this means that network-based provisioning cannot rely exclusively on the built-in network interfaces. So, it's important to be able to provision a system using other than the built-in networks. Turns out, that this is pretty straight-forward for both Solaris 10 and Solaris 11 and fits into the Ops Center deployment process quite nicely.

Hopefully, you will be able to use this as you build out your own private cloud solutions with Ops Center.

Saturday Nov 05, 2011

Wish I could be in NYC this week! After being a part of the Solaris team for a long, long time, it's great to see this finally almost here.

But, if you can be in NYC or catch the webcast, I hope you will.

Join Oracle executives Mark Hurd and John Fowler and key Oracle
Solaris Engineers and Execs at the Oracle Solaris 11 launch event in
New York, Gotham Hall on Broadway, November 9th and learn how you
can
build your infrastructure with Oracle Solaris 11 to:

The launch event will also feature exclusive content for our
in-person
audience including a session led by the VP of core Solaris
development
and his leads on Solaris 11 and a customer insights panel during
lunch.
We will also have a technology showcase featuring our latest systems
and Solaris technologies. The Solaris executive team will also be
there
throughout the day to answer questions and give insights into future
developments in Solaris.

Don't miss the Oracle Solaris 11 launch in New York on November 9.REGISTER
TODAY!

Just a note that the slides are now available from our ZFS in the cloud session at OpenWorld. Tom Shafron, CEO of Viewbiquity, and I presented on the new features in ZFS and how Viewbiquity is using them to provide a cloud-based data storage system.

From our abstract: Oracle Solaris ZFS, a key feature of Oracle Solaris, integrates the
concepts of a file system and volume management capabilities to deliver
simple, fast data management. This session provides a case study of
Viewbiquity, a provider of market-leading, innovative M2M platforms. Its
cloud-based platform integrates command and control, video, VoIP, data
logging and management, asset tracking, automated responses, and
advanced process automation. Viewbiquity relies on Oracle Solaris ZFS
and Oracle Solaris for fast write with integrated hybrid traditional and
solid-state storage capabilities, snapshots for backups, built-in
deduplication, and compression for data storage efficiency. Learn what
Oracle Solaris ZFS is and how you can deploy it in high-performance,
high-availability environments.

Friday Jul 29, 2011

For a long time, I have advocated that Solaris users adopt ZFS for root, storing the operating system in ZFS. I've also strongly advocated for using Live Upgrade as a patching tool in this case. The benefits are intuitive and striking, but are they actual and quantifiable?

Background

You can find a number of bloggers on BOC talking about the hows and whys of ZFS root. Suffice it to say that ZFS has a number of great qualities that make management of the operating system simpler, especially when combined with other tools like Live Upgrade. ZFS allows for the immediate creation of as many snapshots as you might want, simply by preserving the view of the filesystem meta-data and taking advantage of the fact that all writes in ZFS use copy-on-write, completely writing the new data before releasing the old. The gives us snapshots for free.

Like chocolate and peanut butter, ZFS and Live Upgrade are two great tastes that taste great together. Live Upgrade, traditionally was used just to upgrade systems from one release of Solaris (or update release) to another. However, in Solaris 10, it now becomes a tremendous tool for patching. With Live Upgrade (LU), the operating system is replicated in an Alternate Boot Environment (ABE) and all of the changes (patches, upgrades, whatever) are done to the copy of the OS while the OS is running, rather than taking the system down to apply maintenance. Then, when the time is right, during a maintenance window, the new changes are activated by rebooting using the ABE.

With this approach, downtime is minimized since changes are applied while the system is running. Moreover, there is a fall-back procedure since the original boot environment is still there. Rebooting again into the original environment effectively cancels out the changes exactly.

The Problem with Patching

Patching, generally speaking, is something that everyone knows they need to do, but few are really happy with how they do it. It's not much fun. It takes time. It keeps coming around like clockwork. You have to do it to every system. You have to work around the schedules of the applications on the system. But, it needs to be done. Sort of like mowing the grass every week in the summer.

Live Upgrade can take a lot of the pain out of patching, since the actual application of the patches no longer has to be done while the system is shut down. A typical non-LU approach for patches is to shut the system down to single-user prior to applying the patches. In this way, you are certain that nothing else is going on on the system and you can change anything that might need to be updated safely. But, the application is also down during this entire period. And that is the crux of the problem. Patching takes too long; we expect systems to be always available these days.

How long does patching take? That all depends. It depends on the number of changes and patches being applied to the system. If you have not patched in a very long time, then a large number of patches are required to bring the system current. The more patches you apply, the longer it takes.

It depends on the complexity of the system. If, for example, there are Solaris zones on the system for virtualization, patches applied to the system are automatically applied to each of the Zones as well. This takes extra time. If patches are begin applied to a shut-down system, that just extends the outage.

It's hard to get outage windows in any case and long outage windows are especially hard to schedule especially when the perceived benefit is small. Patches are like a flu shot. They can vaccinate you against problems that have been found, but they won't help if this year has a new strain of flu that we've not seen before. So, long outage across lots of systems are hard to justify.

So, How Long Does It Really Take

I have long heard people talk about how patching takes too long, but I've not measured it in some time. So, I decided to do a bit of an experiment. Using a couple of different systems, neither one very fast or very new, I applied the Solaris 10 Recommended patch set from mid-July 2011. I applied the patches to systems running different update releases of Solaris 10. This gives different numbers of patches that have to be applied to bring the system current. As far as procedure goes, for each test, I shut the system down to single-user (init S), applied the patches, and rebooted. The times listed are just the time for the patching, although the actual maintenance window in real-life would include time to shut down, time to reboot, and time to validate system operation. The two systems I used for my tests were an X4100 server with 2 dual-core Opteron processors and 16GB of memory and a Sun Fire V480 server with 4 UltraSPARC III+ processors. Clearly, these are not new systems, but they will show what we need to see.

System

Operating System

Patches Applied

Elapsed Time (hh:mm:ss)

X4100

Solaris 10 9/10

105

00:17:00

X4100

Solaris 10 10/09

166

00:26:00

X4100

Solaris 10 10/08

216

00:36:06

V480

Solaris 10 9/10

99

00:47:29

For each of these tests, the server is installed with root on ZFS and patches are applied from the Recommended Patchset via the command "./installpatch -d --<pw>" for whatever password this patchset has. All of this is done while the system was in single-user rather than while running multi-user.

It appears that clock speed is important when applying patches. The older V480 took three times as long as the X4100 for the same patchset.

And this is the crux of the problem. Even to apply patches to a pretty current system requires an extended outage. This does not even take into account the time required for whatever internal validation of the work done, reboot time, application restart time, etc. How can we make this better? Let's make it worse first.

More Complicated Systems Take Longer to Patch

Nearly a quarter of all production systems running Solaris 10 are deployed using Solaris Zones. Many more non-production systems may also use zones. Zones allow me to consolidate the administrative overhead of only having to patch the global zone rather than each virtualized environment. But, when applying patches to the global zone, patches are automatically applied to each zone in turn. So, the time to patch a system can be significantly increased by having multiple zones. Let's first see how much longer this might take, and then we will show two solutions.

System

Operating System

Number of Zones

Patches Applied

Elapsed Time (hh:mm:ss)

X4100

Solaris 10 9/10

2

105

00:46:51

X4100

Solaris 10 9/10

20

105

03:03:59

X4100

Solaris 10 10/09

2

166

01:17:17

X4100

Solaris 10 10/08

2

216

01:37:17

V480

Solaris 10 9/10

2

99

01:53:59

Again, all of these patches were applied to systems in single-user in the same way as the previous set. Just having two (sparse-root) zones defined took nearly three times as long as just the global zone alone. Having 20 zones installed took the patch time from 17 minutes to over three hours for even the smallest tested patchset.

How Can We Improve This? Live Upgrade is Your Friend

There are two main ways that this patch time can be improved. One applies to systems with or without zones, while the second improves on the first for systems with zones installed.

I mentioned before that Live Upgrade is very much your friend. Rather than go into all the details of LU, I would refer you to the many other blogs and documents on LU. Check out especially Bob Netherton's Blog for lots of LU articles.

When we use LU, rather than taking the system down to single-user, we are able to create a new alternate boot environment, using ZFS snapshot and clone capability, while the system is up, running in production. Then, we apply the patches to that new boot environment, still using the installpatchset command. For example, "./installpatchset -d -B NewABE --<pw>" to apply the patches into NewABE rather than the current boot environment. When we use this approach, the patch times that we saw before improve don't change very much, since the same work is being done. However, all of this is time that the system is not out of service. The outage is only the time required to reboot into the new boot environment.

So, Live Upgrade saves us all of that outage time. Customers who have older servers and are fairly out of date on patches say that applying a patch bundle can take more than four or five hours, an outage window that is completely unworkable. With Live Upgrade, the outage is reduced to the time for a reboot, scheduled when it can be most convenient.

Live Upgrade Plus Parallel Patching

Recently, another enhancement was made to patching so that multiple zones are patched in parallel. Check out Jeff Victor's blog where he explains how this all works. As it turns out, this parallel patching works whether you are patching zones in single-user or via Live Upgrade. So, just to get an idea of how this might help I tried to do some simple measurement with 2 and 20 sparse-root zones created on a system running Solaris 10 9/10.

System

Operating System

Number of Zones

Patches Applied

num_procs

Elapsed Time (hh:mm:ss)

X4100

Solaris 10 9/10

2

105

1

00:46:51

X4100

Solaris 10 9/10

2

105

2

00:36:04

X4100

Solaris 10 9/10

20

105

1

03:03:59

X4100

Solaris 10 9/10

20

105

2

01:55:58

X4100

Solaris 10 9/10

20

105

4

01:25:53

num_procs is used as a guide for the number of threads to be engaged in parallel patching. Jeff Victor's blog (above) and the man page for pdo.conf talk about how this relates to the actual number of processes that are used for patching.

With only two zones, doubling the number of threads has an effect, but not a huge effect, since the amount of parallelism is limited. However, with 20 zones on a system, boosting the number of zones patched in parallel can significantly reduce the time taken for patching.

Recall that all of this is done within the application of patches with Live Upgrade. Used alone, outside of Live Upgrade, this can help reduce the time required to patch a system during a maintenance window. Used with Live Upgrade, it reduces the time required to apply patches to the alternate boot environment.

So, what should you do to speed up patching and reduce the outage required for patching?

Use ZFS root and Live Upgrade so that you can apply your patches to an alternate boot environment while the system is up and running. Then, use parallel patching to reduce the time required to apply the patches to the alternate boot environment where you have zones deployed.

Monday May 25, 2009

Sun's Executive Briefing Center is on the road this week. We are visiting with customers in Cleveland, Columbus, and Detroit. Looks like a busy schedule and I am looking forward to the trip. I was asked to fill in at the Solaris Virtualization speaker for this trip.

We fly to Cleveland and fly home from Detroit. Kate has arranged a bus to get us from Cleveland to Columbus to Detroit. My wife calls it Geeks on a Bus and thought it sounded too scary to contemplate!

We'll be talking about Sun's Vision, Systems, Software, OpenStorage, Solaris, Virtualization of Systems, Desktop Virtualization, and Services to support all of these. Hope to see many of you there.

Saturday May 09, 2009

Last week, I blogged about a Jumpstart Survey. I've gotten good comments and some responses to the survey. It's been a week, but I want to collect some more responses before posting an analysis. Take a look at my previous blog and fill out the survey or comment on the blog. I will summarize and report in another week or so.

I'm doing briefings on DTrace and Solaris Performance Tools this week in Atlanta, Ft. Lauderdale, and Tampa. Click the links below to register if this is of interest and you can attend. These are pretty much a 2 1/2 to 3 hour briefing that stays pretty technical with lots of examples.

Jumpstart makes use of rules to decide how to install a particular system, based on its architecture, network connectivity, hostname,
disk and memory capacity, or any of a number of other parameters. The rules select a profile that determines what will be
installed on that system and where it will come from. Scripts can be inserted before and after the installation for further
customization. To help manage the profiles and post-installation customization, Mike Ramchand has produced a fabulous tool,
the Jumpstart Enterprise Toolkit (JET).

My Questions for You

As a long time Solaris admin, I have been a fan of Jumpstart for years and years. As an SE visiting many cool companies,
I have seen people do really interesting things with Jumptstart. I want to capture how people use Jumpstart in the real
world - not just the world of those who create the product. I know that people come up with new and unique ways of
using the tools that we create in ways we would never imagine.

For example, I once installed 600 systems with SunOS 4.1.4 in less than a week using Jumpstart - remember that Jumpstart
never supported SunOS 4.1.4.

But, I am not just looking for the weird stories. I want to know what Jumpstart features you use. I'll follow this up
with extra, detailed questions around Jumpstart Flash, WAN Boot, DHCP vs. RARP. But I want to start with just some basics
about Jumpstart.

Tuesday Dec 23, 2008

A Different Approach

A week or so ago, I wrote about a way to get around the current limitation of mixing flash and ZFS root in Solaris 10 10/08.
Well, here's a much better approach.

I was visiting with a customer last week and they were very excited to move forward quickly with ZFS boot in their Solaris 10
environment, even to the point of using this as a reason to encourage people to upgrade. However, when they realized that
it was impossible to use Flash with Jumpstart and ZFS boot, they were disappointed. Their entire deployment infrastructure
is built around using not just Flash, but Secure WANboot. This means that they have no alternative to Flash; the images deployed
via Secure WANBoot are always flash archives. So, what to do?

It occurred to me that in general, the upgrade procedure from a pre-10/08 update of Solaris 10 to Solaris 10 10/08 with a
ZFS root disk is a two-step process. First, you have to upgrade to Solaris 10 10/08 on UFS and then use lucreate
to copy that environment to a new ZFS ABE. Why not use this approach in Jumpstart?

Turns out that it works quite nicely. This is a framework for how to do that. You likely will want to expand on it, since
one thing this does not do is give you any indication of progress once it starts the conversion. Here's the general approach:

Create your flash archive for Solaris 10 10/08 as you usually would. Make sure you include all the appropriate LiveUpgrade
patches in the flash archive.

Use Jumpstart to deploy this flash archive to one disk in the target system.

Use a finish script to add a conversion program to run when the system reboots for the first time. It is necessary to make
this script run once the system has rebooted so that the LU commands run within the context of the fully built
new system.

Details of this approach

Our goal when complete is to have the flash archive installed as it always has been, but to have it running from a ZFS root
pool, preferably a mirrored ZFS pool. The conversion script requires two phases to complete this conversion. The first phase
creates the ZFS boot environment and the second phase mirrors the root pool. The following in this example, our flash archive
is called s10u6s.flar. We will install the initial flash archive onto the disk c0t1d0 and built our
initial root pool on c0t0d0.

We specify a simple finish script for this system to copy our conversion script into place:

cp ${SI_CONFIG_DIR}/S99xlu-phase1 /a/etc/rc2.d/S99xlu-phase1

You see what we have done: We put a new script into place to run at the end of rc2 during the first boot.
We name the script so that it is the last thing to run. The x in the name makes sure that this will
run after other S99 scripts that might be in place. As it turns out, the luactivate that we will
do puts its own S99 script in place, and we want to come after that. Naming ours S99x makes it happen later in the
boot sequence.

So, what does this magic conversion script do? Let me outline it for you:

Create a new ZFS pool that will become our root pool

Create a new boot environment in that pool using lucreate

Activate the new boot environment

Add the script to be run during the second phase of the conversion

Clean up a bit and reboot

That's Phase 1. Phase 2 has its own script to be run at the same time that finishes the mirroring of the root pool.
If you are satisfied with a non-mirrored pool, you can stop here and leave phase 2 out. Or you might prefer to make
this step a manual process once the system is built. But, here's what happens in Phase 2:

Delete the old boot environment

Add a boot block to the disk we just freed. This example is SPARC, so use installboot. For x86, you
would do something similar with installgrub.

Attach the disk we freed from the old boot environment as a mirror of the device used to build the new
root zpool.

Clean up and reboot.

I have been thinking it might be worthwhile to add a third phase to start a zpool scrub, which will force
the newly attached drive to be resilvered when it reboots. The first time something goes to use this drive, it will
notice that it has not been synced to the master drive and will resilver it, so this is sort of optional.

The reason we add bootability explicitly to this drive is because currently, when a mirror is attached to a root zpool,
a boot block is not automatically installed. If the master drive were to fail and you were left with only the mirror,
this would leave the system unbootable. By adding a boot block to it, you can boot from either drive.

So, here's my simple little script that got installed as /etc/rc2.d/S99xlu-phase1. Just to make the code a
little easier for me to follow, I first create the script for phase 2, then do the work of phase 1.

I think that this is a much better approach than the one I offered before, using ZFS send. This approach
uses standard tools to create the new environment and it allows you to continue to use Flash as a way to
deploy archives. The dependency is that you must have two drives on the target system. I think that's
not going to be a hardship, since most folks will use two drives anyway. You will have to keep then as separate
drives rather than using hardware mirroring. The underlying assumption is that you previously used SVM or VxVM
to mirror those drives.

So, what do you think? Better? Is this helpful? Hopefully, this is a little Christmas present for
someone! Merry Christmas and Happy New Year!

Friday Dec 05, 2008

Ancient History

Gather round kiddies and let Grandpa tell you a tale of how we used to to clone systems before we had Jumpstart and Flash,
when we had to carry water in leaky buckets 3 miles through snow up to our knees, uphill both ways.

Long ago, a customer of mine needed to deploy 600(!) SPARCstation 5 desktops all running SunOS 4.1.4. Even then, this was
an old operating system, since Solaris 2.6 had recently been released. But it was what their application required.
And we only had a few days to build and deploy these systems.

Remember that Jumpstart did not exist for SunOS 4.1.4, Flash did not exist for Solaris 2.6. So, our approach was to
build a system, a golden image, the way we wanted to be deployed and then use ufsdump to save the contents of the filesystems.
Then, we were able to use Jumpstart from a Solaris 2.6 server to boot each of these workstations. Instead of having a
Jumpstart profile, we only used a finish script that partitioned the disks and restored the ufsdump images.
So Jumpstart just provided us clean way to boot these systems and apply the scripts we wanted to them.

Solaris 10 10/08, ZFS, Jumpstart and Flash

Now, we have a bit of a similar situation. Solaris 10 10/08 introduces ZFS boot to Solaris, something that many of my
customers have been anxiously awaiting for some time. A system can be deployed using Jumpstart and the ZFS boot environment
created as a part of the Jumpstart process.

But. There's always a but, isn't there.

But, at present, Flash archives are not supported (and in fact do not work) as a way to install into a ZFS boot environment,
either via Jumpstart or via Live Upgrade. Turns out, they use the same mechanism under the covers for this. This is CR 6690473.

So, how can I continue to use Jumpstart to deploy systems, and continue to use something akin to Flash archives to speed
and simplify the process?

Build a "Golden Image" System

The first step, as with Flash, is to construct a system that you want to replicate. The caveat here is that you use ZFS for
the root of this system. For this example, I have left /var as part of the root filesystem rather than a separate
dataset, though this process could certainly be tweaked to accommodate a separate /var.

Once the system to be cloned has been built, you save an image of the system. Rather than using flarcreate, you will create a
ZFS send stream and capture this in a file. Then move that file to the jumpstart server, just as you would with a flash archive.

In this example, the ZFS bootfs has the default name - rpool/ROOT/s10s_u6wos_07.

How do I get this on my new server?

Now, we have to figure out how to have this ZFS send stream restored on the new clone systems. We would like to take advantage
of the fact that Jumpstart will create the root pool for us, along with the dump and swap volumes, and will set up all of the needed
bits for the booting from ZFS. So, let's install the minimum Solaris set of packages just to get these side effects.

Then, we will use Jumpstart finish scripts to create a fresh ZFS dataset and restore our saved image into it.
Since this new dataset will contain the old identity of the original system, we have to reset our system identity.
But once we do that, we are good to go.

So, set up the cloned system as you would for a hands-free jumpstart. Be sure to specify the sysid_config and install_config
bits in the /etc/bootparams. The manual Solaris 10 10/08 Installation Guide: Custom JumpStart and Advanced Installations
covers how to do this. We add to the rules file a finish script (I called mine loadzfs in this case) that will do the
heavy lifting. Once Jumpstart installs Solaris according to the profile provided, it then runs the finish script to finish up
the installation.

Here is the Jumpstart profile I used. This is a basic profile that installs the base, required Solaris packages into a ZFS pool
mirrored across two drives.

The finish script is a little more interesting since it has to create the new ZFS dataset, set the right properties, fill it up,
reset the identity, etc. Below is the finish script that I used.

#!/bin/sh -x
# TBOOTFS is a temporary dataset used to receive the stream
TBOOTFS=rpool/ROOT/s10u6_rcv
# NBOOTFS is the final name for the new ZFS dataset
NBOOTFS=rpool/ROOT/s10u6f
MNT=/tmp/mntz
FLAR=s10s_u6wos_07_flar.zfs
NFS=serverIP:/export/solaris/Solaris10/flash
# Mount directory where archive (send stream) exists
mkdir ${MNT}
mount -o ro -F nfs ${NFS} ${MNT}
# Create file system to receive ZFS send stream &
# receive it. This creates a new ZFS snapshot that
# needs to be promoted into a new filesystem
zfs create ${TBOOTFS}
zfs set canmount=noauto ${TBOOTFS}
zfs set compression=on ${TBOOTFS}
zfs receive -vF ${TBOOTFS} < ${MNT}/${FLAR}
# Create a writeable filesystem from the received snapshot
zfs clone ${TBOOTFS}@flar ${NBOOTFS}
# Make the new filesystem the top of the stack so it is not dependent
# on other filesystems or snapshots
zfs promote ${NBOOTFS}
# Don't automatically mount this new dataset, but allow it to be mounted
# so we can finalize our changes.
zfs set canmount=noauto ${NBOOTFS}
zfs set mountpoint=${MNT} ${NBOOTFS}
# Mount newly created replica filesystem and set up for
# sysidtool. Remove old identity and provide new identity
umount ${MNT}
zfs mount ${NBOOTFS}
# This section essentially forces sysidtool to reset system identity at
# the next boot.
touch /a/${MNT}/reconfigure
touch /a/${MNT}/etc/.UNCONFIGURED
rm /a/${MNT}/etc/nodename
rm /a/${MNT}/etc/.sysIDtool.state
cp ${SI_CONFIG_DIR}/sysidcfg /a/${MNT}/etc/sysidcfg
# Now that we have finished tweaking things, unmount the new filesystem
# and make it ready to become the new root.
zfs umount ${NBOOTFS}
zfs set mountpoint=/ ${NBOOTFS}
zpool set bootfs=${NBOOTFS} rpool
# Get rid of the leftovers
zfs destroy ${TBOOTFS}
zfs destroy ${NBOOTFS}@flar

When we jumpstart the system, Solaris is installed, but it really isn't used. Then, we load from the send stream
a whole new OS dataset, make it bootable, set our identity in it, and use it. When the system is booted, Jumpstart
still takes care of updating the boot archives in the new bootfs.

On the whole, this is a lot more work than Flash, and is really not as flexible or as complete. But hopefully, until
Flash is supported with a ZFS root and Jumpstart, this might at least give you an idea of how you can replicate systems
and do installations that do not have to revert back to package-based installation.

Many people use Flash as a form of disaster recover. I think that this same approach might be used there as well. Still
not as clean or complete as Flash, but it might work in a pinch.

So, what do you think? I would love to hear comments on this as a stop-gap approach.