IPS changes in Solaris 11.1, brought to you by the letter ‘P’

I thought it might be a good idea to put together a post about some of the IPS changes that appear in Solaris 11.1. To make it more of a challenge, everything I’m going to talk about here, begins with the letter ‘P‘.

Performance

We’ve made great progress in speeding up IPS. I think performance bugs tend to come in a few different flavours: difficult to solve or subtle bugs, huge and obvious ones, bugs that can be solved by doing tasks in parallel and bugs that are really all about the perception of performance, rather than actual performance. We’ve come across at least one of each of those flavours during the course of our work on 11.1.

Shawn and Brock spent time digging into general packaging performance, carefully analyzing the existing code and testing changes to improve performance and reduce memory usage. Ultimately, their combined efforts resulted in a 30% boost to pkg(5) performance across the board, which I think was pretty impressive.

Other performance bugs were much easier to spot and fix. For example, 'pkg history' performance on systems with lots of boot environments was attrocious: my laptop with 1796 pkg history entries was taking 3 minutes to run 'pkg history' with S11 IPS bits, and after the fix, the command runs in 11 seconds, another good performance improvement, albeit one of lesser significance.

I’ll mention some other performance fixes in the next two sections.

Parallel zones

Apart from trying to perform operations more quickly, a typical way to address performance problems is to make the system faster by doing things in parallel. In this case, in the previous release, 'pkg update' in a global zone that contains many non-global zones was quite slow because we worked on one zone at a time. For S11.1, Ed did some excellent work to add the ‘-C‘ flag to several pkg(1) subcommands, allowing multiple zones to be updated at once.

Ed’s work wasn’t simply just to perform multiple operations in parallel, but also to improve what was being done along the way – it was a lot of change, and it was well worth it.

With the work we’ve done in the past on the system-repository, these parallel updates are network-efficient, with caching of packaged content for zones being provided by the system repository.

Progress-tracking

Sometimes you can make a system appear faster by making the user interface provide more feedback on what is being performed. Dan added some wonderful new progress tracking code to all of the pkg(5) tools, changing the tools to use that API.

So, if the older "Planning /-|-\ " spinner was frustrating you, then you’ll definitely enjoy the changes here. It’s hard to show an example of the curses-terminal-twiddling in this blog post, so here’s what you’d see when piping the output (the progress tracking code can tell when it’s talking to a terminal, and it adjusts the output accordingly):

Proxy configuration

I suppose this could also be seen as a performance bug (though the link is tenuous, I admit)

Behind the scenes, pkg(5) tools use libcurl to provide HTTP and HTTPS transport facilities, and we inherit the support that libcurl provides for web proxies. Typically a user would set a $http_proxy environment variable before running their IPS command.

At home, I run a custom web-proxy, through which I update all of my Solaris development machines (most of my systems reside in NZ, but many of my repositories are in California, so using a local caching proxy is a big performance win for me)

Now, I could use pkgrecv(1) to pull updates to a local repository every build, and while this is great for users who want to maintain a “golden master” repository, it’s not an ideal solution for a user like me who updates their systems every two weeks: the upstream repository tends to have a bunch of packages that I will never care about, I’m unlikely to ever need to worry about sparc binaries at home, and I’m never sure which packages I’ll want to install, so I prefer the idea of a transparent repository cache, than having to populate and maintain a complete local repository.

Unfortunately, quite often I’d find myself forgetting to set $http_proxy before running ‘pkg update‘, and I’d end up using more bandwidth than I needed to, and when using repositories that were only accessible with different proxies, things tended to get a bit messy.

So, to scratch that itch, we came up with the "--proxy" argument to "pkg set-publisher", which allows us to associate proxies with origins on your system. The support is provided at the individual origin level, so you can use different proxies for different URLs (handy if you have some publishers that live on the internet, and others that live on your intranet)

To make things easier for zones administrators, the system-repository inherits that configuration automatically, so there’s no need to set the ‘config/http_proxy‘ option in the SMF service anymore (however, if you do set it, the service will use that value to override all --proxy settings on individual origins)

As part of this work, we also changed the output of "pkg publisher", removing those slightly confusing "proxy://http://foobar" URIs. Now, in a non-global zone, we show something like this:

This particular zone is one that’s running on a system which has a HTTP origin and a file-based origin in the global zone, and a HTTP origin that has been manually added to the nonglobal zone. The “P” column indicates whether a proxy is being used for each origin (“T” standing for “true”, indicating HTTP access going through the system repository, and “F” standing for “false”, showing the file-based publisher being served directly from the system-repository itself, as well as the zone-specific repository running on port 8080 in that zone)

We print more details about the configuration using the "pkg publisher <publisher>" command:

P5p archive support and zones

This isn’t related to performance (unless you count a completely missing piece of functionality as being a particularly severe form of performance bug!) When implementing the system-repository for S11, we ran out of runway and had to impose a restriction on the use of “p5p” archives when the system had zones configured. This work lifts those restrictions.

The job of the system-repository is to allow the zone to access all of the pkg(5) repositories that are configured in the global zone, and to ensure that any changes in the publisher configuration in the global zone are reflected in every non-global zone automatically.

To do this, it uses a basic caching proxy for HTTP and HTTPS-based publishers, and a series of Apache RewriteRule directives to provide access to the file-based repositories configured in the global zone.

P5p files were more problematic: these are essentially archives of pkg(5) repositories that can be configured directly using ‘pkg set-publisher‘. The problem was, that no amount of clever RewriteRules would be able to crack open a p5p archive, and serve its contents the the non-global zone.

We considered a few different options on how to provide this support, but ended up with a solution that uses mod_wsgi (which is now in Solaris, as a result) to serve the contents directly. See /etc/pkg/sysrepo/sysrepo_p5p.py if you’re interested in how that works, but there’s no administrator interaction needed when using p5p archives, everything is taken care of by the system-repository service itself.

Pruning and general care-taking

According to hg(1), we’ve made 209 putbacks containing 276 bug fixes and RFEs to the pkg-gate since S11. So aside from all of the performance and feature work mentioned here, Solaris 11.1 comes with a lot of other IPS improvements – definitely a good reason to update to this release.

If you’re running on an Illumos-based distribution and you don’t have these bits in your distribution, I think now would be an excellent time to sync your hg repositories and pull these new changes. Feel free to ping us on #pkg5 on irc.freenode.net if you’ve any questions about porting, or anything else really – we’re a friendly bunch.

Per-BE /var subdirectories (/var/share)

OK, that’s a slightly contrived name for this feature (only used here so it could begin with ‘P’) We’ve been calling this “separate /var/share” while it was under development.

Technically, this isn’t an IPS change, it’s a change in the way we package the operating system, but it’s a concrete example of one of the items in the IPS developer guide on how to migrate data across directories during package operations using the ‘salvage-from‘ attribute for ‘dir‘ actions.

This change moves several directories previously delivered under /var onto a new dataset, rpool/VARSHARE, allowing boot environments to carry less baggage around as part of each BE clone, sharing data where that makes sense. Bart came up with the mechanism and prototype to perform the migration of data that should be shared, and I finished it off and managed the putback.

For this release, the following directories are shared:

/var/audit

/var/cores

/var/crash (previously unpackaged!)

/var/mail

/var/nfs

/var/statmon

Have a look at /lib/svc/method/fs-minimal to see how this migration was performed. Here’s what pkg:/system/core-os looks like when delivering actions that salvage content:

As part of this work, we also wrote a new section 5 man page, datasets(5) which is well worth reading. It describes the default ZFS datasets that are created during installation, and explains how they interact with system utilities such as swap(1M), beadm(1M), useradd(1M), etc.

Putting the dev guide on docs.oracle.com

Finally, it’s worth talking a bit about the devguide. We wrote the IPS Developer Guide in time for the initial release of Solaris 11, but didn’t quite make the deadline for the official docs.oracle.com documentation release, leading us to publish it ourselves on OTN and opensolaris.org. Since then, we’ve had a complaints about the perceived lack of developer documentation for IPS, which was unfortunate.

So, for Solaris 11.1, Alta has converted the guide into Docbook, and done some cleanup on the text (the content is largely the same) and it will be available on docs.oracle.com in all its monochrome glory.

I think that’s all of the Solaris 11.1 improvements I’ll talk about for now – if you’ve questions on any of these, feel free to add comments below, mail us on pkg-discuss or pop in to #pkg5 to say hello. I’ll update this post with links to the official Solaris 11.1 documentation once it becomes available.

I just realized I forgot to credit Bart for coming up with the mechanics and prototype of the /var/share work, so I’ve updated the post above. And given that I’ve mentioned everyone else on the team, I should also give props to Danek for his advice and thorough code reviews, and to Erik for keeping ipkg on it’s little rubber feet :-)

You will need to update to the SRU bits before being able to update to 11.1 since there’s a boot-loader change which needs certain updated binaries on the live-image performing the update.

However the 11.1 bits are not yet available in the /release repository, so there’s nothing to update to at the time of writing. It’s possible that when the bits are made available, the latest SRU bits will also be placed in the /release repository, but I’m not sure if that has been decided yet.

Sure: this ‘pkg publisher’ output is what you’d see in a non-global zone.

The first origin here, marked as “system-repository”, is one that is being proxied through the system repository running in the global zone, accessing a repository elsewhere (you can see a “T” in the “proxy” column)

The second one is also going through the system-repository, but in this case, it’s being served directly by the system repository itself – likely a file-based repository in the GZ itself. (there’s an “F” in the proxy column, indicating this isn’t being proxied)

The last origin is one that was added locally to the non-global zone. It doesn’t have “system-repository” as the URI of its origin, so it’s going direct to a repository that’s configured on port 8080 in the non-global zone.

The example shows the different ways non-global zones can access pkg(5) repositories. Does that make sense?

They’re all marked as “(syspub)” to denote that the publisher, “solaris”, that each of these origins provides packages for [remember, one publisher can be served by many URIs] is one that’s been configured in the global zone, and so has restrictions on the pkg(5) operations that can be performed on it by a non-global zone administrator.

I am having a really hard time trying to get my none-global zone to get the oracle support updates I keep getting the internal 500 error. I have a zone that is configured to handle a local repo file based and in the Global zone I used the –proxy to access our proxy along with setup for the pkg.oracle.com/solaris/support but its just not getting to the NGZ.

Have a look in the logs for the system-repository Apache instance in the GZ, /var/log/pkg/sysrepo/* to see what errors are being reported by the service when it’s returning http 500 errors. I assume the GZ is able to reach the publisher correctly (‘pkg refresh ‘ returning successfully in the GZ is enough to demonstrate that it is)

Check that the zones-proxy-client SMF service is running in the zone and and zones-proxyd and system-repository are running in the global zone.

Finally, check that the system-repository ‘config/*_proxy’ SMF properties is not set, since those will override any –proxy options set in the GZ publisher configuration.

As a last resort, you can dig around the Apache sysrepo_httpd.conf configuration generated on behalf of the system-repository at /system/volatile/pkg/sysrepo to see whether anything’s amiss there.

Is there a way to populate the ‘Alias’ item which is so tantalisingly displayed by pkg publisher ? I’d like all the publishers (local, network) to co-exist and to be able to manage each one by name, without having to remove or replace them. This seems to be difficult to do at the moment. For instance, if I have a local repo and the default one, how do I even set which one is preferred since they’re both called ‘solaris’…at the moment all I can see to do is remove the default one. Alternatively is there a way to rename my local one to ‘solaris-local’ and not have pkg server barf?

So publishers with multiple origins can peacefully co-exist on Solaris clients. The packaging system tries to reach all origins, then builds a combined catalog of all the packages available from all origins for a given publisher. The transport system will optimize for the fastest origin if all packages are available from all origins.

Does that answer your question? You can’t disable specific origins, though you can remove ones you don’t want to use (pkg set-publisher -G oldorigin publishername)

While I’m on it, what are best practices for providing package services for different release and SRU levels of S11? In production environments I’m not going to be automatically updating anything without signoff. I can’t see how to do this easily with IPS. Presumably it can, but the documentation seems to be predicated on latest and greatest everything at all times. I don’t offhand see how to provide lifecycle rollout from dev, through test and then to production, where all three could be on slightly different releases.

1. maintain a local “golden master” repository inside your network, which you periodically populate using pkgrecv(1) from pkg.oracle.com (probably two repositories, one staging, one production) Point all of your clients at your local master repository, then they can only upgrade when you make updated bits available within your network.

2. Keep using pkg.oracle.com, but restrict clients to versions that they’re allowed to access using either by running ‘pkg freeze’ on each machine, or by publishing and installing your own package which has ‘incorporate’ dependencies on packages at specific versions (presumably allowing some leeway for updated versions of packages within the same version-stream for security fixes, etc.)

In the Old Days (TM) you would download Recommended every three months or so and chuck it at a test server for regression (unless of course you had a situation which required something more drastic). When all was well, which it usually was, you’d unleash it on the unsuspecting users. How do we emulate this happy situation, which was admittedly quite braindead but very, very easy to get your head around, and also required little or no supporting apparatus? No customer I know wants automatic patching–this is not linux or god forbid windows.

Right, that’s what I’m saying: you can maintain an internal staging repository, which you mirror from pkg.oracle.com as/when you want to. You point your test systems at that. Then, when you want your production systems to be able to update to new releases, pkgrecv bits from that staging repository into your internal golden-master repository, which those production clients are already pointed at.

There’s no requirement to have all your production systems pointing at the repositories at pkg.oracle.com: it’s entirely your choice.