The IPS System Repository

I’m excited about today’s launch of Solaris 11 – I’ve been contributing to Solaris for quite a while now, pretty much since 1996, but my involvement in S11 has been the most fun I’ve had in all releases so far.

Today, I’m going to talk about the system repository and how I helped.

How zones differ from earlier releases

Zones that use IPS are different than those in Solaris 10, in that they are always full-root: every zone contains its own local copy of each package, they don’t inherit packaged content from the global zone as "sparse" zones did in Solaris 10.

This simplifies a lot of zone-related functionality: for the most part, administrators can treat a zone as if it were a full Solaris instance, albeit a very small one. By default new zones in S11 are tiny. However, packaging with zones is a little more complex, and the system aims to hide that complexity
from users.

Some packages in the zone always need to be kept in sync with those packages in the global zone. For example, anything which delivers a kernel module and a userland application that interfaces with it must be kept in sync between the global zone and any non-global zones on the system.

In earlier OpenSolaris releases, after each global-zone update, each non-global zone had to be updated by hand, attaching and detaching each zone. During that detach/attach the ipkg brand scripts determined which packages were now in the global zone, and updated the non-global zone accordingly.

In addition, in OpenSolaris, the packaging system itself didn’t have any way of ensuring that every publisher in the global zone was also available in the non-global zone, making updates difficult if switching publishers.

Zones in Solaris 11

In Solaris 11, zones are now first-class citizens of the packaging system. Each zone is installed as a linked image, connected to the parent image, which is the global zone.

During packaging operations in the global zone, IPS recurses into any non-global zones to ensure that packages which need to be kept in sync between the global and non-global zones are kept in sync.

For this to happen, it’s important for the zone to have access to all of the IPS repositories that are available from the global zone.

This is problematic for a few reasons:

the zone might not be on the same subnet as the global zone

the global-zone administrator might not want to distribute SSL keys/certs for the repos to all zone administrators

The System Repository

The system repository, and accompanying zones-proxy services was our solution to the list of problems above.

The SMF Services responsible are:

svc:/application/pkg/system-repository:default

svc:/application/pkg/zones-proxyd:default

svc:/application/pkg/zones-proxy-client:default

The first two services run in the global zone, the last one runs in the non-global zones.

With these services, the system repository shares publisher configuration to all non-global zones on the system, and also acts as a conduit to the publishers configured in the global zone. Inside the non-global zone, these proxied global-zone publishers are called system publishers.

When performing packaging operations inside a zone that accesses those publishers, Solaris proxies access through the system repository. While proxying, the system repository also caches any file-content that was
downloaded. If there are lots of zones all downloading the same packaged content, that will be efficiently managed.

Implementation

If you don’t care about how all this works behind the scenes, then you can stop reading now.

There’s three parts to making all of the above work, apart from the initial linked image functionality that Ed worked on, which was fundamental to all of the system repository work.

IPS client/repository support

Zones proxy

System repository

IPS client/repository support

Brock managed the heavy lifting here. This work involved:

defining an interchange format that IPS could use to pass publisher configuration between the global and non-global zones

refreshing the system repository service on every parent image publisher change

allowing local publisher configuration to merge with system publisher configuration

ensuring that system-provided publishers could not have their order changed

allowing an image to be created that has no publishers

toggling use of the system publisher

Zones proxy

The zones proxy client, when started in the non-global zone creates a socket which listens on an inet port on 127.0.0.1. It passes the file descriptor for this socket to the zones proxy daemon via a door call.

The zones proxy daemon then listens for connections on the file descriptor. When the zone proxy daemon receives a connection, it proxies the connection to the system repository.

This allows the zone to access the system repository without any additional networking configuration needed (which I think is pretty neat – nicely done Krister!)

System repository

The system repository itself consists of two components:

A Python program, /usr/lib/pkg.sysrepo

A custom Apache 2.2 instance

Brock initially prototyped some httpd.conf configurations, and I worked on the code to write them automatically, produce the response that the system repository would use to inform zones of the configured publishers, and also worked out how to proxy access to file-based publishers in the global zone, which was an interesting problem to solve.

When you start the system-repository service in the global zone, pkg.sysrepo(1) determines the enabled, configured publishers then creates a response file served to non-global zones that want to discover the publishers configured in the global zone. It then uses a Mako template from /etc/pkg/sysrepo/sysrepo_httpd.conf.mako to generate an Apache configuration file.

The configuration file describes a basic caching proxy, providing limited access to the URLs of each publisher, as well as allowing URL rewrites to serve any file-based repositories. It uses the SSL keys and certificates from the global zone, and allows proxies access to those from the non-global zone over http.
(remember, data served by the system repository between the zone and non-global zone goes over the zones proxy socket, so http is fine here: access from the proxy to the publisher still goes over https)

The system repository service then starts an Apache instance, and a daemon to keep the proxy cache down to its configured maximum size. More detail on the options available to tune the system repository are in pkg.sysrepo(1) man page.

Result?

The practical upshot of all this, is that all zones can access all publishers configured on the global zone, and if that configuration changes, the zones publishers automatically change too. Of course, non-global zones can add their own publishers, but aren’t allowed to change the order, or disable any system
publishers.

Personally, I’ve found this capability to be incredibly useful. I work from home, and have a system with an internet-facing non-global zone, and a global zone accessing our corporate VPN. My non-global zone is able to securely access new packages when it needs to (and I get to test my own code at the same time!)

Performing a pkg update from the global zone ensures that all zones are kept in sync, and will update all zones automatically (though, as mentioned in the Zones administration guide, pkg update <list of packages> will simply update the global zone, and ensure that during that update only the packages that cross the kernel/userland boundary are updated in each zone.)

Working on zones and the system repository was a lot of fun – hope you find it useful.

Advertisements

Like this:

LikeLoading...

Related

Post navigation

19 thoughts on “The IPS System Repository”

After registering the Solaris support repository with the SSL cert and key files generated at https://pkg-register.oracle.com/register/status/, the system-repository service no longer starts. Without this service running, I am not able to create any new zones.

It looks like the Apache SSL Proxy config (SSLProxyMachineCertificateFile) does not like the cert and/or key files that I used to generated.

That’s interesting. Do those same keys/certs work to install content from the support repository into the global zone using pkg(1)? The system repository worked fine using the SSL keys/certificates from our internal repo, but I’ll see if I can generate support keys/certs and try to reproduce.

In the meantime, a workaround would be to use pkgrecv(1) to pull down a local copy of the repository, then configure the global zone to use that instead, which will then allow zones to be created.

I was able to reproduce this with newly-generated certificates for the support repository. Interestingly, certificates I generated earlier this year for the public support repository worked just fine. I’m investigating to see if anything has changed in the certificate generation process that could have triggered this.

I am experiencing a problem similar to Jorge; however, I’m not trying to create new zones. What I’ve noticed is that when the system-repository service isn’t running an existing zone’s connection to the pkg service in the global zone is lost preventing updates in the zone. Also, if you detach the zone and try to re-attach you’ll get an error. I managed to recover by rolling back to my old keys but the recovery path isn’t obvious.

I got into this because my old Solaris Express keys expire in 24 days and I need to update my keys. If this is happening to me, it will probably happen to others so you’re going to likely to need a fix sooner rather than later.

With that fix in place, I’ve been able to install a zone from the support repository. Hacking the key like this is generally not a good idea, and while it appears to work for both pkg(1) and the system-repository service, openssl ‘-check’ does report errors on the key.

We’re investigating how to fix this, with the likely solution being to make the pkg-register webapp produce keys in a format that both pkg(1) and the system-repository service can cope with, which will mean others won’t run into this problem in future.

The webapp has now been fixed. You should be able to re-download the same keys that you generated before from https://pkg-register.oracle.com, and they’ll be converted on the fly to the format that Apache expects. That is a better solution than the workaround I posted above.

Tim – Thank you for resolving issue with the keys. I downloaded the updated key and managed to get my system with soon-to-expire keys updated. I had do do a little experimenting to determine what needs to be done inside the zone. It appears that the key doesn’t get automatically updated inside the zone and therefore you have to go through the key update process both in the global zone and in the non-global zones. I wasn’t able to find documentation explaining the process for updating the keys when you have zones.

If you’re dealing with an S11 system (that is, snv_175b) then you don’t need to do anything in the zone: pkg(1) operations in the zone will go through the system repository, using its key to access the support repository. Your zones won’t need to be configured to access the support repository separately, they’ll get all of their publisher configuration automatically. For example, in your zone, you would see:

On Solaris 11 Express (snv_151), you’re correct – the system repository didn’t exist back then, so keys needed to be manually propagated across zones and publisher or key changes had to be done by hand across all zones.

Tim,
I was using 175b (Solaris 11 11/11) and when I updated the keys in the global zone I ceased getting the messages stating your key is expiring in the global zone when I did a “pkg refresh” after they key update but I continued to get that message in the non-global zones. So it appears that at least some pkg operations were continuing to use the old key in the non-global zone. I fixed that by applying the key change command in the non-global zone which cleared the message. Hopefully, that did no harm and the zone will still get updates via the linkage from the global zone.

Should I only have the one publisher line. My zones have two. One with the proxy: prefix and one without:

nice post about the system repository (IPS) associated with local & global zones. the main question which comes to my mind; are these designed with the idea that zones can move (manually or automatically via a clustering software) between servers ( ie between different global zones).
How will the zones proxy react to a different system repository when the local zone is attached to a different global zone or another server. (what will happen to the SSL certs )
are these IPS and system repositories designed (and tested ) with the concept that zones can move between global zones

So yes, zones can be moved between systems using the “zoneadm attach/detach” commands, and the design of the system repository doesn’t hamper this functionality in any way.

Providing the zones are being moved between systems that have the same publishers configured in the respective global zones and those global zones have the correct packages installed to support each non-global zone, then this will work just fine. On attaching a zone, the packaging system will ensure that the zone is capable of running on that global zone, and will report an error otherwise.

The SSL certificates used by the system repository are resident in each global zone, so there’s nothing to do there when moving zones.

Any locally configured https-based origins in the non-global zone (ie. ones not going through the system repository) will have their own copies of SSL keys/certs, and those will be moved along with the zone.

It is really annoying/bad design, that the proxy stuff has no notion of lazy init. E.g. it delays shutting down zones (usually because the server in the GZ already has been shutdown or went into maintainance mode – e.g. because it could reach its repos in time). In the same way [machine|zone] startup gets delayed considerably, e.g. if the repos are not yet reachable. Here the pkg proxy server svc goes in maintenance, and even worse, the proxy-client svcs of all zones go into maintenance mode as well (because the GZ server is not responsive). What a nightmare: when the connection to the repos is possible again (who knows when …), one needs to clear the server in the GZ and travel through all zones and clear the proxy-client svcs – what a pain!

So the proxy idea is good, but the implementation is really bad.
I guess, if a web browser would take 5 min to start only because of a web site in its bookmark list isn’t reachable, most people would throw it away immediately …

We’ve addressed some of the startup performance problems in 11.1 by caching known-good system-repository configuration once we get it (the problem is, we have to follow all HTTP 302 redirects from the pkg(5) origins in order to setup the RewriteRules we rely on to get the correct proxy configuration. If we have sufficient numbers of publishers configured, this was enough to run us into the SMF timeout, which we can hit before the in-method timeout code fires.)

That said, if the system-repository isn’t available for zones on startup, we’ll get cascading problems anyway, such as SMF services in zones that rely on pkg(5) operations failing mysteriously. By having zone-proxy-client fail, at least we’re pin-pointing the problem early on – a lazy startup might result in race-conditions or spurious errors from those method scripts (sometimes they work, sometimes they don’t) which would be even worse imho.

I’m not dodging the comment though, I know we’ve more work to do here, and the user experience could definitely be better. Thanks for bugging me about it :-)

I don’t know all the details inside. However, a normal human being assumes, that the install/setup/update makes sure, that the system is properly installed and thus there should be no need at all, to contact any external repository at startup/shutdown at all. All possible questions about whether a package is installed, where, what facets etc. should be answerable using the local install db. If there is a strange service, which relies on external repo[s] for whatever strange reason (any real examples?) it should be able to deal with errors, i.e. e.g. going into sleep and try again later.

So to avoid the really annoying startup delays and svc maintenance mode chaos, why not start the sysrepo server immediately with a config, which just answers all “unsatisfiable” requests with 503 (Service Unavailable)? Meanwhile a 2nd process may try to create the required final config and if done, it just needs to HUP the repo server to pick up the new config and exit …

OK, the real question is, why the rewrites are required at all, but that’s probably above my head ;-)

I understand. For any pkg(5) queries connected to packages installed on the system, and for queries for a list of packages available to be installed at a given time, all answers can be obtained without network access, but for queries like “which package delivers this file?” we need network access, so go to lengths to make sure it’s available.

As I say, we’re looking at ways to improve resilience of the service.

Rewrites are needed at the moment because we don’t know what the upstream network looks like, in terms of how the HTTP protocol is being used to connect us to the repository (the system-repository is implemented at present using an Apache proxy server, and a series of fun and exciting RewriteRule directives – see /system/volatile/pkg/sysrepo/* and /etc/pkg/sysrepo/* for the gory details )