Upgrading Solaris 10 with Zones on ZFS
From Docupedia
Date: 9/30/2008
Contents |
Introduction
Right away, I'd like to point out that my intended audience is experienced sysadmins. The problem addressed in this article isn't one you should EVER find yourself in with a system that is in revenue-generating production, and the workaround is NOT suitable for use on a machine that might hurt your cashflow if it goes up in flames. On the other hand, it's an enlightening excercise with Zones and if you, like me, find yourself trying to fix some machines for a department that will at worst leave already-committed customers somewhat unimpressed if their internal machines have to be rebuilt from scratch, this might be the thing you've been looking for.
Some required knowledge ahead of time:
- ZFS
- Solaris Zones
- Installing/configuring Solaris 10
The Problem
Way back in ancient history (June 2006) Sun Microsystems rolled this great new volume management / filesystem / software RAID technology, ZFS, out with their latest official release of Solaris 10. I've lost the link to the original post that mentioned it, but at the time one of the developers who was ecstatic about it wrote a post on his public blog enumerating a bunch of GREAT IDEAS for how we could all take advantage of the power of ZFS. One of his examples was placing a Solaris Zone's root within a ZFS volume and then using that zone as a template for deploying more zones rapidly--- the advantage of ZFS here was that you could clone a ZFS volume in no-time and consume nearly no extra disk space in the process.
Whoops! That wasn't a recommendation, it was just an observation! We came to find when the next update of Solaris was released (November 2006) that it was impossible to upgrade a system that had zones with their roots on ZFS (this is still the case as of September 2008[1]). The reasons are a little complex, but the gist of it is that the Solaris Upgrade procedure won't declare a zone properly upgraded unless the patch/package/etc. sets are all 'exactly' the same as the global zone, and having a zone under a ZFS root makes that impossible.
There's some work being done in OpenSolaris to make it possible to back-out all patches on a zone which, if my understanding is correct, will make this possible in the future but it's not here yet (link?). As of the writing of this article (September 2008) there is still no support for upgrading zones created in this fashion planned for the next release of Solaris 10 (scheduled for October 2008[2]), so we're left with only a few options.
Our Current Options
- Detach the zones, copy them into a UFS area, re-attach them, and perform the upgrade. I haven't tried this, but I imagine it'd work pretty well if you have that much space left to stick a UFS filesystem into.
- Don't upgrade and hope that Solaris 10u7 (10/08 aka U6 will *not*) will support upgrading normally.
- Serious hackery. This is what this article is about.
A Hackish Workaround
On with the serious hackery!
Caveats
First, this workaround does not actually upgrade the zones, it upgrades the underlying system. If your zones inherit most of the libraries and binaries from the global zone then they'll still gain most of the benefit of the upgrade, but the truth is that your zones are going to be left in an inconsistent state. If you can't live with this, you need to rebuild the system and the zones from scratch.
The Workaround
- For each zone do:
- Save the zone's configuration:
zonecfg -z <zone name> export > <zone name>.cfg
- Save this .cfg file somewhere safe (i.e. not on this machine)
- Detach the zone:
zoneadm -z <zone name> detach
- Save the zone's configuration:
- Export the entire ZPool that the zones were living in:
zpool export <zpool name>
- Rebuild the host OS, being sure not to destroy your ZPool
- Build a sample minimal zone on system
- Detach the sample minimal zone
- Within the sample zone's root you'll now find a file named "SUNWdetached.xml"; save / set this aside.
- Delete the sample minimal zone (or keep it, whatever, but we're done with it for now).
- Import the ZPool back into the system:
zpool import <zpool name>
- Re-attach the zones from the old system to the new system (here comes the hackery). For each zone:
- Create a new SUNWdetached.xml for each zone using the template from the sample zone that we saved earlier. This is the real hack. For each zone:
- Copy/paste to the side the lines from (inclusive) the <zone ...> line up to (not including) the first <package ...> line
- Overwrite the SUNWdetached.xml file in the zone's root with the template we saved from the sample zone
- Edit the new SUNWdetached.xml file, overwriting those first lines with the ones we saved from the original file
- Leave all other lines identical to those in sample zone's SUNWdetached.xml
- Here's an example of our template file:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE zone PUBLIC "-//Sun Microsystems Inc//DTD Zones//EN" "file:///usr/share/lib/xml/dtd/zon ecfg.dtd.1"> <!-- DO NOT EDIT THIS FILE. Use zonecfg(1M) and zoneadm(1M) attach. --> <zone name="testzone" zonepath="/data/zones/testzone" autoboot="false"> <inherited-pkg-dir directory="/lib"/> <inherited-pkg-dir directory="/platform"/> <inherited-pkg-dir directory="/sbin"/> <inherited-pkg-dir directory="/usr"/> <package name="SUNWocfd" version="11.10.0,REV=2005.01.21.15.53"/> <patch id="125095-15"/> <patch id="128010-10"/> <package name="SUNWlucfg" version="11.10,REV=2007.03.09.13.13"/> (snipped...) - Here's an example of our hacked-up file based on the template:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE zone PUBLIC "-//Sun Microsystems Inc//DTD Zones//EN" "file:///usr/share/lib/xml/dtd/zonecfg.dtd.1"> <!-- DO NOT EDIT THIS FILE. Use zonecfg(1M) and zoneadm(1M) attach. -- > <zone name="nobby" zonepath="/data/zones/nobby" autoboot="false"> <inherited-pkg-dir directory="/lib"/> <inherited-pkg-dir directory="/platform"/> <inherited-pkg-dir directory="/sbin"/> <inherited-pkg-dir directory="/usr"/> <dataset name="zonepool/nobby_pool"/> <network address="10.88.3.183/24" physical="bge0"/> <device match="/dev/lofictl"/> <device match="/dev/lofi/*"/> <device match="/dev/rlofi/*"/> <package name="SUNWocfd" version="11.10.0,REV=2005.01.21.15.53"/> <patch id="125095-15"/> <patch id="128010-10"/> <package name="SUNWlucfg" version="11.10,REV=2007.03.09.13.13"/> (snipped...)
- Here's an example of our template file:
- Restore/import zonecfg, a la
# zonecfg -z <zone name> -f zoneconfig.cfg
- Attach the zone, a la
# zoneadm -z <zone name> attach
- Boot zone into single user mode:
# zoneadm -z <zone name> boot -s && zlogin -C <zone name>
- Fix InetD connection_backlog default
In one of the updates between 6/06 and 5/08, a default value was added to the SMF configuration for inetd named "connection_backlog". Since our zones haven't been properly upgraded, they won't have this default set properly and we need to do it manually (not setting this causes rpc/gss sub-service of inetd to fail and consequently the system will not boot normally).- We can see the problem as follows:
-bash-3.00# inetadm -l rpc/gss SCOPE NAME=VALUE name="100234" endpoint_type="tli" proto="ticotsord" isrpc=TRUE rpc_low_version=1 rpc_high_version=1 wait=TRUE exec="/usr/lib/gss/gssd" user="root" default bind_addr="" default bind_fail_max=-1 default bind_fail_interval=-1 default max_con_rate=-1 default max_copies=-1 default con_rate_offline=-1 default failrate_cnt=40 default failrate_interval=60 default inherit_env=TRUE default tcp_trace=FALSE default tcp_wrappers=FALSE Error: Property connection_backlog is missing and has no defined default value. - And we can fix the problem as follows:
# inetadm -M connection_backlog=10
- We can see the problem as follows:
- Reboot the zone.
- Create a new SUNWdetached.xml for each zone using the template from the sample zone that we saved earlier. This is the real hack. For each zone:
- ???
- PROFIT!!!
