[comp.sys.sun] Moving from 4.0.3c to 4.1.1 without doing an install

hedrick@athos.rutgers.edu (Charles Hedrick) (03/28/91)

A few people have asked whether it's really necessary to do a full install
to bring up 4.1.1.  The answer is no, sort of.  I thought I'd outline the
procedure we're following at Rutgers to move from 4.0.3c to 4.1.1.

Rutgers' situation is a bit unusual, because we use an automated software
distribution system to keep software up to date on several hundred Suns.
If at all possible, we try to avoid doing installs, because this requires
a staff member to take each system down and hack on it, whereas if we can
get our software distribution system to install things, it happens at 4am
without interfering with anybody.  So we've come up with a plan to move
incrementally to 4.1.1 without doing an install.

First, we found that you can run a 4.1.1 kernel on a system that has
4.0.3c software, with a very few exceptions.  Once you adjust a few pieces
of software, you will have a set of software that allows you to use either
a 4.0.3c or 4.1.1 kernel, simply by changing kernels (and /usr/kvm, if you
care about ps, etc.)  Here's the minimum set of things we found we had to
adjust:

/usr/bin:

  sh  -  also /sbin/sh.  The 4.0.3c version of sh will not run
	scripts under 4.1.1, which means that /etc/rc doesn't
	run, etc.  The 4.1.1 version of sh works fine under 4.0.3c,
	so we just moved to that on all systems.

  mt - "mt status" uses an ioctl that was changed incompatibly in
	4.1.1.  Most sites could probably live with a non-functional
	mt status while they are doing the transition.  It happens
	that our backup scripts need it.  We produced a version of
	mt that tries the 4.1.1 method and backs up to the 4.0.3c
	method if that fails.  This is really a kernel bug.  The
	4.1.1 ioctl simply has a longer argument block.  There's
	no reason it couldn't accept the 4.0.3c size block as well
	and just not fill in the extra information.

/usr/lib:

  libc.so.*  - In order to run software built under a 4.1.1 system,
	and from the 4.1.1 distribution, we installed the 4.1.1
	version of libc, including libc.so.  They work fine under
	4.0.3c.  Note that the distributed version of libc does
	not have encryption by default.  If you use "des", etc.
	make sure you get the additional encryption option.

  ld.so ldconfig - these have moved from /usr/kvm to /usr/lib in
	4.1.1.  The 4.0.3c versions do not work under 4.1.1 on
	all architectures.  (I believe the problem was with
	sun3 only.)  The 4.1.1 works fine under 4.0.3c, so we
	just moved to it everywhere.

/usr/etc:

  in.telnetd and in.rlogind must be upgraded to the 4.1.1 version.
	This is because of a slight change in the tty code,
	which requires a setpgrp(0,0) in places where you could
	get away with not having it before.  The 4.1.1 version
	works fine under 4.0.3c (though according to CERT you
	should make sure to get a new version of in.telnetd that fixes a
	security problem).  [We have tried the 4.1.1 version of
	in.telnetd and it does work.  However the telnetd we are
	actually using is from Berkeley.  We had to make a couple
	of patches to get it to work on both 4.0.3c and 4.1.1.
	As Berkeley distributes it, you must decide at compile
	time which release you are going to run it on.  We want
	the same image to work on both versions.]

  ping - timeouts don't work if you use the 4.0.3c version
	under 4.1.1.  The 4.0.3c version depends upon
	software interrupts interrupting a system call in
	circumstances where it doesn't happen under 4.1.1.
	The 4.1.1 version uses a new facility to explicitly 
	request that behavior.  It works fine under both 4.0.3c 
	and 4.1.1.

/etc:

  fstab - if you have your default swap partition listed, remove
	it or comment it out.  /etc/rc does swapon -a.  This
	will attempt to add all swap partitions listed in
	/etc/fstab.  In theory it's OK to list the default
	swap partition.  swapon -a should say "partition already
	in use" and ignore it.  A lot of sites put it in fstab
	simply as documentation.  Under 4.1.1 something obscure
	happens that typically doesn't show up until you try
	to back the system up or do something else that uses
	a lot of memory.  At that point the system may crash
	with a fairly obscure panic.  This bug was documented
	by Columbia for 4.1.  It appears to happen still under
	4.1.1.  At least we were seeing daily crashes, which
	went away when we commented out that /etc/fstab entry
	for our default swap partition.  It's fine to remove
	the entry for 4.0.3c systems as well.  It was never
	needed.

Of course in addition to this, you'll need to change /usr/kvm and the
programs that depend upon it, such as ps.  However we can live without ps
for a few days.  Thus we make just the changes described above to all
systems.  This gets us into a position where we can bring up 4.1.1 just by
changing kernels (and doing MAKEDEV or mknod if the system has any non-Sun
devices whose major numbers have changed).

Once we are happy with the way 4.1.1 is running on a system, we change
/usr/kvm and related programs.  Note that which programs are in /usr/kvm
has changed between 4.0.3c and 4.1.1.  For the moment we've merged them.
I.e. anything that is in /usr/kvm in either version is in /usr/kvm for us,
so the symlinks are the same for 4.0.3c and 4.1.1.  Then we just have to
exchange /usr/kvm to go between them.

4.1.1 has changed the way terminal I/O is done in init.  The 4.0.3c init
will still work with 4.1.1, as long as you are using your old /etc/rc*.
(However there's some reason to think you may not be able to type ^C while
the system is booting to abort individual commands.) However eventually
you'll want to replace your /etc/rc, /etc/rc.local, etc., with the new
ones.  They've reorganized them in a fairly nice way.  When you change
/etc/rc* (or if you have a SS2 where they are preinstalled), you'll need
to move to the 4.1.1 version of init.

The 4.0.3c version of fsck and other file system utilities appear to work
fine under 4.1.1, as long as you keep your old file system.  Eventually
you'll want to dump your files to tape, do a newfs, and bring them back.
When you do a newfs under 4.1.1 you get file systems in a new format which
will have better performance.  By the time you do this, you'll need the
4.1.1 version of fsck, newfs, mkfs, etc.  But you can put this off until
you're ready to commit to 4.1.1 permanently.  4.1.1 can handle old file
systems fine, and as long as you are using an old file system, the old
fsck will work.

One comment about the new fsck.  It's got a handy option, -c, for
converting between 4.0.3c and 4.1.1 file system formats.  (However as the
installation manual explains, it is not always possible to go back from
the new to the old format, depending upon the file system parameters.)  We
found one unexpected thing about fsck -c.  I'm accustomed to having fsck
scan the disk, but not actually do anything until the end (or when it
finds an error).  fsck -c changes the superblock immediately, but changes
the free list at the end.  So if you ^C in the middle of the operation,
you get a file system that is very confused.  It will work, but you tend
to get crashes.  Running fsck -c again will unconfuse it, fortunately.  It
is safe to run fsck -c just to see whether the disk is in new or old
format.  It starts by asking you whether you want to convert it, and the
way it asks the question tells you what the current format is.  (If it
asks whether you want to convert to the new format, you know it's
currently in the old format.)  As long as you ^C when it asks that
question, it hasn't made any changes.  But once you tell it to go ahead,
it changes the superblock.