hedrick@athos.rutgers.edu (Charles Hedrick) (03/28/91)
A few people have asked whether it's really necessary to do a full install to bring up 4.1.1. The answer is no, sort of. I thought I'd outline the procedure we're following at Rutgers to move from 4.0.3c to 4.1.1. Rutgers' situation is a bit unusual, because we use an automated software distribution system to keep software up to date on several hundred Suns. If at all possible, we try to avoid doing installs, because this requires a staff member to take each system down and hack on it, whereas if we can get our software distribution system to install things, it happens at 4am without interfering with anybody. So we've come up with a plan to move incrementally to 4.1.1 without doing an install. First, we found that you can run a 4.1.1 kernel on a system that has 4.0.3c software, with a very few exceptions. Once you adjust a few pieces of software, you will have a set of software that allows you to use either a 4.0.3c or 4.1.1 kernel, simply by changing kernels (and /usr/kvm, if you care about ps, etc.) Here's the minimum set of things we found we had to adjust: /usr/bin: sh - also /sbin/sh. The 4.0.3c version of sh will not run scripts under 4.1.1, which means that /etc/rc doesn't run, etc. The 4.1.1 version of sh works fine under 4.0.3c, so we just moved to that on all systems. mt - "mt status" uses an ioctl that was changed incompatibly in 4.1.1. Most sites could probably live with a non-functional mt status while they are doing the transition. It happens that our backup scripts need it. We produced a version of mt that tries the 4.1.1 method and backs up to the 4.0.3c method if that fails. This is really a kernel bug. The 4.1.1 ioctl simply has a longer argument block. There's no reason it couldn't accept the 4.0.3c size block as well and just not fill in the extra information. /usr/lib: libc.so.* - In order to run software built under a 4.1.1 system, and from the 4.1.1 distribution, we installed the 4.1.1 version of libc, including libc.so. They work fine under 4.0.3c. Note that the distributed version of libc does not have encryption by default. If you use "des", etc. make sure you get the additional encryption option. ld.so ldconfig - these have moved from /usr/kvm to /usr/lib in 4.1.1. The 4.0.3c versions do not work under 4.1.1 on all architectures. (I believe the problem was with sun3 only.) The 4.1.1 works fine under 4.0.3c, so we just moved to it everywhere. /usr/etc: in.telnetd and in.rlogind must be upgraded to the 4.1.1 version. This is because of a slight change in the tty code, which requires a setpgrp(0,0) in places where you could get away with not having it before. The 4.1.1 version works fine under 4.0.3c (though according to CERT you should make sure to get a new version of in.telnetd that fixes a security problem). [We have tried the 4.1.1 version of in.telnetd and it does work. However the telnetd we are actually using is from Berkeley. We had to make a couple of patches to get it to work on both 4.0.3c and 4.1.1. As Berkeley distributes it, you must decide at compile time which release you are going to run it on. We want the same image to work on both versions.] ping - timeouts don't work if you use the 4.0.3c version under 4.1.1. The 4.0.3c version depends upon software interrupts interrupting a system call in circumstances where it doesn't happen under 4.1.1. The 4.1.1 version uses a new facility to explicitly request that behavior. It works fine under both 4.0.3c and 4.1.1. /etc: fstab - if you have your default swap partition listed, remove it or comment it out. /etc/rc does swapon -a. This will attempt to add all swap partitions listed in /etc/fstab. In theory it's OK to list the default swap partition. swapon -a should say "partition already in use" and ignore it. A lot of sites put it in fstab simply as documentation. Under 4.1.1 something obscure happens that typically doesn't show up until you try to back the system up or do something else that uses a lot of memory. At that point the system may crash with a fairly obscure panic. This bug was documented by Columbia for 4.1. It appears to happen still under 4.1.1. At least we were seeing daily crashes, which went away when we commented out that /etc/fstab entry for our default swap partition. It's fine to remove the entry for 4.0.3c systems as well. It was never needed. Of course in addition to this, you'll need to change /usr/kvm and the programs that depend upon it, such as ps. However we can live without ps for a few days. Thus we make just the changes described above to all systems. This gets us into a position where we can bring up 4.1.1 just by changing kernels (and doing MAKEDEV or mknod if the system has any non-Sun devices whose major numbers have changed). Once we are happy with the way 4.1.1 is running on a system, we change /usr/kvm and related programs. Note that which programs are in /usr/kvm has changed between 4.0.3c and 4.1.1. For the moment we've merged them. I.e. anything that is in /usr/kvm in either version is in /usr/kvm for us, so the symlinks are the same for 4.0.3c and 4.1.1. Then we just have to exchange /usr/kvm to go between them. 4.1.1 has changed the way terminal I/O is done in init. The 4.0.3c init will still work with 4.1.1, as long as you are using your old /etc/rc*. (However there's some reason to think you may not be able to type ^C while the system is booting to abort individual commands.) However eventually you'll want to replace your /etc/rc, /etc/rc.local, etc., with the new ones. They've reorganized them in a fairly nice way. When you change /etc/rc* (or if you have a SS2 where they are preinstalled), you'll need to move to the 4.1.1 version of init. The 4.0.3c version of fsck and other file system utilities appear to work fine under 4.1.1, as long as you keep your old file system. Eventually you'll want to dump your files to tape, do a newfs, and bring them back. When you do a newfs under 4.1.1 you get file systems in a new format which will have better performance. By the time you do this, you'll need the 4.1.1 version of fsck, newfs, mkfs, etc. But you can put this off until you're ready to commit to 4.1.1 permanently. 4.1.1 can handle old file systems fine, and as long as you are using an old file system, the old fsck will work. One comment about the new fsck. It's got a handy option, -c, for converting between 4.0.3c and 4.1.1 file system formats. (However as the installation manual explains, it is not always possible to go back from the new to the old format, depending upon the file system parameters.) We found one unexpected thing about fsck -c. I'm accustomed to having fsck scan the disk, but not actually do anything until the end (or when it finds an error). fsck -c changes the superblock immediately, but changes the free list at the end. So if you ^C in the middle of the operation, you get a file system that is very confused. It will work, but you tend to get crashes. Running fsck -c again will unconfuse it, fortunately. It is safe to run fsck -c just to see whether the disk is in new or old format. It starts by asking you whether you want to convert it, and the way it asks the question tells you what the current format is. (If it asks whether you want to convert to the new format, you know it's currently in the old format.) As long as you ^C when it asks that question, it hasn't made any changes. But once you tell it to go ahead, it changes the superblock.