mf@ircam.fr (Michel Fingerhut) (11/05/90)
Summary: The chpt utility does not check for blatantly incorrect partioning of a disk. Neither do any of the file system checking utilities, nor, apparently, the disk driver (and/or the error log mechanism). Moral: see below. Description: I happened to set the top of the last partition on an ra90 connected to a 5820 (really a 5810) under ultrix 4.0 *beyond* the physical size of the disk by a couple of hundred sectors. Now all sorts of bad things happened. a chpt did not complain (it could have warned me, eg by looking at the info in /etc/disktab or otherwise). Well, it was happy. b newfs didn't either. it made a file system on that partition, with a map comprising the inexistant blocks. Funny, it takes as argument the disk type, it could have looked into disktab and find the size of the disk, and then say: last partition too big, do you really mean this, Michael? Well, it did not. c a file was apparently created with blocks taken from that inexistent pool (sounds like an Italo Calvino title, if you ask me). Every time an access was made to those (inexistant) blocks, an error occurred and an attempt was made to replace them (and that failed too). The bad block table had olso been overwritten. d The uerf messages were cryptic -- so much so that it led DEC to believe the disk was physically damaged (rather than a stupid software problem) and they replaced it. Messages were: DISK TRANSFER ERROR DATA ERROR INVALID HEADER BAD BLK REPL ATTMPT REPLACEMENT FAILURE, INCONSISTENT RCT MEDIA FORMAT ERROR RCT CORRUPTED e elcsd got so many messages that it ate all the cpu time, finding barely the time to announce every second on the console the loss of about 2000 messages to the ErrorLog. I had to abort the machine and boot single user so as to gain any sort of control. f Using ncheck and iclr I removed all inodes pointing to these inexistant blocks, repartitioned, ran fsck several times, and after it announced it was happy, the system crashed when I first created a directory. Moral: Don't trust any program to have safeguards. Read the man pages and decide among contradictory info which is the one you trust, believe or understand. Eg., ra(4) says about the c partition of an ra90: disk start length ... ra?c 0 2409680 ... while /etc/disktab says: :pc#2376153:bc#8192:fc#1024:\ This is a significant difference. The difference between a happy file system and nights of crashes, fscks, backups and restores.
mjr@hussar.dco.dec.com (Marcus J. Ranum) (11/05/90)
In article <1990Nov4.163016.3492@ircam.ircam.fr> mf@ircam.fr (Michel Fingerhut) writes: > >The chpt utility does not check for blatantly incorrect partioning of a disk. >Neither do any of the file system checking utilities, nor, apparently, the >disk driver (and/or the error log mechanism). Moral: see below. I agree that it would be nice if they'd complain, or at least politely point out: "you say this is an RA81, are you sure it has 43,553,433 blocks?" but I don't think that it should be *enforced* - have you ever tried to hook a foreign disk up to a machine that rigidly enforces drive parameters ? Ick. chpt -q shows you the overlaps of various partitions - watch it compulsively when partitioning. I believe the idea is to use the default system tools for "ordinary" installs, and if you need to tweak the partitions you're assumed to be a guru-type. I don't think that approach is "wrong", really, since most people are probably perfectly happy with the default partitioning. The UNIX systems administrator's toolset is traditionally pretty bad in catching typos (I type "chpt -d /dev/rra0a" instead of "-q" and...) but I don't think ULTRIX is the only guilty party here. I was really impressed with the graphical disk partition/filesystem editor in SunOs 2.X-3.X install procedure, but it's been discontinued. I don't know for sure why, but I wonder if it might be because they wound up having lots of users calling because they were having so much fun laying out their file systems in some bizarre layout that they wound up having to redo them in 2 months. (and of course that's always the vendor's fault, since they provided the tool) First time I fooled with Sun's install, the result was one seriously messed-up system. I had a good time playing with it, though. mjr.