[comp.sys.dec] Partioning disks -- warning...

mf@ircam.fr (Michel Fingerhut) (11/05/90)

Summary:

The chpt utility does not check for blatantly incorrect partioning of a disk.
Neither do any of the file system checking utilities, nor, apparently, the
disk driver (and/or the error log mechanism).  Moral: see below.

Description:

I happened to set the top of the last partition on an ra90 connected to
a 5820 (really a 5810) under ultrix 4.0 *beyond* the physical size of
the disk by a couple of hundred sectors.  Now all sorts of bad things happened.

    a  chpt did not complain (it could have warned me, eg by looking at
       the info in /etc/disktab or otherwise).  Well, it was happy.

    b  newfs didn't either.  it made a file system on that partition, with
       a map comprising the inexistant blocks.  Funny, it takes as argument
       the disk type, it could have looked into disktab and find the
       size of the disk, and then say: last partition too big, do you really
       mean this, Michael?  Well, it did not.

    c  a file was apparently created with blocks taken from that inexistent
       pool (sounds like an Italo Calvino title, if you ask me).  Every
       time an access was made to those (inexistant) blocks, an error occurred
       and an attempt was made to replace them (and that failed too).  The
       bad block table had olso been overwritten.

    d  The uerf messages were cryptic -- so much so that it led DEC to believe
       the disk was physically damaged (rather than a stupid software problem)
       and they replaced it.  Messages were:

		
			DISK TRANSFER ERROR
			DATA ERROR
			INVALID HEADER

			BAD BLK REPL ATTMPT
			REPLACEMENT FAILURE, INCONSISTENT RCT

			MEDIA FORMAT ERROR
			RCT CORRUPTED

     e  elcsd got so many messages that it ate all the cpu time, finding barely
	the time to announce every second on the console the loss of about
	2000 messages to the ErrorLog.  I had to abort the machine and boot
	single user so as to gain any sort of control.

     f  Using ncheck and iclr I removed all inodes pointing to these inexistant
	blocks, repartitioned, ran fsck several times, and after it announced it
	was happy, the system crashed when I first created a directory.

Moral:

Don't trust any program to have safeguards.  Read the man pages and decide
among contradictory info which is the one you trust, believe or understand.
Eg., ra(4) says about the c partition of an ra90:

	   disk    start   length
	   ...
	   ra?c    0       2409680
	   ...

while /etc/disktab says:

	   :pc#2376153:bc#8192:fc#1024:\

This is a significant difference.  The difference between a happy file system
and nights of crashes, fscks, backups and restores.

mjr@hussar.dco.dec.com (Marcus J. Ranum) (11/05/90)

In article <1990Nov4.163016.3492@ircam.ircam.fr> mf@ircam.fr (Michel Fingerhut) writes:
>
>The chpt utility does not check for blatantly incorrect partioning of a disk.
>Neither do any of the file system checking utilities, nor, apparently, the
>disk driver (and/or the error log mechanism).  Moral: see below.

	I agree that it would be nice if they'd complain, or at least politely
point out: "you say this is an RA81, are you sure it has 43,553,433 blocks?"
but I don't think that it should be *enforced* - have you ever tried to hook
a foreign disk up to a machine that rigidly enforces drive parameters ? Ick.
chpt -q shows you the overlaps of various partitions - watch it compulsively
when partitioning.

	I believe the idea is to use the default system tools for "ordinary"
installs, and if you need to tweak the partitions you're assumed to be a
guru-type. I don't think that approach is "wrong", really, since most people
are probably perfectly happy with the default partitioning.

	The UNIX systems administrator's toolset is traditionally pretty bad
in catching typos (I type "chpt -d /dev/rra0a" instead of "-q" and...) but
I don't think ULTRIX is the only guilty party here. I was really impressed
with the graphical disk partition/filesystem editor in SunOs 2.X-3.X install
procedure, but it's been discontinued. I don't know for sure why, but I
wonder if it might be because they wound up having lots of users calling
because they were having so much fun laying out their file systems in some
bizarre layout that they wound up having to redo them in 2 months. (and
of course that's always the vendor's fault, since they provided the tool)
First time I fooled with Sun's install, the result was one seriously messed-up
system. I had a good time playing with it, though.

mjr.