[net.unix-wizards] Disk partitioning -- some thoughts

dave@murphy.UUCP (09/29/86)

Summary: Special partitions for bad-block areas and partitioning info, etc.
Line eater: enabled

A couple of ideas for improving the disk-partitioning scheme: in addition to
the regular data partitions, why not have a partition which would consist
of the reserved area at the beginning of the disk where the partition table
(and maybe the first-level bootstrap)?  This would allow a user-level program
to be written while could be used to alter the partitioning of disks
(unmounted, of course) while the system runs.  We have some machines here
on which downtime allowed for such things is scarce, and it would be nice
to be able to do this function, newfs each of the new partitions, and then
bring the disk online while other people are working -- especially when
formatting new packs for removable disks.  In addition, another special
partition could include the bad-block remapping area, although I'm not sure
why anyone would need this.  I do agree that none of the standard a-h
partitions should include reserved areas like partition tables or
alternate cylinders for remapping.

Also, in regard to the partition-table scheme that Chris Torek is working
on: I can think of one thing that would be nice to have in a partition
table that I haven't seen in any implementation yet.  This is a flag bit
that indicates whether or not a particular partition is a swap partition.
This way, the system could search for swap partitions at boot time, instead
of having it configured in statically.  Also, the swapon system call could
check the bit and refuse to start swapping/paging on a partition if the
bit isn't set; this would help eliminate accidents where a data partition
is accidentally used for swap, particularly on removable packs.
---
It's been said by many a wise philosopher that when you die and your soul
goes to its final resting place, it has to make a connection in Atlanta.

Dave Cornutt, Gould Computer Systems, Ft. Lauderdale, FL
UUCP:  ...{sun,pur-ee,brl-bmd}!gould!dcornutt
 or ...!ucf-cs!novavax!houligan!dcornutt
ARPA: wait a minute, I've almost got it...

"The opinions expressed herein are not necessarily those of my employer,
not necessarily mine, and probably not necessary."

gershon@ccicpg.UUCP ( Gershon Shamay) (10/15/86)

Just to add some spices to the discussion. At CCI we implemented what we
call a 'dynamic partitioning' scheme (on CCI's 6/32 machines).
The implementation is simple. The partition info is kept on the
drive itself, in the first block (at least 1k) which is not used
for any bootstrap program. The table contains information for each
partition, like where it begins, how big it is and even type (this way
the same drive can have BSD or AT&T or swap or... type of partitions,
mixed and independent of each other). There's also drive-specific
information like size ( # of cylinders, # of heads, etc). Plus data like
RPM, sector size and so on - the delight of newfs and the like.
Most of this table is put on the drive by the formatter program.
Now for the fun part of it.
1) A special user-level program allows root to change any data, thus
	effectively re-partitioning the drive. Anyone can view the table.
2) A special ioctl to the disk driver allows root to WRITE this table.
	Any other writes to this area are quietly ignored.
	That's how dd's and the like CAN"T ruin the drive info.
	You even get an error message if something is currently
	mounted on this drive.
3) The special program is the only one that uses this ioctl.
4) When Unix comes up, the disk driver reads the table off the drive.
	From here on, it knows everything it needs about it. No need
	to re-compile the kernel or define ANYTHING about drives. Any
	new drive type is automatically supported by the same kernel.
5) The same ioctl will happily read the table for you. Programs
	like newfs (or mkfs) etc just use the library routines to read
	disk info. We changed the library to ask the driver first and then
	go to /etc/disktab. So the on-drive table has precedence over
	any nonsense one may have in disktab. If there's no table
	on the drive (can happen, backward compatibility, etc) THEN
	/etc/disktab is used.
6) The on-drive table keeps a nickname you assign to he drive
	(a string of chars). At bootstrap time, the OS prints on
	the console all the drive types it finds connected, including
	the nicknames (if any). Handy for people who forget.

So far, this scheme is working great. We got rid once and for all of the
hard-wired tables in the kernel and the need to send out a new version
of Uni*x to support new drive types. The same scheme should work fine
for removable packs, too.

The only 'pure' problem we had to consider with this scheme was point
#2 above (ignoring writes to the first block). AHA ! There goes the
semantic of the raw disk device. True to a point. In real life, we
considered it not to be a problem. All the writes to the raw disk at
block #0 were found to be 'dd' types where one wants to keep a copy
of an 'a' partition for disasters. Now if one does that, he should
better have both partitions being the same size. On top of which,
he usually doesn't want to ruin the rest of the destination
drive anyway. Bottom line - even in this case there's no reason
to wipe out the on-drive table. And writes to the block device
never go to block #0 (reserved for bootstrap, courtesy of Uni*x 
history).

					Gershon Shamay
						CCI
					Computer Products Group

chris@umcp-cs.UUCP (Chris Torek) (11/02/86)

In article <132@ccicpg.UUCP> gershon@ccicpg.UUCP ( Gershon Shamay) writes:
>2) A special ioctl to the disk driver allows root to WRITE [the
>partition] table.  Any other writes to this area are quietly ignored.

I am not sure I like that.  Noisily ignored, or quietly overwritten,
but quietly ignored . . . ?  It does not feel right.

>	That's how dd's and the like CAN"T ruin the drive info.
>	You even get an error message if something is currently
>	mounted on this drive.

Then it is not quiet.

4.4BSD (or 9BSD or whatever the next release is called---assuming
there *is* a next release) is likely to have a very similar scheme.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu