[net.unix-wizards] disk partitioning

smb@mimsy.umd.EDU (Steve M. Burinsky) (09/03/86)

I have two questions regarding disk partitioning.  Although I think
the questions are generic, I am dealing with a RA60 and a RA81.  First,
some background for my questions.  (I hope I'm not boring anyone).

>From "diskpart (8)":
     The disk partition sizes are based on the total amount of
     space on the disk as give in the table below (all values are
     supplied in units of 512 byte sectors).  The `c' partition
     is, by convention, used to access the entire physical disk,
     including the space reserved for the bad sector forwarding
     table.  In normal operation, either the `g' partition is
     used, or the `d', `e', and `f' partitions are used.  The `g'
     and `f' partitions are variable sized, occupying whatever
     space remains after allocation of the fixed sized parti-
     tions.

>From "bad144 (8)":
     Replacement sectors are allocated starting with the first
     sector before the bad sector information and working back-
     wards towards the beginning of the disk.  A maximum of 126
     bad sectors are supported.  The position of the bad sector
     in the bad sector table determines which replacement sector
     it corresponds to.  The bad sectors must be listed in
     ascending order.

     The bad sector information and replacement sectors are con-
     ventionally only accessible through the ``c'' file system
     partition of the disk.  If that partition is used for a file
     system, the user is responsible for making sure that it does
     not overlap the bad sector information or any replacement
     sectors.

I want to use use the "c" partitions of my disks for form one large file
system per disk.  If this is not acceptable, I will be resizing the
default partition sizes to better fit my needs.  Here are my questions:

1.  Under 4.2, are there advantages/disadvantages to one large file system
versus many smaller file systems?  What about quotas and file system
efficiency?  My understanding is that small file systems are a relic
of 16-bit machines which could only handle 64k inodes per file system.

2.  If I use the "c" partition, how do I account for/leave enough space for
the bad sector information and replacement sectors?

3.  If I resize the partitions, how do I account for/leave enough space for
the bad sector information and replacement sectors?

4.  This may be a silly question, but I can't figure it out.  The background
info above says that you can access the bad sector information and 
replacement sectors only through the "c" partition.  Well, if I do a disk-
to-disk copy using the "c" partition, am I copying one disk's bad sectors and
bad sector information to the other disk?!

I would appreciate any help I could get on this matter.  Thanks in advance.
You are to be congratulated for reading this rather lengthy message.

Steve Burinsky
smb@mimsy

alan@ecrcvax.UUCP (Alan P. Sexton) (09/10/86)

My own setup is 4 eagles and an rm05 on a SI 9900 on a vax 785 running
4.2BSD. We have sources but the following can be done without if you
are not afraid to use adb on the kernel (I've done it on a uvaxII without
sources but it is somewhat different as the MSCP stuff changes things
(especially some of the following arguements)).

Steve Burinsky writes <3496@brl-smoke.ARPA>:
> 1.  Under 4.2, are there advantages/disadvantages to one large file system
> versus many smaller file systems?  What about quotas and file system
> efficiency?  My understanding is that small file systems are a relic
> of 16-bit machines which could only handle 64k inodes per file system.

The 4.2 fast file system attempts to keep individual files in single cylinder
groups so that in a large file system the file blocks shouldn't spread very
much over the whole system (thus large file systems (LFS) not a disadvantage).
LFS take longer to back up and I prefer to do backups in smaller stages
(if something goes wrong in backing up 2 small file systems there is
a 50% probability that you only have to redo the backup for one of them)
The quota system is an efficiency overhead itself -  with smaller file systems
you may be able to put logical groups of users onto their own partitions and
so do away with quotas (Group A has 100 Mb to share between themselves - they
can work without their space being affected by any other groups and can manage
their own space among themselves - This has turned out to be much more
flexible than quotas for us while still remaining reasonably fair.
A disadvantage of many small file systems is that there
is a maximum of 16 FS's mountable at any one time (This however is easily
fixed if you have sources - look in the Building Systems with Config paper
in the section on Data Structure Sizing Rules). Naturally the feasibility
of the above (and following) suggestions depends very much on your environment.

> 2.  If I use the "c" partition, how do I account for/leave enough space for
> the bad sector information and replacement sectors?

On my system (admittedly a System Industries driver but I believe it is
the same for standard 4.2) the bad block table is on the last track. To
be precise there is a copy of the table on every even numbered sector on the
last track of the last cylinder of each disk. The replacement blocks then
stretch backwards (on an eagle taking up another 2 and a bit tracks, on an
rm05 taking up an extra 4 tracks). I suggest rounding up to the nearest
track. Thus total bad block overhead on an eagle is the last 4 tracks
and on an rm05 is the last 5 tracks on the last cylinder of the disk.

> 3.  If I resize the partitions, how do I account for/leave enough space for
> the bad sector information and replacement sectors?

The c partition should cover the whole disk INCLUDING the bad block data
at the end of the disk. As for the rest - in your driver there is a table
called disktype_sizes (where disktype is the name of your type of disk).
this contains the length of each disk partion in sectors and the number
of the first cylinder of that disk partition on the disk (all disk partitions
start on a cylinder boundary). Simply change these values to whatever is
desired but just have no partition (except c) cover the bad block data.
Note 1. /etc/disktab should be changed accordingly.
Note 2. Remember that your new kernel may not be able to read your old
	data if you change the current partition set up
Note 3. Don't decrease the size of the a partition as you might not
	then be able to boot new 4.n unices from tape without zapping
	all your data.
Note 4. Don't forget to use the new tables in the boot stuff (/sys/stand)
	as your old standalone programs will still only recognise
	the old disk partitions.
> 
> 4.  This may be a silly question, but I can't figure it out.  The background
> info above says that you can access the bad sector information and 
> replacement sectors only through the "c" partition.  Well, if I do a disk-
> to-disk copy using the "c" partition, am I copying one disk's bad sectors and
> bad sector information to the other disk?!
> 
Not at all a silly question!!! when doing a c partition to c partition copy
you must specify the size of the transfer (I presume you are using dd) so that
it DOES NOT include the bad block data. To be safe you could define some
partition to be the whole disk EXCEPT the bad block data and use that for disk
to disk transfers. In particular, aside from the data loss, if you zap the bad
block table it is not easy to get it back unless you have a printout of what
blocks were in it. You may have to use an extensive disk tester (painful) with
(in my case) the error recovery mechanism on the disk controller switched
off (call in the technician) and still not find a quarter of the bad blocks.

On my system I have made major disk reorganisation changes and have been
running with them for several months now without problems. The relevant
tables from the disk driver is as follows:

	/* alan's changes: 30 cyl on rm05 == 19 cyl on eagle = 1 segment
	so root(a) = 2 segs, data = 12 seg (e&f) or 24 seg (d). g partition is
	what is left of last cylinder when bb table ( = 1 track) and replacement	blocks ( = 4 tracks are removed). This partition should hold *RAW* info
	for backups. swap part (b) = 13Mb = 1.4 segs to take up leftover space
	h is the whole disk except for the bb stuff. */
   rm05_sizes[8] = {
	36480,    0,            /* A=cyl   0 thru 59  =  60 cyl */
	25536,   60,            /* B=cyl  60 thru 101 =  42 cyl */
	500384,   0,            /* C=cyl   0 thru 822 = 823 cyl */
	437760, 102,            /* D=cyl 102 thru 821 = 720 cyl */
	218880, 102,            /* E=cyl 102 thru 461 = 360 cyl */
	218880, 462,            /* F=cyl 462 thru 821 = 360 cyl */
	448,    822,            /* G=cyl 822 thru 822 =   1 cyl - BB stuff */
	499776,   0,            /* H=cyl   0 thru 821 = 822 cyl */
},

	/* alan's changes: 30 cyl on rm05 == 19 cyl on eagle = 1 segment
	so root(a) = 2 segs, data = 12 seg (e&f) or 24 seg (d) plus 12 seg (h). 
	g partition is what is left of last cylinder when bb table ( = 1 track)
	and replacement	blocks ( = 3 tracks are removed) plus 5 leftover
	cylinders. Swap part (b) = 6 segs (== 56Mb) (NB this is half a data
	partition so if a special filesystem is required this can be used
	and backed up easily)						*/
   eagle_sizes[8] = {
	36480,  0,              /* A=cyl   0 thru  37 =  38 cyl */
	109440, 38,             /* B=cyl  38 thru 151 = 114 cyl */
	808320, 0,              /* C=cyl   0 thru 841 = 842 cyl */
	437760, 152,            /* D=cyl 152 thru 607 = 456 cyl */
	218880, 152,            /* E=cyl 152 thru 379 = 228 cyl */
	218880, 380,            /* F=cyl 380 thru 607 = 228 cyl */
	5584,   836,            /* G=cyl 836 thru 841 =   6 cyl - BB stuff */
	218880, 608,            /* H=cyl 608 thru 835 = 228 cyl */
},

Note that I increased the a partition size as we do some work on kernels
and needed more space in root (we also needed a larger /tmp)
Note also that I increased the partition sizes to an integral number of tracks
- I see no reason to waste that space.

If anyone has any questions, comments or criticisms or knows of something I
missed or forgot to mention I would be delighted to hear about it. In particular
if anyone knows of a good reason to keep the sizes at the defaults I'd like
to hear it (NB so that people can swap disks is NOT, in my opinion, a good
reason as this can easily be done by creating a filesystem (excluding the
BB data) on a full c partition. Unix will understand the filesystem on
another machine and won't overwrite the bad block data unless you specifically
try to do so).

> You are to be congratulated for reading this rather lengthy message.
ditto re the reply

Alan P. Sexton				Systems Programmer
alan%ecrcvax.UUCP@Germany.CSNET		European Computer-Industry
..!mcvax!unido!ecrcvax!alan		Research Centre GmbH

andrew@ee.brunel.ac.uk (Andrew Findlay) (09/11/86)

In article <3496@brl-smoke.ARPA> smb@mimsy.umd.EDU (Steve M. Burinsky) writes:
>I have two questions regarding disk partitioning.
...
>
>I want to use use the "c" partitions of my disks for form one large file
>system per disk.
>
>1.  Under 4.2, are there advantages/disadvantages to one large file system
>versus many smaller file systems?  What about quotas and file system
>efficiency?

If you have plenty of disks, it makes sense to have some very big file
systems. The 4.2 file system will lay them out efficiently, so that is
not a worry. There are two main advantages to this:

(A)	The less filesystems you have, the simpler everything is to keep
	track of - especially quotas and dumps.

(B)	There is a limit to the number of filesystems that 4.2 will let
	you mount at any one time. With large filesystems you get more
	storage before hitting this limit (15 in the standard distribution
	- see page 33 of 'Building Systems with Config'.

The possible disadvantage of using partition C is that the disk cannot
then be used for swapping. If you have enough disks to get interleaved
swapping as well as big filesystems, so much the better. You seem to be
building an immense system...

>
>2.  If I use the "c" partition, how do I account for/leave enough space for
>the bad sector information and replacement sectors?

You can give arguments to newfs(8) - just tell it to use
(size_of_part_C - size_of_bad_block_area).

>
>3.  If I resize the partitions, how do I account for/leave enough space for
>the bad sector information and replacement sectors?

Simply make sure that the total of partition sizes other than C is less
than the disk size by the right amount. (And remember you have to change
tables in the driver as well as in /etc/disktab). The problem here is that
all disks of a given type share a single partition table, so you cannot
have wildly different layouts on each of two RA81s..

>The background info says that you can access the bad sector information and 
>replacement sectors only through the "c" partition.  Well, if I do a disk-
>to-disk copy using the "c" partition, am I copying one disk's bad sectors and
>bad sector information to the other disk?!

Sort of.. The disk driver (/sys/vaxuba/uda.c in your case) will map any
bad sectors to the appropriate replacement sectors on both the source and
destination disks. Thus, the disks appear to be perfect until you reach
the bad sector table itself. This is on the last track, which is usually
defect-free anyway. Once you start copying this, the destination disk's
bad sector info gets overwritten with that from the source disk. Even now,
things will be OK - UNTIL YOU REBOOT THE SYSTEM. The duff bad-sector info
will then be picked up and all hell will break loose.

In general, I would avoid disk-to-disk copying. If you must, use dd(1) and
set a block count so that it does not touch the bad-sector info.

Andrew

-- 

-------------------------------------------------------------------------
| From Andrew Findlay at Brunel University, Uxbridge, UB8 3PH, UK       |
| JANET: andrew@uk.ac.brunel.me    ARPA:  andrew%me.brunel.ac.uk@ucl-cs |
| UUCP:  ...ukc!me.brunel!andrew   PHONE: +44 895 74000 x2512           |
-------------------------------------------------------------------------

grandi@noao.UUCP (Steve Grandi) (09/13/86)

In article <385@brueer.ee.brunel.ac.uk> andrew@me.brunel.ac.uk (Andrew Findlay) writes:
>Sort of.. The disk driver (/sys/vaxuba/uda.c in your case) will map any
>bad sectors to the appropriate replacement sectors on both the source and
>destination disks. Thus, the disks appear to be perfect until you reach
>the bad sector table itself. This is on the last track, which is usually
>defect-free anyway. Once you start copying this, the destination disk's
>bad sector info gets overwritten with that from the source disk. Even now,
>things will be OK - UNTIL YOU REBOOT THE SYSTEM. The duff bad-sector info
>will then be picked up and all hell will break loose.
>
For UDA-50 supported disks (i.e. RA81s) this is INCORRECT.  They DO NOT
have bad-block tables that can be accessed by normal disk read/writes.
Their bad block forwarding is done by the hardware using spare sectors
thoughtfully provided in each track.  Other tables used in the revectoring
process itself are also inaccessible by normal reads and writes.  
In short, it is safe to use the entire RA81 disk (all 891072 sectors) without 
worrying about bad block tables.
-- 
Steve Grandi, National Optical Astronomy Observatories, Tucson, AZ, 602-325-9228
{arizona,decvax,hao,ihnp4,seismo}!noao!grandi  grandi%draco@Hamlet.Caltech.Edu

chris@umcp-cs.UUCP (Chris Torek) (09/14/86)

In article <497@carina.noao.UUCP> grandi@noao.UUCP (Steve Grandi) writes:

>For UDA-50 supported disks (i.e. RA81s) ...  it is safe to use the
>entire RA81 disk (all 891072 sectors) without worrying about bad
>block tables.

This is true . . .

>They DO NOT have bad-block tables that can be accessed by normal
>disk read/writes.  Other tables used in the revectoring process
>itself are also inaccessible by normal reads and writes.

but this is not.  The bad block tables and the spare sectors are
easily read or written with normal file system writes IF you make
a file system extending past sector 891071.  This is somewhat hard
to do, as the defalt driver tables prevent this, but if you alter
these tables, or write special kernel-munging programs, it can be
done.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

mojo@sun.uucp (Joseph Moran) (09/15/86)

In article <497@carina.noao.UUCP> grandi@noao.UUCP (Steve Grandi) writes:
>In article <385@brueer.ee.brunel.ac.uk> andrew@me.brunel.ac.uk (Andrew Findlay) writes:
>>	[lots of stuff about /sys/vaxuba/uda.c having mapped-to sectors in]
>>	[the `c' partition and thus you have to be careful when copying]
>>	[the entire partition]
>For UDA-50 supported disks (i.e. RA81s) this is INCORRECT.

There are lots more cases where the bad sector information is not visible
to the user when copying `c' partitions.  At Sun, this is true on all of
our disk subsystems.

`SCSI' disks (ST506 and ESDI) let the controller handle forwarding bad
blocks.  The good side of this is that the UNIX driver should always
see a perfect disk.  The down side of this is that a bad sector which
shows up after the original format of the disk can only be "mapped out"
by using a disk utility to reformat the entire disk.  Another
consequence of this is that with the 4.2 fast file system which tries
to allocate file systems taking advantage of the geometry of the disk
can lose because a particular block might not be physically on the
cylinder group that the file system expects.  Thus smarter disk
subsystems can sometimes work against you.

`SMD' disks (e.g. A "normal" Fujitsu eagle) *CAN* let the controller
handle the bad block forwarding, but you have more control.  The
controller can avoid a bad block by "slipping" a sector.  When this
happens normally a logical sector is slipped to a spare sector on the
same track.  This way the data is on the same cylinder and track that
the file system expected.  Sun has been shipping SMD disks set up for
slip sectoring for a while now.

The other way to avoid a map block is to do traditional bad block
mapping.  This can be done by making the header on the bad sector
invalid.  Using a bad sector map which is read per disk, the UNIX
driver can figure out where to really get the data when it needs to.
Of course a mapped sector isn't great for a smart file system, but it
happens in such a small percentage of the cases, it normally isn't a
big deal.  But on all Sun disks, the `c' partition does not include the
bad sector map or the mapped-to blocks.  Thus on a Sun SMD disk,
copying the entire disk via dd of the `c' partition should never
be a problem.

Another important thing about Sun disks is that file partitions are
never hard wired in the driver (as they are on the Vax), they are read
in per disk.  The `c' partition is always a bit smaller than the actual
physical disk to allow for bad blocks, spare labels, and mapping info.
Having soft partitions avoids lots of disk partitioning problems that
you hear from users of systems which have the partition information
wired in the driver.  I'm sure there are lots of guru's out there that
know what I'm talking about.

Thus for all Sun disks, the `c' partition as seen by a user of UNIX
does not contain mapping information (other than the geometry of the
physical disk and partitioning information), and coping the default
`c' partition to another disk of the same type never destroys mapping
information.  In general it is a good idea to understand what is
going on before trying to copy the entire disk, or let someone
who does (like a support person from the manufacturer) do it for you.

I hope that this information helps sheds some more light on disk
issues.  My use of Sun disks as an example was just because I'm
familiar with them and I can use them as a counter example to
Vax UNIX usage of disks.

	Joseph Moran
	{ihnp4, decvax, seismo, decwrl, ...}!sun!mojo
	mojo@sun.COM (or mojo@sun.ARPA)

jgh@gec-mi-at.co.uk (Jeremy Harris) (09/22/86)

In article <7247@sun.uucp> mojo@sun.UUCP (Joseph Moran) writes:
>`SCSI' disks (ST506 and ESDI) let the controller handle forwarding bad
>blocks.  The good side of this is that the UNIX driver should always
>see a perfect disk.  The down side of this is that a bad sector which
>shows up after the original format of the disk can only be "mapped out"
>by using a disk utility to reformat the entire disk.

	The SCSI spec defines an optional command 'reassign blocks' for
normal and WORM disks.

>`SMD' disks (e.g. A "normal" Fujitsu eagle) *CAN* let the controller
>handle the bad block forwarding, but you have more control.  The
>controller can avoid a bad block by "slipping" a sector.  When this
>happens normally a logical sector is slipped to a spare sector on the
>same track.  This way the data is on the same cylinder and track that
>the file system expected.

	Some SCSI controllers do forwarding this way too. There's nothing
in the SCSI spec which says where the replacement blocks have to be.

Jeremy Harris	jgh@gec-mi-at.co.uk	...!mcvax!ukc!hrc63!miduet!jgh