[comp.unix.internals] setting ipg > MAXIPG on a filesystem?

haberman@s1.msi.umn.edu (Joe Habermann) (12/27/90)

Please excuse me if this question has been asked many times 
already.  It seems common enough.

We have a Sun running SunOS 4.0.3 that is acting as a net news
server.  Since news invariably makes many small files, we wanted
to make our news filesystem with many inodes.  

The obvious thing to try is to run mkfs with a smaller nbpi 
than the default.  We have tried this, but are now running up
against MAXIPG.  That is, no matter how small nbpi is made,
SunOS mkfs can only allocate <= MAXIPG inodes per cylider group
as defined in /usr/include/ufs/fs.h.  

Is there any reasonable way around this problem?

Thanks,

Joe Habermann
haberman@msi.umn.edu

dpz@action.rutgers.edu (David Paul Zimmerman) (12/27/90)

What we generally do here is to skirt the problem by mkfs'ing a smaller number
of cylinders per cylinder group (-c flag to mkfs on SunOS).  For example, if
the standard is 16 cpg, we use 12 or 8 or even 4.  (My own news spool system
uses 12.)

						David
-- 
David Paul Zimmerman                                     dpz@dimacs.rutgers.edu
Systems Programmer						    rutgers!dpz
Rutgers Univ Center for Discrete Math and Theoretical Computer Science (DIMACS)

chris@mimsy.umd.edu (Chris Torek) (01/07/91)

In article <1990Dec26.182035.6868@s1.msi.umn.edu> haberman@s1.msi.umn.edu
(Joe Habermann) asks:
>We have a Sun running SunOS 4.0.3 that is acting as a net news server. ...
>We ... are now running up against MAXIPG.  That is, no matter how small
>nbpi is made, SunOS mkfs can only allocate <= MAXIPG inodes per cylider
>group as defined in /usr/include/ufs/fs.h.  

>Is there any reasonable way around this problem?

In article <Dec.26.20.35.45.1990.2094@action.rutgers.edu>
dpz@action.rutgers.edu (David Paul Zimmerman) answers:
>What we generally do here is to skirt the problem by mkfs'ing a smaller number
>of cylinders per cylinder group (-c flag to mkfs on SunOS).  For example, if
>the standard is 16 cpg, we use 12 or 8 or even 4.  (My own news spool system
>uses 12.)

This is usually the solution.  I have, however, a longer article
explaining what is going on (and why you should complain to your
vendor about their not picking up some changes from 4.3BSD-tahoe).

From: chris@mimsy.umd.edu (Chris Torek)
Newsgroups: comp.unix.questions
Subject: Re: How to increase number of inodes in a BSD file system?
Message-ID: <26329@mimsy.umd.edu>
Date: 1 Sep 90 21:40:27 GMT

First, avoid mkfs.  Use newfs.  Even if you still have a separate mkfs
program (4.3-tahoe and 4.3-reno no longer do), newfs is easier.

Next, newfs: there are a number of options that together affect the total
number of inodes.  These are:

	-b, -f, -i, -c, -s, -u, -t, -p, -x

Many of these have effects that are so indirect (and complicated) that
they are best ignored.  Some dictate the physical geometry of the disk
and should therefore not be changed anyway.  The important ones are `-i'
and `-c', and if you have a sufficiently recent BSD (4.3-tahoe or later)
`-i' will (almost) always work and you can ignore the rest anyway.

The -i option takes an argument giving the `number of bytes per inode':
that is, the expected average size of a file.  It is best to underestimate
slightly.  The average file size on many of our file systems is around
7 or 8 kB; on these file systems `-i 6144' or `-i 6656' gives us good
results.  On a few file systems (e.g., /news) the average file size is
much smaller.

Now, what are all the rest of those doing there?  -i `ought' to do the
trick, but. . . .

There are various constraints imposed by the original file system code
that crop up in certain common situations on certain machines from a
certain vendor that, despite significant progress on certain networking
and VM architecture fronts :-) , seems remarkably slow at picking up
such important and useful fixes as the `fat fast file system' code in
4.3BSD-tahoe.  In particular, the 4.2BSD code required that there be no
more than 2048 (MAXIPG) inodes per `cylinder group'.  (A `cylinder
group' is just that: a group (collection) of (contiguous) disk cylinders.)

There are normally 16 cylinders per cylinder group (except in the last
group).  On an RK07, for instance, this gives

	22 sect | 3 trk | 1 kB   | 16 cyl
	--------+-------+--------+------- = 528 kB per cyl group
	 1 trk  | 1 cyl | 2 sect |  1 cg

Now, `-i' defaults to 2048 bytes per inode.  This means that there would
be (528 * 1024 / 2048) inodes in a default RK07 cylinder group, or 264 i/g.

But consider a slightly more modern disk: a Fujitsu Eagle:

	48 sect | 20 trk | 1 kB   | 16 cyl
	--------+--------+--------+------- = 7680 kB per cyl group
	 1 trk  |  1 cyl | 2 sect |  1 cg

This requires 3840 inodes per cylinder group to allot one inode for every
2048 bytes.  Already we have overrun MAXIPG.  On a still-higher density
drive once popular from the fore(not)mentioned vendor we find:

	67 sect | 20 trk | 1 kB   | 16 cyl
	--------+--------+--------+------- = 10720 kB / cg
	 1 trk  |  1 cyl | 2 sect |  1 cg

To get 2048 bytes per inode here, we would have to have 5360 inodes per
cylinder group.  Since we can only have 2048 inodes per group, newfs
(silently, on said vendor's systems, as they had not yet picked up the
fixed version that was more verbose) raised the -i value from 2048 to
5360: that is, it assumed that the average file was about 5400 bytes
long.

Hence the -c option.  If we change the number of cylinders in a cg, we
will change the total space in each cg, while retaining the 2048 inode
limit.  The obvious choice for someone needing more inodes is to use a
smaller value for -c, such as 8:

	48 sect | 20 trk | 1 kB   | 8 cyl
	--------+--------+--------+------ = 3840 kB per cyl group
	 1 trk  |  1 cyl | 2 sect | 1 cg

which needs only 1920 inodes to get 2048 bytes per inode, comfortably
under the MAXIPG limit of 2048.

But wait, there is more.

Each superblock also contains rotational tables, used when allocating
disk blocks so that sequentially reading a file does not require long
waits for the disk to spin around.  These tables eventually repeat, or
`cycle'.  The period for this cycle sets a *lower* bound on the size
of the cylinder group.  This lower bound is aggravated (gets larger)
by values of `sectors per track' that have few factors of two.  On
RK07s the number of sectors per track is divisible by two; on Eagles
the number of sectors per track is 2^4*3; but on these other disks,
the list of prime factors is: 67.  The track size is a prime number.

All this means that in order to reduce the size of a cylinder group,
you may wind up having to reduce the block size (this has the opposite
effect on the cycle period).

Of course, you can also simply lie to newfs, describe a disk geometry
that is not quite the actual geometry, and get whatever results you like,
at some cost in performance.  Without measuring it I would not care to
guess what that cost might be.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris