[comp.unix.wizards] mkfs/newfs: not enough inodes

dupuy@douglass.columbia.edu (Alexander Dupuy) (08/02/88)

We're bringing up a Sun-4 fileserver, which will take over much of the news and
mail duties here.  We have two 898M Hitachi drives, with those biiiig cylinders
(1005 sectors ~= 500K each).  I wanted to create a nice little 50M filesystem
with lots of inodes for news.  This doesn't seem to be possible.

The most inodes/cylinder group which mkfs or newfs allows is 2048.  With the
default 16 cylinders/group, this works out to 1 inode / 4K, which is a bit low
for news.  I went and pulled the old Sun-Spots issue off the archive server to
see what the workaround was.  The solution which someone came up with then was
to use 8 cylinders/group - fairly obvious in retrospect.  Unfortunately, under
SunOS 4.0, mkfs/newfs say:

	"cylinder groups must have a multiple of 16 cylinders"

which does not warm my heart.  On a Sun-3 running SunOS 3.5, the minimum seems
to be 4 cylinders/group.  Is there a reason for this?  What would happen if I
compile my old 3.5 mkfs source under 4.0 and build an 8 cyl/grp filesystem
anyhow?  I suspect the kernel will panic and die at some point, and don't care
to find out the hard way.

So, does anyone out there have any useful suggestions?

@alex

-- 
inet: dupuy@columbia.edu
uucp: ...!rutgers!columbia!dupuy

chris@mimsy.UUCP (Chris Torek) (08/03/88)

In article <5794@columbia.edu> dupuy@douglass.columbia.edu
(Alexander Dupuy) writes:
>...  under SunOS 4.0, mkfs/newfs say:
>
>	"cylinder groups must have a multiple of 16 cylinders"

From: chris@mimsy.UUCP (Chris Torek)
Newsgroups: comp.unix.questions
Subject: Re: mkfs problem
Date: 8 Jul 88 23:15:48 GMT

In article <699@natinst.UUCP> brian@natinst.UUCP (Brian H. Powell) writes:
>I'm having trouble getting the bytes/inode parameter to newfs/mkfs work
>like I want it to.  Normally, it uses 2048 bytes/inode.

Sometimes.  You are getting about 7K/inode, for reasons to be explained
in a moment.

>I want four times that many, so I want 512 bytes/inode.

This is rather excessive.  2K/inode usually provides more than twice as
many inode as you really need.  On file systems with many tiny files,
you might average as low as 1.5K/inode, or even 1K/inode.  512 bytes
per inode, though, would mean not only that every file would have to
be <= 512 bytes long, but every directory would also have to be <= 512
bytes long.  Four times your current allocation is just a bit under
2K/inode.

>natinst# /etc/newfs -n -v -i 512 /dev/rxl0e
>/etc/mkfs /dev/rxl0e 390744 67 27 8192 1024 16 10 60 512 t 0
>/dev/rxl0e:     390744 sectors in 216 cylinders of 27 tracks, 67 sectors
>        200.1Mb in 14 cyl groups (16 c/g, 14.82Mb/g, 2048 i/g)

Look at the numbers in the last line: 16 c/g, 14.82MB/g (Mb is just
wrong; the sizes are bytes, not bits! :-) ), 2048 i/g.  Translation:
16 cylinders per cylinder group, 14.82 megabytes each, with 2048
inodes each.  That is 2048 inodes per 14.82 MB of inode+data space,
or just under 7 KB of data space per inode (the inodes take part of
that 14.82 MB, as each inode consumes 128 bytes).  Given that there
are 16 cylinders per group, and that 16 cylinders is 14.82 MB, to get
512 bytes per inode, you should see:

	512 bytes * #i + 128 bytes * i = 14.82 MB [1]
	640 bytes * #i = 14.82 MB
	#i = 14.82*1024*1024 / 640

or around 24000 inodes per group!  Where did they all go?

>What's going on?
-----
[1] these calculations are somewhat off; there is also a block map
in each cylinder group.  Still, they are good enough for demonstration
purposes.
-----

What is going on is that there is (was) a hard limit in the way:

% egrep MAXIPG /sys/ufs/fs.h		(on a Sun)
 * MAXIPG bounds the number of inodes per cylinder group, and
 * N.B.: MAXIPG must be a multiple of INOPB(fs).
#define MAXIPG		2048	/* max number inodes/cyl group */
	char	cg_iused[MAXIPG/NBBY];	/* used inode map */
%

Now, you cannot just raise MAXIPG wantonly; indeed, if you do not
have source, you cannot raise it at all.  So what *can* you do?

There is a `-c' parameter to newfs, described as

     -c #cylinders/group
	       The number of cylinders per cylinder group in a
	       file system.  The default value used is 16.

If you lower c/g, you will lower MB/g.  A smaller MB/g will give
a smaller MB/inode ratio if i/g remains fixed.  Hence

	newfs -c 4 /dev/rxl0e

should give you `around' 2K/inode.  Of course, your cylinder groups
will be very small, which is not terribly advantageous.

But there is another problem.  Newfs cannot lower c/g below 16 when s/t
and t/c are 67 and 27 and the blocksize is 8K [2].  So now what?  It
might work to claim that the device has only 66 sectors per track,
which would let you use a c/g of 8; this loses 27 sectors, or 13.5KB,
per cylinder, and goofs up the allocation policies, unfortunately.  Or
you could use a blocksize of 4K, but that prevents paging on a Sun 3.

-----
[2] The problem has to do with the fact that 67*27 = 1809, which is
odd, or more precisely, has no 2s in its prime factorisation.  The
magic calculations, from the 4.3BSD-tahoe newfs, are:

	sblock.fs_spc = secpercyl;
	for (sblock.fs_cpc = NSPB(&sblock), i = sblock.fs_spc;
	     sblock.fs_cpc > 1 && (i & 1) == 0;
	     sblock.fs_cpc >>= 1, i >>= 1)
		/* void */;
	mincpc = sblock.fs_cpc;
	bpcg = sblock.fs_spc * sectorsize;
	inospercg = roundup(bpcg / sizeof(struct dinode), INOPB(&sblock));
	if (inospercg > MAXIPG(&sblock))
		inospercg = MAXIPG(&sblock);
	used = (sblock.fs_iblkno + inospercg / INOPF(&sblock)) * NSPF(&sblock);
	mincpgcnt = howmany(sblock.fs_cgoffset * (~sblock.fs_cgmask) + used,
	    sblock.fs_spc);
	mincpg = roundup(mincpgcnt, mincpc);

secpercyl is 67*27; fs_cpc (cylinders per rotational position cycle) is
(8192 bytes/block) / (512 bytes/sector) or 16 sectors/block.  This gives
a mincpc of 16, which carries on down into mincpg.
-----

In short, there are really no good solutions.  4.3BSD-tahoe has
eliminated the 2048 MAXIPG limit; MAXIPG is now computed as one third
of the space in a cylinder group (via the MAXIPG(&sblock) macro
above).  We can hope that Sun will pick up Kirk's new code quickly.
Until then, well, you may just have to create bigger files. . . .
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris