[comp.os.minix] Disk performance under Minix

HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) (08/08/89)

Bruce Evans' work on improving floppy disk performance got my curiosity
up, so I did a little unscientific test of i/o to my hard disk under
Minix and PC-DOS.  I wrote a little program that wrote 256 1024-byte
blocks to a file using fopen/fwrite.  Under PC-DOS, this program
took 5 seconds to write the file.  Under Minix, this program took
18 seconds to run.  This is on a 20Mhz 80386 machine with standard
AT-style controller and a 40Mb ST-251 disk with 1:2 interleave.
The 1:2 interleave is optimum for DOS; I suspect that it is too
tight for Minix.  Has anyone tried fiddling with the interleave (or
the AT disk driver) to improve disk i/o?  Does Bruce's protected mode
fixes (with associated improved interrupt code) improve the disk i/o
situation by reducing the time the kernel spends working (thereby improving
the optimum interleave factor)?

-- Guy
   BITNET: HELMER@SDNET

nfs@notecnirp.Princeton.EDU (Norbert Schlenker) (08/14/89)

In article <21290@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) writes:
>Bruce Evans' work on improving floppy disk performance got my curiosity
>up, so I did a little unscientific test of i/o to my hard disk under
>Minix and PC-DOS.  I wrote a little program that wrote 256 1024-byte
>blocks to a file using fopen/fwrite.  Under PC-DOS, this program
>took 5 seconds to write the file.  Under Minix, this program took
>18 seconds to run.  This is on a 20Mhz 80386 machine with standard
>AT-style controller and a 40Mb ST-251 disk with 1:2 interleave.
>The 1:2 interleave is optimum for DOS; I suspect that it is too
>tight for Minix.  Has anyone tried fiddling with the interleave (or
>the AT disk driver) to improve disk i/o?  Does Bruce's protected mode
>fixes (with associated improved interrupt code) improve the disk i/o
>situation by reducing the time the kernel spends working (thereby improving
>the optimum interleave factor)?
>
>-- Guy
>   BITNET: HELMER@SDNET

This seemed a bit curious to me, so I tried it on my system too
(16MHz 80386 with a Conner drive and RLL controller that appears to
cache tracks).  A DOS program finishes in about 6 seconds; a Minix
program takes 20 seconds (elapsed) / 5.8 (user) / 0.6 (system).  If
I use my own stdio package (ANSI standard, as Posix standard as I
can make it, with debugging if you want it and none if you don't,
and numerous other improvements --- but not quite ready to post)
with no debugging, I can reduce the Minix time to 17/4.0/0.6.

The numbers certainly indicate that the single threaded file system
has got to be replaced!  But doing that won't affect these numbers,
as the test was run as (essentially) the only process in the system.
Bruce Evan's protected mode fixes don't affect the times substantially
(the numbers above are in protected mode; vanilla 1.3 is similar).
Interleave isn't a factor on my system.  The copy through the buffer
cache struck me as another possible culprit, but shouldn't the system
times be higher if so (not that I have confidence in time accounting)?

So where is the slop?  I've looked at at_wini() - it's not a terribly
complex piece of code.  What could it be doing that would make fixed
disk performance so lousy?  Ideas, anyone?

Norbert

HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) (08/15/89)

In a recent article, nfs@notecnirp.princeton.edu (Norbert Schlenker)
writes:
>In article <21290@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy
>Helmer)
> writes:
>>Bruce Evans' work on improving floppy disk performance got my curiosity
>>up, so I did a little unscientific test of i/o to my hard disk under
>>Minix and PC-DOS.  I wrote a little program that wrote 256 1024-byte
>>blocks to a file using fopen/fwrite.  Under PC-DOS, this program
>>took 5 seconds to write the file.  Under Minix, this program took
>>18 seconds to run.  This is on a 20Mhz 80386 machine with standard
>>AT-style controller and a 40Mb ST-251 disk with 1:2 interleave.
>> ... machine description deleted ...|
>>
>>-- Guy
>>   BITNET: HELMER@SDNET
>
>This seemed a bit curious to me, so I tried it on my system too
>(16MHz 80386 with a Conner drive and RLL controller that appears to
>cache tracks).  A DOS program finishes in about 6 seconds; a Minix
>program takes 20 seconds (elapsed) / 5.8 (user) / 0.6 (system).
> ... mention of improved stdio code, etc. ...|
>
>Norbert

I have been discussing this with Bruce Evans.  It appears that
the small buffer cache will cause Minix hard disk i/o to slow to a crawl.
From just observing the behavior of the program I described above, it
seems that after the buffer cache gets full the disk begins seeking
intensely, as though it is going back to the superblock / inode table /
free block table between every few physical writes.

From The Book, lines 8161-8167 (documentation for put_block() in fs/cache.c)
say that "Blocks whose loss can hurt the integrity of the file system (e.g.
inode blocks) are written to the disk immediately if they are dirty."  Hmmm...
A file's inode would get modified quite often during the kind of i/o
that we are doing here, and put_block() might be called on the cached inode
block every time another block was allocated for a file.  I'll do a little
research and see if I'm on the right track.

-- Guy Helmer
   BITNET: HELMER@SDNET

kirkenda@psueea.uucp (Steve Kirkendall) (08/15/89)

In article <18613@princeton.Princeton.EDU> nfs@notecnirp.UUCP (Norbert Schlenker) writes:
>In article <21290@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) writes:
>>The 1:2 interleave is optimum for DOS; I suspect that it is too
>>tight for Minix.  Has anyone tried fiddling with the interleave (or
>>the AT disk driver) to improve disk i/o?  Does Bruce's protected mode
>>fixes (with associated improved interrupt code) improve the disk i/o
>>situation by reducing the time the kernel spends working (thereby improving
>>the optimum interleave factor)?
>
>Interleave isn't a factor on my system.  The copy through the buffer
>cache struck me as another possible culprit, but shouldn't the system
>times be higher if so (not that I have confidence in time accounting)?
>
>So where is the slop?  I've looked at at_wini() - it's not a terribly
>complex piece of code.  What could it be doing that would make fixed
>disk performance so lousy?  Ideas, anyone?
>
>Norbert

I ran some speed tests on my ST, equipped witha Supra hard disk.  My tests all
concerned reading, not writing.  The results of the test are presented below,
followed by the program I used to perform the tests.

As you read the following test results, keep in mind that my disk hardware
can transfer data at 189 kb/sec, and that other brands of disks can achieve
up to 1000 kb/sec on the ST.

The test performs N calls to read(), giving a buffer size of (2meg / N).
The speed of the read() call is then computed by dividing the 2meg by the
elapsed realtime of the test.

The test is repeated with several buffer sizes.  If the speed remains fairly
contant regardless of the buffer size, then speed is limited primarily by
hardware or the caching strategy (i.e. either the device is slow, or the
driver is slow, or cache is too complex or has a low hit/miss ratio).  If
the speed increases as the buffer size increases, then the speed is limited
largely by the overhead involved in the system call.

-----------------------------------------------------------------------------
TEST 1: reading from a data file

Blk Size  Test Time    Speed
  512        101         20 kb/sec
 1024         91         23 kb/sec
 2048         67         31 kb/sec
 8192         57         36 kb/sec
16384         56         37 kb/sec

Speed increases as the the buffer size increases, so the system call overhead
contributes to the low speed.  However, when the block size was increased by
a factor of 32, the speed was increased by a factor of only 1.85 -- so the
system call overhead can only accept a relatively small part of the blame.

The results of this test are directly comparable to results published in
UNIX Review Magazine...
						Compaq 386/20
Blk Size	Sun 3/50	Sun 3/260	  ISC UNIX
  512		  232		   485		    124
 1024		  219		   672		    143
 2048		  232		   642		    142
 8192		  232		   620		    146
-----------------------------------------------------------------------------
Test 2: reading from /dev/null

This test eliminates the hard disk hardware & driver from the test.  The cache
is also of no consequence, since /dev/null is a character device.  Since it
is a device rather than a regular file, FS doesn't have much to do. Also,
every read() reads 0 bytes, so no memory-to-memory copies are needed.

We are left with the system call overhead and the ramdisk driver.

Blk Size  Test Time    Speed
  512         32         64 kb/sec
 1024         16        128 kb/sec
 2048          8        256 kb/sec
 8192          2       1024 kb/sec
16384          1       2048 kb/sec

This tells us that the kernel can handle about 128 read() calls per second.
Think of these speeds as a theoretical maximum.
-----------------------------------------------------------------------------
TEST 3: /dev/bnull

/dev/bnull is simply a block-device version of /dev/null.  By comparing this
test to the /dev/null test, we can get an idea of the overhead required to
maintain the cache.  (Keep in mind, though, that no data blocks are actually
being cached here, since /dev/bnull is 0 blocks long.)

Blk Size  Test Time    Speed
  512         56         37 kb/sec
 1024         29         71 kb/sec
 2048         14        146 kb/sec
 8192          3        683 kb/sec
16384          2       1024 kb/sec

So, with cacheing, the speed is about 60% of what we got without cacheing.
-----------------------------------------------------------------------------
TEST 4: /dev/rhd2

This test is similar to the test on /dev/null, except that real hardware is
involved, and data bytes are actually being moved around.

Blk Size  Test Time    Speed
  512         99         21 kb/sec
 1024         53         39 kb/sec
 2048         32         64 kb/sec
 8192         14        146 kb/sec
16384         11        186 kb/sec

This test is really amazing, since my hard disk is only capable of 189 kb/sec.
With a large buffer, hardware is the limiting factor.  With small buffers,
overhead in the kernel, FS, or device driver becomes a severe problem.
-----------------------------------------------------------------------------
TEST 5: /dev/hd2

This test is similar to the test on /dev/rhd2, except that the cache is used
because /dev/hd2 is a block device.

Blk Size  Test Time    Speed
  512         98         21 kb/sec
 1024         87         24 kb/sec
 2048         67         31 kb/sec
 8192         58         35 kb/sec
16384         56         37 kb/sec

Suddenly, the software overhead is killing us.  I suspect that FS divides
each request into 1K chunks, and then reads each chunk separately.  So, we
wind up with a speed that is slightly lower than the 1K uncached speed,
no matter how large our cached read is.

It is interesting to compare the results of this test with the results of
the test on /dev/bnull, in which no data was actually cached or copied.

Also, note that these speeds are almost identical to the speeds for a regular
file in a filesystem on the harddisk.  So, there seems to be little overhead
involved in translating a file offset into a block number within a filesystem.
-----------------------------------------------------------------------------
SUMMARY

Basically, the slow speed seems to be a product of the way FS handles blocks.
We could speed up I/O tremendously if we could modify FS so that it lets the
driver read more than one block at a time.

Or, if that is too ambitious, then we could modify the device driver so that
it performs read-ahead.  A simple way to do this would be to always read 4k
when FS requests 1K; the extra 3k would be used to satisfy later requests,
if appropriate.  This would probably double the speed of reading.

One word of caution: sequential reading of a large file is exactly the sort
of test that makes a cache look bad.  This test was biased against caches.
-----------------------------------------------------------------------------
*** NEWSFLASH ***

I just added read-ahead to the device driver, and reran the test on /dev/hd2,
with the following results:

Blk Size  Test Time    Speed
  512         68         30 kb/sec	(was 21 kb/sec)
 1024         55         37 kb/sec	(was 24 kb/sec)
 2048         43         48 kb/sec	(was 31 kb/sec)
 8192         43         48 kb/sec	(was 35 kb/sec)
16384         43         48 kb/sec	(was 37 kb/sec)

So we get a 30%-50% improvement with read-ahead in the driver.  That's nice,
but I expected more.  The modified driver always does physical reads of 4k
or more, so I expected a speed just slightly less than what a 4096 byte block
would get you on the raw disk -- about 90 kb/sec.

We could probably do better with read-ahead implemented in the cache, since
that way we could reduce the number of messages passed to/from the device
driver, and also eliminate the chore of copying from the driver's buffer
to FS's buffer.

     +------------------------------------------------------------------+
     |                                                                  |
     | Hey, by golly, I sure am learning a lot about operating systems! |
     |                                                                  |
     +------------------------------------------------------------------+

Here is the program I used to perform the tests.  When run with no arguments,
it creates a 2meg file to use for the testing.  If you give an argument, then
it reads from the named file without writing to it.
----- cut here --------- cut here ---------- cut here ---------- cut here -----
/* seqread.c */

/* This program tests the spead at which sequential files are read.
 * There must be enough disk space for a 2 megabyte temp file.
 */

#include <fcntl.h>

#define TESTFILE	"twomegs"
#define FILESIZE	2097152L
char *testfile = TESTFILE;
char buf[16384];

main(argc, argv)
	int	argc;
	char	**argv;
{
	if (argc > 1)
	{
		testfile = argv[1];
	}
	else
	{
		/* create the test file */
		writefile();
	}

	/* test for various block sizes */
	printf("Blk Size  Test Time    Speed\n");
	readfile(512);
	readfile(1024);
	readfile(2048);
	readfile(8192);
	readfile(16384);

	if (argc > 1)
	{
		/* delete the test file */
		unlink(TESTFILE);
	}
}

writefile()
{
	long	offset;
	int	fd;

	/* create the file */
	fd = creat(TESTFILE, 0666);
	if (fd < 0)
	{
		perror(TESTFILE);
		exit(2);
	}

	/* put two megabytes of data in it */
	for (offset = 0L; offset < FILESIZE; offset += 16384)
	{
		if (write(fd, buf, 16384) < 16384)
		{
			perror("while writing");
			unlink(TESTFILE);
			exit(3);
		}
	}

	/* close the file */
	close(fd);
}

readfile(size)
	int	size;	/* size of buffer to use */
{
	long	before;	/* time at start of test */
	long	after;	/* time at end of test */
	int	blks;	/* number of buffers-full of data to read */
	int	fd;	/* used while reading the file */

	/* open the test file */
	fd = open(testfile, O_RDONLY);
	if (fd < 0)
	{
		perror("while reopening");
		exit(4);
	}

	/* read the file */
	for (blks = FILESIZE / size, time(&before); blks > 0; blks--)
	{
		read(fd, buf, size);
	}
	time(&after);

	/* close the file */
	close(fd);

	/* present statistics */
	printf("%5d    %7ld    %7ld kb/sec\n",
		size,
		after - before,
		(512 + FILESIZE / (after - before)) / 1024);
}
----- cut here --------- cut here ---------- cut here ---------- cut here -----
	-- Steve Kirkendall
	      ...uunet!tektronix!psueea!jove!kirkenda
	or    kirkenda@cs.pdx.edu

evans@ditsyda.oz (Bruce Evans) (08/16/89)

In article <21693@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) writes:
>say that "Blocks whose loss can hurt the integrity of the file system (e.g.
>inode blocks) are written to the disk immediately if they are dirty."  Hmmm...
>A file's inode would get modified quite often during the kind of i/o
>that we are doing here, and put_block() might be called on the cached inode
>block every time another block was allocated for a file.  I'll do a little

One of the ingredients in my speedups was to remove this. (Remove the
WRITE_IMMED's from all but the super block and map blocks in fs/buf.h.)
It had an immediate huge effect on the time for "rm *" in a bug directory.
Extending files is another bad case for the high-integrity method. It
doubles the i/o, and extra seeks may cost much more.

I also changed the cache flushing method so all dirty blocks on a device
are flushed whenever one needs to be flushed. This helps keep the integrity.
It may actually be safer - less wear.
-- 
Bruce Evans		evans@ditsyda.oz.au

HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) (08/16/89)

In a recent article, evans@ditsyda.oz (Bruce Evans) writes:
>In article <21693@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy
>Helmer)
> writes:
>>say that "Blocks whose loss can hurt the integrity of the file system (e.g.
>>inode blocks) are written to the disk immediately if they are dirty."  Hmmm...
>>A file's inode would get modified quite often during the kind of i/o
>>that we are doing here, and put_block() might be called on the cached inode
>>block every time another block was allocated for a file.  I'll do a little
>
>One of the ingredients in my speedups was to remove this. (Remove the
>WRITE_IMMED's from all but the super block and map blocks in fs/buf.h.)
>It had an immediate huge effect on the time for "rm *" in a bug directory.
>Extending files is another bad case for the high-integrity method. It
>doubles the i/o, and extra seeks may cost much more.

I hestitated to suggest this speedup since I believe it diverges
from the AT&T un*x design, according to Bach in _The Design of the Un*x
Operating System_ chapter 4 - algorithm iput (page 66)|.  I do believe
the speed increase in write operations justifies the change, though.

>I also changed the cache flushing method so all dirty blocks on a device
>are flushed whenever one needs to be flushed. This helps keep the integrity.
>It may actually be safer - less wear.
>--
>Bruce Evans		evans@ditsyda.oz.au

Thanks for the info.

-- Guy Helmer
   BITNET: HELMER@SDNET

jnall%FSU.BITNET@cornellc.cit.cornell.edu (John Nall 904-644-5241) (08/17/89)

In article <21693@louie.udel.edu> HELMER%SDNET.BITNET@vm1.nodak.edu
(Guy Helmer) writes:
> A file's inode would get modified quite often....and put_block()
> might be called . . .

ast points out in the Book (pages 272-3 of my edition) that some
performance improvement might be made by scattering the inodes
around the disk.  Since both the super-block and the inodes are
at the very start of the disk, the problem of writing every time
would seem to be magnified by having to seek so far.  (Just thinking
out loud...if we write critical blocks often, but only for safety,
so we don't need them if we don't crash, could they be written
somewhere else........)

John Nall

evans@ditsyda.oz (Bruce Evans) (08/18/89)

In article <1599@psueea.UUCP> kirkenda@jove.cs.pdx.edu (Steve Kirkendall) writes:
>...
>I ran some speed tests on my ST, equipped witha Supra hard disk.  My tests all
>concerned reading, not writing.  The results of the test are presented below,
>...
>The test performs N calls to read(), giving a buffer size of (2meg / N).
>...
>[Good explanation of how to interpret the tests.]
>...
>I just added read-ahead to the device driver, and reran the test on /dev/hd2,
>with the following results:
>
>Blk Size  Test Time    Speed
>  512         68         30 kb/sec	(was 21 kb/sec)
> 1024         55         37 kb/sec	(was 24 kb/sec)
> 2048         43         48 kb/sec	(was 31 kb/sec)
> 8192         43         48 kb/sec	(was 35 kb/sec)
>16384         43         48 kb/sec	(was 37 kb/sec)
>
>So we get a 30%-50% improvement with read-ahead in the driver.  That's nice,
>but I expected more.  The modified driver always does physical reads of 4k
>or more, so I expected a speed just slightly less than what a 4096 byte block
>would get you on the raw disk -- about 90 kb/sec.

Fragmentation is the main obstacle. I tried full-track read-ahead. The hit
rate was dismal.

>We could probably do better with read-ahead implemented in the cache, since
>that way we could reduce the number of messages passed to/from the device
>driver, and also eliminate the chore of copying from the driver's buffer
>to FS's buffer.

My method is to always read-ahead up to 15 blocks (the size of a track on a
1.2M floppy).  FS guesses the track boundaries and rounds up read requests
to match. It also communicates with the block device drivers up to NR_BUFS
blocks at a time. FS sorts the block numbers in increasing order. All the
drivers except FLOPPY just loop over the blocks. But now the loop is local
to the driver and the precious time between blocks is not wasted talking to
FS and copying data between FS and users.

For a 20MHz 386 and a drive limited to 255 kb/sec, and a 320K cache:

Blk Size  Test Time    Speed
  512         15        137 kb/sec  (was 48 kb/sec)
 1024         14        146 kb/sec  (was 49 kb/sec)
 2048         13        158 kb/sec  (was 49 kb/sec)
 8192         11        186 kb/sec  (was 48 kb/sec)
16384         12        171 kb/sec  (was 48 kb/sec)

The extra CPU power makes the old FS (286 protected mode) give the same
times for all sizes. The large cache is not particularly important for a
sequential read like this.

It will be difficult to achieve such rates on slower machines. I found that
an AT kept up with 3:1 interleave only in its Turbo mode. 8088's will have
difficulty with even 20K/sec from a floppy. I only see how to do it using
extra blocking and unblocking code.
-- 
Bruce Evans		evans@ditsyda.oz.au

hinton@netcom.UUCP (Greg Hinton) (08/18/89)

In article <18613@princeton.Princeton.EDU> nfs@notecnirp.UUCP (Norbert Schlenker) writes:
>In article <21290@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) writes:
>>I wrote a little program that wrote 256 1024-byte
>>blocks to a file using fopen/fwrite.  Under PC-DOS, this program
>>took 5 seconds to write the file.  Under Minix, this program took
>>18 seconds to run.
   . . . .
>The copy through the buffer
>cache struck me as another possible culprit

Considering that DOS uses a write-through cache -- i.e. writes are NEVER
delayed -- I don't see how MINIX's delayed-write cache could possibly
contribute to a slowdown in output.  All other things being equal, doesn't
a delayed-write cache guarantee higher throughput?

In article <2131@ditsyda.oz> evans@ditsyda.oz (Bruce Evans) writes:
>(Remove the
>WRITE_IMMED's from all but the super block and map blocks in fs/buf.h.)
>It had an immediate huge effect on the time for "rm *" in a bug directory.
   . . . .
>I also changed the cache flushing method so all dirty blocks on a device
>are flushed whenever one needs to be flushed.

Bruce, do you have actual timings to show how much performance is increased?
Does it approach DOS' performance?

One other major difference between the two operating systems is that I/O
in DOS is generally not interrupt driven.  I haven't looked at the actual
disk I/O code in the ROM BIOS, but I wouldn't be surprised if it sits in a
tight little loop, polling the disk controller, until I/O completes.  This
is much faster than responding to interrupts.  But, of course, preemptive
multitasking operating systems don't have this luxury.

-- 
Greg Hinton
INET: hinton@netcom.uucp
UUCP: ...!uunet!apple!netcom!hinton

HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) (08/18/89)

>In article <18613@princeton.Princeton.EDU> nfs@notecnirp.UUCP (Norbert
> Schlenker) writes:
>>In article <21290@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy
> Helmer) writes:
>>> ... ancient remarks deleted ...|
>   . . . .
>>The copy through the buffer
>>cache struck me as another possible culprit
>
>Considering that DOS uses a write-through cache -- i.e. writes are NEVER
>delayed -- I don't see how MINIX's delayed-write cache could possibly
>contribute to a slowdown in output.  All other things being equal, doesn't
>a delayed-write cache guarantee higher throughput?

The delayed-write cache normally results in higher output rates, but when
it is combined with standard Minix's attempts at maintaining file system
reliability (by writing Very Important Blocks to disk immediately after
modification), things get very slow.

>In article <2131@ditsyda.oz> evans@ditsyda.oz (Bruce Evans) writes:
>>(Remove the
>>WRITE_IMMED's from all but the super block and map blocks in fs/buf.h.)
>>It had an immediate huge effect on the time for "rm *" in a bug directory.
>   . . . .
>>I also changed the cache flushing method so all dirty blocks on a device
>>are flushed whenever one needs to be flushed.
>
>Bruce, do you have actual timings to show how much performance is increased?
>Does it approach DOS' performance?

I removed the WRITE_IMMED's from buf.h and then re-tried my tests.  For
the 256 1k block fopen/fwrite test, timings improved from
~18sec real time (so awful that I didn't even bother to get an accurate
average) to a mean of 9.60sec (mean of 5 tries) real time on my
20Mhz 80386 machine with a 28ms hard disk.  Here's the complete info:

               fwrite                 write
Test      Real  User  Sys        Real  User  Sys
  1        9.0   5.3  0.5         5.0   0.0  0.4
  2       10.0   5.4  0.4         5.0   0.0  0.5
  3       10.0   5.2  0.4         5.0   0.0  0.3
  4       10.0   5.4  0.4         5.0   0.0  0.5
  5        9.0   5.3  0.4         5.0   0.0  0.4

Note that with this mildly modified FS, Minix is very close to being as
fast as good old DOS.  Like any other O/S, a little testing would reveal
areas of the FS code where more performance could be gained.

>Greg Hinton
>INET: hinton@netcom.uucp
>UUCP: ...!uunet!apple!netcom!hinton

-- Guy Helmer
   BITNET: HELMER@SDNET

chasm@attctc.Dallas.TX.US (Charles Marslett) (08/19/89)

In article <2150@netcom.UUCP>, hinton@netcom.UUCP (Greg Hinton) writes:
> In article <18613@princeton.Princeton.EDU> nfs@notecnirp.UUCP (Norbert Schlenker) writes:
> Considering that DOS uses a write-through cache -- i.e. writes are NEVER
> delayed -- I don't see how MINIX's delayed-write cache could possibly
> contribute to a slowdown in output.  All other things being equal, doesn't
> a delayed-write cache guarantee higher throughput?

Note that DOS uses a write through cache for the data -- not the FAT (which
is its equivalent of inodes).  As a result, the FAT is normally accessed only
when the internal cache buffer is otherwise needed (for a write), when the
current block of the FAT contains no more unused clusters (for a read) or
when the "other" floppy is accessed (in a single floppy system), or when
the file is closed (the most common).  As a result,
if BUFFERS is not inadequately low, output of a sequential file to the disk
will be very nearly as fast as is possible on the hardware..

This is equivalent to caching all inode writes in a delayed write buffer
and writing all the data blocks directly through the cache.  Just about
the inverse of the Minix philosophy.  It is rather dangerous except for the
"intelligence" built into it -- a file close operation "fixes" the disk
so that the only messed up data on the disk is the current output files
should the system crash, and in most cases, the effect of a crash is that
no clusters are allocated and the data written to the disk is lost.  Some
additional code in the MSDOS fs tries to not reuse clusters that were
recently freed, so even some of the potentially dangerous mixes of operations
can still be defused.  [At the expense of a bit of allocation complexity ;^]

> In article <2131@ditsyda.oz> evans@ditsyda.oz (Bruce Evans) writes:
> >(Remove the
> >WRITE_IMMED's from all but the super block and map blocks in fs/buf.h.)
> >It had an immediate huge effect on the time for "rm *" in a bug directory.
>    . . . .
> >I also changed the cache flushing method so all dirty blocks on a device
> >are flushed whenever one needs to be flushed.
> 
> Bruce, do you have actual timings to show how much performance is increased?
> Does it approach DOS' performance?

It actually should, except for the possible increased fragmentation of a
freelist based file system as opposed to a map based one.

> -- 
> Greg Hinton
> INET: hinton@netcom.uucp
> UUCP: ...!uunet!apple!netcom!hinton

Charles Marslett
STB Systems, Inc.   <-- apply all standard disclaimers
chasm@attctc.dallas.tx.us

evans@ditsyda.oz (Bruce Evans) (08/25/89)

In article <21978@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) writes:
>I removed the WRITE_IMMED's from buf.h and then re-tried my tests.  For
>the 256 1k block fopen/fwrite test, timings improved from
>...
>               fwrite                 write
>Test      Real  User  Sys        Real  User  Sys
>...
>  2       10.0   5.4  0.4         5.0   0.0  0.5	[slowest of 5]
>...

Some more reference points for fwriting 256K on Minix
(open, fwrite, close; sync; for 386, do 10 times and divide later by 10):

cpu		disk		O/S	stdio	cache	real	user	sys

8088/4.77	3:1/80ms	1.3	1.3	 30K	117.0	66.7	 7.2
8088/4.77	3:1/80ms	1.4b-	1.3	 30K#	 81.0	58.5	 9.5
8088/4.77	3:1/80ms	1.3	mine	 30K	 56.0	 0.5	 4.9
8088/4.77	3:1/80ms	1.3+	mine	 30K	 43.0	 0.3	 5.1
8088/4.77	3:1/80ms	1.4b-	mine	 30K#	 22.0	 0.2	10.8
386/20/16 bits	2:1/28ms	1.4b-	mine	 30K#	  6.6	  .02	  .79
386/20/16 bits	2:1/28ms	1.4b-	mine	 50K#	  6.2	  .02	  .77
386/20/32 bits	2:1/28ms	1.4b-	mine	 30K#	  3.2	  .02	  .59
386/20/32 bits	2:1/28ms	1.4b-	mine	 50K#	  2.7	  .02	  .59
386/20/32 bits	2:1/28ms	1.4b-	mine	320K#	  2.3	  .02	  .63

1.3+ is my old version of 1.3 (bests parts are in 286 posting).
'#' means my modified cache and drivers. The driver got NR_BUFS - 5 blocks
in most requests in these tests.

The 16-bit 386 times are consistent with Guy's times for write. A little
slower. The grouped writes are *not* helping. Yet they do help with 32-bits
and the same cache size, and read times are identical in 16 and 32-bit modes!
The reason is that there is little time to spare in keeping up with the
interleave (50 micrsosec for write and 150 microsec for read), and 16-bit
mode is a little slower (mainly from 32-bit divisions in 16-bit software).
This highlights the problem of keeping up on slower machines. 3:1 interleave
should be OK for AT's with only a little work on the driver.

I'll be happy with times of 4 sec for the 8088 and 1.5 sec for the 386.
-- 
Bruce Evans		evans@ditsyda.oz.au

ast@cs.vu.nl (Andy Tanenbaum) (08/26/89)

In article <2141@ditsyda.oz> evans@ditsyda.mq.oz (Bruce Evans) writes:
>Some more reference points for fwriting 256K on Minix
>
>cpu		disk		O/S	stdio	cache	real	user	sys
>8088/4.77	3:1/80ms	1.4b-	1.3	 30K#	 81.0	58.5	 9.5
>8088/4.77	3:1/80ms	1.4b-	mine	 30K#	 22.0	 0.2	10.8

These two measurements seem to suggest that a factor of four can be won
by improving stdio.  That package was written by one of my students who
was enthusiastic but not very experienced.  What did you do to stdio to get
this big gain?

I would MUCH prefer a solution that gives good but not optimal performance
by fixing stdio than one that adds a lot of complexity to the kernel
(multiple read aheads etc.)

Andy Tanenbaum (ast@cs.vu.nl)

henry@utzoo.uucp (Henry Spencer) (08/28/89)

In article <3076@ast.cs.vu.nl> ast@cs.vu.nl (Andy Tanenbaum) writes:
>These two measurements seem to suggest that a factor of four can be won
>by improving stdio.  That package was written by one of my students who
>was enthusiastic but not very experienced.  What did you do to stdio to get
>this big gain?

You might want to look at the souped-up stdio stuff we ship with C News.
It's the fastest implementation of fread, fwrite, fgets, and fputs we
know of.  It is somewhat AT&T-Unix-stdio-specific in its details, but the
methods should apply widely.
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

ast@cs.vu.nl (Andy Tanenbaum) (08/29/89)

In article <1989Aug28.153612.8689@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>You might want to look at the souped-up stdio stuff we ship with C News.
>It's the fastest implementation of fread, fwrite, fgets, and fputs we
>know of.  It is somewhat AT&T-Unix-stdio-specific in its details, but the
>methods should apply widely.
As you no doubt know, the MINIX stdio internal data structures are not the
same as AT&Ts (If they were, we'd have lawsuit problems).  This probably
means that if will be difficult to take over some of your stdio and leave
the rest the old way.  If anybody wants to volunteer for redoing stdio
using Henry's stuff, by all means do so.  I'll happily include it if it
it works, is robust, is clean code and is fast.

I don't have the time to look at it myself now.

Classes start next week and I have to finish off the third edition of the
architecture book.

Andy Tanenbaum (ast@cs.vu.nl)

HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) (08/29/89)

ast@cs.vu.nl (Andy Tanenbaum) writes:
>In article <1989Aug28.153612.8689@utzoo.uucp> henry@utzoo.uucp (Henry Spencer)
> writes:
>As you no doubt know, the MINIX stdio internal data structures are not the
>same as AT&Ts (If they were, we'd have lawsuit problems).  This probably
>means that if will be difficult to take over some of your stdio and leave
>the rest the old way.  If anybody wants to volunteer for redoing stdio
>using Henry's stuff, by all means do so.  I'll happily include it if it
>it works, is robust, is clean code and is fast.
>
>Andy Tanenbaum (ast@cs.vu.nl)

While I haven't looked at Henry's stdio, I have begun designing a new
stdio for Minix.  It should be real fast, but I don't have anything in
code yet.

-- Guy Helmer
   BITNET: HELMER@SDNET

henry@utzoo.uucp (Henry Spencer) (08/30/89)

In article <22744@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) writes:
>While I haven't looked at Henry's stdio...

Actually, I should set the record straight on this:  Geoff did the stdio
speedup stuff.

>... I have begun designing a new
>stdio for Minix...

If you haven't put a lot of effort into this yet, might I suggest picking
a different project?  There are at least three freely-redistributable-stdio
projects already well advanced towards release.
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

HELMER%SDNET.BITNET@vm1.nodak.edu (Guy Helmer) (08/30/89)

henry@UTZOO.UUCP (Henry Spencer) writes:
>In article <22744@louie.udel.EDU> HELMER%SDNET.BITNET@vm1.nodak.edu (Guy
>Helmer)
> writes:
>>While I haven't looked at Henry's stdio...
>
>Actually, I should set the record straight on this:  Geoff did the stdio
>speedup stuff.
>
>>... I have begun designing a new
>>stdio for Minix...
>
>If you haven't put a lot of effort into this yet, might I suggest picking
>a different project?  There are at least three freely-redistributable-stdio
>projects already well advanced towards release.

Oh.  Okay, but pulling stuff off the net isn't nearly as much fun as
writing my own :-) .

-- Guy Helmer
   BITNET: HELMER@SDNET