[comp.unix.aux] Some questions about A/UX

alexis@ccnysci.UUCP (Alexis Rosen) (10/22/88)

I have a few questions about A/UX. I realize that there may be many sides to
some of these questions, but I'll take whatever I can get...
 
1) Can the Mac auto-reboot into A/UX automatically? If the machine dies,
   what happens? How often will it crash from high loads? What is a high
   load on a Mac IIx?
 
2) I know that the distribution comes with lots of source code. *What* source
   code? Can I get source for the stuff that's shipped only as binary?
 
3) Does A/UX have system accounting, and quotas, in the kernel, just like
   BSD? I know that "AU/X is all of SVR2 plus all of BSD 4.2 except ioctls",
   but what does that mean? Can it support the fast file system? What about
   stuff that comes with a standard BSD, but isn't part of the kernel? Am I
   even asking the right questions?
 
4) If the Mac crashes, the console will presumably go haywire. All diagnostic
   output will be lost. Can I make the console a serial device on one of the
   built-in serial ports? If not, can I stick a printer on one of those ports
   and redirect or copy ALL console output to the printer?
 
5) How many interrupts per second can the Mac deal with before getting
   hopelessly bogged down? The obvious answer is "it depends", but can
   someone out there give a rough approximation?
 
6) "Always balance your disk load." How much difference will there be between
   a system with one 620MB Wren and a system with a 620MB and a 90MB Wren?
   What is your favorite strategy with disks? Is it better to get a 90MB Wren
   or a 140MB Rodime as a second disk?
 
7) There is a well-known problem with Hayes-compatible modems on sysV machines.
   They often fail to reset properly after a session, so the next caller can't
   connect properly. I believe there is an answer, but I'm not sure. With A/UX,
   can I set up eight modems and be confident that they will work right? If
   Hayes-compatible modems are out, what do you recommend? The system would
   be used primarily for news and mail, so this is critically important to us.
 
8) To repeat an earlier query, is there ANYONE out there who runs more than
   three simultaneous users off of a Mac II or IIx? Any information about
   this would be deeply appreciated.
 
9) Lastly, I have asked in the past if anyone knew of a SCSI DMA board for
   the Mac II/IIx. Phil Ronzone was kind enough to spend a few moments with
   me explaining that DMA wouldn't be a big win. Nevertheless, all of my
   calculations indicate otherwise. What am I missing, if anything? Here are
   my numbers:

Assuming a Wren IV with 16ms access time, capable of transmitting data at a
sustained rate of 1 MByte/sec., and assuming the Mac II can do about 300 KB/sec
sustained, then a hard page fault (4KBytes) will take:

	  DMA: 16 ms + 4KB/(1MB/s)   =  16 ms +  ~4 ms = ~20 ms
 Mac w/no DMA: 16 ms + 4KB/(300KB/s) =  16 ms + ~13 ms = ~29 ms

So a page fault will take 50% more time without DMA.

   Of course, transferring larger amounts of data, DMA wins by a lot more.
   The more you transfer, the closer you will get to 330% speed for DMA. For
   loading a 500KB program, DMA would be 300% faster.

   Are there other factors I have overlooked? With a very fragmented drive,
   non-DMA loses a little less, but not a lot less. Anything else?

Thanks

----
Alexis Rosen                       alexis@dasys1.UUCP  or  alexis@ccnysci.UUCP
Writing from                       {allegra,philabs,cmcl2}!phri\
The Big Electric Cat                                       uunet!dasys1!alexis
Public UNIX                           {portal,well,sun}!hoptoad/

phil@Apple.COM (Phil Ronzone) (10/27/88)

In article <941@ccnysci.UUCP> alexis@ccnysci.UUCP (Alexis Rosen) writes:
>9) Lastly, I have asked in the past if anyone knew of a SCSI DMA board for
>   the Mac II/IIx. Phil Ronzone was kind enough to spend a few moments with
>   me explaining that DMA wouldn't be a big win. Nevertheless, all of my
>   calculations indicate otherwise. What am I missing, if anything?

Killing this misinformation on Mac II and SCSI hopefully for the last time ...

Assume you are running a typical one user I/O load of 40 to 80 1K blocks
a second. When the Apple HD80 presents the data requested, then A/UX
"yanks" the 1K, 4 bytes at a time, in a very tight loop. There is hardware
assist to make for very quick "yanking". How quick? 3.657 bytes per
microsecond. Or, 280 microseconds to "yank" the 1K block.

NOTE THAT: Both a DMA chip and the 68020 would both begin the "yanking"
of bytes at the same time. They would both finish around the same time.

A DMA CHIP CAN DO ABSOLUTELY NOTHING (REPEAT NOTHING NOTHING NOTHING ...)
TO MAKE THIS HAPPEN FASTER THAN HAVING THE 68020 TRANSFER THE BYTES. IS
THE UNIVERSE NOW AWARE OF THIS!!! :-) :-) <- tongue-in-cheek screaming.

So DMA can NOT make I/O performance improve. What it CAN do is free up
more cycles. It can save that 280 microseconds per block, or, if you are
doing 80 1K blocks per second, it can save you 22.4 milliseconds every
second. That is 2.2% of your total CPU cycles every second.

If you are doing swapping of a 160K "chunk" every second, then it will
save you 4.9% of your total CPU cycles every second.

At this point, a coomon objection is that "I/O can happen faster because
the 68020 can start it quicker because the DMA is taking off the load ...".
No - that ain't true either. Until either the DMA or 68020 is DONE transferring
the data, you CAN'T start more I/O.

Does this mean we are against DMA? NO NO NO! When you start transferring
large amounts of data (multi-megabyte images to a LaswerWriter SC) you
don't want to burn up your processor that way. Also unlikable is that
interrupts get locked out for large transfers to slower devices. A/UX
doesn't support LocalTalk on the onboard SCC's because a DataGram takes
21 milliseconds of LOCK-OUT-ALL-INTERRUPTS time. With an 8 DataGram per
ATP transaction, 170 millseconds loses too many interrupts (Ethernet,
keystrokes, mouse movements, incoming serial data etc.).

SUMMARY - A DMA chip on the Mac II can NOT increase I/O throughput. It can
free up more I/O cycles, although only 4% (predicted) / 8% (measured) for
typical heavy UNIX I/O loads. DMA buys the most in reducing interrupt
timing sensitivity, and in support the "very large data transfers" peripherals
such as LaserWriter II SC.

O.K.?

P.S. My very first reaction after examining the Macintosh II prototype
design was "What!? No DMA!!??". Just the facts, Ma'm.
+------------------------+-----------------------+----------------------------+
| Philip K. Ronzone      | A/UX System Architect | APPLELINK: RONZONE1        |
| Apple Computer MS 27AJ +-----------------------+----------------------------+
| 10500 N. DeAnza Blvd.  | If you post a bug to the net, and the manufacturer |
| Cupertino CA 95014     | doesn't read it,does that mean it doesn't exist?   |
+------------------------+----------------------------------------------------+
|{amdahl,decwrl,sun,voder,nsc,mtxinu,dual,unisoft}!apple!phil                 |
+-----------------------------------------------------------------------------+

alexis@ccnysci.UUCP (Alexis Rosen) (10/30/88)

In article <19528@apple.Apple.COM> phil@Apple.COM (Phil Ronzone) writes:
>In article <941@ccnysci.UUCP> alexis@ccnysci.UUCP (Alexis Rosen) writes:
>>9) Lastly, I have asked in the past if anyone knew of a SCSI DMA board for
>>   the Mac II/IIx. Phil Ronzone was kind enough to spend a few moments with
>>   me explaining that DMA wouldn't be a big win. Nevertheless, all of my
>>   calculations indicate otherwise. What am I missing, if anything?
>
>Killing this misinformation on Mac II and SCSI hopefully for the last time ...
>
>Assume you are running a typical one user I/O load of 40 to 80 1K blocks
>a second. When the Apple HD80 presents the data requested, then A/UX
>"yanks" the 1K, 4 bytes at a time, in a very tight loop. There is hardware
>assist to make for very quick "yanking". How quick? 3.657 bytes per
>microsecond. Or, 280 microseconds to "yank" the 1K block.
>
>NOTE THAT: Both a DMA chip and the 68020 would both begin the "yanking"
>of bytes at the same time. They would both finish around the same time.
>
>A DMA CHIP CAN DO ABSOLUTELY NOTHING (REPEAT NOTHING NOTHING NOTHING ...)
>TO MAKE THIS HAPPEN FASTER THAN HAVING THE 68020 TRANSFER THE BYTES. IS
>THE UNIVERSE NOW AWARE OF THIS!!! :-) :-) <- tongue-in-cheek screaming.
>[etc.]
>SUMMARY - A DMA chip on the Mac II can NOT increase I/O throughput. It can
>free up more I/O cycles, although only 4% (predicted) / 8% (measured) for
>typical heavy UNIX I/O loads. DMA buys the most in reducing interrupt
>timing sensitivity, and in support the "very large data transfers" peripherals
>such as LaserWriter II SC.

First of all, thanks to Phil for speaking out on this. My previous comment
about him was sincere; I do appreciate the time he's willing to spend answering
these questions.

That said, I still have some questions. The Mac can transfer a 1KByte block in
280 usecs. That's fine, but it's not the whole story. If it were, it could do
about 3.5 MBytes/sec. In fact, it can do less than one tenth of that speed. So
what causes that discrepancy? My guess (uneducated, so please correct me if I'm
wrong) is that it's the overhead for transferring that block. Can the Mac
transfer 10 or 100 blocks in 2800 or 28000 microseconds? I don't think so.

I don't know what the overhead for DMA is, but it seems to be a lot less. The
Golden Triangle folks say they are getting about 1 MByte/sec from their board,
using CDC Wrens. Since the Wrens are capable of just over 1 MByte/sec, G.T.
might be able to do even better with a faster controller (then again, maybe
not- I didn't ask).

As I think my numbers show, (read my original posting for them) there is a BIG
difference between 300KB/sec and 1MB/sec, even for single-user stuff. Maybe the
question is whether there is something about the Mac that makes these faster
transfers difficult (or impossible). I can't think of anything though. The very
thought seems silly.

So again, what am I missing?

p.s.- I am leaving for a week Monday, that's why I won't be answering anything
that needs a response until around Election Day. Sorry.

----
Alexis Rosen                       alexis@dasys1.UUCP  or  alexis@ccnysci.UUCP
Writing from                       {allegra,philabs,cmcl2}!phri\
The Big Electric Cat                                       uunet!dasys1!alexis
Public UNIX                           {portal,well,sun}!hoptoad/

phil@Apple.COM (Phil Ronzone) (11/02/88)

In article <964@ccnysci.UUCP> alexis@ccnysci.UUCP (Alexis Rosen) writes:
>That said, I still have some questions. The Mac can transfer a 1KByte block in
>280 usecs. That's fine, but it's not the whole story. If it were, it could do
>about 3.5 MBytes/sec. In fact, it can do less than one tenth of that speed. So
>what causes that discrepancy? My guess (uneducated, so please correct me if I'm
>wrong) is that it's the overhead for transferring that block. Can the Mac
>transfer 10 or 100 blocks in 2800 or 28000 microseconds? I don't think so.
>So again, what am I missing?

DMA is merely a hardware assist for transferring data from the (typically and
hopefully) buffered SCSI device, such as a hard disk, into the Mac system
memory.

To do this, the hard disk is instructed to read/write "N" blocks of
data starting at a certain address. For example, opening a thousand
1 block files, each of which is located near the end of a slower hard disk,
could, in ABSOLUTE worst case, can take as follows:
     1000 seeks to front of the disk to read the inode
     and then go back to near the end of the disk to read one 1K block
     = 60ms * 1000 = 60 secs = about 16KB per second!!!!

Rememberm that hard disk is getting more or less 40 - 100 requests for
blocks per second under load - and each block has a more or less random
block address in UNIX (to oversimplify it).

On the other hand, a SCSI device that is actually 16M (16 megabytes!) of
cache in the SCSI adapter (to ESDI) gave up to 400K/s data transfer rates.
You could write at even higher rates, but every 1 or 2 seconds it hit
an internal count of "writes pending" and accepted no more I/O until ALL
the writes were flushed. So you had 2-3 seconds of immense data transfer,
no matter how much "seeking" was implied, then 2 - 3 seconds of NO data
transfer while the device flushed.

So DMA (or "PIO") is but a small part of the equation for disk throughput.
Consider interleave, disk seek, average bytes per transfer, "randomness"
and so on.
+------------------------+-----------------------+----------------------------+
| Philip K. Ronzone      | A/UX System Architect | APPLELINK: RONZONE1        |
| Apple Computer MS 27AJ +-----------------------+----------------------------+
| 10500 N. DeAnza Blvd.  | If you post a bug to the net, and the manufacturer |
| Cupertino CA 95014     | doesn't read it,does that mean it doesn't exist?   |
+------------------------+----------------------------------------------------+
|{amdahl,decwrl,sun,voder,nsc,mtxinu,dual,unisoft}!apple!phil                 |
+-----------------------------------------------------------------------------+

dave@onfcanim.UUCP (Dave Martindale) (11/03/88)

In article <19528@apple.Apple.COM> phil@Apple.COM (Phil Ronzone) writes:
>
>Assume you are running a typical one user I/O load of 40 to 80 1K blocks
>a second. When the Apple HD80 presents the data requested, then A/UX
>"yanks" the 1K, 4 bytes at a time, in a very tight loop. There is hardware
>assist to make for very quick "yanking". How quick? 3.657 bytes per
>microsecond. Or, 280 microseconds to "yank" the 1K block.
>
>If you are doing swapping of a 160K "chunk" every second, then it will
>save you 4.9% of your total CPU cycles every second.

From my perspective, this is an argument that using the CPU to copy
data is fine as long as you are using the System V 1K-block filesystem,
since the filesystem so thoroughly throttles the disk.  But if you
ever switch to a filesystem with more throughput, you'll be in trouble.

For comparison, our old, slow, vax 780 running 4.3BSD always reads 8K
blocks on the filesystems that store images, and it manages to get about
60 blocks per second through the filesystem.  That's about 500 Kb/sec,
an order of magnitude larger than Phil's figures.  And this is on
old Eagle disks, where the average user data rate coming off the head
is about 1.6 Mb/sec.  More recent disks, even small Winchesters, are
considerably faster.

Our old Silicon Graphics workstation with a 70 Mb Vertex disk still
manages about 200 Kb/sec, using SGI's proprietary "extent filesystem".

So, if A/UX ever switches to a filesystem that allows access to some
reasonable fraction of the disk's real bandwidth (say 500 Kb/sec to
1 Mb/sec), like other workstation manufacturers provide,
having a DMA controller will suddenly become essential.

Remember that once the data is in kernel memory, UNIX has to copy it to
user memory, so the 68020 is going to be really busy just handling that.

I hope Apple switches to a better filesystem soon....

	Dave Martindale

alexis@ccnysci.UUCP (Alexis Rosen) (11/20/88)

Sorry I took so long to respond to this...

In article <19816@apple.Apple.COM> phil@Apple.COM (Phil Ronzone) writes:
"In article <964@ccnysci.UUCP> alexis@ccnysci.UUCP (Alexis Rosen) writes:
">That said, I still have some questions. The Mac can transfer a 1KByte block
">in 280 usecs. That's fine, but it's not the whole story. If it were, it could
">do about 3.5 MBytes/sec. In fact, it can do less than one tenth of that
">Speed. So what causes that discrepancy? My guess (uneducated, so please
">correct me if I'm wrong) is that it's the overhead for transferring that
">block. Can the Mac transfer 10 or 100 blocks in 2800 or 28000 microseconds? I
">don't think so. So again, what am I missing?
"
"DMA is merely a hardware assist for transferring data from the (typically and
"hopefully) buffered SCSI device, such as a hard disk, into the Mac system
"memory.
"
"To do this, the hard disk is instructed to read/write "N" blocks of
"data starting at a certain address. For example, opening a thousand
"1 block files, each of which is located near the end of a slower hard disk,
"could, in ABSOLUTE worst case, can take as follows:
"     1000 seeks to front of the disk to read the inode
"     and then go back to near the end of the disk to read one 1K block
"     = 60ms * 1000 = 60 secs = about 16KB per second!!!!
"
"Rememberm that hard disk is getting more or less 40 - 100 requests for
"blocks per second under load - and each block has a more or less random
"block address in UNIX (to oversimplify it).
"
"On the other hand, a SCSI device that is actually 16M (16 megabytes!) of
"cache in the SCSI adapter (to ESDI) gave up to 400K/s data transfer rates.
"You could write at even higher rates, but every 1 or 2 seconds it hit
"an internal count of "writes pending" and accepted no more I/O until ALL
"the writes were flushed. So you had 2-3 seconds of immense data transfer,
"no matter how much "seeking" was implied, then 2 - 3 seconds of NO data
"transfer while the device flushed.
"
"So DMA (or "PIO") is but a small part of the equation for disk throughput.
"Consider interleave, disk seek, average bytes per transfer, "randomness"
"and so on.

This is useful information to have, but it really doesn't answer the question
at all. Writing to a buffered SCSI disk can go up to 400K/s. That's great...
but not too great. The Wrens can sustain a throughput of 1MB/s. That's really
great. So why is the Mac's "great" != the Wren's "great"?

In other words, if the lack of DMA isn't slowing down the Mac, what is???
It is clearly not going as fast as it should.

----
Alexis Rosen                       alexis@dasys1.UUCP  or  alexis@ccnysci.UUCP
Writing from                       {allegra,philabs,cmcl2}!phri\
The Big Electric Cat                                       uunet!dasys1!alexis
Public UNIX                           {portal,well,sun}!hoptoad/

phil@Apple.COM (Phil Ronzone) (11/23/88)

In article <1011@ccnysci.UUCP> alexis@ccnysci.UUCP (Alexis Rosen) writes:
>.... Writing to a buffered SCSI disk can go up to 400K/s. That's great...
>but not too great. The Wrens can sustain a throughput of 1MB/s. That's really
>great. So why is the Mac's "great" != the Wren's "great"?
>
>In other words, if the lack of DMA isn't slowing down the Mac, what is???
>It is clearly not going as fast as it should.

One last last shot -- then I'm forgetting the matter. Typical UNIX SV
filesystem, 50 reads a second, each read to essentially a random block.
Assuming fast disks with 16MS average seek times. Each read/write
takes maybe 200 to 900 microseconds, depending on hardware. This gives
us a time to ACQUIRE IN MEMORY, for each block, 16MS + ~500 microseconds.

Now if we had an INFINITELY FAST DMA/tranfers mechanism, we could cut
that figure down to 16MS + ~0 microseconds. Notice the blazing increase
in throughput!!! :-)

OPEN QUESTION - why do you think the "Mac is not going as fast as it should"?
If this is on a comparsion basis, tell me the equivalent machine that runs
faster. Equivalent means 68020, ~16MHz, memory management, SCSI disk I/O,
SVR2, etc. We really do want to know of equivalent hardware that runs better
because of software. When we find it, we want to make ours run faster too.

WARNING -- "fast & faster" can be exceptionally subjective.
+------------------------+-----------------------+----------------------------+
| Philip K. Ronzone      | A/UX System Architect | APPLELINK: RONZONE1        |
| Apple Computer MS 27AJ +-----------------------+----------------------------+
| 10500 N. DeAnza Blvd.  | If you post a bug to the net, and the manufacturer |
| Cupertino CA 95014     | doesn't read it,does that mean it doesn't exist?   |
+------------------------+----------------------------------------------------+
|{amdahl,decwrl,sun,voder,nsc,mtxinu,dual,unisoft}!apple!phil                 |
+-----------------------------------------------------------------------------+

dave@onfcanim.UUCP (Dave Martindale) (11/24/88)

In article <21057@apple.Apple.COM> phil@Apple.COM (Phil Ronzone) writes:
>
>OPEN QUESTION - why do you think the "Mac is not going as fast as it should"?
>If this is on a comparsion basis, tell me the equivalent machine that runs
>faster. Equivalent means 68020, ~16MHz, memory management, SCSI disk I/O,
>SVR2, etc. We really do want to know of equivalent hardware that runs better
>because of software. When we find it, we want to make ours run faster too.

The Silicon Graphics IRIS 3000 series uses a 16 MHz 68020, with memory
management.  The old 70 Mb disks use an ST506 interface - SCSI should
do better.  The kernel is basically system V release something, and they
get several times the disk throughput of A/UX.

Why?  Basically because they don't use the system V filesystem - they
replaced it with an extent-based filesystem that reads and writes
much larger data blocks at a time.  I believe that the only way A/UX
will get decent performance out of the disk is by switching to a
different filesystem.

Note that using the Bell filesystem cripples NFS performance too,
since all reads and writes are done 1 Kb at a time, instead of the
4 or 8 Kb that other workstations use.  So it matters even when you
aren't using the local disk.

If you change filesystems and quadruple disk throughput, DMA may become
important for disk I/O.  Or it might not.  But for the moment, the filesystem
software seems to be the problem.

	Dave Martindale

fnf@fishpond.UUCP (Fred Fish) (11/26/88)

In article <16775@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes:
>The Silicon Graphics IRIS 3000 series uses a 16 MHz 68020, with memory
>management.  The old 70 Mb disks use an ST506 interface - SCSI should
>do better.  The kernel is basically system V release something, and they
>get several times the disk throughput of A/UX.
>
>Why?  Basically because they don't use the system V filesystem - they
>replaced it with an extent-based filesystem that reads and writes
>much larger data blocks at a time.  I believe that the only way A/UX
>will get decent performance out of the disk is by switching to a
>different filesystem.

Maybe it's just a rumor, but I once heard from someone close to the
A/UX project that the BSD filesystem was tried with A/UX, and it turned
out to be even slower than the System V filesystem on the Mac-II hardware.
There was an explanation, but I confess I didn't listen too closely to
it.  I hope that this is wrong, and that we will someday see a BSD 
filesystem with A/UX, because there are lots of things about A/UX that
I like.

I decided to retry the disk performance benchmark that I ran in Feb '88 and
posted the results for.  This posting contains a copy of that benchmark
at the end.  Here are the current results for an Amiga 2000 and the Mac-II,
along with some old results for a Sun-3/50.

Performance timings using Rick Spanbauer's diskperf.c program.

					Amiga	Sun	A/UX	A/UX
					2000	3/50	Mac-II	Mac-II
					ST277N		HD80SC	HD80SC
					Nov 88	?????	Feb 88	Nov 88

File creations           (files/sec)	14	6	6	6
File deletions           (files/sec)	41	11	8	8
Directory scan         (entries/sec)	92	350	371	397
Seek+read            (seek+read/sec)	85	298	110	93
Read speed,    512 buffer (byte/sec)	67216	240499	55168	25593
Read speed,   4096 buffer (byte/sec)	109226	234057	53708	25323
Read speed,   8192 buffer (byte/sec)	187245	233189	54013	25183
Read speed,  32768 buffer (byte/sec)	374491	236343	53644	25123
Write speed,   512 buffer (byte/sec)	28187	215166	44181	43855
Write speed,  4096 buffer (byte/sec)	137970	182466	47211	46287
Write speed,  8192 buffer (byte/sec)	154202	179755	46832	46445
Write speed, 32768 buffer (byte/sec)	218453	187580	46930	46707

Notes:
	(1)	Sun-30/50 timings by Rick Spanbauer.
	(2)	All Amiga and Mac-II timings done by Fred Fish.
	(3)	The Amiga 2000 uses an A2090 DMA controller, Workbench 1.3,
		and a Seagate ST277N (40 ms average access time).
	(4)	The Mac-II uses an HD80SC (30 ms average access time)

Comments:

	(1)	I included both the Feb 88 and current Mac-II timings because
		the read figures were significantly different.  I have no
		explanation for the discrepancy other than to note that
		the disk is probably now significantly fragmented, and
		I have since increased the number of I/O buffers to about
		1000.

	(2)	The Amiga timings I get for the relatively slow ST277N are
		about half of what have been reported by other people for
		faster drives (about 400-800 Kb per second maximum transfer
		rates).

	(3)	Considering that the Amiga is a stock 68000 running
		at less than 8 MHz, using a 30% slower drive than the
		Mac-II, it seems obvious that disk I/O is not the Mac's
		strongest feature...  :-)

======================================================================

/*
** Disk performance benchmark.  If your Amiga configuration is substantially
** different from the ones mentioned here, please run the benchmark and
** report the results to either: ..!philabs!sbcs!rick or posting to
** comp.sys.amiga.  Thanks!
**
** To compile benchmark for Unix 4.2/4.3 SUN 3.0/3.2:
**
**	cc -o diskperf -O diskperf.c
**
** Amiga version was cross compiled from a SUN, so you'll have to figure out
** how to compile diskperf under your favorite compiler system.  A uuencoded
** Amiga binary version of diskperfa is included with the shar file that 
** contained this source listing.
**
** To run diskperf, simply type:
**
**	diskperf [location], e.g. (on Amiga) diskperf ram:
**
** On the Amiga, you will need at least 256K bytes of "disk" wherever you
** choose to run.  Unix systems will need about 3 mBytes free (larger size
** test files to delete buffer caching effect).
**
** Disclaimer:
**
**	This benchmark is provided only for the purpose of seeing how fast
**	_your_ system runs the program.  No claims are made on my part
**	as to what conclusions may be drawn from the statistics gathered.
**	Just consider this program the "Sieve of Eratosthenes" of disk
**	benchmarks - haggle over the numbers with friends, etc, but
**	don't base purchasing decisions solely on the numbers produced
**	by this program.
**
** Amiga timings gathered thus far:
**
-----------------------------------------------------------------------------
Amiga A-1000, ~7mHz 68000, RAM:

File create/delete:	create 5 files/sec, delete 10 files/sec
Directory scan:		5 entries/sec
Seek/read test:		51 seek/reads per second
r/w speed:		buf 512 bytes, rd 201469 byte/sec, wr 154202 byte/sec
r/w speed:		buf 4096 bytes, rd 655360 byte/sec, wr 374491 byte/sec
r/w speed:		buf 8192 bytes, rd 873813 byte/sec, wr 374491 byte/sec
r/w speed:		buf 32768 bytes, rd 873813 byte/sec, wr 436906 byte/sec
-----------------------------------------------------------------------------
Amiga A-1000, ~7mHz 68000, DF1:

File create/delete:	create [0..1] files/sec, delete 1 files/sec
Directory scan:		43 entries/sec
Seek/read test:		18 seek/reads per second
r/w speed:		buf 512 bytes, rd 11861 byte/sec, wr 5050 byte/sec
r/w speed:		buf 4096 bytes, rd 12542 byte/sec, wr 5180 byte/sec
r/w speed:		buf 8192 bytes, rd 12542 byte/sec, wr 5130 byte/sec
r/w speed:		buf 32768 bytes, rd 12542 byte/sec, wr 5160 byte/sec
-----------------------------------------------------------------------------
Amiga A-1000/CSA Turbo board, ~14 mHz 68020, no 32 bit ram installed, RAM:

File create/delete:	create 7 files/sec, delete 15 files/sec
Directory scan:		8 entries/sec
Seek/read test:		84 seek/reads per second
r/w speed:		buf 512 bytes, rd 187245 byte/sec, wr 145625 byte/sec
r/w speed:		buf 4096 bytes, rd 655360 byte/sec, wr 327680 byte/sec
r/w speed:		buf 8192 bytes, rd 873813 byte/sec, wr 374491 byte/sec
r/w speed:		buf 32768 bytes, rd 873813 byte/sec, wr 436906 byte/sec
-----------------------------------------------------------------------------
Amiga A-1000, ~7 mHz 68000, Ameristar NFS  -> SUN-3/50, Micropolis 1325 disk:

File create/delete:	create 3 files/sec, delete 7 files/sec
Directory scan:		10 entries/sec
Seek/read test:		35 seek/reads per second
r/w speed:		buf 512 bytes, rd 30481 byte/sec, wr 3481 byte/sec
r/w speed:		buf 4096 bytes, rd 113975 byte/sec, wr 21664 byte/sec
r/w speed:		buf 8192 bytes, rd 145635 byte/sec, wr 38550 byte/sec
r/w speed:		buf 32768 bytes, rd 145365 byte/sec, wr 37449 byte/sec
-----------------------------------------------------------------------------
SUN-3/50, Adaptec SCSI<->ST-506, Micropolis 1325 drive (5.25", 5 mBit/sec):

File create/delete:	create 6 files/sec, delete 11 files/sec
Directory scan:		350 entries/sec
Seek/read test:		298 seek/reads per second
r/w speed:		buf 512 bytes, rd 240499 byte/sec, wr 215166 byte/sec
r/w speed:		buf 4096 bytes, rd 234057 byte/sec, wr 182466 byte/sec
r/w speed:		buf 8192 bytes, rd 233189 byte/sec, wr 179755 byte/sec
r/w speed:		buf 32768 bytes, rd 236343 byte/sec, wr 187580 byte/sec
-----------------------------------------------------------------------------

**
** Some sample figures from "large" systems: 
**

-----------------------------------------------------------------------------
SUN-3/160, Fujitsu SuperEagle, Interphase VSMD-3200 controller:

File create/delete:	create 15 files/sec, delete 18 files/sec
Directory scan:		722 entries/sec
Seek/read test:		465 seek/reads per second
r/w speed:		buf 512 bytes, rd 361162 byte/sec, wr 307200 byte/sec
r/w speed:		buf 4096 bytes, rd 419430 byte/sec, wr 315519 byte/sec
r/w speed:		buf 8192 bytes, rd 409067 byte/sec, wr 314887 byte/sec
r/w speed:		buf 32768 bytes, rd 409600 byte/sec, wr 328021 byte/sec
-----------------------------------------------------------------------------
SUN-3/75, NFS filesystem, full 8192 byte transactions:

File create/delete:	create 9 files/sec, delete 12 files/sec
Directory scan:		88 entries/sec
Seek/read test:		282 seek/reads per second
r/w speed:		buf 512 bytes, rd 238674 byte/sec, wr 52012 byte/sec
r/w speed:		buf 4096 bytes, rd 259334 byte/sec, wr 54956 byte/sec
r/w speed:		buf 8192 bytes, rd 228116 byte/sec, wr 26483 byte/sec
r/w speed:		buf 32768 bytes, rd 243477 byte/sec, wr 36174 byte/sec
-----------------------------------------------------------------------------
DEC VAX 780, RP07:

File create/delete:	create 12 files/sec, delete 12 files/sec
Directory scan:		509 entries/sec
Seek/read test:		245 seek/reads per second
r/w speed:		buf 512 bytes, rd 168041 byte/sec, wr 141064 byte/sec
r/w speed:		buf 4096 bytes, rd 210135 byte/sec, wr 239765 byte/sec
r/w speed:		buf 8192 bytes, rd 206277 byte/sec, wr 239948 byte/sec
r/w speed:		buf 32768 bytes, rd 199222 byte/sec, wr 232328 byte/sec
-----------------------------------------------------------------------------
DEC VAX 750, RA81:

File create/delete:	create 12 files/sec, delete 15 files/sec
Directory scan:		208 entries/sec
Seek/read test:		153 seek/reads per second
r/w speed:		buf 512 bytes, rd 99864 byte/sec, wr 72549 byte/sec
r/w speed:		buf 4096 bytes, rd 142663 byte/sec, wr 166882 byte/sec
r/w speed:		buf 8192 bytes, rd 147340 byte/sec, wr 153525 byte/sec
r/w speed:		buf 32768 bytes, rd 142340 byte/sec, wr 141571 byte/sec
-----------------------------------------------------------------------------

*/ 

#ifdef unix

#include <sys/types.h>
#include <sys/file.h>
#include <sys/time.h>
#include <sys/stat.h>

#define SCAN_ITER       10
#define RW_ITER         3
#define RW_SIZE         (3*1024*1024)
#define SEEK_TEST_FSIZE (1024*1024)
#define OPEN_TEST_FILES 200
#define TIMER_RATE      100

/*
** Amiga compatibility library for Unix.  These are NOT full or correct 
** emulations of the Amiga I/F routines - they are intended only to 
** run this benchmark.
*/

#define MODE_OLDFILE    1005
#define MODE_NEWFILE    1006
#define ERROR_NO_MORE_ENTRIES 
#define OFFSET_BEGINNING -1
#define OFFSET_CURRENT 0

Open(name, accessMode)
        char    *name;
        long    accessMode;
{
        int     flags, file;

        flags = O_RDWR;
        if(accessMode == MODE_NEWFILE)
                flags |= O_TRUNC|O_CREAT;
        if((file = open(name, flags, 0644)) < 0)
                file = 0;
        return(file);
}

/*
** To be fair, write should be followed by fsync(file) to flush cache.  But
** since when are benchmarks fair??
*/
#define Write(file, buffer, length) write(file, buffer, length)
#define Read(file, buffer, length) read(file, buffer, length)
#define Close(file) close(file)
#define CreateDir(name) mkdir(name, 0755)
#define Seek(file, position, mode) lseek(file, position, \
                (mode==OFFSET_BEGINNING ? 0 : (mode==OFFSET_CURRENT?1:2)))
#define AllocMem(size, constraints) malloc(size)
#define FreeMem(p, size) free(p, size)
#define DeleteFile(filename) unlink(filename)

timer_init()
{
        return(1);
}

timer_quit()
{
}

timer(valp)
        long    *valp;
{
        static struct timeval ref;
        struct timeval current;
        
        if(valp == (long *)0){
                gettimeofday(&ref, 0);
                return;
        } 
        gettimeofday(&current, 0);
        *valp = (current.tv_usec - ref.tv_usec)/(1000000/TIMER_RATE);
        if(*valp < 0){
                current.tv_sec--;
                *valp += TIMER_RATE;
        }
        *valp += (current.tv_sec - ref.tv_sec)*TIMER_RATE;
}

OpenStat(filename)
        char    *filename;
{
        int     fd, result;
        struct stat statb;

        if((fd = open(filename, 0)) < 0)
                return(0);
        result = fstat(fd, &statb);
        close(fd);
        return(result == 0);
}

#else

/*
** Iteration/size definitions smaller for Amiga so benchmark doesn't take
** as long and fits on empty floppy.
*/

#include <exec/types.h>
#include <libraries/dos.h>
#include <devices/timer.h>

#ifdef MANX
#include <functions.h>		/* For Manx only */
#endif

#define SCAN_ITER       5
#define RW_ITER         3
#define RW_SIZE         (256*1024)
#define SEEK_TEST_FSIZE (256*1024)
#define OPEN_TEST_FILES 100
#define TIMER_RATE      10              /* misnomer, should be resolution */

struct MsgPort *timerport, *CreatePort();
struct timerequest *timermsg, *CreateExtIO();
long    TimerBase;

timer_init()
{
        timerport = CreatePort(0, 0);
        if(timerport == (struct MsgPort *)0)
                return(0);
        timermsg = CreateExtIO(timerport, sizeof(struct timerequest));
        if(timermsg == (struct timerequest *)0){
                DeletePort(timerport);
                return(0);
        }
        if(OpenDevice(TIMERNAME, UNIT_VBLANK, timermsg, 0) != 0){
                DeletePort(timerport);
                DeleteExtIO(timermsg, sizeof(struct timerequest));
                return(0);
        }
        TimerBase = (long)timermsg->tr_node.io_Device;  /* Hack */
        return(1);
}

timer_quit()
{
        CloseDevice(timermsg);
        DeleteExtIO(timermsg, sizeof(struct timerequest));
        DeletePort(timerport);
}

timer(valp)
        long    *valp;
{
        static struct timeval ref;
        long    t;

        timermsg->tr_node.io_Command = TR_GETSYSTIME;
        DoIO(timermsg);
        t = timermsg->tr_time.tv_secs;
        if(valp == (long *)0)
                ref = timermsg->tr_time;
        else {
                SubTime(&timermsg->tr_time, &ref);
                *valp = timermsg->tr_time.tv_secs*TIMER_RATE +
                        (timermsg->tr_time.tv_micro/(1000000/TIMER_RATE));
        }
}

OpenStat(filename)
        char    *filename;
{
        long    lock, result;
        static struct FileInfoBlock fib;  /* must be on &fib mod 4 == 0 */

        if((lock = Lock(filename, MODE_OLDFILE)) == 0)
                return(0); 
        result = Examine(lock, &fib);
        UnLock(lock);
        return(result);
}

#endif

/*
** Benchmarks performed:
**
**      1)  Raw file read/write rates.  Tested for operation sizes of
**          512/4096/8192/65536 bytes.  Return read/write figures for each
**          tranfer size in bytes/sec.
**
**      2)  Directory create/delete rates.  Return create/delete entries
**          per second.
**
**      3)  Directory lookup rate.  Create files in directory, and
**          then measure time to lookup, open & stat entire directory contents.
**          Return entries/second.
**
**      4)  Seek speed test - create large file, then seek to various
**          positions in file & read one byte.  Seek distances intentionally
**          chosen large to reduce cacheing effectiveness - want basic
**          speed of disk format here.  Return seeks/second.
*/

char    *prepend = "";  /* prepend this path to all filenames created   */
char    scratch[8192];  /* scratch buffer used in various tests         */

/*
** Our `C' library for the Amiga is a bit different than Unix's, so this
** routine will look a bit obtuse to most of you.  Trying to avoid using
** sprintf()..
*/
maketemp(buf, pref)
        char    *buf;
{
        char    *p, *q;
        int     fnum;
        static  int cnt;
        
        fnum = cnt++;
        q = buf;
        if(pref)
                for(p = prepend; *p; )
                        *q++ = *p++;
        for(p = "diskperf"; *p; )
                *q++ = *p++;
        *q++ = 'A' + ((fnum>>8)&0xf);
        *q++ = 'A' + ((fnum>>4)&0xf);
        *q++ = 'A' + (fnum&0xf);
        *q++ = 0;
}

long    sptest[] = {512, 4096,  8192, 32768, 0};

void rw_test()
{
        long    i, j, k, maxsize, file, RDaccTime, WRaccTime, Dt;
        struct timeval t0, t1;
        char    *p, filename[64];

        maxsize = -1;
        for(k = 0; sptest[k] != 0; k++)
                if(sptest[k] > maxsize)
                        maxsize = sptest[k];
        if((p = (char *)AllocMem(maxsize, 0)) == (char *)0){
                printf("Could not get %d bytes of memory\n", maxsize);
                return;
        }
        for(k = 0; sptest[k] != 0; k++){
                RDaccTime = WRaccTime = 0;
                for(j = 0; j < RW_ITER; j++){

                        maketemp(filename, 1);

                        if((file = (long) Open(filename, MODE_NEWFILE)) == 0){
                                printf("Could not create %s\n", filename);
                                return;
                        }
                        timer(0);
                        for(i = RW_SIZE/sptest[k]; i > 0; i--)
                                Write(file, p, sptest[k]);
                        timer(&Dt);
                        WRaccTime += Dt;
                        Close(file);

                        if((file = (long) Open(filename, MODE_OLDFILE)) == 0){
                                printf("Could not open %s\n", filename);
                                return;
                        }
                        timer(0);
                        for(i = RW_SIZE/sptest[k]; i > 0; i--)
                                Read(file, p, sptest[k]);
                        timer(&Dt);
                        RDaccTime += Dt;
                        Close(file);

                        DeleteFile(filename);
                }
                printf("r/w speed:\t\tbuf %d bytes, rd %d byte/sec, wr %d byte/sec\n", 
                                sptest[k],
                                (TIMER_RATE*RW_SIZE)/(RDaccTime/RW_ITER),
                                (TIMER_RATE*RW_SIZE)/(WRaccTime/RW_ITER));

        }
        FreeMem(p, maxsize);
}

seek_test()
{
        char    fname[64];
        long    i, fd, Dt, cnt, pos, dist;

        maketemp(fname, 1);
        if((fd = (long) Open(fname, MODE_NEWFILE)) == 0){
                printf("Could not create %s\n", fname);
                return;
        }
        for(i = SEEK_TEST_FSIZE/sizeof(scratch); i > 0; i--)
                if(Write(fd, scratch, sizeof(scratch)) != sizeof(scratch))
                        break;
        if(i == 0){
                cnt = 0;
                timer(0);
                for(dist = 256; dist <= 65536; dist <<= 2)
                        for(pos = 0; pos < SEEK_TEST_FSIZE; pos += dist){
                                cnt++;
                                Seek(fd, pos, OFFSET_BEGINNING);
                                Read(fd, scratch, 1);
                        }
                timer(&Dt);
                printf("Seek/read test:\t\t%d seek/reads per second\n", 
                                (TIMER_RATE*cnt)/Dt);
        }
        Close(fd);
        DeleteFile(fname);
}

char    tempname[OPEN_TEST_FILES][16];

open_scan_test()
{
        char    dirname[64];
        long    lock, oldlock, cDt, dDt, sDt, i, j, fd, numRead;
        struct FileInfoBlock *fib;

        maketemp(dirname, 1);
        lock = CreateDir(dirname);
#ifdef unix
        chdir(dirname);
#else
        oldlock = CurrentDir(lock);
#endif
        for(i = 0; i < OPEN_TEST_FILES; i++)
                maketemp(tempname[i], 0);
        
        /*
        ** Time Open of files.
        */
        timer(0);
        for(i = 0; i < OPEN_TEST_FILES; i++){
                if((fd = Open(tempname[i], MODE_NEWFILE)) == 0){
                        printf("Could not open %s/%s\n", dirname, tempname);
                        break;
                }
                Close(fd);
        }
        timer(&cDt);

        /*
        ** Time open scan of directory.
        */
        timer(0);
        numRead = 1;
        for(i = 0; i < SCAN_ITER; i++)
                for(j = 0; j < OPEN_TEST_FILES; j++)
                        if(OpenStat(tempname[i]) != 0)
                                numRead++;
        timer(&sDt);

        /*
        ** Time Close of files.
        */
        timer(0);
        for(i = 0; i < OPEN_TEST_FILES; i++)
                DeleteFile(tempname[i]);
        timer(&dDt);

        printf("File create/delete:\tcreate %d files/sec, delete %d files/sec\n",
                                (TIMER_RATE*OPEN_TEST_FILES)/cDt, 
                                (TIMER_RATE*OPEN_TEST_FILES)/dDt);
        printf("Directory scan:\t\t%d entries/sec\n", 
                                (TIMER_RATE*numRead)/sDt);
#ifdef unix
        chdir("..");
        rmdir(dirname);
#else
        CurrentDir(oldlock);
        DeleteFile(dirname);
#endif
}

main(argc, argv)
        int     argc;
        char    **argv;
{
        if(!timer_init()){
                printf("Could not init timer\n");
                return(0);      /* Exit in most systems, but not ours! */
        }
        if(argc > 1)
                prepend = argv[1];
        open_scan_test();
        seek_test();
        rw_test();
}

-- 
# Fred Fish, 1346 West 10th Place, Tempe, AZ 85281,  USA
# noao!nud!fishpond!fnf                   (602) 921-1113

lalonde@nicmad.UUCP (John Lalonde) (11/28/88)

In article <165@fishpond.UUCP> fnf@fishpond.UUCP (Fred Fish) writes:
>I decided to retry the disk performance benchmark that I ran in Feb '88 and
>posted the results for.
>Performance timings using Rick Spanbauer's diskperf.c program.
>
>					Amiga	Sun	A/UX	A/UX
>					2000	3/50	Mac-II	Mac-II
>					ST277N		HD80SC	HD80SC
>					Nov 88	?????	Feb 88	Nov 88
>
>File creations           (files/sec)	14	6	6	6
>File deletions           (files/sec)	41	11	8	8
>Directory scan         (entries/sec)	92	350	371	397
>Seek+read            (seek+read/sec)	85	298	110	93
>Read speed,    512 buffer (byte/sec)	67216	240499	55168	25593
>Read speed,   4096 buffer (byte/sec)	109226	234057	53708	25323
>Read speed,   8192 buffer (byte/sec)	187245	233189	54013	25183
>Read speed,  32768 buffer (byte/sec)	374491	236343	53644	25123
>Write speed,   512 buffer (byte/sec)	28187	215166	44181	43855
>Write speed,  4096 buffer (byte/sec)	137970	182466	47211	46287
>Write speed,  8192 buffer (byte/sec)	154202	179755	46832	46445
>Write speed, 32768 buffer (byte/sec)	218453	187580	46930	46707

I compiled diskperf.c and ran it on two different sun 386i workstations.
I made no modifications to the benchmark and ran it on a stock version of
SunOS 4.0.0

					sun 386i (model 150/8x)
File creations           (files/sec)	18
File deletions           (files/sec)	54
Directory scan         (entries/sec)	976
Seek+read            (seek+read/sec)	1976
Read speed,    512 buffer (byte/sec)	776722
Read speed,   4096 buffer (byte/sec)	2995931
Read speed,   8192 buffer (byte/sec)	3534525
Read speed,  32768 buffer (byte/sec)	3700856
Write speed,   512 buffer (byte/sec)	280618
Write speed,  4096 buffer (byte/sec)	1728421
Write speed,  8192 buffer (byte/sec)	2069557
Write speed, 32768 buffer (byte/sec)	1787345

Notes:
	(1)	Sun-386i model 150/8x uses a 20MHz 80386, has 8 MB of memory,
		and 32KB of cache.
	(2)	Sun 386i configured with a CDC Wren IV (94171-300) disk
	(3)	Sun 386i uses the Intel 82380 DMA Controller
	(4)	Sun 386i uses the Western Digital 33c93 SCSI Controller IC



-- 
John LaLonde
Nicolet Instrument Corporation
uucp: {ucbvax,rutgers,harvard}!uwvax!astroatc!nicmad!lalonde

ggere@csi.3Com.Com (Gary M. Gere) (12/01/88)

I thought it would be interesting to throw in the disk performance
measurements from a DEC VAX/11-750 (~=.75 mip) with 3mb memory, Emulex
controller with DMA and a Fujitsu M2351A "Eagle" 474mb 18ms seek disk.


                               Amiga  Sun   A/UX   A/UX   4.3BSD  Sun
                               2000   3/50  Mac-II Mac-II VAX750  386i-150/8
                               ST277N       HD80SC HD80SC Fujitsu CDC Wren-IV
                               Nov 88 ????? Feb 88 Nov 88 "EAGLE"

File creations    (files/s)     14       6       6      6      13      18
File deletions    (files/s)     41      11       8      8      14      54
Directory scan  (entries/s)     92     350     371    397     306     976
Seek+read     (seek+read/s)     85     298     110     93     333    1976
Read    512 buffer (byte/s)  67216  240499   55168  25593  180997  776722
Read   4096 buffer (byte/s) 109226  234057   53708  25323  400729 2995931
Read   8192 buffer (byte/s) 187245  233189   54013  25183  440578 3534525
Read  32768 buffer (byte/s) 374491  236343   53644  25123  499321 3700856
Write   512 buffer (byte/s)  28187  215166   44181  43855  123847  280618
Write  4096 buffer (byte/s) 137970  182466   47211  46287  402782 1728421
Write  8192 buffer (byte/s) 154202  179755   46832  46445  489226 2069557
Write 32768 buffer (byte/s) 218453  187580   46930  46707  503316 1787345
-- 

===============================================================================
CSI Division, 3Com Corporation, 2125 Hamilton Avenue San Jose, Calif 95125-5905
Gary M. Gere {3comvax,epimass}!csi!ggere or ggere@csi.3Com.Com  +1 408/559-1118