[comp.unix.microport] *nix performance

blom@cs.vu.nl (Blom M L) (09/28/88)

Hi netland!

In due time I will be involved in the purchase of a computer which should run
some kind of UNIX.
This computer will be used by about 7 people simultaneously, where only one
will run really heavy jobs.

The point is, I don't know anything about the performances of 386-machines
with some kind of *IX compared to mini-computers like VAX-es etc.

Alas, my question cannot be put very precisely, but could anyone of you:

	-give me info/pointers to magazines about relative performances
	 of different UNIX-es
	-give me info/pointers to magazines about relative performances
	 of machines running some *IX.

One of my main interests is whether a fast (25MHz with cache etc) 386 with
some users attached to it can compare itself with a VAX/750 or other minis.

Furthermore, which UNIX-es can work with MessDos? I know Xenix does, and
Unisys advertises with some *IX which should be able to do it, I believe.
Does anyone know whether this really works?

I would be very grateful with any data.

And eeeer..... if my English is bad it is because I come from this country
where everybody believes they're speaking English without any accent
.............. if my questions are bad/already answered/provoking, just
forgive me my dumbness! :-)

Please don't let this grow into a this-is-much-better-than-what-you-said-war,
like vi-emacs.
So response by email seems best.

		Thancz!

			Lennert Blom.

[Yes, the Netherlands, in case you wondered.]

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (09/30/88)

In article <1428@draak.cs.vu.nl> blom@cs.vu.nl (Blom M L) writes:

| One of my main interests is whether a fast (25MHz with cache etc) 386 with
| some users attached to it can compare itself with a VAX/750 or other minis.

  Benchmarks (both mine and others) seem to show that a 20MHz 386 is
about 3:1 faster than an 11/780. I guess that compares to a 750
somehow... the F.P. performance is more like 5:4 faster, but very few
program are as F.P. intensive as they seem.

  You will probably want to use a "smart" serial card to take the load
off the CPU when interrupts come in, and Xenix comes with device drivers
for several.

  When I measured the performance of programs compiled on Xenix386 and
IX/386, I found no cases where Xenix was slower. I found a few cases in
which the Xenix compiler got into a loop and required hand
simplification of an expression (out of hunderds of programs) and about
15 cases where the ix/386 compiler either core dumped of tried to talk
the assembler into using "register 25". Both have been upgraded since
then, so I don't know if this is still representative.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

rcd@ico.ISC.COM (Dick Dunn) (10/01/88)

> | One of my main interests is whether a fast (25MHz with cache etc) 386 with
> | some users attached to it can compare itself with a VAX/750 or other minis.
> 
>   Benchmarks (both mine and others) seem to show that a 20MHz 386 is
> about 3:1 faster than an 11/780...

Davidsen (>) mentioned a smart serial card to reduce the interrrupt-per-char 
cost--a good idea, since that's a real CPU-waster.

Also, think very carefully about what sort of disk(s) you're going to put
on the 386 box.  The CPU itself has plenty of horsepower, but if you're
sitting around waiting for a single slow disk waving its head here, there,
and everywhere, all the CPU performance in Silicon Valley won't help.  The
importance of disk latency is more pronounced in a multi-user system
because you're more often servicing multiple requests at very different
places on the disk.
-- 
Dick Dunn      UUCP: {ncar,nbires}!ico!rcd           (303)449-2870
   ...A friend of the devil is a friend of mine.

cdold@starfish.Convergent.COM (Clarence Dold) (10/03/88)

From article <9902@ico.ISC.COM>, by rcd@ico.ISC.COM (Dick Dunn):
>> | One of my main interests is whether a fast (25MHz with cache etc) 386 with
>> | some users attached to it can compare itself with a VAX/750 or other minis.
> 
> Also, think very carefully about what sort of disk(s) you're going to put
This also harks back to the MAC - vs - PC discussion of a particular DMA
not being as fast as CPU data transfer.  A DMA activity does take some setup
time.  If the system (single-tasking) is going to be idle until the disk
transfer is complete, -and- the CPU is fast enough to handle disk data on the
fly, -then- CPU will be faster than DMA.
As soon as you talk multi-user, that argument goes away, because the CPU could
be working on a different process, while the DMA occurs offline.
Multi-user performance is distinctly different from single user.
Do buy the fastest disk you can.  Don't forget that with the proper controller,
two slower disks might actually be faster in a multi-user setup.
-- 
---
Clarence A Dold - cdold@starfish.Convergent.COM		(408) 435-5274
		...pyramid!ctnews!mitisft!professo!dold
		P.O.Box 6685, San Jose, CA 95150-6685

sl@van-bc.UUCP (pri=-10 Stuart Lynne) (10/03/88)

In article <736@starfish.Convergent.COM> cdold@starfish.Convergent.COM (Clarence Dold) writes:
>From article <9902@ico.ISC.COM>, by rcd@ico.ISC.COM (Dick Dunn):
>>> | One of my main interests is whether a fast (25MHz with cache etc) 386 with
>>> | some users attached to it can compare itself with a VAX/750 or other minis.
>> 
>> Also, think very carefully about what sort of disk(s) you're going to put
>This also harks back to the MAC - vs - PC discussion of a particular DMA
>not being as fast as CPU data transfer.  A DMA activity does take some setup
>time.  If the system (single-tasking) is going to be idle until the disk
>transfer is complete, -and- the CPU is fast enough to handle disk data on the
>fly, -then- CPU will be faster than DMA.
>As soon as you talk multi-user, that argument goes away, because the CPU could
>be working on a different process, while the DMA occurs offline.
>Multi-user performance is distinctly different from single user.
>Do buy the fastest disk you can.  Don't forget that with the proper controller,
>two slower disks might actually be faster in a multi-user setup.

Not neccesarily. If the DMA channel takes over the bus for the duration of
the transfer, or if each word transferred takes a a larger number of cycles
than the CPU would and the system can't interleave processor cycles; then
CPU is still a win.

In both of those situations the CPU can't perform as much work during the
DMA operation so it *may* be more efficent to allow the CPU to do it.

Well designed DMA systems get around this by allowing the DMA to operate in
an interleaved fashion with the CPU. In these systems you can sometimes beat
DMA with CPU but at the expense of burning CPU cycles and you may wish to
use DMA simply to allow more processing at the expense of increased data
transfer time.


-- 
Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl     Vancouver,BC,604-937-7532

johnl@ima.ima.isc.com (John R. Levine) (10/04/88)

In article <1901@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes:
>In article <736@starfish.Convergent.COM> cdold@starfish.Convergent.COM (Clarence Dold) writes:
>>This also harks back to the MAC - vs - PC discussion of a particular DMA
>>not being as fast as CPU data transfer.  ...
>>As soon as you talk multi-user, that argument goes away, because the CPU could
>>be working on a different process, while the DMA occurs offline.
>Not neccesarily. If the DMA channel takes over the bus for the duration of
>the transfer, or if each word transferred takes a a larger number of cycles
>than the CPU would and the system can't interleave processor cycles; then
>CPU is still a win.

On machines with a PC AT bus DMA is rarely a win. It takes so long to get and
release the bus that it's faster to buffer a chunk of disk in the controller,
then use a processor INS or OUTS instruction to blat the data at full speed.
The IBM hard disk controller does just this. Besides, in many cases the DMA
design is so marginal that multiple DMA devices plain don't work. A controller
with a full track buffer would probably be your best bet.

Or I suppose you could get a microchannel computer; my PS/2 has no trouble
DMA-ing full disk tracks in one revolution.
-- 
John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869
{ bbn | think | decvax | harvard | yale }!ima!johnl, Levine@YALE.something
Rome fell, Babylon fell, Scarsdale will have its turn.  -G. B. Shaw

jfh@rpp386.Dallas.TX.US (The Beach Bum) (10/04/88)

In article <1901@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes:
>Not neccesarily. If the DMA channel takes over the bus for the duration of
>the transfer, or if each word transferred takes a a larger number of cycles
>than the CPU would and the system can't interleave processor cycles; then
>CPU is still a win.

To put it very simply, if there are NO free bus cycles, then DMA may be
a loss.  Each DMA cycle is gained at the expense of a CPU cycle.
Presumably in this situtation the CPU is executing fewer cycles doing
useful work than it is accessing the bus - in other words, the system
is bus limited.  [ Consider the case where a CPU requires two cycles to
execute an instruction which required four cycles to fetch. ]

>In both of those situations the CPU can't perform as much work during the
>DMA operation so it *may* be more efficent to allow the CPU to do it.

My guess would be that it is at least no less efficient, modulo interrupt
and context switch overhead.  If the cost of fielding the presumed interrupt
and context switch is significant, then DMA is still a big win.

>Well designed DMA systems get around this by allowing the DMA to operate in
>an interleaved fashion with the CPU. In these systems you can sometimes beat
>DMA with CPU but at the expense of burning CPU cycles and you may wish to
>use DMA simply to allow more processing at the expense of increased data
>transfer time.

Well designed systems dual port their memories and have separate busses
for their I/O subsystem ;-)  Failing that, a dedicated region of I/O
memory which is seldom accessed by the CPU is another win.  Failing THAT
clever alternative, interleaved memory will do in a pinch.  I suppose the
point of all this is that a PC, no matter wether it is a PC, PC/XT, or
even a 386/AT is not designed to handle large does of DMA.

- John.
-- 
John F. Haugh II (jfh@rpp386.Dallas.TX.US)                   HASA, "S" Division

      "Why waste negative entropy on comments, when you could use the same
                   entropy to create bugs instead?" -- Steve Elias

keithe@tekgvs.GVS.TEK.COM (Keith Ericson) (10/05/88)

In article <2733@ima.ima.isc.com> johnl@ima.UUCP (John R. Levine) writes:
>
>On machines with a PC AT bus DMA is rarely a win. It takes so long to get and
>release the bus that it's faster to...

Corroboration:

We use Micom-Interlan NI5010 network interface boards around here.
Their instructions indicate NOT to use DMA on an AT-class machine
because a tight loop doing the data transfers is faster than the
AT's DMA...

keith

fyl@ssc.UUCP (Phil Hughes) (10/07/88)

In article <4032@tekgvs.GVS.TEK.COM>, keithe@tekgvs.GVS.TEK.COM (Keith Ericson) writes:
> In article <2733@ima.ima.isc.com> johnl@ima.UUCP (John R. Levine) writes:

> >On machines with a PC AT bus DMA is rarely a win. It takes so long to get and
> >release the bus that it's faster to...

> We use Micom-Interlan NI5010 network interface boards around here.
> Their instructions indicate NOT to use DMA on an AT-class machine
> because a tight loop doing the data transfers is faster than the
> AT's DMA...

I can't argue with this but personally I like to think I could use
the CPU to do something other than bus reads and writes (even though
some consider this about the limit for an intel chip :-) ).

I have worked on fast mainframes.  Yes, the CPU can do things fast but DMA
I/O was used to free up the CPU.

Anyway, is DMA just slower when I have an idle CPU to do the transfer or
is there something magic that makes the CPU useless while DMA is running?
-- 
Phil Hughes, SSC, Inc. P.O. Box 55549, Seattle, WA 98155  (206)FOR-UNIX
    uw-beaver!tikal!ssc!fyl or uunet!pilchuck!ssc!fyl or attmail!ssc!fyl

scotty@l5comp.UUCP (Scott Turner) (10/09/88)

In article <7498@rpp386.Dallas.TX.US> jfh@rpp386.Dallas.TX.US (The Beach Bum) writes:
>In article <1901@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes:

In the above articles both authors raise several valid issues in the seemingly
endless debate over CPU vs DMA hard disk I/O.

But since we are supposedly discussing Unix systems there are a few factors
not discussed so far that play into this discussion.

1. The discussion so far has focused on the impact of CPU/DMA on a single
task. Depending on system factors several cases have been made for the
impact on the single task, but not about impact on other tasks.

2. Most "serious" unix computers now have local SRAM caches to enable
them to run at clock rates their main silicon memory can't attain.

Under item 1 I submit that even if DMA slows down the CPU, AND the I/O
operation, the CPU is still free (assuming a properly written kernel) to work
upon another task. If the end result is that the task awaiting I/O completion
has to wait an extra period of time but some other task(s) get to execute,
the task that got to wait may loose but the system as a whole is going to win.

The formula "The needs of the many out weight the needs of the one" summarizes
this decision to use DMA quite nicely.

Under item 2 we may find that an increasing number of systems can allow DMA
operations to proceed without causing the CPU to hold for them very often.
In which case the many will benefit even more from DMA driven I/O.

I just hope AT386 designers keep this in mind when designing their caches.
(Compaq claims to have done so.)

Scott Turner
scotty@l5comp -or- uunet!l5comp!scotty

buck@siswat.UUCP (A. Lester Buck) (10/11/88)

In article <1488@ssc.UUCP>, fyl@ssc.UUCP (Phil Hughes) writes:
> In article <4032@tekgvs.GVS.TEK.COM>, keithe@tekgvs.GVS.TEK.COM (Keith Ericson) writes:
> > In article <2733@ima.ima.isc.com> johnl@ima.UUCP (John R. Levine) writes:
> 
> > >On machines with a PC AT bus DMA is rarely a win. It takes so long to get and
> > >release the bus that it's faster to...
> 
> > We use Micom-Interlan NI5010 network interface boards around here.
> > Their instructions indicate NOT to use DMA on an AT-class machine
> > because a tight loop doing the data transfers is faster than the
> > AT's DMA...
> 
> Anyway, is DMA just slower when I have an idle CPU to do the transfer or
> is there something magic that makes the CPU useless while DMA is running?

The PC BIOS uses DMA to do disk I/O because the DMA chip can do it
faster than the minimum loop of 8088 instructions.

The AT BIOS uses INS/OUTS instructions because they are (slightly) faster
than the equivalent DMA transfers.  As the 80186/88 hardware reference
states "The INS and OUTS instructions move a string of bytes or words
at bus bandwidth speed between memory and an I/O port.  This is
essentially a DMA transfer in one in-line instruction."
Of course, the AT BIOS has nothing better to be doing during the
transfer.

As far as the DMA subsystem capabilities, they are described in
the Intel data sheet for the 8237A (Microprocessor and Peripheral
Handbook, Vol. 1) and a few minor implementation details in the
AT Technical Reference.  From the data sheet, there are four
modes of DMA service: (1) single transfer, (2) block transfer,
(3) demand transfer, and (4) cascade mode.  On the AT,
channel 4 on the second 8237A cascades from the first chip.
In single transfer mode, the DMA chip transfers one byte, then
gives up the bus and must be restarted.  "In 8080A, 8085H, 8088,
or 8086 systems, this will ensure one full machine cycle
execution between DMA transfers."  The block mode and demand mode
differ only in that block mode needs the DREQ request line only
active at the beginning, while demand mode will save its state
and give up the bus when DREQ goes off, then pick up where it
left off when it is reasserted.  Both of these modes continue
transfering until an internal programmed count is exhausted.
For the demand mode, "Thus transfers may continue until the I/O
device has exhausted its data capacity.  After the I/O device
has had a chance to catch up, the DMA service is re-established
by means of a DREQ."

There are several ways to implement the DMA hardware on the I/O
device, and I don't have a broad experience in this area.  One
interesting card I own is an Emulex IB02 SCSI Host Adapter
for the PC (8-bit) bus.  (This card is not very popular and
a bit old and expensive, but it uses a good SCSI interface chip and is
very flexible with switches for DMA channels, DMA modes, IRQ line,
etc.)  The possible selections for DMA modes are: single byte,
4-byte demand, and 8-byte demand.  Hardware on the card gives
up the bus after that many bytes in the demand modes.  Of course,
the driver has to program the 8237A for the matching mode.
Recommended is single-byte for all PC's, 8-byte demand for AT's
running DOS, and 4-byte demand for AT multitasking OS's.  This allows
more CPU processing during DMA transfers and keeps interrupt latency
lower.

-- 
A. Lester Buck		...!uunet!nuchat!moray!siswat!buck

koll@ernie.NECAM.COM (Michael Goldman) (10/19/88)

As I was saying (I'm just getting used to posting) Any manufacturer
trying to run the DMA chip above 5 MHz risks frying the chip, and
part of the motherboard.  With all these cpus going along at 25 MHz
it is faster to use the cpu.  Before dumping on IBM for using such
a dumb chip, recall that the original PC came with a cassette port
and only 64K on the mother board.  Who needs DMA in that environment ?

This is one more reason to go to the new Microchannel architecture which
has good DMA support and very nice chips.  There are some other problems
with DMA on the PC. One is that DOS is not re-entrant and so you have
to VERRRY Carefully save the state with any program that uses interrupts
which is implicit in any reasonable application with DMA.  With all the
yo-yos trying to be the next Mitch Kapor, IBM wisely left out helping
anyone write DMA programs, for fear of having every one try to save a
few usecs and crashing DOS.  The string transfer assembly instructions
on the 80x86 are as fast as DMA anyway at comparable clock speeds.  IN
a no wait-state system there's no real advantage to DMA for single
threaded OS's like DOS, which is probably why IBM waited to have the
386 in a new bus with a new multi-threaded OS an new DMA chips.
So now one process can wait for a file transfer using DMA while another
process can execute.  This implies that the developers can intelligently
use the DMA chips (don't hold your breath - the operant philosophy
seems to be " If the PC is cheap then I don't have to pay the
programmers much either. " and we get what they pay for (I'm not
bitter, not ME !)).  Finally, recall that the 8088 was still
trying to maintain some compatibility with 8080s and a lot of the
support chips out there at the time hadn't caught up.  The 80386
is what Intel should have designed long ago if they had seen the
future, and now it has good support chips. (Not dumping on Intel,
hindsight is 20-20, and densities didn't allow much earlier.)

Regards,
Michael Goldman

brian@umbc3.UMD.EDU (Brian Cuthie) (10/21/88)

In article <168@ernie.NECAM.COM> koll@ernie.NECAM.COM (Michael Goldman) writes:
>
>As I was saying (I'm just getting used to posting) Any manufacturer
>trying to run the DMA chip above 5 MHz risks frying the chip, and
>part of the motherboard.  With all these cpus going along at 25 MHz
                                              ^^^^^^^^^^^^^^^ what !?

>it is faster to use the cpu.  Before dumping on IBM for using such
>a dumb chip, recall that the original PC came with a cassette port
>and only 64K on the mother board.  Who needs DMA in that environment ?
>
>This is one more reason to go to the new Microchannel architecture which
>has good DMA support and very nice chips.  There are some other problems
>with DMA on the PC. One is that DOS is not re-entrant and so you have
>to VERRRY Carefully save the state with any program that uses interrupts
>which is implicit in any reasonable application with DMA.  With all the

WHAT !?  DMA and interrupts are COMPLETELY UNRELATED.  DMA places the 
processor in a HOLD state while the transfer takes place.  This locks
out even interrupts.  There is ABSOLUTELY no necessity to save any context
while doing DMA.  Besides, I know what re-entrant instructions are (and
besides, they're "restartable instructions", but that's a different point), but
what the !%^%@ is a re-entrant operating system.  Can you name one ?? I bet
not.

>yo-yos trying to be the next Mitch Kapor, IBM wisely left out helping
>anyone write DMA programs, for fear of having every one try to save a
>few usecs and crashing DOS.  The string transfer assembly instructions
>on the 80x86 are as fast as DMA anyway at comparable clock speeds.  IN
>a no wait-state system there's no real advantage to DMA for single
>threaded OS's like DOS, which is probably why IBM waited to have the
>386 in a new bus with a new multi-threaded OS an new DMA chips.

The problem with DMA on the PC is simple.  DMA channel 0 is programmed to
periodically paw through RAM to effect a refresh.  Since NOTHING can
interrupt a DMA in progress (including another higher priority DMA request)
burst mode DMA transfers, which would be significantly faster than CPU
transfers could EVER be, would lock out the channel 0 refresh for too long.
Thus DMAs are limited to single byte transfers.  Since each byte then has to 
place the processor into a HOLD state, and this takes some time, it turns
out to be faster to do processor string moves.

Keep in mind that DMA is significantly faster than CPU transfers, even with
caching, because the DMA chip places the memory address on the bus and then 
asserts the READ or WRITE line while simultaniously asserting the DMA ACK line.
Since the peripheral requesting DMA is well aware of who he/she is and knows 
that if the memory WRITE line is asserted it must be a peripheral READ (and
vice versa) the transfer takes place in exactly ONE memory cycle.  Observe
that this would be twice as fast as the CPU since it requires, at best, one
cycle to read the byte from the peripheral and one cycle to write it
to memory.  Of course the above argument holds for 16 or 32 bit words also,
so long as the memory, peripheral and DMA controller are all willing
to participate.

>So now one process can wait for a file transfer using DMA while another
>process can execute.  This implies that the developers can intelligently

Well this sounds better than it often is, since the CPU must sit by and
wait for the DMA to complete anyway.

>use the DMA chips (don't hold your breath - the operant philosophy
>seems to be " If the PC is cheap then I don't have to pay the
>programmers much either. " and we get what they pay for (I'm not
>bitter, not ME !)).  Finally, recall that the 8088 was still
>trying to maintain some compatibility with 8080s and a lot of the
>support chips out there at the time hadn't caught up.  The 80386
>is what Intel should have designed long ago if they had seen the
>future, and now it has good support chips. (Not dumping on Intel,

Intel would have been more than happy to have designed the 80386 years
ago (and in fact that's when they started the design) had the technology
been affordable.   What do you think has kept the 80486 so long.  It hasn't
been a lack of market demand.

>hindsight is 20-20, and densities didn't allow much earlier.)

Bingo

>
>Regards,
>Michael Goldman


Brian Cuthie
Consultant
Columbia, MD 21046
(301) 381 - 1718

Internet:	brian@umbc3.umd.edu
Usenet:		...uunet!umbc3!cbw1!brian

koll@ernie.NECAM.COM (Michael Goldman) (10/21/88)

This is a reply to points Brian Cuthrie made on my reply to some
questions about DMA.   

We seem to have a different background - I'm more software and you seem
to be more hardware-oriented.  I yield (in general) to your hardware
knowledge so perhaps you can educate me out of some ideas I have picked
up over the years.

Points in order:
   1.  I have seen 80386's advertised at 25 MHz so I don't know why
you underscore that with a question ?  Since they are all non-IBM
clones, I have presumed they used the basic AT motherboard with the
same DMA chip from 1981, which I have been told by someone who makes
PC I/O boards (designs and builds) for a living have never been
built for over 5 MHz since the design isn't worth continuing.      
   2.  My understanding of why I as a software engineer use DMA is
that they are useful to make I/O a non-blocking process.  Eg, I want
to dump some stuff to tape but I don't want to halt my program to wait
for it to complete.  So, I set up the DMA to some memory-mapped I/O
board and then go on with other processing.  The DMA process takes
EVERY OTHER CYCLE ( which the cpu often can't use) while the cpu
continues on with other work.  When the I/O board's buffer is full, it
refuses further DMA transfers (or else I've set up the DMA to
transfer only N bytes, depending on the architecture).  The DMA
chip generates an interrupt informing me that the transfer is completed
 and I either restart it if I want more transfers, or do something based
on my knowledge that my I/O is done.  In practice, one program usually
has to block on I/O anyway so that is a good time for the OS (like UNIX)
to suspend it and give the cpu to another program while the DMA runs on
EVERY OTHER CYCLE.  The way one does this is usually based on interrupts
from the DMA or I/O board.
   3.  My understanding of the reason DMA isn't used much on the PC is
that to get pass-through mode going on the PC's DMA requires that it
do one read, transfer the byte to it's internal write buffer, and then
write it to memory.  Thus 2 cycles which is why I said the string moves
were as fast, since the 8088 cpu has internal look-ahead cache of
several bytes and so can overlap read and write somewhat.  Since the
I/O pins are multiplexed  this may not mean much in practice.
   4.  We have some confusion about the word re-entrancy.
My definition of it applies only to programs, not to individual hardware
instructions.  Here's an example of what I mean.  A utility (say a
terminal I/O routine) is used by a lot of other programs.  Rather than
make a copy of it for each user, or block all users but one from using
it at a time, the utility is made re-entrant.  That is program A enters
it, writes a byte to a terminal and while it is waiting for the byte
to go out the port, the OS allows program B to enter the utility.
Obviously, B can't be allowed to trash the registers, I/O address, etc.,
of A, so the utility (or OS) saves the state of A in some buffer, allows
B to process long enough to do something useful (based on time, or I/O)
and then saves B's registers, etc., restoring A's registers etc., to
allow A to continue sending, or receiving.  The reason interrupts
figure in all this is that usually (but not necessarily) this
re-entrancy is provided at intervals signaled by a clock interrupt,
or an I/O interrupt.  DOS was not written to provide re-entrancy, so to
provide it oneself, one must save the surrent segment registers in a
very precise and careful way, reset the stack and other segment
registers in an equally careful way (the DOS stack will only work
for you 95% of the time) and then reverse the process upon exiting.
The first time I did this, it took me many late night weeks, but after
the first time it takes a day at worst.
   5. The 8086 came out the same year as the 68000.  Both companies had
the same technologies to work with but Motorola chose a linear address
space, and general purpose address and data registers.  Programs written
for the 68000 can run on a 68030 without change or impingeing on a
program designed for a 68030.  To run an 8086 program on an 80286 (which
does not have a linear address space) requires major effort and you
can only run one 8086 program with 80286 programs running in protected
mode.  Motorola had to implement some hardware instructions in software
traps for a little while, but a software engineer didn't have to worry
about it and life was good and stayed good.  The 80286 was Intel's idea
of what to do next which shows that they hadn't picked up on the linear
address space and general purpose registers until later.  there's
usually a trade-off in micro-code between time and space, so they
might have tried for more cycles but cleaner instruction set.

Regards,
Michael Goldman

wes@obie.UUCP (Barnacle Wes) (10/24/88)

In article <1279@umbc3.UMD.EDU>, brian@umbc3.UMD.EDU (Brian Cuthie) writes:
> ....  Besides, I know what re-entrant instructions are (and
> besides, they're "restartable instructions", but that's a different point),

OK, what are re-entrant instructions?  I know what re-startable
instructions are, and they have NOTHING to do with re-entrant CODE!

> but what the !%^%@ is a re-entrant operating system.

A re-entrant operating system would be one that is implemented with all
of the system calls being re-entrant.  This means, for instance, that
while one task has blocked on a write() call that must wait for the i/o
operation to complete, another task may call write() without crashing
the system.  MS-DOS is not re-entrant; if you have two calls to a DOS
system call active at once, the system will not be able to return to the
first program that made the call.  This is why networks for the IBM PC
have to replace most of the operating system (on the server, at least) -
the server must be able to open files, etc., for more than one program
at a time.

> Can you name one??  I bet not.

Sure: Unix.  You loose.  Should I name more?  Minix.  VAX/VMS.  RT-11,
RSTS/E, RSX-11 for the PDP-11.  Turbodos and MP/M, to go back a ways in
the micro world.  OS-9.  OS/2, for that matter.  Had enough?

> The problem with DMA on the PC is simple.  DMA channel 0 is programmed to
> periodically paw through RAM to effect a refresh.

This, of course, is not true on the PC/AT, which has dedicated memory
refresh hardware.

> Keep in mind that DMA is significantly faster than CPU transfers, even with
> caching, because the DMA chip places the memory address on the bus and then 
> asserts the READ or WRITE line while simultaniously asserting the DMA ACK
> line.

This also is not true on the PC/AT.  Quoting from "The IBM PC From the
Inside Out," Sargent & Shoemaker, p. 247:

	"The AT has dedicated DRAM refresh circuitry, which frees up DMA
	channel 0 for general use.  In fact, you can use channels 0 and
	1 to block move data within one 64-kilobyte RAM area while the
	80286 does something else.  However, since the 80286 moves blocks
	faster and can work with 16 bits at a time, this is not
	particularly useful.  In fact, the 80286 has string I/O
	instructions (rep insw and rep outsw) that transfer data between
	RAM and I/O faster than the 8237A's can, and the AT uses this
	feature to transfer 512-bytes sectors to and from the hard disk
	controller."

> Since the peripheral requesting DMA is well aware of who he/she is and knows 
> that if the memory WRITE line is asserted it must be a peripheral READ (and
> vice versa) the transfer takes place in exactly ONE memory cycle.

Wrong again.  Again quoting from Sargent & Shoemaker, p. 244:

	"The initial byte transfer takes place in five clock periods,
	but subsequent transfers occur in three period (630 nanoseconds
	on the PC)."

> Observe
> that this would be twice as fast as the CPU since it requires, at best, one
> cycle to read the byte from the peripheral and one cycle to write it
> to memory.  Of course the above argument holds for 16 or 32 bit words also,
> so long as the memory, peripheral and DMA controller are all willing
> to participate.

This, of course, does not apply to the '286 and '386 INS and OUTS
instructions.  These instructions move {bytes,words} between memory and
i/o ports in 5 cycles/{byte,word}.  The big performance win with this is
that the INS and OUTS instructions run at the processor speed, while the
8237 is restricted to a 5 Mhz clock speed.

Mr. Cuthie, you do not seem to be very knowledgable about the PC
architecture.  Of course DMA sounds more "high performance" but the DMA
controller on the 286 and 286-based AT clones on the market right now is
pretty much useless, due to the incredibly slow speed it operates at.
Perhaps you should study more before taking somebody to task in a public
posting?!?

	Wes Peters

-- 
Copyright 1988 Wesley R. Peters.  Permission is granted to distribute this work
in its entirety as long as it is not modified in any way, and this copyright
remains intact.  No rights other than those expressed here are granted.
	"How do you make the boat go when there's no wind?"  -- Me