[net.arch] Timing loops

gnu@hoptoad.uucp (John Gilmore) (02/16/86)

In article <156@motatl.UUCP>, wayne@motatl.UUCP (R.W.McGee) writes:
> The use of software timing loops on an asyncronous
> microprocessor should be discouraged...Public floggings would provide
> a cure, but would be hard to implement.

People who design microprocessors, who don't want software to depend
on the timings of individual instructions in particular systems,
should provide a system-independent way to delay for a specified
amount of time.  We use whatever you give us, guys!

E.g. in meeting the recovery time of a particularly good USART chip
with a horrible bus interface, the Z8530, you need to wait 2.2us
between writes to it.  Give me a good way to wait 2.2us *without*
depending on instruction timing, and I'll consider your request.

PS:  if your answer is "add more chips", a lot of people will cheap
out and use "free" software timing loops.
-- 
John Gilmore  {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu   jgilmore@lll-crg.arpa

mat@amdahl.UUCP (Mike Taylor) (02/17/86)

In article <530@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
> E.g. in meeting the recovery time of a particularly good USART chip
> with a horrible bus interface, the Z8530, you need to wait 2.2us
> between writes to it.  Give me a good way to wait 2.2us *without*
> depending on instruction timing, and I'll consider your request.

Well, on a *real* computer, you just set the TOD clock comparator for
now+2.2 us. and go do something useful while you wait.  Sorry, couldn't
resist.

-- 
Mike Taylor                        ...!{ihnp4,hplabs,amd,sun}!amdahl!mat

[ This may not reflect my opinion, let alone anyone else's.  ]

phil@amdcad.UUCP (Phil Ngai) (02/17/86)

In article <530@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>E.g. in meeting the recovery time of a particularly good USART chip
>with a horrible bus interface, the Z8530, you need to wait 2.2us
>between writes to it.  Give me a good way to wait 2.2us *without*
>depending on instruction timing, and I'll consider your request.

In a design I did with the 8530, the device selection logic made all
8530 cycles about 3 uS long with wait states. For the first 2.2 uS of
the cycle, the 8530 was actually not being accessed.  This guaranteed
the cycle recovery time needed. I had to use a PAL state machine to
assure another parameter (address set up time) and so this cycle
recovery time didn't cost anything extra, except the time it took me
to think it up.

I must admit part of my motivation for doing this was nightmares I had
of obscure bugs showing up because the programmer didn't bother to
read the specs carefully and violating the cycle recovery time.  (or
not even understanding what cycle recovery time was) In this example,
it was possible to idiot proof the hardware at no incremental cost.  I
imagine it is possible to come up with cases where it does cost more
but in my experience a sufficiently innovative design engineer can do
it at no or very low cost (extra pin on PAL).
-- 
 Real men don't have answering machines.

 Phil Ngai +1 408 749 5720
 UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!phil
 ARPA: amdcad!phil@decwrl.dec.com

davet@oakhill.UUCP (Dave Trissel) (02/17/86)

In article <530@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>
>People who design microprocessors, who don't want software to depend
>on the timings of individual instructions in particular systems,
>should provide a system-independent way to delay for a specified
>amount of time.  We use whatever you give us, guys!
>

I grew up with the early IBM 360 and it's built-in interval timer.  Later
models and the 370's had a time of day clock as well.  I have often
yearned to have the same common system-independent timing facilities
in micros as well.  But how do you accomplish that without forcing
every system designer to hook up a constant frequency clock to every
microprocessor in the family?

Of course, the problem is that the basic clock frequency driving the chip
is variable depending on the system.  If we implemented on-chip a clock
or timer register from where would it derive its frequency?  Having an
"adjust divisor" register setup by the system to factor the system clock
would just push the problem right back into the hands of the O.S. coders
where it is now since code somewhere would have to then setup the proper
divisor.

We currently have one customer running the MC68020 at 15 Meghertz, so we can't
assume that the "standard" test frequencies of 12.5 and 16.6666 will be used.
Customers are already preparing for the 20 and 25 Megahertz versions but
there is no way to know now what their exact frequencies will end up.

If you or anyone else has any suggestions on how to do this give a yell.

  --  Dave Trissel  Motorola Semiconductor, Austin, Texas
	{ihnp4,seismo}!ut-sally!oakhill!davet
[Sorry, BITNET, ARPANET etc. will not work as destinations from our mailer.]

bmw@aesat.UUCP (Bruce Walker) (02/17/86)

>.... in meeting the recovery time of a particularly good USART chip
>with a horrible bus interface, the Z8530, you need to wait 2.2us
>between writes to it.  Give me a good way to wait 2.2us *without*
>depending on instruction timing, and I'll consider your request.
>
>PS:  if your answer is "add more chips", a lot of people will cheap
>out and use "free" software timing loops.
>-- 
>John Gilmore  {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu   jgilmore@lll-crg.arpa

You must be clocking your 8530 at 3 MHz.  The spec for Valid Access
Recovery Time is 6TcPC+200 (nS (130 for the 'A' part)) where TcPC is
the bus clock cycle time.  At 4MHz you should wait a minimum of 1.7uS
and at 6MHz you only need to wait 1.2uS.

The kind of people that "cheap out" are the kind of people that cripple
their machines in a multitude of other subtle ways which are only
appropriate for closed-architecture "games machines".  Designers who
are creating machines with a future growth path would put in the extra
hardware (which only amounts to a small, registered PAL anyway).

Bruce Walker     {allegra,ihnp4,linus,decvax}!utzoo!aesat!bmw

"I'd feel a lot worse if I wasn't so heavily sedated." -- Spinal Tap

jack@boring.UUCP (02/17/86)

>
>E.g. in meeting the recovery time of a particularly good USART chip
>with a horrible bus interface, the Z8530, you need to wait 2.2us
>between writes to it.  Give me a good way to wait 2.2us *without*
>depending on instruction timing, and I'll consider your request.

Sorry, but this bad makes it a particularly *bad* USART chip, regardless
of any other features.
Imagine writing a device driver for it, finding out that the C compiler
generates such code that there's far more than 2.2us between writes, and
leaving the place. Then, two years later, the site gets a new C compiler
with a much better optimizer..........
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.

nather@utastro.UUCP (Ed Nather) (02/17/86)

In article <647@oakhill.UUCP>, davet@oakhill.UUCP (Dave Trissel) writes:
> Of course, the problem is that the basic clock frequency driving the chip
> is variable depending on the system.  If we implemented on-chip a clock
> or timer register from where would it derive its frequency?  Having an
> "adjust divisor" register setup by the system to factor the system clock
> would just push the problem right back into the hands of the O.S. coders
> where it is now since code somewhere would have to then setup the proper
> divisor.
> 
> We currently have one customer running the MC68020 at 15 Meghertz, so we can't
> assume that the "standard" test frequencies of 12.5 and 16.6666 will be used.
> Customers are already preparing for the 20 and 25 Megahertz versions but
> there is no way to know now what their exact frequencies will end up.
> 
> If you or anyone else has any suggestions on how to do this give a yell.

Some years ago I was faced with the problem of "upgrading" to a faster mini
and wanted to use the same program for the "old" and "new" ones.  They were
enough different internally to require code to identify which was which, and
adapt accordingly.

I used a counting loop (once, on program start-up) to see whether the program
was running in the fast or slow machine, by checking to see how far it got in
a known amount of time.  In that case, I used an attached teletype machine
as a timer, since it took about 0.1 sec to print a character, and I watched
its "busy" flag in the counting loop.

I'm not proposing to put a TTY on a chip alongside the CPU (I doubt you can
do that ...) but rather a simple, independent (and not very accurate) timer
whose sole job would be to find out how fast the CPU clock is running.
Simple software could then set the proper value into an adjustable count-
down divider so a built-in timer, running off the divided CPU frequency,
would be practical.  The built-in timer need only be accurate enough to
choose among a set of (quantized) clock frequencies.


-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{allegra,ihnp4}!{noao,ut-sally}!utastro!nather
nather@astro.UTEXAS.EDU

campbell@sauron.UUCP (Mark Campbell) (02/17/86)

In article <530@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>In article <156@motatl.UUCP>, wayne@motatl.UUCP (R.W.McGee) writes:
>> The use of software timing loops on an asyncronous
>> microprocessor should be discouraged...Public floggings would provide
>> a cure, but would be hard to implement.
>
>People who design microprocessors, who don't want software to depend
>on the timings of individual instructions in particular systems,
>should provide a system-independent way to delay for a specified
>amount of time.  We use whatever you give us, guys!
>
>E.g. in meeting the recovery time of a particularly good USART chip
>with a horrible bus interface, the Z8530, you need to wait 2.2us
>between writes to it.  Give me a good way to wait 2.2us *without*
>depending on instruction timing, and I'll consider your request.
>
>PS:  if your answer is "add more chips", a lot of people will cheap
>out and use "free" software timing loops.
>-- 
>John Gilmore  {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu   jgilmore@lll-crg.arpa

I was going to leave this one alone until I heard the H/W developers on the
other side of the wall giggling about it.

NCR has some damned good H/W engineers; and some of the best of these work in
my division.  These guys are so good that they actually let the software drive
the architecture of a machine (what I consider the theoretical ideal, which is
seldom obtained in the "real-world").  Of course, the term "drive" implies a
high level of cooperation; however these guys really listen and make an
effort to support those hardware features that we want.  Unfortunately, there
are often constraints that cause us to miss seeing eye to eye.  As an example...

Recently, we began work on a new machine.  The very first thing the H/W guys
did was obtain a copy of "All the Chips that Fit" (by Lyon and Skudlarek, of Sun)
and proclaim that we wouldn't make the mistakes that Sun made.  The major premise
of the paper was that there were many chips that were on unfriendly terms with
Unix; and that these chips caused a great deal of pain to a Unix implementation.

Unfortunately, management then stepped in and gave us an unit price that was
terrifyingly low.  At the next review of the H/W, we suddenly found that we
were getting many of those chips, or clones of those chips, that were specifically
mentioned in the paper.  We screamed, they screamed, etc.  After digging through
the manuals, however, we found that there was very little that could be done given
their stringent constraints.  A great example was our specification of what we
now call "the mythical 32-bit, low-powered CMOS, battery-backed binary counter".
I keep getting told that a certain BCD TOD chip is really fast.  That doesn't do
me a whole hell of a lot of good when I have to use 2 or 3 pages of conversion code
to support it.

The one thing we did insist upon, though, was glue logic to support those chips
that had timing problems.  Mr. Gilmore stated that microprocessor designers should
design system independed ways of dealing with delays.  What should really happen
is that the chip manufacturers incorporate the delay logic within those chips.
The use of the term "particularly good" when referring to a device with a major
flaw such as this is a non sequiteur.  Software delay loops not only cause poor
performance (due to race conditions, interrupt latency, etc.) during porting but
usually come back to haunt you a year or two later, when you switch to cheaper
alternative sources for the devices.
-- 

Mark Campbell    Phone: (803)-791-6697     E-Mail: !ncsu!ncrcae!sauron!campbell

cmt@myrias.UUCP (Chris Thomson) (02/18/86)

In article <2795@amdahl.uucp> Mike Taylor writes:
> Well, on a *real* computer, you just set the TOD clock comparator for
> now+2.2 us. and go do something useful while you wait.  Sorry, couldn't
> resist.

C'mon Mike!  Even a 5860 takes >5 us to context switch (twice).

jimb@amdcad.UUCP (Jim Budler) (02/19/86)

In article <6780@boring.UUCP> jack@mcvax.UUCP (Jack Jansen) writes:
>...
>Sorry, but this bad makes it a particularly *bad* USART chip, regardless
>of any other features.
>Imagine writing a device driver for it, finding out that the C compiler
>generates such code that there's far more than 2.2us between writes, and
>leaving the place. Then, two years later, the site gets a new C compiler
>with a much better optimizer..........

Sorry, but this sounds like a bad device driver to me, not a bad device.
Depending on a poor C compiler for timing is just as bad as depending
on any other non-portable 'feature' of a C compiler.  The device driver 
could be broken by a new C compiler in any of a few thousand other
ways.
-- 
 Jim Budler
 Advanced Micro Devices, Inc.
 (408) 749-5806
 Usenet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amdcad!jimb
 Compuserve:	72415,1200

gnu@hoptoad.uucp (John Gilmore) (02/19/86)

In article <2795@amdahl.UUCP>, mat@amdahl.UUCP (Mike Taylor) writes:
> In article <530@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
> >                        Give me a good way to wait 2.2us *without*
> > depending on instruction timing, and I'll consider your request.
>
> Well, on a *real* computer, you just set the TOD clock comparator for
> now+2.2 us. and go do something useful while you wait.  Sorry, couldn't
> resist.

I didn't think you could do anything useful in 100 System/370
instructions anyway.  In fact, it probably takes more than that to set
the clock comparator (timer queues ya know).  Sorry, didn't resist.

Something like the System/370 TOD clock and comparator is the kind of
facility I was talking about, though: a standard, high precision clock
that doesn't change regardless of what system model you have.
-- 
John Gilmore  {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu   jgilmore@lll-crg.arpa

rshepherd@euroies.UUCP (Roger Shepherd INMOS) (02/20/86)

Have a look at the transputer. Apart from a bug in REV A devices, all
transputers (no matter what speed selection) run of a standard (5 or 25 Mhz
) clock frequency. This is used to derive the standard comms link
speed and the processor clock. The transputer's real time
clock/alarm runs at 1 tick per micro-second at high priority,
this means that the occam program below will look at (eg) a uart
once every 5 uS. The accuracy achievable is quite good as the iunterrupt
latency (time to go from low to high priority) is about 58 cycles worst 
case (2.9 uS for a T414-20 - currently available parts are -12 so are
have 4.64 uS latency). 

PRI PAR
  SEQ 
    ... initialisation
    WHILE polling
      SEQ
        TIME ? AFTER nextinstance
        ... poll uart or whatever
        nextinstance := nextinstance + 5 -- 1 tick per uS
  ... -- rest of system at low priority
-- 
Roger Shepherd, INMOS Ltd, Whitefriars, Lewins Mead, Bristol, BS1 2NP, UK
Tel: +44 272 290861
UUCP: ...!mcvax!euroies!rshepherd

mat@amdahl.UUCP (Mike Taylor) (02/20/86)

In article <221@myrias.UUCP>, cmt@myrias.UUCP (Chris Thomson) writes:
> In article <2795@amdahl.uucp> Mike Taylor writes:
> > Well, on a *real* computer, you just set the TOD clock comparator for
> > now+2.2 us. and go do something useful while you wait.  Sorry, couldn't
> > resist.
> 
> C'mon Mike!  Even a 5860 takes >5 us to context switch (twice).

Yes, but context switching isn't useful! Actually, just trying to point
out that for timing like 2.2 us., regardless of how good your timing
facility is (S/370 architecturally has 244 picosecond resolution), you can't
ignore instruction timing.  Fielding the external interrupt when the
clock comparator "hits," even without a full context switch, will take quite
a few cycles to save registers, etc. before being able to do any useful
work.  What is even worse is the more or less unpredictable timing
delays due to cache effects, (consistency, misses) to say nothing of
EC level changes (a cycle here or there...)  It is probably true to
say that you can't usefully time anything to a resolution better
than plus or minus 20 cycles (300 ns.) even on a machine which has good
timing facilities.
-- 
Mike Taylor                        ...!{ihnp4,hplabs,amd,sun}!amdahl!mat

[ This may not reflect my opinion, let alone anyone else's.  ]

andrew@aimmi.UUCP (Andrew Stewart) (02/21/86)

In article <2795@amdahl.UUCP> mat@amdahl.UUCP writes:

>In article <530@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
>> E.g. in meeting the recovery time of a particularly good USART chip
>> with a horrible bus interface, the Z8530, you need to wait 2.2us
>> between writes to it.  Give me a good way to wait 2.2us *without*
>> depending on instruction timing, and I'll consider your request.
>
>Well, on a *real* computer, you just set the TOD clock comparator for
>now+2.2 us. and go do something useful while you wait.  Sorry, couldn't
>resist.
>

On a *real* computer, you use the front panel keys. USARTS??? TOD clock???
Ha! (Remember the PDP-8, only bigger and better...)

	Andrew Stewart
-- 
-------------------------------------------
Andrew Stewart		 USENET:   ...!mcvax!ukc!aimmi!andrew

"My axioms just fell into a Klein bottle"

farren@well.UUCP (Mike Farren) (02/22/86)

In article <2817@amdahl.UUCP> mat@amdahl.UUCP (Mike Taylor) writes:
>(S/370 architecturally has 244 picosecond resolution)

   I admit to knowing little about the S/370, but a 4 GHz clock rate?
Can someone verify this, please?   I don't remember seeing any microwave
plumbing in a 370... :-)

-- 
           Mike Farren
           uucp: {your favorite backbone site}!hplabs!well!farren
           Fido: Sci-Fido, Fidonode 125/84, (415)655-0667

ka@hropus.UUCP (Kenneth Almquist) (02/22/86)

>> Sorry, but this bad makes it a particularly *bad* USART chip, regardless
>> of any other features.
>> Imagine writing a device driver for it, finding out that the C compiler
>> generates such code that there's far more than 2.2us between writes, and
>> leaving the place. Then, two years later, the site gets a new C compiler
>> with a much better optimizer..........	[JACK JANSEN]
>
> Sorry, but this sounds like a bad device driver to me, not a bad device.
> Depending on a poor C compiler for timing is just as bad as depending
> on any other non-portable 'feature' of a C compiler.  The device driver 
> could be broken by a new C compiler in any of a few thousand other
> ways.						[JIM BUDLER]

I don't like the idea of depending upon the C compiler for timing, but
what is the alternative?  Write the specific parts of the device driver
in assembly language?  This introduces maintainance problems of its own.

You frequently have to depend upon non-portable features of C when writing
device driver code, but of course device drivers are non-portable anyway.
				Kenneth Almquist
				ihnp4!houxm!hropus!ka	(official name)
				ihnp4!opus!ka		(shorter path)

jack@boring.uucp (Jack Jansen) (02/22/86)

In article <9645@amdcad.UUCP> jimb@amdcad.UUCP (Jim Budler) writes:
>In article <6780@boring.UUCP> I wrote:
>>...
>>Sorry, but this bad makes it a particularly *bad* USART chip, regardless
>>of any other features.
>>Imagine writing a device driver for it, finding out that the C compiler
>>generates such code that there's far more than 2.2us between writes, and
>>leaving the place. Then, two years later, the site gets a new C compiler
>>with a much better optimizer..........
>
>Sorry, but this sounds like a bad device driver to me, not a bad device.
>Depending on a poor C compiler for timing is just as bad as depending
>on any other non-portable 'feature' of a C compiler.  The device driver 
>could be broken by a new C compiler in any of a few thousand other
>ways.

The point here is that, even if you notice the small print in
the datasheet, you look at your driver and say "oh, there's
more than enough time in between writes", and forget about the
whole timing constraint in five minutes.

You're right that the driver could be broken in thousands of ways
by a new compiler, but this is *not* due to the driver, it is
due to the *device*.

There is no way you'll write a device driver that is guaranteed
to work with any C compiler, if you have to take care of timing
considerations (unless you're willing to pay the penalty of using
a *real* timer, of course).
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.

jimb@amdcad.UUCP (Jim Budler) (02/22/86)

In article <295@hropus.UUCP> ka@hropus.UUCP (Kenneth Almquist) writes:
>>> Sorry, but this bad makes it a particularly *bad* USART chip, regardless
>>> of any other features.
>>> Imagine writing a device driver for it, finding out that the C compiler
>>> generates such code that there's far more than 2.2us between writes, and
>>> leaving the place. Then, two years later, the site gets a new C compiler
>>> with a much better optimizer..........	[JACK JANSEN]
>>
>> Sorry, but this sounds like a bad device driver to me, not a bad device.
>> Depending on a poor C compiler for timing is just as bad as depending
>> on any other non-portable 'feature' of a C compiler.  The device driver 
>> could be broken by a new C compiler in any of a few thousand other
>> ways.						[JIM BUDLER]
>
>I don't like the idea of depending upon the C compiler for timing, but
>what is the alternative?  Write the specific parts of the device driver
>in assembly language?  This introduces maintainance problems of its own.
>
>You frequently have to depend upon non-portable features of C when writing
>device driver code, but of course device drivers are non-portable anyway.

I guess I wasn't quite clear.  If whatever code you generated cannot
guarantee 2.2uS when run through the optimizer then you cannot say that
you have written a good device driver. I got another flame from somewhere
asking me how I thought it should be done without timing loops or
additional hardware. I didn't say not to use a timing loop, but if
you are going to do software timing DO software timing. i.e. put some real
code in there to GUARANTEE whatever time you want.

And yes, in a situation like this, trying to guarantee 2.2uS, I think
a short piece of assembly code to wait 3uS IS the answer. And how much
of a maintenance problem can 5 or 6 lines of assembly code be.  I've
seen many cases like the Vax _doprint in the Berkeley code and a few other
pieces of code with a couple of lines of in line assembly code in them.
-- 
 Jim Budler
 Advanced Micro Devices, Inc.
 (408) 749-5806
 Usenet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amdcad!jimb
 Compuserve:	72415,1200

davet@oakhill.UUCP (Dave Trissel) (02/23/86)

In article <689@well.UUCP> farren@well.UUCP (Mike Farren) writes:

>>(S/370 architecturally has 244 picosecond resolution)
>
>   I admit to knowing little about the S/370, but a 4 GHz clock rate?
>Can someone verify this, please?   I don't remember seeing any microwave
>plumbing in a 370... :-)

Back when I was working with 370's as a systems programmer the time of day
(TOD) clock systems guaranteed that the resolution was greater than the
shortest possible instruction time.  In other words, you would always
get a unique value from the TOD clock even if you read it with back to back
store clock instructions.

What this indicated was that the clock resolution depended on the machine
model of 370.  The bottom line 370s (370/25 if I remember correctly) were
so slow that a clock frequency of several microseconds would have sufficed.

  --  Dave Trissel  Motorola Austin
  {seismo,ihnp4}!ut-sally!im4u!oakhill!davet

cmt@myrias.UUCP (Chris Thomson) (02/24/86)

> >(S/370 architecturally has 244 picosecond resolution)
>    I admit to knowing little about the S/370, but a 4 GHz clock rate?

The 370 architecture has timers that are 64 bits wide, with the 12th bit
from the right (low order) end being 1 microsecond.  It is model-dependent
how many of the low-order 12 bits actually count, as opposed to holding zero
values.  However, the timer resolution should be similar to instruction
execution time, since the instruction store time of day is required to give
a different answer each time it is used, even on a multiple-CPU
configuration.  Current high-end 370 models have resolutions of a few
nanoseconds.

aglew@ccvaxa.UUCP (02/24/86)

>/* Written 11:14 pm  Feb 22, 1986 by mjs@sfsup.UUCP in ccvaxa:net.arch */
>As a kernel hacker, I would maintain that a device that requires a
>certain latency and neither rejects further commands nor signals an
>iterrupt until it's ready is a botch.  Why patch software when the
>hardware CAN do it right?  Software is not the answer to hardware
>designer ineptitude.  Even if it has to be done at the board level,
>the proper choice is to add the hardware to disable access to the
>device until its latency period is over.

As an apprentice kernel hacker (well, not quite apprentice - I'm learning by 
doing) and an aspiring hardware designer, I respond that those nice features
you want in your devices are probably provided by firmware, which is a lot
cheaper than extra hardware, and that this firmware has to be programmed in
some language. You don't want to condemn firmware programmers to always 
working in assembly, do you?

I agree that devices interfacing to a large, multiuser, UNIX system should
be well behaved, but you don't necessarily want to pay that price in small
systems - hell, on small systems you can't talk so blithely about the board
level, boards are damned expensive. And even on large systems, there are 
devices that respond quickly enough, and for which you cannot afford the
extra delay provided by hardware lockouts, that direct control is necessary.

Andy "Krazy" Glew. Gould CSD-Urbana. 
USEnet: ...!ihnp4!uiucdcs!ccvaxa!aglew
ARPAnet: aglew@gswd-vms

jer@peora.UUCP (J. Eric Roskos) (02/24/86)

>    I admit to knowing little about the S/370, but a 4 GHz clock rate?
> Can someone verify this, please?   I don't remember seeing any microwave
> plumbing in a 370... :-)

Actually this involves a really interesting aspect of the nature of "time"
on a computer.  Suppose you have (hopefully without loss of generality :-))
a machine all of whose instructions take the same amount of time to execute.
Suppose it can execute 32,768 instructions per second.  Now, suppose you
have a clock that counts in 1/65536ths of a second.  Then, as far as you
are concerned, it's impossible to tell the clock is running that fast...
depending on when you began execution relative to the counter in the clock,
the low-order bit will always be a zero or one every time you look at it.

Although it's possibly not as obvious, the same thing happens if you don't
have such "round" numbers... if the timer is counting faster than your
CPU's basic cycle time (and if the CPU runs with a fixed-rate clock) then
the timer's counter will appear to be being incremented by some constant
value, and there's no way to tell it's going faster than that.

So, you can replace the faster timer with a much slower one that increments
its counter by this integer, and no one will be able to tell the difference.
Of course, this assumes that the timer doesn't do anything else, e.g.,
control some external devices which rely on the faster clock rate.

Actually this generalizes to an even more interesting idea, viz., that if
a CPU doesn't have any kind of reliable external clock to measure time
against, then if you stop the CPU's clock occasionally, or make it run
irregularly, the CPU's "idea" of time will be such that events external to
it that are happening at a constant rate will appear to the CPU to be
occurring irregularly.  So you get to experiment a little with the
relativistic nature of time this way.
-- 
UUCP: Ofc:  jer@peora.UUCP  Home: jer@jerpc.CCUR.UUCP  CCUR DNS: peora, pesnta
  US Mail:  MS 795; CONCURRENT Computer Corp. SDC; (A Perkin-Elmer Company)
	    2486 Sand Lake Road, Orlando, FL 32809-7642

mat@amdahl.UUCP (Mike Taylor) (02/24/86)

In article <689@well.UUCP>, farren@well.UUCP (Mike Farren) writes:
> In article <2817@amdahl.UUCP> mat@amdahl.UUCP (Mike Taylor) writes:
> >(S/370 architecturally has 244 picosecond resolution)
> 
>    I admit to knowing little about the S/370, but a 4 GHz clock rate?
> Can someone verify this, please?   I don't remember seeing any microwave
> plumbing in a 370... :-)
> 

Architecture, not necessarily implemented. See S/370 XA Principles of
Operation, IBM pub# SA-22-7085 pp.4-20,4-21
-- 
Mike Taylor                        ...!{ihnp4,hplabs,amd,sun}!amdahl!mat

[ This may not reflect my opinion, let alone anyone else's.  ]

sher@rochester.UUCP (David Sher) (02/24/86)

Just to introduce a theoretical note:
Wouldn't an entirely self timed architecture avoid the issue of
software timing loops?  I would think that this would put the problem
where it belongs, in the hardware.  Of course you pay a certain factor
for self timing (I think it depends on the size of the chunks of hardware
that are self timed).  

Probably this has no relevance to current real machines but I'm an academic
anyway.
-- 
-David Sher
sher@rochester
seismo!rochester!sher

dick@ucsfcca.UUCP (Dick Karpinski) (02/26/86)

In article <6790@boring.UUCP> jack@mcvax.UUCP (Jack Jansen) writes:
>
>There is no way you'll write a device driver that is guaranteed
>to work with any C compiler, if you have to take care of timing
>considerations (unless you're willing to pay the penalty of using
>a *real* timer, of course).

I thought someone suggested a solution:  Build a timing loop, but
set its constant (how many cycles) using some other, possibly low
precision, timer to see how fast the loop is _today_ with this
compiler/clock/cpu-chip/whatever.  I _think_ that one can usually
count on those things remaining constant _during_ this run of the
program, i.e. between reboots of the OS.  I have heard of systems
which change their cpu clock on the fly, but you probably know
that when you write the device driver.  Is that enough?

Dick
-- 

Dick Karpinski    Manager of Unix Services, UCSF Computer Center
UUCP: ...!ucbvax!ucsfcgl!cca.ucsf!dick   (415) 476-4529 (12-7)
BITNET: dick@ucsfcca   Compuserve: 70215,1277  Telemail: RKarpinski
USPS: U-76 UCSF, San Francisco, CA 94143

rb@ccivax.UUCP (rex ballard) (02/27/86)

What is really needed is a standard "time base" signal that is
independent of "Chip Clock" speed.  When Chip makers create a "wait X"
instruction where X is the time in microseconds, and put the neccessary
loop/wait in the micro-code, I will be GLAD to stop using loops.  I have
always hated them (bursty DMA can blow timing too).  Something is
needed for that little window that is smaller than the RTC interrupt,
and bigger than a NOP or wait state.  Even a constant period NOP would
be nice.

With the 68020, even the old trick of running a big loop and timing
it against the RTC chip doesn't work (cache misses in shorter loops).

There are still a few places where very short timing intervals are
still needed, and have to be reasonably accurate.  Things like
hand-shaking, line-turnaround, device locking (where TAS won't do,
or needs to be done more than once).

The original point of using them for copy-protection schemes and
long trivial loops is valid.  Don't do it.

ron@brl-smoke.ARPA (Ron Natalie <ron>) (02/27/86)

> In article <2817@amdahl.UUCP> mat@amdahl.UUCP (Mike Taylor) writes:
> >(S/370 architecturally has 244 picosecond resolution)
> 
>    I admit to knowing little about the S/370, but a 4 GHz clock rate?
> Can someone verify this, please?   I don't remember seeing any microwave
> plumbing in a 370... :-)
> 

Who said anything about a 4Ghz clock rate?  All he said was the architecture
supported that resolution.  If the 370 time register were incremented by
one, it would be updated every 244 picoseconds.  However each machine
in the line increments it by a somewhat larger number corresponding to the
speed of the clock on that processor.  Hence, the higher order bits have
been consistant accross 10 years of processors and will continue to be
so for quite a few years to come I would expectr.

-Ron

gnu@hoptoad.uucp (John Gilmore) (02/28/86)

In article <613@sauron.UUCP>, campbell@sauron.UUCP (Mark Campbell) writes:
> Recently, we began work on a new machine.  The very first thing the H/W
> guys did was obtain a copy of "All the Chips that Fit" (by Lyon and
> Skudlarek, of Sun) and proclaim that we wouldn't make the mistakes that
> Sun made.
> 
> Unfortunately, management then stepped in...
> 
> The one thing we did insist upon, though, was glue logic to support
> those chips that had timing problems.  Mr. Gilmore stated that
> microprocessor designers should design system independed ways of
> dealing with delays.  What should really happen is that the chip
> manufacturers incorporate the delay logic within those chips.

Peripheral chips are driven by strobes from the outside world.  In
almost all cases (except tightly coupled coprocessor style chips), the
peripheral chip does not tell the CPU when it is done with a request;
the system designer is expected to have read the data sheet and set up
the right number of wait states and such.  This makes the peripheral
chip usable in many systems no matter whose CPU or bus you are using.

Given that piece of reality, how should chip manufacturers "incorporate
the delay logic within those chips"?  What should the chip do if it
gets a strobe and isn't ready for one?

I currently prefer the approach of having a status pin saying "ready"
or "not ready", which external hardware (or software) can use to avoid
strobing the chip when it is not ready.  However, pins are expensive,
so this only happens when there's a spare pin around.  If they're going
to add a few pins to make the chip nicer, I vote for a few other uses
first (like multiple address lines to avoid the write-the-address-then-
write-the-data approach, which falls down if you take an interrupt in
the middle).

System designers can nowadays use a PAL to generate the right number of
wait states, when a few years ago there was just a decoder and you were
stuck with whatever timing it could provide.  This does cost quite a bit
more than a few extra bytes of software, though.

> The use of the term "particularly good" when referring to a device with
> a major flaw such as this is a non sequiteur.

The Z8530 *is* a particularly good chip.  I would rate it as one of the
best chips Zilog has designed.  Its bus interface has clearly shown up
as the weakest part of the chip though -- especially the vectored
interrupt "support".  If one of the seventy-leven companies which are
second sourcing it and putting their own part numbers on it would
instead fix the bus interface, I bet they'd get some sales.
-- 
John Gilmore  {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu   jgilmore@lll-crg.arpa

marty@fritz.UUCP (Marty McFadden) (03/02/86)

I must agree that even well documented code is still burdensome if
timing loops are needed. I recently ported Unix* System V from the
68000 to the 68020 (16 2/3 Mhz), there were unfortunately a few timing
loops that were put into the kernel that had to be changed. (talk
about finding a needle in a haystack!!)

*Unix is a trademark of Bell Laboratories

					Martin J. McFadden
					FileNet Corp
					trwrb!fritz!marty

campbell@sauron.UUCP (Mark Campbell) (03/03/86)

In article <566@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>In article <613@sauron.UUCP>, campbell@sauron.UUCP (Mark Campbell) writes:
>> ...              What should really happen is that the chip
>> manufacturers incorporate the delay logic within those chips.
>
> [Fundamental H/W realities...]
>
>Given that piece of reality, how should chip manufacturers "incorporate
>the delay logic within those chips"?  What should the chip do if it
>gets a strobe and isn't ready for one?

Obviously you don't let the chip get a strobe for which it's not ready;
you delay DSACK on the first access to the chip until the delay time has
expired (i.e., until the device may be accessed legally again).  What
you've done is implemented the registered PAL approach (state-machine)
at the device level.

>I currently prefer the approach of having a status pin saying "ready"
>or "not ready", which external hardware (or software) can use to avoid
>strobing the chip when it is not ready.

Just use the status pin to indicate when it is safe to contine, and
you've solved the problem.  In essence, you've constructed the same
delay mechanism which you proposed to the uP guys (more on that later).

>                                        However, pins are expensive,
>so this only happens when there's a spare pin around.  If they're going
>to add a few pins to make the chip nicer, I vote for a few other uses
>first (like multiple address lines to avoid the write-the-address-then-
>write-the-data approach, which falls down if you take an interrupt in
>the middle).

Spare pins aren't usually an all or nothing proposition...a single delay
status pin is a lot cheaper than demultiplexing the address/data lines.
Besides, I'd rather perform an extra two instructions (raising the
priority of the processor, writing the address/data, and lowering the
priority) than worry about ALL of the problems associated with delays.

>System designers can nowadays use a PAL to generate the right number of
>wait states, when a few years ago there was just a decoder and you were
>stuck with whatever timing it could provide.  This does cost quite a bit
>more than a few extra bytes of software, though.

The cost is three to five dollars per registered PAL...and you're right,
it ain't cheap.  I realize that this translates to at least nine to
fifteen dollars to the customer.  But how much do you think it will cost
the customer when he gets an intermittent character lost somewhere
down the road? And how much S/W development time do you think it will cost
each time the fault is "discovered"?  You can't use a single point model
of cost in this situation; it just doesn't apply.

>> The use of the term "particularly good" when referring to a device with
>> a major flaw such as this is a non sequiteur.
>
>The Z8530 *is* a particularly good chip.  [...]

And the 432 *was* a particularly good chip (set), it was just a little
slow. (:-)  Seriously, it very well might be an *excellent* chip, just
like the TOD chip I previously mentioned.  However, both are *extremely
bad* chips in the context of supporting Unix, because both contain major
flaws with respect to Unix.  H/W exists only to execute S/W (just like an
OS exists only to execute applications), and if the H/W does it poorly,
then it just isn't good H/W...regardless of the cost.

I believe the last sentence will cause a lot of flamage.  Before you H/W
designers go on a crazed rampage, let me say that I consider this subject
somewhat moot.  The original posting came from a guy at Motorola warning
against timing loops.  This was followed by John Gilmore suggesting uP
modifications that would solve this problem.  Well, I believe that John,
in his last posting (not the message that this is in response to) gave a
solution which is adequate.  I hope it is implemented; however, I also
hope that peripheral chip manuafacturers will clean-up their bus interface
problems.
-- 

Mark Campbell    Phone: (803)-791-6697     E-Mail: !ncsu!ncrcae!sauron!campbell

jer@peora.UUCP (J. Eric Roskos) (03/04/86)

John Gilmore (gnu@hoptoad.UUCP) writes:

> Peripheral chips are driven by strobes from the outside world.  In
> almost all cases (except tightly coupled coprocessor style chips), the
> peripheral chip does not tell the CPU when it is done with a request;
> the system designer is expected to have read the data sheet and set up
> the right number of wait states and such.

I must disagree with this!  Although possibly my confusion (and possibly
part of the debate itself) may arise because we are thinking of different
types of "peripheral chips".

Peripheral chips that perform I/O operations -- UARTs, disk controllers,
DMA controllers, etc. -- should certainly tell the CPU when it is done
with a request.  That is what interrupts are for!  I think that is what
the original poster was referring to.  It would be bad to convince people
who may design new peripheral parts that they should do away with the
"ready" pins on their devices; if they did that, the ability of I/O
software to work in a reasonable manner would rapidly diminish, especially
for "multitasking" systems.

On the other hand, wait states for slow parts are a different matter, and
I think maybe that was what the above poster was referring to?  Still, it
would be far better to handle this in hardware -- for example, suspending
the processor's execution if it tries to access the part again before it
has completed the previous operation -- than expecting a software timing
loop to do it.  Of course, this may have an adverse effect in the case of
real time applications, where you need to know how long it will take to
perform each operation; you could suspend execution as soon as the original
access is done, until it is completed, which would give a constant delay
regardless of the time between accesses.  But for many applications, you
could do something useful during that time.
-- 
UUCP: Ofc:  jer@peora.UUCP  Home: jer@jerpc.CCUR.UUCP  CCUR DNS: peora, pesnta
  US Mail:  MS 795; CONCURRENT Computer Corp. SDC; (A Perkin-Elmer Company)
	    2486 Sand Lake Road, Orlando, FL 32809-7642   LOTD(5)=O
----------------------
Amusing error message explaining reason for some returned mail recently:
> 554 xxxxxx.xxxxxx.ATT.UUCP!xxx... Unknown domain address: Not a typewriter
(The above message is true... only the names have been changed...)

grr@cbm.UUCP (George Robbins) (03/04/86)

In article <566@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>In article <613@sauron.UUCP>, campbell@sauron.UUCP (Mark Campbell) writes:
>
> [Fundamental H/W realities...]
>

>> The use of the term "particularly good" when referring to a device with
>> a major flaw such as this is a non sequiteur.
>
>The Z8530 *is* a particularly good chip.  [...]

Speaking of moot problems, the recovery time specification for the Z8530 is
a non-problem in most applications.  The data sheet basically specifies that
so many PCLK cycles must elapse between accesses.  Unless you are using an
unusually slow PCLK, the overhead of the C style inb()/outb() subrountine calls
will eat up the requisite cycles.  Assembly code may need a nop or two to
guarentee cycles.

To avoid interrupt hassles, you can define C routines outoutb() and outinb()
that save interrupt status, off interrupts, write pointer, nop, read/write data,
and restore interrupts.

Also, the DMA status pins can be used to generate hardware wait states or be
sensed by software (through some other chip) to indicate when the chip is ready
to accept another operation.

The real hardware world is full of these little kludges in much the same sense
as unix is blessed with features and warts.  The 8530 does just about anything
you would want a serial interface to do, short of ethernet, and does it for two
channels.  Neither is perfect, but they let you get the job done.
-- 
George Robbins - now working with,	uucp: {ihnp4|seismo|caip}!cbm!grr
but no way officially representing	arpa: cbm!grr@seismo.css.GOV
Commodore, Engineering Department	fone: 215-431-9255 (only by moonlite)

franka@mmintl.UUCP (Frank Adams) (03/04/86)

In article <6780@boring.UUCP> jack@mcvax.UUCP (Jack Jansen) writes:
>>E.g. in meeting the recovery time of a particularly good USART chip
>>with a horrible bus interface, the Z8530, you need to wait 2.2us
>>between writes to it.
>
>Sorry, but this bad makes it a particularly *bad* USART chip, regardless
>of any other features.

It seems to me that from theoretical considerations, there will always be
*some* time dependencies in any device.  If you run it with a fast enough
processor, it will stop working.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

rb@ccivax.UUCP (rex ballard) (03/06/86)

In article <443@ucsfcca.UUCP> dick@ucsfcca.UUCP (Dick Karpinski) writes:
>In article <6790@boring.UUCP> jack@mcvax.UUCP (Jack Jansen) writes:
>>
>I thought someone suggested a solution:  Build a timing loop, but
>set its constant (how many cycles) using some other, possibly low
>precision, timer to see how fast the loop is _today_ with this
>compiler/clock/cpu-chip/whatever.  I _think_ that one can usually
>count on those things remaining constant _during_ this run of the
>program, i.e. between reboots of the OS.  I have heard of systems
>which change their cpu clock on the fly, but you probably know
>that when you write the device driver.  Is that enough?
>
>Dick
>-- 
This works ok UNLESS you have a high speed cache (like the 68020)
AND are servicing non-maskable interrupts.  The timing in a tight
loop can wander all over the place, depending on how often the
interrupts scramble your cache.  Problems can ALSO occur when
co-processors or multiple DMA devices are sharing the bus and your
processor has low priority.

greg@utcsri.UUCP (Gregory Smith) (03/10/86)

In article <1162@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>It seems to me that from theoretical considerations, there will always be
>*some* time dependencies in any device.  If you run it with a fast enough
>processor, it will stop working.

False. A processor-to-device interface can be designed in such a way that
an access to a slow device will cause the processor to be 'stopped' until
the device is ready. This can be done in a port-dependent way, i.e. if
there is only one slow device on the bus, the processor will only be
slowed when that device is accessed. The 'stopped' state of the processor
is sometimes called a 'wait' state.
On many systems, this technique is the rule rather than the exception -
I think UNIBUS is an example.

-- 
"So this is it. We're going to die."	- Arthur Dent
----------------------------------------------------------------------
Greg Smith     University of Toronto       ..!decvax!utzoo!utcsri!greg

campbell@sauron.UUCP (Mark Campbell) (03/14/86)

In article <25@cbm.UUCP> grr@cbm.UUCP (George Robbins) writes:
>
>Speaking of moot problems, the recovery time specification for the Z8530 is
>a non-problem in most applications.  The data sheet basically specifies that
>so many PCLK cycles must elapse between accesses.  Unless you are using an
>unusually slow PCLK, the overhead of the C style inb()/outb() subrountine calls
>will eat up the requisite cycles.  Assembly code may need a nop or two to
>guarentee cycles.

Absolute last word on the subject (we clock at 4MHz, but I'll assume 6MHz):

	8530 Set-Up Delay = 6 x T x PCLK + 200ns
			  = 6 x 165ns + 200ns = 1.19us (6MHz, best case)
			  = 6 x 2us + 200ns   = 12.2us (6MHz, worst case)

	MC68020 NOP Time  = 2 x 60ns = 120ns           (16.67MHz, loop)
			  = 2 x 80ns = 160ns           (12.5MHz, loop)

	1.19us / 120ns = 10 NOP's;  1.19us / 160ns = 8 NOP's  (best case)
	12.2us / 120ns = 102 NOP's; 12.2us / 160ns = 77 NOP's (worst case)

I'm not familiar with the "C style inb()/outb" routines you mention.  However,
I would respectfully suggest that if these routines can guarantee a 12.2us
delay through normal code execution, you fire the guy who wrote it.
You'll probably notice that the number of NOP's required are a bit more than
"a nop or two" you predicted.  This is compounded by the fact that interrupts
are disabled at this time, increasing latency.  *This* is compounded by the
fact that we have to assume worst case with no H/W chip level support.  *This*
is compounded by the fact that this software solution still does not solve
the problems associated with 8530-related DMA accesses.

I included the best case timings to illustrate what the proper H/W can do.
I suggest that anyone out there having problems with the 8530 see the April 4
issue of EDN, pages 274-275, for a nice H/W solution for the 8530's recovery
time problems.
-- 

Mark Campbell    Phone: (803)-791-6697     E-Mail: !ncsu!ncrcae!sauron!campbell

gnu@hoptoad.uucp (John Gilmore) (03/16/86)

In article <25@cbm.UUCP>, grr@cbm.UUCP (George Robbins) writes:
> Speaking of moot problems, the recovery time specification for the Z8530 is
> a non-problem in most applications.  The data sheet basically specifies that
> so many PCLK cycles must elapse between accesses.  Unless you are using an
> unusually slow PCLK, the overhead of the C style inb()/outb() subroutine calls
> will eat up the requisite cycles.  Assembly code may need a nop or two to
> guarentee cycles.

This is only true if you have an unusually slow CPU.  Ours overruns the
chip without trouble.  Maybe Commodore's doesn't.
-- 
John Gilmore  {sun,ptsfa,lll-crg,ihnp4}!hoptoad!gnu   jgilmore@lll-crg.arpa