[comp.arch] RISC != real-time control

koopman@A.GP.CS.CMU.EDU (Philip Koopman) (04/26/88)

One aspect of RISC processors for real time control that I
have not seen discussed is the conflict between
deadline scheduling and the statistical nature of
RISC performance figures.

Real-time control programs often have a situation where only
X microseconds are available to perform a task.  Therefore,
the code to perform the task must be GUARANTEED to complete
within X microseconds.  In real-time control, a late answer
is a wrong answer.

The problem with RISC designs is that they promise a performance
of Y MIPS in the average case over large sections of code and
relatively long periods of time.  It seems to me that this
is not an applicable performance measure for real-time control.
What is more important is worst-case performance (maximum
possible cache misses for that program, branch-target buffer
misses, etc.)  It may be the case that a slower processor
with uniform performance can be rated at a higher usable
MIPS rate than a RISC processor with inconsistent
instantaneous performance.

So, what is a real-time control designer to do?

-- De-rate the RISC MIPS ratings to assume 100% cache misses?

-- Use (probably) non-existent tools to compute worst-case
   program execution time under all possible conditions?

-- Not use RISC in an environment with short deadline events?


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~  Phil Koopman             5551 Beacon St.             ~
~                           Pittsburgh, PA  15217       ~
~  koopman@faraday.ece.cmu.edu   (preferred address)    ~ 
~  koopman@a.gp.cs.cmu.edu                              ~
~                                                       ~
~  Disclaimer: I'm a PhD student at CMU, and I do some  ~
~              work for WISC Technologies.              ~
~              My opinions are my own, etc.             ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

larry@mips.COM (Larry Weber) (04/26/88)

In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:
>
>One aspect of RISC processors for real time control that I
>have not seen discussed is the conflict between
>deadline scheduling and the statistical nature of
>RISC performance figures.
>
> ...
>So, what is a real-time control designer to do?
>
>-- De-rate the RISC MIPS ratings to assume 100% cache misses?
>
>-- Use (probably) non-existent tools to compute worst-case
>   program execution time under all possible conditions?
>
>-- Not use RISC in an environment with short deadline events?
>
Cache effects can be present in any machine that has a cache: CISC or RISC.  

Answer 1 will provide a general guide-line of the effect only if you 
know how YOUR application maps onto the MIPS rating.  Even if your
program followed the MIPS rating in a number of trials, you still have to
know how the time is allocated between memory references and other operations
which do not have a statistical nature.

Answer 2 will give a worst case bound on the performance.  The MIPS compilers
have tools that will inform you of the number cycles, instructions and 
memory references for a given run of the program.  Computing worst
case times is really a matter of multiplication.  This answer is
really over kill because not all applications require worst case times to be
used for every part of the problem.  For example, assume
you had to accept a piece of data and queue it for processing while
interupts were disabled.  The critical time is how long are interupts
disabled because data could be lost in that period.  

Answer 3 is like throwing out the baby with the bath water - this solution
should be generalized to any hardware that has a statistical nature.
This leaves out the 68020 and 030 too.
-- 
-Larry Weber  DISCLAIMER: I speak only for myself, and I even deny that.
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!larry,   DDD:408-720-1700, x214
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

henry@utzoo.uucp (Henry Spencer) (04/26/88)

> So, what is a real-time control designer to do?

The same thing he does with a high-powered CISC:  swear loudly, try to
estimate worst-case performance, and contemplate going back to the Z80.
At least RISC instruction times are more or less predictable, unlike those
of, say, the 68020.

More generally, there is a fundamental clash between trying to make the
performance simple and predictable and trying to maximize it by exploiting
regularities in the workload.  If you want absolutely predictable speed,
then (for example) you will either have to live without caches or else
manage them very carefully so you know what they're doing.  The same applies
to optimizing compilers, buffered I/O devices, asynchronous buses, etc etc.
-- 
"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
non-negotiable."  --DMR              | {ihnp4,decvax,uunet!mnetor}!utzoo!henry

koopman@A.GP.CS.CMU.EDU (Philip Koopman) (04/26/88)

In article <1521@pt.cs.cmu.edu>, koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:
> One aspect of RISC processors for real time control that I
> have not seen discussed is the conflict between
> deadline scheduling and the statistical nature of
> RISC performance figures.
> [stuff deleted]

Thanks for the response so far.

I have received several replies of the form that any machine with
cache has problems with predictability of performance.
I agree, but that isn't the whole question/answer.  I thought
that RISCs had a higher cache miss rate (in misses per second,
not miss ratio) since they need more instructions, or is this
solved with increased line size/prefetching?

A better question is: is it appropriate to be using a RISC
on embedded applications?  What if you can't afford off-chip cache
memory -- doesn't the increased instruction bandwidth required
for a RISC cause problems?  I get the feeling that cache helps a CISC
somewhat, but that a RISC simply dies without a lot of cache -- is
that really the case?

Another concern has to do with program size.  Everything I've seen
says that RISCs have programs about twice as big as CISCs.  What
does that do in an embedded environment -- NO, Memory is NOT cheap
when it costs power/weight/cooling/volume/dollars/chip count in a highly
constrained application!

Thanks for the feedback,

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~  Phil Koopman             5551 Beacon St.             ~
~                           Pittsburgh, PA  15217       ~
~  koopman@faraday.ece.cmu.edu   (preferred address)    ~ 
~  koopman@a.gp.cs.cmu.edu                              ~
~                                                       ~
~  Disclaimer: I'm a PhD student at CMU, and I do some  ~
~              work for WISC Technologies.              ~
~              (No one listens to me anyway!)           ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

aglew@urbsdc.Urbana.Gould.COM (04/26/88)

>So, what is a real-time control designer to do?
>
>-- De-rate the RISC MIPS ratings to assume 100% cache misses?

You have to do this for CISCs with caches, not just RISCs.

>-- Use (probably) non-existent tools to compute worst-case
>   program execution time under all possible conditions?

In a hard real time environment you have to do thisd for CISCs
as well as RISCs. I don't know of any tools to do this *well*
in either camp, but building them should be considerably easier
for a RISC than a CISC, given the preponderance of short,
single cycle instructions, and explicitness of timing constraints.
On a CISC you never know what interlock is going to bite you.
    In fact, wasn't this one of the original reasons for RISC -
simple instructions make performance of code sequences easier
to calculate, and hence easier to choose between in optimization?

>-- Not use RISC in an environment with short deadline events?

I rather think that the GE RPM-40 guys will disagree with you about
that...

aglew@gould.com

schmitz@FAS.RI.CMU.EDU (Donald Schmitz) (04/26/88)

In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:

>Real-time control programs often have a situation where only
>X microseconds are available to perform a task.  Therefore,
>the code to perform the task must be GUARANTEED to complete
>within X microseconds.  In real-time control, a late answer
>is a wrong answer.

This may be straying somewhat from the original point, but what sort of
applications really have such exact timing deadlines?  I have done a little
real-time motion control, using a CPU to implement a discrete position
control law for robot axes, and in general a few percent deviation in cycle
time has next to no effect.  As long as the deviation is small and well
distributed, ie.  delays of no more than 20% and occuring less than 10
sample periods in a row, I can't imagine a mechanical system reacting to the
error.

Don Schmitz (schmitz@fas.ri.cmu.edu)

petolino%joe@Sun.COM (Joe Petolino) (04/26/88)

>One aspect of RISC processors for real time control that I
>have not seen discussed is the conflict between
>deadline scheduling and the statistical nature of
>RISC performance figures.
>
   .  .  .
>So, what is a real-time control designer to do?

First (as others have pointed out) this problem has more to do with having a
cache than with using any particular type of processor.  RISC processors
complicate this a little by providing opportunities for varying levels of
optimization for a given piece of code.  However, once it's cast into machine
code, execution time (barring memory system effects) is quite predictable
for most processors (either CISC or RISC), and could be determined with a
good simulator.

You could attack the cache problem by clever system design.  A former
employer of mine at one point contemplated building a RISC-based system aimed
at real-time applications.  Our plan was to use a set-associative instruction 
cache, and include a control bit in each cache set (writable by the operating
system) which could 'lock' one of the elements of the set into the cache:  if
the bit was set, that cache block would never get swapped out of the cache
(the rest of the set was still available for 'non-critical' stuff, which
would suffer a higher miss rate due to the reduced cache size).  If you
loaded your response-critical code into the cache, then locked it in, one big
variable went away.  Unfortunately, this system never was built.  Has anyone
else done something like this?

-Joe

bcase@Apple.COM (Brian Case) (04/27/88)

In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:
>One aspect of RISC processors for real time control that I
>have not seen discussed is the conflict between
>deadline scheduling and the statistical nature of
>RISC performance figures.

??????  And CISC (or whatever you consider an alternative to RISC) doesn't
have the so-called "statistical nature" of performance?!?!

>The problem with RISC designs is that they promise a performance
>of Y MIPS in the average case over large sections of code and
>relatively long periods of time.

??????  How do alternatives to RISC differ?

>What is more important is worst-case performance (maximum
>possible cache misses for that program, branch-target buffer
>misses, etc.)

Worst-case performance is always *most* important for real-time systems.
Because of fundamental limitations of technology (big DRAMs are slower
than small SRAMs), any processor that runs as fast as the technology will
allow will rely on caching to some degree (I claim).  To the extent that
your real-time code can't depend on the cache(s) containing your working
set (probably can't depend on it at all), you may be better off, in terms
of cost, designing the hardware without caches.  If the caches are on-chip,
then you have no choice of course.  Now, it *is* possible that, in an
environment where the cache(s) is(are) always missing, cache(s) will actually
make the system run slower.  However, it will be more and more difficult
to find any fast processor, CISC, RISC, or whatever-ISC, without on-chip
caches.  In fact, many CISCs will soon be implemented with a very RISC-
like core.  Oops, I guess I could have summarized this whole spiel by
simply saying "your problem isn't RISC, its statistical techniques in
general.  These techniques are universally used."  Maybe a good-old 68000
is your best bet?

bob@pedsga.UUCP (04/27/88)

In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU.UUCP writes:
> { questioning the suitability of RISC processors for Real-Time use }
> ...

It seems to me that it is much *easier*  to predict worst
case performance for RISC processors because

1) Most execute one instruction/clock. You don't have to figure
	out how many cycles each instruction actually takes.
2) Most don't have interuptible instructions. Who knows how
	long it takes?

If you are really concerned about cache misses, you would design
your system so that all the memory was fast enough for the processor.
And you wouldnt do demand-paging either.

Just my opinion.
Bob Weiler.

fdr@joy.ksr.com (Franklin Reynolds) (04/27/88)

Another similar question about RISC vrs. realtime is whether the philosophy
of optimising for the general case instead of the exception is appropriate.

As I understand it, optimising for the general case is fundamental to most
RISC designs. Modern, sophisticated realtime systems that have to deal with
hard time constraints and overload conditions might be better served by
architectures that are optimized for various exceptional conditions.

You could imagine an architecture optimized for speedy interrupt handling,
context switching, process ordering, IPC, etc. This architecture might have
advantages for certain types of realtime applications over designs that
optimized for throughput in the general case.

   Franklin Reynolds 			Kendall Square Research Corporation
   fdr@ksr.uucp				Building 300 / Hampshire Street
   ksr!fdr@harvard.harvard.edu 		One Kendall Square
   harvard!ksr!fdr 			Cambridge, Ma 02139

jmd@granite.dec.com (John Danskin) (04/28/88)

We have a leetle teeny ucode engine (read risc by Weitek) that needs
some things locked into cache (a real time constraint that involves
the bus hanging if we slip by even one cycle (our fault, not
weitek's)).  Fortunately, our system uses direct mapped caches, so we
changed the linker so that modules which should be locked into
cache get unique addresses (modulo the cache size). This works
just fine, and since we have hardly any of this critical code, caused
only a 2% overall code growth (because of all of the little holes)
-- 
John Danskin				| decwrl!jmd
DEC Technology Development		| (415) 853-6724 
100 Hamilton Avenue			| My comments are my own.
Palo Alto, CA  94306			| I do not speak for DEC.

david@daisy.UUCP (David Schachter) (04/28/88)

In article <1534@pt.cs.cmu.edu> schmitz@FAS.RI.CMU.EDU (Donald Schmitz) writes:
>In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:
>>Real-time control programs often have a situation where only
>>X microseconds are available to perform a task.  Therefore,
>>the code to perform the task must be GUARANTEED to complete
>>within X microseconds.  In real-time control, a late answer
>>is a wrong answer.
>
>This may be straying somewhat from the original point, but what sort of
>applications really have such exact timing deadlines?...
>[I]n general a few percent deviation in cycle
>time has next to no effect.  As long as the deviation is small and well
>distributed, ie.  delays of no more than 20% and occuring less than 10
>sample periods in a row, I can't imagine a mechanical system reacting to the
>error.

Not all real-time system control mechanical objects.

I wrote code for a radio-controlled clock.  The microcontroller takes a non-
maskable interrupt every millisecond.  If the interrupt service routine ever
takes more than a millisecond to execute, the results are:

1) The stack may get trashed, or it may not.
2) The clock will lose a millisecond.
3) Certain I/O ports may not be completely updated.
4) The clock may lose an output character (sending time to the host)
5) The clock may lose input characters (receiving commands from the host.)

Depending on the customer's usage of the clock, the result could be a simple
as a traffic light "slipping" a millisecond" or as bad as a wide-area network
losing packets and not being able to restart after a network crash.

I put in code to reset the clock if nested NMI's occur and I spent a lot of
time counting clocks and doing measurements with an oscilloscope, to insure
the interrupt service routine will alway take a less than a millisecond.
Worst case time: 900 microseconds.  Usual case: 100 microseconds.

Before the work, the clock would often crash for no apparent reason.  Turned
out the previous programmer (this is two years ago) was allowing the ISR to
take more than ten milliseconds (i.e. nesting NMI's ten levels deep!)

Disclaimer: this article was written by Schroedinger's cat, Bill.

peter@athena.mit.edu (Peter J Desnoyers) (04/28/88)

Problems like this have already cropped up in the modem field, where
you have RISC-like processors (e.g. TMS32020) which require very fast
memory running code which has to run every sample time, and then a lot
of random code to control the front panel, RS232, MNP, and other
random piddling stuff. The solution until now was to use an 8 bit
micro (sometimes a 68000) to do the piddling stuff that took up 80-90%
of the code volume, and a signal processing micro to do the fast
stuff, and give them each their own slow and fast memory,
respectively.

Things have changed. It is now possible to get at least one of these
chips (I think it's the 32020) to do wait states on memory, and
someone (I don't remember who) has now put their MNP implementation
and a few other things on this processor, in slow ROM, while their
signal processing code runs in fast (20ns?) RAM.  It takes a lot more
ROM space than an eight bit micro (simple, fixed-length (32 bit?)
instructions, poor handling of anything but integer multiplies and
accumulates) , but you still end up with fewer chips, lower cost, and
a negligible load added to the signal processor. 

The interesting thing to notice is that there is no need for fast
memory to be used as a cache in an embedded application. Just load
your time-critical code into fast memory, and your random stuff into
slow memory. If the time-critical part of the code is huge, then a
cache wouldn't help anyway. 


				Peter Desnoyers
				peter@athena.mit.edu

pardo@june.cs.washington.edu (David Keppel) (04/29/88)

I talked with our local real-time guru, Alan Shaw, who said something
to the effect of (not an exact quote, but I'll try to get the message
across):

    Doing any kind of timing analysis is very hard.  You can't
    assume in your analysis that there's going to be bus contention 
    every memory cycle, or your estimated performance is going to
    look much worse than it ever will in practice.  What people
    really do is come up with reasonable figures based on the
    probability of there being N consecutive bus contention cycles,
    and make your timing analysis based on some number of contention
    cycles that will happen with a probability that is smaller than
    the chance of other catastrophic failure.

Note that this analysis is independent of RISC/CISC or almost
anything else.  The key point here is that you can measure and
estimate probabalistically, and in practice the failure rate
from other sources (e.g., hardware failures) will be the
dominant mode of failure.

	;-D on  ( Well it looked good when I closed my eyes )  Pardo

rick@pcrat.UUCP (Rick Richardson) (05/01/88)

In article <1532@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:
>
>A better question is: is it appropriate to be using a RISC
>on embedded applications?  What if you can't afford off-chip cache
>memory -- doesn't the increased instruction bandwidth required
>for a RISC cause problems?  I get the feeling that cache helps a CISC
>somewhat, but that a RISC simply dies without a lot of cache -- is
>that really the case?
>

I'm still looking for the RISC that does ~4K (C language) Dhrystones,
has no cache, clocks around 4 Mhz, has a 16 bit bus, can address maybe 1MB,
is a power miser, can't do floating point, and costs no more than $15.

In HUGE quantities.  Just think of the millions and millions of next
generation consumer products that could use the extra performance,
while still meeting EMI, power consumption, and cost requirements.

Come on guys, I know that there's a lot of prestige in
having the fastest micro-* around, but theres a LOT of HIGH VOLUME
applications out there that just can't use all that power.

You might sell 10K-100K of these super high performance chips.
Wouldn't you rather sell *tens of millions*?
-- 
		Rick Richardson, President, PC Research, Inc.

(201) 542-3734 (voice, nights)   OR     (201) 834-1378 (voice, days)
uunet!pcrat!rick (UUCP)			rick%pcrat.uucp@uunet.uu.net (INTERNET)

aglew@urbsdc.Urbana.Gould.COM (05/01/88)

>As far as I know, no one has solved the virtual cache coherency
>problem yet...

There sure are a lot of folk who think they have, though not
commercially (yet). The virtual cache consistency problem is just
like the physical cache consistency problem, except that you
need a physical index for bus snooping.

[Knowing I'm gonna get flamed :-) ]: of course, Alliant doesn't
have too much to do with cache consistency - after all, the CEs
talk to the same cache, don't they, so don't have any consistency
problems? But how far can this scale? I suppose that the IPs
have to be kept coherent, and I believe that's writeback, but
the duty cycle doesn't have to be very high.

aglew@gould.com

bcase@Apple.COM (Brian Case) (05/03/88)

In article <476@pcrat.UUCP> rick@pcrat.UUCP (Rick Richardson) writes:
>I'm still looking for the RISC that does ~4K (C language) Dhrystones,
>has no cache, clocks around 4 Mhz, has a 16 bit bus, can address maybe 1MB,
>is a power miser, can't do floating point, and costs no more than $15.

Oh, that's easy!  The Acorn RISC Machine (ARM).  Yes, I know it has a
32-bit bus now, but just talk to VTI (they have the ARM and use it as a
cell, I think):  if you are right about volumes, they'll make a mod to
give it a 16-bit bus.  On every other account, the ARM is what you want.
I think you could even get it for around $10 instead of $15 (I think that
price is currently available for large quantities).

On second thought, with a 16-bit bus, it might slow down a lot.  It seems
worth looking into though.

jesup@pawl18.pawl.rpi.edu (Randell E. Jesup) (05/03/88)

In article <476@pcrat.UUCP> rick@pcrat.UUCP (Rick Richardson) writes:
>I'm still looking for the RISC that does ~4K (C language) Dhrystones,
>has no cache, clocks around 4 Mhz, has a 16 bit bus, can address maybe 1MB,
>is a power miser, can't do floating point, and costs no more than $15.

	Yeah, and what technology is this wonder-chip implemented in???
Whatever it is, I can think of dozens of Si companies that would give away
all their current facilites for that process.  Oh, and I'm not even worrying
about cost.

	Back to reality, it just can't be done, except MAYBE with a state of
the art chip optimized to NOTHING but fast dhrystones (which, by the way,
are a pretty poor predicter for most applications, due to string handling.)
4 Mhz is REAL slow.  A 4Mhz rpm-40 would be equivalent to maybe a 14Mhz
68000 (note: not '020).  At such slow speeds, CISC chips may well show
superiority due to wanting to maximize the usefulness of every bus cycle.

     //	Randell Jesup			      Lunge Software Development
    //	Dedicated Amiga Programmer            13 Frear Ave, Troy, NY 12180
 \\//	beowulf!lunge!jesup@steinmetz.UUCP    (518) 272-2942
  \/    (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup
(-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)

bcase@Apple.COM (Brian Case) (05/04/88)

In article <833@imagine.PAWL.RPI.EDU> jesup@pawl18.pawl.rpi.edu (Randell E. Jesup) writes:
=	Yeah, and what technology is this wonder-chip implemented in???
=Whatever it is, I can think of dozens of Si companies that would give away
=all their current facilites for that process.  Oh, and I'm not even worrying
=about cost.
=
=	Back to reality, it just can't be done, except MAYBE with a state of
=the art chip optimized to NOTHING but fast dhrystones (which, by the way,
=are a pretty poor predicter for most applications, due to string handling.)
=4 Mhz is REAL slow.  A 4Mhz rpm-40 would be equivalent to maybe a 14Mhz
=68000 (note: not '020).  At such slow speeds, CISC chips may well show
=superiority due to wanting to maximize the usefulness of every bus cycle.

On the contrary.  Let me say it again:  the ARM from VTI and ACORN.  At low
clock rates (so that memory access time isn't an issue), the ARM gets about
1K dhrystones per MHz (using the rather decent ACORN C compiler).  The
process is (was) junky 2 or 3 micron CMOS.  Current price for the ARM
(VTI 86000 I think is the part number) is very low in quantity, < $15 I
think.  The only problem for meeting the original poster's requirements is
the 32-bit bus of the ARM.

baum@apple.UUCP (Allen J. Baum) (05/04/88)

--------
[]
>In article <476@pcrat.UUCP> rick@pcrat.UUCP (Rick Richardson) writes:
>
>I'm still looking for the RISC that does ~4K (C language) Dhrystones,
>has no cache, clocks around 4 Mhz, has a 16 bit bus, can address maybe 1MB,
>is a power miser, can't do floating point, and costs no more than $15.
>

Except for the 16bit bus, the ARM chip seems to meet your qualifications.
It looks very good for controller kinds of applications. Its simple, small
(die size) and therefore, cheap. It does not require a cache, and knows how
to talk to DRAMs with page mode access cycles to get good performance with
no cache.

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

steckel@Alliant.COM (Geoff Steckel) (05/04/88)

In article <4921@bloom-beacon.MIT.EDU> peter@athena.mit.edu (Peter J Desnoyers) writes:
>Things have changed. It is now possible to get at least one of these
>chips (I think it's the 32020) to do wait states on memory, and
>someone (I don't remember who) has now put their MNP implementation
>and a few other things on this processor, in slow ROM, while their
>signal processing code runs in fast (20ns?) RAM.

The scheme mentioned is very close to one with which I am currently working.
I recently surveyed all the DSP chips for which I could get documentation.
Only the TI 320xxx series have a 'memory access done' pin.  All the other
chips (Moto, AD, NEC, OKI, ...) either have a programmable # wait states or
assume external program or data memory is sufficiently fast to work
synchronously.

This makes ganging of DSP chips using shared (peer-to-peer) global memory
difficult, and makes using mixed slow and fast program memory impossible.
The designers seem to assume:
  1) All parts of the application must run equally fast.
  2) Programs will be small.
  3) Data will be small or only accessed a little at a time.
  4) The DSP chip will own all resources to which it is connected.
  5) Any resource the DSP chip does not own are:
     a) connected via a serial port (a la Transputer, etc), or
     b) sufficiently unimportant that polling a ready line is good enough, or
     c) very fast, or
     d) nonexistent

Can any of the DSP mavens comment on DSP architectures which
  1) Can be connected to large (> 64K) shared memories, which the DSP may
     use, but does not own (i.e. must request and be granted access)
     and whose access time has an upper bound but is not deterministic
     below that bound.
  2) Can run 'background' tasks (servicing panels, SCSI, etc., etc.)
     which require serious processing but much less than the 'foreground'
     task does, preferably with the code in slow (> 70nS, cheap!) memory.
while doing 'foreground' classic DSP?

Right now only TI's 320xx chips seem to have some of the hardware support, with
the large advantage of an extremely narrow program memory path (16 bits!).
The corresponding disadvantage is an extremely baroque and assymmetrical
instruction set.

The chip described is very close to a general purpose RISC chip, but with
the following differences:
  1) Onboard multiply must be very very fast (for convolutions, etc).
  2) sub-wordsize (byte, etc.) performance not very important
     DSP almost (ha) never does divides, but 1000000s of multiplies.
  3) barrel shifter very useful to required
  4) extended precision adder for multiply and accumulate vital
     (e.g. if a * b yields 32 bits, at least 34 bits in the sum, preferably
     more like 40!).  You don't have time to check for overflow.
  5) Floating point is **really** nice, but many applications can be
     bludgeoned into fixed point.  Painfully.
     If you do put in floating point, make it FAST.  Like 2-3 cycles.
  6) Cheaper than the RISC chips are running.  $100/ea in moderate quantity.

     geoff steckel (steckel@alliant.COM)

sedwards@esunix.UUCP (Scott Edwards) (05/05/88)

From article <1534@pt.cs.cmu.edu>, by schmitz@FAS.RI.CMU.EDU (Donald Schmitz):
> In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:
> 
>>Real-time control programs often have a situation where only
>>X microseconds are available to perform a task.  .....
> 
> This may be straying somewhat from the original point, but what sort of
> applications really have such exact timing deadlines?  I have done a little
> real-time motion control, ....

I worked on a project a while back that implimented a motion control servo
loop with a microprocessor and every time the uP didn't make the deadline
the loop would go unstable and lost all control.

It was fun to watch!   We finally had to change the time period so that the
processor always completed it's job on time, even tho in other modes it
was idle 60% of the time.

-- Scott

peter@sugar.UUCP (Peter da Silva) (05/08/88)

In article <1534@pt.cs.cmu.edu>, schmitz@FAS.RI.CMU.EDU.UUCP writes:
> In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman)
  talks about hard realtime when he writes:

> >Real-time control programs often have a situation where only
> >X microseconds are available to perform a task.

> This may be straying somewhat from the original point, but what sort of
> applications really have such exact timing deadlines?

How about jet engine control systems in fighters? Or the software that
lands the space shuttle?
-- 
-- Peter da Silva      `-_-'      ...!hoptoad!academ!uhnix1!sugar!peter
-- "Have you hugged your U wolf today?" ...!bellcore!tness1!sugar!peter
-- Disclaimer: These aren't mere opinions, these are *values*.

jack@swlabs.UUCP (Jack Bonn) (05/08/88)

From article <1534@pt.cs.cmu.edu>, by schmitz@FAS.RI.CMU.EDU (Donald Schmitz):
> In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:
> 
>>Real-time control programs often have a situation where only
>>X microseconds are available to perform a task.  .....
> 
> This may be straying somewhat from the original point, but what sort of
> applications really have such exact timing deadlines?  I have done a little
> real-time motion control, ....

The worst system for real time deadlines I ever worked on was one
that implemented the control functions for a bottle making machine.
This wasn't a bottler; it took molten glass and formed it into bottles.

We had a 2.5 MHz Z-80 and a periodic interrupt whose period was 1 msec.
Doesn't leave much time for background processing.

The worst case was if an output to the scoop was delayed.  Rather than
catching the molten gob of glass in flight, it would fling it across the 
plant floor.  If it hit anyone, it would stick to their skin and most likely
result in an amputation.

Since I had previously worked on central office software, this gave me
a much more clear view of real time.  I used to worry about what would
happen if a dial tone or compelled signaling tone was delayed.  Ah, the
good old days.

-Jack
-- 
Jack Bonn, <> Software Labs, Ltd, Box 451, Easton CT  06612
uunet!swlabs!jack

nather@ut-sally.UUCP (Ed Nather) (05/09/88)

In article <832@swlabs.UUCP>, jack@swlabs.UUCP (Jack Bonn) writes:
> From article <1534@pt.cs.cmu.edu>, by schmitz@FAS.RI.CMU.EDU (Donald Schmitz):
> > In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:
> > 
> > This may be straying somewhat from the original point, but what sort of
> > applications really have such exact timing deadlines?  
> 
> We had a 2.5 MHz Z-80 and a periodic interrupt whose period was 1 msec.
> Doesn't leave much time for background processing.
> 

Our data acquisition system for time-series analysis of variable stars also had
1 msec interrupts, imposed on a Nova minicomputer, ca. 5 usec add time reg to 
reg.  If your interrupt routine chews up 100 usec, you still have 90% of the
CPU left to do "background" processing (I always thought of it as "forground,"
because it's what the user sees -- keyboard response, display, etc.)  That
meant keeping the interrupt routine short in the worst case, and allowing ONLY
the timing interrupt -- all other I/O was polled or DMA.  That allowed us to
specify the worst case condition -- when everything was active all at once --
and verify we'd never lose an interrupt. It was a disaster if we did: we'd get
data that looked fine but was actually wrong.  Not as dramatic as slinging
molten glass at someone, of course, but still awful.

I suspect time-critical software design will become more and more common as
computers get faster, just because you can consider software control where
only hardware was fast enough before.


-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{allegra,ihnp4}!{noao,ut-sally}!utastro!nather
nather@astro.AS.UTEXAS.EDU

mcdonald@uxe.cso.uiuc.edu (05/10/88)

>In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:
>>Real-time control programs often have a situation where only
>>X microseconds are available to perform a task.  Therefore,
>>the code to perform the task must be GUARANTEED to complete
>>within X microseconds.  In real-time control, a late answer
>>is a wrong answer.
>
>This may be straying somewhat from the original point, but what sort of
>applications really have such exact timing deadlines?...
>[I]n general a few percent deviation in cycle
>time has next to no effect.  As long as the deviation is small and well
>distributed, ie.  delays of no more than 20% and occuring less than 10
>sample periods in a row, I can't imagine a mechanical system reacting to the
>error.

Sometimes microseconds can matter. Our most complicated real-time
system runs a scanning interferometer and a laser. The interferometer
is a mechanical plunger riding in a sleeve 0.00025 inch larger in
diameter than the moving part, at a temperature of -196 Kelvin, on
a cushion of pressurized helium.  The "wiggle" tolerance on the motion
is +- 0.000005 inch. This can only be achieved if the motion is smooth;
this part is taken care of by servo hardware. This hardware detects
the position of the mirror mounted on the plunger by counting 
interference fringes of a laser. It sends signals to the computer every
100 microseconds. The computer converts several error signals from
the hardware and decides if they are within tolerance. If not, it skips
a data point. If they are OK it starts the complicated process of
firing the various parts of the laser so that the sixth anticipated
trigger signal will occur just at the time the laser is really ready to 
go; the actual firing is by hardware. The computer again checks to see
if the collected data is OK or garbage. Then it can start over again.
The computer also checks on the "quality" of the servo loop inputs;
if they get weak the moving parts have been known to self-destruct
($5000) - there are hardware "stops" to prevent destruction, but 
using them ruins the alignment and we have to warm up to room 
temperature to fix it, a three day process. We are using a PDP-11/73,
with ALL interrupts disabled. The program was written in assembler,
checking the timing of every instruction -- we can see by its outputs
on a scope how much time we have to spare, and of course there are
variations due to the cache hit/not hit probability, but we know
FOR SURE that it won't overrun, as we give it 25% to spare, in the worst
case. The code was an absolute nightmare to write, but it is actually
rather simple , in fact only about 3000 lines.
        I would consider this to be "real-time".
Doug McDonald

phil@osiris.UUCP (Philip Kos) (05/13/88)

>In article <1521@pt.cs.cmu.edu> koopman@A.GP.CS.CMU.EDU (Philip Koopman) writes:
>This may be straying somewhat from the original point, but what sort of
>applications really have such exact timing deadlines?...

I worked on some real-time data acquisition applications at the University
of Illinois between 1980 and 1984, and if my program wasn't ready to read
that data word and put it someplace appropriate when it was ready to be
read (affectionaly known as "overrun"), we had to throw out the whole trial
and do it over again.  Some of the experiments I assisted were simple
enough, but most were not easily reproducible (particularly the ones
dealing with muscle fatigue) and I never again want to suffer the wrath of
a grad student facing a grant or thesis deadline.  Like the original
article said, if it's late, it might as well be wrong.

                                                                 Phil Kos
                                                      Information Systems
...!uunet!pyrdc!osiris!phil                    The Johns Hopkins Hospital
                                                            Baltimore, MD

mark@hubcap.UUCP (Mark Smotherman) (05/14/88)

What type of work has been done on benchmarks for real-time systems?
The applications seem so specialized as to make most comparisons into
apples versus oranges.  Are there any standard, "representative" tasks
that could be used to indicate the relative merit of a machine/OS?
In evaluating a machine, do you rely mainly on interrupt latency measures,
or on what?

Please email responses and I will post a summary.  Thanks.

-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark

aronsson@sics.se (Lars Aronsson) (05/16/88)

>>>the code to perform the task must be GUARANTEED to complete
>>>within X microseconds.  In real-time control, a late answer
>>>is a wrong answer.
>>
>>This may be straying somewhat from the original point, but what sort of
>>applications really have such exact timing deadlines?...
>
>Sometimes microseconds can matter. Our most complicated real-time
>system runs a scanning interferometer and a laser. The interferometer

Enough! Obviously, real-time applications do exist. No more
interferometers in this news group, please.

A few years ago, there was a discussion on why you wouldn't use UNIX
for real-time applications. This was because of the virtual memory
system. Today, we have UNIX clones which allow you to lock a process
in main memory, just like the UNIX kernel. Since a virtual memory
system is but a cache mechanism for the disk, the following thoughts
come naturally to me:

Before I start: This might turn out to be Todays Dumb Suggestion.
Maybe my ideas are already implemented on lots of systems or totally
useless. Please, let me know!

As far as I know, RISC instruction caches are a gain only when the
processor runs through loops. What about the ability to declare
cache-resident functions (procedures/subroutines)? This might not be
the solution to real-time applications, but seems potentially useful
in many other cases.

Things normally managed by super-CISC instructions (decimal
arithmetics, string instructions and the like) in such machines, would
then be done with neat library functions declared as "register". The
CISC equivalent to this would be to allow users to define new machine
instructions at run-time.

Of course, you would have to decide on what to do on a context switch.
Maybe the the register functions should belong to a shared library and
be more or less permanently in the cache.

Perhaps, this kind of register functions would make the RISC vs CISC
debate fade a little.

billo@cmx.npac.syr.edu (Bill O) (05/18/88)

In article <1924@sics.se> aronsson@sics.se (Lars Aronsson) writes:

>Before I start: This might turn out to be Todays Dumb Suggestion.
>Maybe my ideas are already implemented on lots of systems or totally
>useless. Please, let me know!

Yes, I think they have been to a certain extent. More in a bit...

>
>As far as I know, RISC instruction caches are a gain only when the
>processor runs through loops. What about the ability to declare
>cache-resident functions (procedures/subroutines)? This might not be
>the solution to real-time applications, but seems potentially useful
>in many other cases.
>
>Things normally managed by super-CISC instructions (decimal
>arithmetics, string instructions and the like) in such machines, would
>then be done with neat library functions declared as "register". The
>CISC equivalent to this would be to allow users to define new machine
>instructions at run-time.
>
>Of course, you would have to decide on what to do on a context switch.
>Maybe the the register functions should belong to a shared library and
>be more or less permanently in the cache.

Actually, there is no need to use *associative* cache for this
purpose, because the "associative" part is really just a mechanism to
enable the computer to keep in fast memory a portion of the code which
it predicts will be referenced in the near future (the prediction is
usually based on past use). For functions declared as being "fast"
or, as suggested, "register", all you really need is good old
fashioned fast memory.

What follows are excerpts from a couple of recent (past few months)
postings relating to the way this sort of thing was done on the pdp 10
and 11 (the second excerpt gives new meaning to the declaration
"register")

[Dean W. Anneser, Pratt & Whitney Aircraft]
-We have 7 of these beasties [pdp-11/55], and they're still running
-strong.  The memory configuration is 0-32kw bipolar, and 32-124kw MOS.
-We keep the time- critical code in the bipolar.  DEC has never
-produced a faster PDP-11.  We have benchmarked and are currently using
-the 11/73, 11/83, and 11/84, and the 11/55 will still run circles
-around them...

[Brian Utterback, Cray Research Inc.]
-Another advantage the PDP-10 had by mapping the registers to the
-memory space, other than indexing, was in execution.  You could load a
-short loop into the registers and jump to them!  The loop would run
-much faster, executing out of the registers.

Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University
(billo@cmx.npac.syr.edu)