[comp.realtime] real-time multicomputer systems

mcculley@alien.enet.dec.com (05/26/90)

In article <9005241526.AA12356@fsucs.cs.fsu.edu>, groh@fsucs.cs.fsu.edu (Jim Groh) writes...
>In article <4228@hcx1.SSD.CSD.HARRIS.COM>, steve@SSD.CSD.HARRIS.COM (Steve Brosky) writes:
>>
>>* contiguous disk files --
>>	allows faster disc I/O because seeks are eliminated
> 
>          Isn't that also dependent on file size and disk configuration?
> 

Not only that, it makes a lot of assumptions about the nature of the 
disk i/o (IMHO questionable ones at that).  Basically, seeks can be
eliminated only for the file blocks contained in the track(s)/cylinder(s)
presently under the heads (and maybe the next in sequence).  So the elimination
of seeks will be true only for one limited portion of one (possibly large?)
file.  Access to other files, and other parts of that file, will require seeks.
The overhead involved in accomplishing those seeks may or may not be increased
due to the simplistic contiguous file structure (in my experience Murphy will
insist that it is).

There is a related assumption that there will be only one single access stream
to a disk at any time, or else seeks will still be required.  So that
multiprocessor configuration either needs a disk per processor, or some of them
will have to wait for that single access stream to become available in order to
hit the disk without causing those dreaded seeks.

And I won't ask about extending files...

Contiguous files are a very simple structure, as such they minimize overhead at
the expense of capabilities.  Might be the best thing for a system requiring
only one disk access stream, on the other hand if you have multiple processors
wouldn't multiple disk access streams be nice too?  Like any tradeoff, you pays
your money, you takes your choice.

- Bruce McCulley
  RSX-11 Software Development
  Digital Equipment Corp.
  Nashua, NH USA

sccowan@watmsg.uwaterloo.ca (S. Crispin Cowan) (05/26/90)

In article <4228@hcx1.SSD.CSD.HARRIS.COM> steve@SSD.CSD.HARRIS.COM (Steve Brosky) writes:
>CX/RT, from Harris Computer Systems, is a Unix compatible operating system that
>runs on the Night Hawk platform: a tightly-coupled, shared-bus symmetric 
>multiprocessor containing up to 8 MC68030 CPUs, per-processor external caches
>and local memories, and a large global memory.  All times listed below 
>are for a 20MHz 68030.
[stuff]
>* fast context switch times -- 40-60 microseconds

This is obviously crucial to real-time response, and seems much faster
than conventional Unix context switching time.  How did you get Unix to
context switch so quickly?

Crispin
----------------------------------------------------------------------
(S.) Crispin Cowan, CS grad student, University of Waterloo
Office:		DC3548	x3934		Home phone: 570-2517
Post Awful:	60 Overlea Drive, Kitchener, N2M 1T1
UUCP:		watmath!watmsg!sccowan
Domain:		sccowan@watmsg.waterloo.edu

"You can have peace.  Or you can have freedom.  Don't ever count on
having both at once."
	-Lazarus Long
"You can't separate peace from freedom because no one can be at peace
unless he has his freedom."
	-Malcolm X

uemura@isvax.isl.melco.co.jp (Joe Uemura) (05/29/90)

In article <4228@hcx1.SSD.CSD.HARRIS.COM>, steve@SSD.CSD.HARRIS.COM (Steve Brosky) writes:
<lots of stuff deleted>
> 
> 
> The operating system supports real-time features like:
	.
	.
> * real-time process synchronization --
> 	very fast synchronization primitives to coordinate access to shared data
> 	between cooperating processes (this is not AT&T system V IPC semaphores,
>     they are too slow!).  The synchronization primitives which block, will
> 	also place a bound on priority inversion.
> 
Could you please elaborate more on what you mean by placing "a bound on
priority inversion"? Is this some form of avoidance protocol?

I would also like to hear if anyone knows of other
realtime Unix adaptations which deal with priory inversion.


Joe Uemura
Mitsubishi Electric Co
ISED Lab
Parallel Computing Group
Kamakura, Japan

buck@siswat.UUCP (A. Lester Buck) (05/30/90)

In article <17536@isvax.isl.melco.co.jp>, uemura@isvax.isl.melco.co.jp (Joe Uemura) writes:
> > * real-time process synchronization --
> >     The synchronization primitives which block, will
> > 	also place a bound on priority inversion.
> > 
> Could you please elaborate more on what you mean by placing "a bound on
> priority inversion"? Is this some form of avoidance protocol?
> 
> I would also like to hear if anyone knows of other
> realtime Unix adaptations which deal with priory inversion.

I am not sure this is what you want, but an excellent tutorial on
realtime scheduling theory and priority inversion problems in Ada
is in the following article:

"Real-Time Scheduling Theory and Ada", by Lui Sha and John B. Goodenough
IEEE Computer, April 1990.

This is a very readable tutorial on the rate-monotonic theory of realtime
scheduling.  The rate-monotonic algorithm is the simple - it assigns
the higher priority to tasks with the shortest periods.  It is analogous
to linear system theory - most everything is exactly solvable, but the
processor utilization is somewhat lower than a scheduler that does not
follow the rate-monotonic algorithm.  References in the article extend
the theory to multi-processors.

Then the authors show how the Ada tasking model matches the rate-monotonic
algorithm well, in allowing designers to ignore time line type scheduling
and depend on the Ada runtime to schedule tasks successfully according to
the algorithm.  But they also point out how realtime programming  is
somewhat of a joke in Ada.  "Of course, telling programmers to assign
'Rate_Monotonic_Priorities' to tasks but not to use pragma PRIORITY
surely says we are fighting the language rather than taking advantage of
it.  But, the important point is that no official revisions to the
language are needed."  And all this from a language _designed_ to
be used for realtime, embedded systems!  That's what happens when
the language is designed by a committee of academics and finalized
by a commercial company that didn't do its homework, years before
a working compiler existed and the people who would be using the
language got a chance to code something.  "Priority inversion, what's
that?"

Various Ada runtime environments are adding rate-monotic support, and
from a phone conversation with Lui Sha I learned that Lynx Realtime
Systems wants to add the rate-monotonic algorithm to its scheduler.
It is no small coincidence that Lynx OS was chosen for the space
station, since the rate-monotic algorithm is an excellent fit to the
open realtime system planned for that project.  As long as all vendors
follow the rate-monotonic discipline with their realtime tasks, they
can code and combine their tasks independently and know that the
tasks will meet their deadlines.

As an aside, the space station leads to some unique realtime problems, with
its open environment and the requirement that once it starts up, it should
never shut down, even during software upgrades.  By contrast, the shuttle
on-board software was written by exactly one company (IBM) and can be
upgraded between missions.  The only advantage the space station has is that
if a system fails for a few minutes, (usually) nothing really bad happens.
Life support, ventilation, experiments, telemetry, all can stand a few
minutes while systems reboot or whatever (unless the system has done
something _really_ stupid, of course...).

-- 
A. Lester Buck     buck@siswat.lonestar.org  ...!texbell!moray!siswat!buck

monty@SSD.CSD.HARRIS.COM (Monty Norwood) (05/30/90)

The benefit of contiguous files is indeed the elimination of seek times.
As noted by many posters to this group, the success of this is dependent
on many things, not just the contiguous nature of the file.

The primary benefit is in cases where a disk, in a real time application,
is dedicated as an archive. Essentially a large circular (possibly) buffer
saving the last x seconds, minutes, or hours of the raw data gathered.
Telemetry applications often need this. Missile range stuff where raw data
comes in fast and furious but only for a short duration of time can benefit
from this use.

On the Night Hawk system, contiguous disk files are typically used with the
direct disk I/O feature (bypassing the buffer cache) so that large chunks of
data can be written to the disk on a periodic basis as quickly as possible.
These features are intended to be used in stringent real time environments
where the environment is controlled. 

Clearly, if there is other activity to the same disk or the accesses to the
file are random (as opposed to sequential) then all the noted problems 
in other postings make this feature a moot point. Expanding the file is
indeed a problem.

It is not perfect, but very useful in some applications, particularly real
time.

steve@SSD.CSD.HARRIS.COM (Steve Brosky) (05/30/90)

>         What are spin locks??
>         (hope this isn't a frequently asked question!! )
spin locks, or busy waiting locks, are locks for which a user never blocks.
When the user tries to lock a spin lock which is already locked, he spins
in a tight loop, waiting for the lock to be freed.  Spin locks are only 
used to protect critical sections which hold the lock for a very short period
of time, and therefor not worth the overhead of blocking the process.

Steve Brosky                        sabrosky@ssd.csd.harris.com
Harris Computer Systems Division
2101 W Cypress Creek Rd
Fort Lauderdale, Fla 333122

steve@SSD.CSD.HARRIS.COM (Steve Brosky) (05/30/90)

>>* fast context switch times -- 40-60 microseconds

>This is obviously crucial to real-time response, and seems much faster
>than conventional Unix context switching time.  How did you get Unix to
>context switch so quickly?

A lot of effort was put into optimizing context switches.
Sorry I can't tell you about some of the more subtle tricks we've 
used.  However I should explain what is meant by this ambiguous term
"context switch time".  The 40-60 microseconds is the time to:

- save the state (registers) of the old process
  (note that this machine has an enhanced floating point unit so this also
   includes 8 extra floating point registers)
- select the new process
- restore the state (VM, registers) of the new process
- invalidate the caches


This context switch time does not include the time used to decide that 
a context switch is required. For example, in the case where a process 
expires its quantum (determined in a clock interrupt routine),
the processing time consumed by the clock interrupt is not included 
in the quoted context switch time.

Certain people are more interested in the time it takes to transition from
process A to process B when process A takes some action that causes process
B to run. There are a number of different primitives available for this on
our system that take the form of a system call. The "real-time" primitives
take from 170-200 microseconds for this to happen. This time is measured
from immediately before process A makes the "wake-up" system call (actually
the call to the library interface to the syscall) to immediately after process
B returns from the "blocking" system call.  This is actual wall time
from the last useful instruction in user process A to the first useful
instruction in user process B. The 40-60 microsecond context switch is part of
this time.

Depending on the synchronization method used, these times vary considerably.
Signals, for example, are far less efficient than the mechanisms alluded to
above. There are other cases where the times are better and approach 100
microseconds for a transition from last user mode instruction to first user
mode instruction.

Steve Brosky                        sabrosky@ssd.csd.harris.com
Harris Computer Systems Division
2101 W Cypress Creek Rd
Fort Lauderdale, Fla 333122

drk@athena.mit.edu (David R Kohr) (06/04/90)

Monty Norwood's discussion of the contiguous disk file facility in 
Harris' Nighthawk multiprocessor system got me wondering about how other
people out there handle the recording of massive amounts of instrumentation
data which occurs in short but dense bursts.  

The project I'm on here at Lincoln involves recording radar data at a rate 
of 13 Mb./sec. in the current version of the system, for a period of a 
minute or two; we may ultimately produce a system which records up to
40 Mb./sec.  We had to buy a very expensive digital tape recorder from
AMPEX Corporation to handle this rate, and have lots of specialized
hardware built and tied together with a custom high-speed data bus, to
feed the AMPEX recorder quickly enough.  But this system was essentially
designed three years ago, and I have been wondering if there are other
alternatives available nowadays for handling this kind of data rate.

--
David R. Kohr     M.I.T. Lincoln Laboratory      Group 45 ("Radars 'R' Us")
    email:    DRK@ATHENA.MIT.EDU (preferred)  or  KOHR@LL.LL.MIT.EDU
    phone:    (617)981-0775 (work)	      or  (617)527-3908 (home)

steve@SSD.CSD.HARRIS.COM (Steve Brosky) (06/05/90)

>> * real-time process synchronization --
>> 	very fast synchronization primitives to coordinate access to shared data
>> 	between cooperating processes (this is not AT&T system V IPC semaphores,
>>  they are too slow!).  The synchronization primitives which block, will
>> 	also place a bound on priority inversion.
>> 

> Could you please elaborate more on what you mean by placing "a bound on
> priority inversion"? Is this some form of avoidance protocol?

These blocking primitives implement basic priority inheritance.
This means that when process A attempts to lock one of these locks, and
it is already locked by the lower priority process B, we guarantee that
process B will get an effective priority at least as high as process A.
This allows process B to run until he releases the lock, and loses his
higher effective priority.  The highest priority process that was waiting
for the lock will now be run.

This scheme does not eliminate priority inversion, note that while
process B was running inside the critical section above a priority inversion
was in effect.  However the length of the priority inversion is bounded to
the longest hold time of the lock.  Basic priority inheritance is also not
the only approach to priority inversion.  The priority ceiling protocol
provides better worst-case bounds but is more difficult to implement.

For more information see the up coming summer USENIX proceedings for
the article "An Implementation of Real-Time Thread Synchronization"
by Mark Heuser.

Steve Brosky   sabrosky@ssd.csd.harris.com
Harris Computer Systems Division
Fort Lauderdale, Fla.

geoff@modcomp.UUCP (06/07/90)

The discussion on contiguous disk files has concentrated on the elimination 
of seek times as being the major benefit.  Another important benefit is 
the ability to pass more data in each disk i/o operation, reducing the 
total number of such operations.  An additional benefit on reads is that
modern disks often have considerable amounts of RAM and autonomously read ahead
into this local cache.  Contiguous files benefit from this feature.

Here at Modcomp we have produced REAL/IX, a Unix System V based realtime
operating system which I'll use as the basis for discussion.  In addition
to the standard System V file system,  we have yet another fast file system.  
The allocater on the latter is designed to place file blocks contiguously 
wherever possible, making a "mostly contiguous file system".  Files are 
normally accessed via the buffer cache.  If a sequential read is detected, 
REAL/IX takes advantage of any contiguity by reading several blocks from disk 
into several buffers in the cache in a single I/O operation.  Similarly, 
writes to contiguous disk blocks can be combined into a single operation.

We find that for applications that make intensive use of sequential I/O 
to large files, throughput can be approximately tripled.  In a development
environment, the number of disk I/O operations is reduced by, roughly,
a factor of 30%.  Your mileage may vary.

The trade off for using mostly contiguous files is the increased overhead 
of the block allocater. For very small files or for temporary files that 
will never touch the disk the standard allocater will be marginally better.
The use of larger block sizes is an alternative method of reducing the 
number of I/O operations, involving a different set of trade offs.

There is nothing revolutionary about these techniques, variants of which have
been adopted by some other vendors.  Their absence from the Unix porting base 
has lead to their absence from most Unix implementations, leading to the urban 
myth that Unix filesystems have intrinsically poor performance.  

Explicitly preallocated contiguous disk blocks are the current empirical
method for obtaining maximum disk I/O throughput.  If disk and controller 
are used for no other purpose, guaranteed and small access times result
(with the caveat that file accesses cannot be made via the buffer cache).
REAL/IX also provides prioritisation of disk I/O requests from realtime 
processes to allow shared use while providing access time guarantees.  
The use of contiguous files with asynchronous I/O operations is a natural 
combination when attempting to obtain the maximum possible throughput.  
Here we find another benefit of contiguity,  simpler asynchronous file I/O.  

As has been noted, there are problems with explicitly preallocated contiguous 
files, particularly what to do when you come to the end.  There is probably
no single right way to deal with this, so we allow the user to specify 
what action to take.

Finally, I'd like to follow up on Steve Brosky's interesting figures for the
Harris Night Hawk system. I appreciated the care used in explaining just what 
the numbers referred to.  One of the benefits of the Posix 1003.4 standard, 
when it is eventually finished, should be that a set of common performance 
metrics will come into use, allowing users to directly compare systems.

Geoff Hall                                    uunet!modcomp!geoff
Modcomp                                       modcomp!geoffi@uunet.UU.NET
1650 West McNab Rd
Ft Lauderdale, FL 33340