[comp.realtime] Realtime and fault-tolerance together?

jessea@dynasys.UUCP (Jesse W. Asher) (11/03/90)

I was wondering if any realtime operating systems also incorporated
fault-tolerant considerations.  If so, how are they implemented software-wise?
How do hardware based fault-tolerance systems interact with software based
realtime?  It seems to me that they go hand in hand in many cases and that
one would want to implement both in a system.

As a side note, are they any people from Modcomp reading this newsgroup?


---*---*---*---*---*---*---*---*---*---*---*---*---*---*---*---*---*---*---*---
      Jesse W. Asher                             Phone: (901)382-1609 
               6196-1 Macon Rd., Suite 200, Memphis, TN 38134
                UUCP: {fedeva,chromc,rutgers}!dynasys!jessea
 -> GIVE:  Support the helpless victims of computer error.

dwells@fits.cx.nrao.edu (Don Wells) (11/05/90)

In article <720@dynasys.UUCP> jessea@dynasys.UUCP (Jesse W. Asher) writes:

   From: jessea@dynasys.UUCP (Jesse W. Asher)
   Newsgroups: comp.realtime
   Date: 2 Nov 90 16:35:30 GMT

   I was wondering if any realtime operating systems also incorporated
   fault-tolerant considerations.  If so, how are they implemented software-wise?
   How do hardware based fault-tolerance systems interact with software based
   realtime?  It seems to me that they go hand in hand in many cases and that
   one would want to implement both in a system.
...

I consider a hardware-based watchdog timer with automatic reboot to be
a form of fault-tolerance. In many cases a software-based watchdog
timer is almost as effective for this purpose.

--

Donald C. Wells, Assoc. Scientist  |        dwells@nrao.edu
Nat. Radio Astronomy Observatory   |         6654::DWELLS
Edgemont Road                      | +1-804-296-0277      38:02.2N
Charlottesville, VA 22903-2475 USA | +1-804-296-0278(Fax) 78:31.1W

valentin@cbmvax.commodore.com (Valentin Pepelea) (11/05/90)

In article <720@dynasys.UUCP> jessea@dynasys.UUCP (Jesse W. Asher) writes:
>
> I was wondering if any realtime operating systems also incorporated
> fault-tolerant considerations.  If so, how are they implemented
> software-wise?

Realtime operating systems typically provide for task exceptions handlers,
which are task specific functions that are called when an error occurrs during
a particular task's time slice. That task's function then decides on the
appropriate action to take, and corrects the fault under the same context
(at the same priority) that the fault occured. Of course, some exceptions such
as those generated by power faults, might require the same corrective action
no matter under what task's context it occurs.

> How do hardware based fault-tolerance systems interact with software based
> realtime?  It seems to me that they go hand in hand in many cases and that
> one would want to implement both in a system.

The typical hardware based fault tolerant system is one that uses several
units, and decides upon an action depending on what the majority of the
units vote to do. Democracy at work.

The typical hardware/software combined system is one where special circuitry
is used to detect a hardware fault to initiate a software recovery routine.
Although even in the case above some software might be necessary to control
the selection of a faulty unit, the saliant point here is that software is
the backbone of the fault tolerant system, and software will be used to
recover or circumvent the fault.

I'll let somebody more experienced give us some examples and juicy anecdotes.

Valentin
-- 
The Goddess of democracy? "The tyrants    Name:    Valentin Pepelea
may destroy a statue,  but they cannot    Phone:   (215) 431-9327
kill a god."                              UseNet:  cbmvax!valentin@uunet.uu.net
             - Ancient Chinese Proverb    Claimer: I not Commodore spokesman be

alex@vmars.tuwien.ac.at (Alexander Vrchoticky) (11/06/90)

jessea@dynasys.UUCP (Jesse W. Asher) writes:

>I was wondering if any realtime operating systems also incorporated
>fault-tolerant considerations.  If so, how are they implemented software-wise?
>How do hardware based fault-tolerance systems interact with software based
>realtime?  It seems to me that they go hand in hand in many cases and that
>one would want to implement both in a system.

What is software-based real-time ?

:-)

--
Alexander Vrchoticky,  Tech Univ Vienna, CS,  Dept for Real-Time Systems
Voice:  +43/222/58801-8168   Fax: +43/222/569149
Internet: alex@vmars.tuwien.ac.at Path: vmars!alex@relay.eu.net

steved@hrshcx.csd.harris.com (Steve Daukas) (11/06/90)

In article <720@dynasys.UUCP>, jessea@dynasys.UUCP (Jesse W. Asher) writes:
> I was wondering if any realtime operating systems also incorporated
> fault-tolerant considerations.  If so, how are they implemented software-wise?
> How do hardware based fault-tolerance systems interact with software based
> realtime?  It seems to me that they go hand in hand in many cases and that
> one would want to implement both in a system.

What do *you* mean by fault tolerance?  I've seen requirements were
a company would have to pay fines on the order of $20,000 a minute for
down-time.  In this case, you need a duplicate or triplicate system
with shared disks, etc., so that when something fails in one system,
you switch to the other.

One might define fault tolerance as the abiliy to be self correcting
in the sense of exception handeling...

In any case, the operating system usually doesn't have direct control
over the fault tolerance.  Its usually a matter of using the proper hooks
in the OS to provide for whatever capabilities make sense for the given
application.

I guess the next question is: what do *you* mean by real-time?  Are we
talking miliseconds or microseconds for response times?

Steve
--

Stephen C. Daukas              | sdaukas@csd.harris.com  
Harris Corporation             | uunet!hcx1!misg!sdaukas  
Computer Systems Division      | (617) 221-1834, (617) 221-1830

     "Old MacDonald had an agricultural real estate tax abatement."

varvel@cs.utexas.edu (Donald A. Varvel) (11/06/90)

One approach to getting realtime and fault-tolerance, within
certain assumptions, is self-stabilization.

The usual assumption for the investigation of self-stabilizing
programs is that the program will not be degraded, but that
any or all data may.  This assumes that ROM can be made not
to degrade, whereas RAM cannot.  Environments like the space
telescope come to mind.

A self-stabilizing system has a subset of states, usually
defined as those reachable from a predefined starting state,
which are acceptable.  The program in any unacceptable state
must eventually reach an acceptable state.

To make any claim to being "real-time", a self-stabilizing
program must reach an acceptable state within defined time
bounds.  I have a hard time visualizing such a system being
able to guarantee "hard" constraints, but there may be some
definition of "real time" that would be satisfied.

-- Don Varvel (varvel@cs.utexas.edu)

srp@modcomp.uucp (Steve Pietrowicz) (11/06/90)

In <720@dynasys.UUCP> jessea@dynasys.UUCP (Jesse W. Asher) writes:

] As a side note, are they any people from Modcomp reading this newsgroup?

You bet!
--
--------------
SR Pietrowicz    UUCP:  ...!uunet!modcomp!srp        CIS:  73047,2313