[comp.dcom.lans] "real-time" over a lan: token ring vs ethernet vs ?

berger@well.sf.ca.us (Robert J. Berger) (07/30/90)

We are looking to make a special purpose dedicated lan for controlling
up to 128 devices. These devices will run a real time os such as PSOS or
VxWorks. There will be up to 16 master devices made up of unix workstations
running Unix System V.4. The workstations will be initiating most traffic.

We will not be using the lan for traditional computer traffic, but just for 
control and status exchanges. Most messages will be short and would easily fit
into one datagram packet. They will probably be RPC's on top of UDP or TCP/IP.

The main time critical response we need is to have a guaranteed worst case of
the workstation sending a message to one or several of the slaves, where the
message must get to the slave within 5 milliseconds.  Most other traffic needs
to get from master to slave within 16 milliseconds.

There will be some backround traffic as well, but they are not generally time
critical.

It seems that ethernet would have problems guaranteeing the response time,
particularly if the backround traffic got somewhat heavy and thus start having
collisions.

We are strongly considering 16mbs 802.5 token ring for our lan based on our
reading (we have no actual experience with token ring). Theoretically token
ring should be able to meet our worst case traffic estimates and give us
the token often enough to meet the 5 millisecond response time, particularly
if we take advantage of the alleged ability of 802.5 to support global
priorities on the network.

My questions for the net are:
1. Are we barking up the wrong tree here? Is it concievable to get the response
   we need at all?
2. Does token ring really offer the kind of response time that it shows on 
   paper?
3. Does token ring global priorities really work and are they usable from UDP/
   TCP/IP?
4. Has anyone done something like this before and have some hard numbers?
   (with ethernet, token ring, or something else?)

Thanks, Bob
PS. Please use the email address in the signature, not the return address of
this posting!

-------------------------
Bob Berger
SONY Advanced Video Technology Center
677 River Oaks Parkway  San Jose, CA 95134  408-944-4964
[uunet,mips]sonyusa!sfcsun!berger  (soon: berger@sfc.sony.com)

hedrick@athos.rutgers.edu (Charles Hedrick) (07/30/90)

You've got to be real careful about this "guaranteed" stuff.  I think
it's mostly marketing propaganda.  The behavior of both Ethernet and
TR has statistical components.  These include instantaneous load on
the network and packets being dropped.  TR propaganda talks about
"guaranteeing" that the bandwidth is evenly split.  But the problem
is, generally there are enough hosts on the network that if they all
decide to transmit at full speed at the same time, you've got an
overload.  So the fact that things work at all is due to the random
nature of the traffic: all hosts don't transmit at once.  Generally if
you wanted to make a precise mathematical model of the thing, you'd
have to take the probability distributions for the offered load of
each source, and make sure that the combined load on the network is
within acceptable bounds "almost all" of the time.  Few of us can
afford networks fast enough that the worst case will be within bounds.
Maybe in your specific application you can.  The point is that you've
got to size the network so that you avoid an overload condition.
The fact that TR and Ethernet handle overloads somewhat differently
probably isn't going to affect you.

There are also queueing issues in the hosts.  Our experience with TR
is that many of the IBM TR cards have fairly small buffers -- smaller
than current Ethernet devices.  So in fact the servers don't handle
peak loads well at all.  They start losing packets.  Again, you may
not be able to afford to get enough buffers on the cards to handle all
slaves sending maximum size messages to the master at the same time.
You're going to have to depend upon reasonable statistics.

Finally, we have the issue of packets dropping.  Lots of things can
cause a packet to drop, including momentary noise somewhere.  On the
TR you have in addition the fact that when you boot an IBM PC (can't
speak for your workstatinos), its relay clicks in and interrupts the
network for a while.  We think the token gets lost and has to be
regenerated.  At any rate, the delay -- although only a fraction of a
second -- is still probably longer than the longest likely delay due
to collisions.  Ethernet is passive, so turning on and off a device
does not cause any interruption.

The net effect of all of this is that you probably want to set up a
mockup or do a fairly detailed simulation.  Or if the bandwidths
involved are small enough, just rely on massive overkill.  If the
messages are small enough and don't happen that often, you've got
enough extra bandwidth that it's unlikely you'll have any problems
with either technology.  But if you're at all close to the limits,
what actually happens as you near overload is not the simple thing
that TR propaganda implies.

kwe@bu-it.bu.edu (Kent England) (07/31/90)

In article <Jul.29.17.57.05.1990.14474@athos.rutgers.edu>,
 hedrick@athos.rutgers.edu (Charles Hedrick) writes:
> You've got to be real careful about this "guaranteed" stuff.

	How right.  Even "guaranteed" begs the question on exactly
what is guaranteed.

	I recall a discussion about FDDI that started with figuring
out effective throughput and ended up talking about the effect on
throughput of setting certain token rotation timers.  I believe
that discussion was right here on comp.dcom.lans two or three months
ago.  I can't recall exactly the other principals in the discussion or
I would give credit for info.

	One thing I came away with from that discussion was that
some of the worst case maximum token rotation times, for given
congestion and timer settings, were positively geologic timeframes.

	So, token ring may be "guaranteed", but how long are you going
to wait for a given target token rotation time?  And what happens to the 
guarantee when the token is lost, or other losses occur?  It does not
hold.

	Seems to me that our profession is beyond making these sorts
of arguments for one technology over another.  It's out of fashion,
like saying Ethernet can't sustain more than 3 Mbps throughput.

	--Kent

andrew@dtg.nsc.com (Lord Snooty @ The Giant Poisoned Electric Head ) (07/31/90)

In article <61624@bu.edu.bu.edu>, kwe@bu-it.bu.edu (Kent England) writes:
> 	[..]So, token ring may be "guaranteed", but how long are you going
> to wait for a given target token rotation time?  And what happens to the 
> guarantee when the token is lost, or other losses occur?  It does not
> hold.

I more or less agree with all that's been said, except to point out that
FDDI uses a "timed token" protocol whereas 802.5 Token Ring does not.
A fellow called Werner Bux (IBM Zurich I believe) has published many papers
analysing loading vs. offered traffic (and vs. many other parameters).

Also, it was mentioned that "the token is believed to be lost" when a new
station inserts and clicks in its MAU relay, thus causing longish delay.
This is not what the protocol specifies; the Initialisation Phase merely
causes the new station to exchange special MAC frames with designated
Server nodes on the ring. At any rate, there should be no loss of token
*theoretically*.
-- 
...........................................................................
Andrew Palfreyman	Incidentally, in English, the name of the planet
andrew@dtg.nsc.com 	is "Earth".		- Henry Spencer

chris@yarra.oz.au (Chris Jankowski) (07/31/90)

In article <19300@well.sf.ca.us> berger@well.sf.ca.us (Robert J. Berger) writes:
>
> We are looking to make a special purpose dedicated lan for controlling
> up to 128 devices. These devices will run a real time os such as PSOS or
> VxWorks. There will be up to 16 master devices made up of unix workstations
> running Unix System V.4. The workstations will be initiating most traffic.
	  ^^^^^^^^^^^^^^^
> The main time critical response we need is to have a guaranteed worst case of
> the workstation sending a message to one or several of the slaves, where the
> message must get to the slave within 5 milliseconds.  Most other traffic needs
                                       ^^^^^^^^^^^^^^^
I am just wondering about the following:
1. You say you will be using UNIX System V release 4 on your workstations.
		and
2. You require *guaranteed* delivery time of a packet send by those 
workstations within 5 miliseconds, thus the workstation has to issue
those packets with time resolution not worse then 5 miliseconds I presume.

Now consider the following:
There is an interrupt for your real-time application on the workstation
but the workstation just happens to be prosessing some stuff in a critical
section of the kernel, which cannot be interrupted.
So the kernel continues on its merry business and time flies.
I remeber reading an HP paper a few years ago saying that it can take
up to a second before the standard UNIX kernel switches the interrupts on.
Or you need the so called preemptive kernel.
I know very little about SVR4 but I think that it is not preemptive.
I believe it was to be and I vaguely remember 10ms mentioned sometime
in 1988 but I think that it was quietly dropped in 1989.

My conclusion is that it looks as your 5ms may be insignificant 
compared to variability of the time on workstations.

Am I right or wrong or maybe I do not know something important?
Anybody cares to comment? 

      -m-------   Chris Jankowski - Senior Systems Engineer chris@yarra.oz{.au}
    ---mmm-----   Pyramid Technology Corporation Pty. Ltd.  fax  +61 3 820 0536
  -----mmmmm---   11th Floor, 14 Queens Road                tel. +61 3 820 0711
-------mmmmmmm-   Melbourne, Victoria, 3004       AUSTRALIA       (03) 820 0711

micron n. - a unit of length of one milionth of a meter, 
            worth $2,000,000,000 since the fault in the Hubble space telescope 
            has been identified.

rpw3@rigden.wpd.sgi.com (Rob Warnock) (08/04/90)

In article <61624@bu.edu.bu.edu> kwe@bu-it.bu.edu (Kent England) writes:
+---------------
|  hedrick@athos.rutgers.edu (Charles Hedrick) writes:
| > You've got to be real careful about this "guaranteed" stuff.
|  ... I recall a discussion about FDDI that started with figuring
| out effective throughput and ended up talking about the effect on
| throughput of setting certain token rotation timers.  I believe
| that discussion was right here on comp.dcom.lans two or three months
| ago.  I can't recall exactly the other principals in the discussion or
| I would give credit for info.
+---------------

I was one of them.   ;-}    It was in late May, as I recall.

+---------------
| 	One thing I came away with from that discussion was that
| some of the worst case maximum token rotation times, for given
| congestion and timer settings, were positively geologic timeframes.
+---------------

*Seconds*! Like, 82 of them! Max config ring, 500 dual-attach single-MAC
stations in a 100 km circumference circle, that is, 200 Km of fiber.
[My previous number of 164 seconds was for 1000 single-attach stations
with no concentrators, which isn't really a legal configuration.]

And, quoting further from myself in that discussion:

   "Get real!", you say? O.k., let's say that since we really don't want
   any given file server to be able to bang out more than 90% of the net
   (even if we ask him to), we can run with T_Opr as small as 16ms (or,
   max_bytes/rotation == 200K). That still means 16 *seconds* before you
   get to send, if everybody else suddenly wants to send 200K, and you
   happened to have just missed capturing the token. (That is, the "storm"
   started with the guy immediately downstream of you.)

[Oops! "Only" 8 seconds if you have 500 dual-attach single-MAC stations.]

   "That's still ridiculous!", you say? O.k., so we don't have 1000 nodes in
   the ring, we have 200. And we don't have 200km of fiber, we have 20km. Now
   we're down to something that you could easily see in a single department,
   if all the fiber happens to be "starred" out from a central wiring closet.
   (Dual-attach, so the average station is 25 meters from the closet. Real
   enough?) Idle TRT is down to 320usec, so we set T_Opr to 5.66ms (so our
   expensive file server can at least send a 64K window per token, with 4096
   bytes of user data per packet, and thus get about 93% of FDDI), and it
   *still* can be as much as 1.1 seconds before you get to send if everybody
   else suddenly wants to send only 64K.

I'll still stand by those numbers [as modifed above]...


-Rob

-----
Rob Warnock, MS-9U/510		rpw3@sgi.com		rpw3@pei.com
Silicon Graphics, Inc.		(415)335-1673		Protocol Engines, Inc.
2011 N. Shoreline Blvd.
Mountain View, CA  94039-7311

ken@minster.york.ac.uk (08/09/90)

In article <64442@yarra.oz.au> chris@yarra.oz.au (Chris Jankowski) writes:
>In article <19300@well.sf.ca.us> berger@well.sf.ca.us (Robert J. Berger) writes:
>>
>> We are looking to make a special purpose dedicated lan for controlling
>> up to 128 devices. These devices will run a real time os such as PSOS or
>> VxWorks. There will be up to 16 master devices made up of unix workstations
>> running Unix System V.4. The workstations will be initiating most traffic.
>	  ^^^^^^^^^^^^^^^
>> The main time critical response we need is to have a guaranteed worst case of
>> the workstation sending a message to one or several of the slaves, where the
>> message must get to the slave within 5 milliseconds.  Most other traffic needs
>                                       ^^^^^^^^^^^^^^^
>My conclusion is that it looks as your 5ms may be insignificant 
>compared to variability of the time on workstations.
>
>Am I right or wrong or maybe I do not know something important?
>Anybody cares to comment? 

Well pointed out. I still laugh at people selling `Real-Time' Unix, with
claims like "We run real-time Unix, and you can use NFS, etc, etc". If you run
NFS then you ain't running in real-time. If you want guaranteed response times
don't use Unix, use a real-time operating system.

Ken

--
Ken Tindell             UUCP:     ..!mcsun!ukc!minster!ken
Computer Science Dept.  Internet: ken%minster.york.ac.uk@nsfnet-relay.ac.uk
York University,        Tel.:     +44-904-433244
YO1 5DD
UK

mack@wizzle.enet.dec.com (Dick Mack) (08/16/90)

|> 
|> Also, it was mentioned that "the token is believed to be lost" when a new
|> station inserts and clicks in its MAU relay, thus causing longish delay.
|> This is not what the protocol specifies; the Initialisation Phase merely
|> causes the new station to exchange special MAC frames with designated
|> Server nodes on the ring. At any rate, there should be no loss of token
|> *theoretically*.
|> -- 
|> ...........................................................................
|> Andrew Palfreyman	Incidentally, in English, the name of the planet
|> andrew@dtg.nsc.com 	is "Earth".		- Henry Spencer
|> 

If what you are talking about here is 'graceful insertion', one has to be
careful - even a single attachment end station can add enough delay to blow
timers and cause a ring re-initialization. When one takes into consideration
that two operational segments might be joined, then there has to be some
guarantee that there are no multiple tokens and that the insertion has not
caused frames to be concatenated. Ensuring token loss so that the total ring
reconfigures is an easy way to accomplish this.

Dick Mack

mack@wizzle.enet.dec.com (Dick Mack) (08/29/90)

|> 
|> Also, it was mentioned that "the token is believed to be lost" when a new
|> station inserts and clicks in its MAU relay, thus causing longish delay.
|> This is not what the protocol specifies; the Initialisation Phase merely
|> causes the new station to exchange special MAC frames with designated
|> Server nodes on the ring. At any rate, there should be no loss of token
|> *theoretically*.
|> -- 
|> ...........................................................................
|> Andrew Palfreyman	Incidentally, in English, the name of the planet
|> andrew@dtg.nsc.com 	is "Earth".		- Henry Spencer
|> 

If what you are talking about here is 'graceful insertion', one has to be
careful - even a inserting a single attachment end station can add enough delay
to blow
timers and cause a ring re-initialization. When one takes into consideration
that two operational segments might be joined, then there has to be some
guarantee that there are no multiple tokens and that the insertion has not
caused frames to be concatenated. Ensuring token loss so that the new combined
segment
recinitialized is an easy way to accomplish this.

Dick Mack



--
---                                                 |
Dick Mack       mack@wizzle.enet.dec.com            | To err is human;
                                                    | to moo, bovine.
          or    mack%wizzle.enet@decwrl.dec.com     |
---                                                 |

pat@hprnd.HP.COM (Pat Thaler) (09/01/90)

|> 
|> Also, it was mentioned that "the token is believed to be lost" when a new
|> station inserts and clicks in its MAU relay, thus causing longish delay.
|> This is not what the protocol specifies; the Initialisation Phase merely
|> causes the new station to exchange special MAC frames with designated
|> Server nodes on the ring. At any rate, there should be no loss of token
|> *theoretically*.
|> -- 
From IEEE 802.5:

  7.4.2 Insertion/Bypass Transfer Timing.  The insertion/bypass mechanism
  shall break the existing circuit before establishing the new circuit.
  The maximum time that the ring trunk circuit is open shall not exceed
  5 ms.

and

  6.4 Symbol Timing ....

    (2) Whenever a station is inserted into the ring or loses phase
  lock with the upstream station, it shall, upon receipt of a signal
  which is within specification from the upstream station (re)aquire
  phase lock within 1.5 ms.
    (3) .....

So it looks to me like station insertion causes at least a 6.5 ms break
in the network, perhaps longer since all the stations between the
inserting station and the master lose and reaquire lock.  During
that time packets or the token can be lost.

Pat Thaler