[comp.dcom.sys.cisco] poor inter-area routed DECnet performance

Leonard@arizona.edu (Aaron Leonard) (06/08/90)

We have experienced what you might call negative synergy
between DECnet traffic being routed thru a cisco and
then between level II routers. 

The case in which the difficulty emerges is somewhat 
complicated to explain, but quite reproducible, so please bear
with me.

                     [ UAZHE0   46.437 ]   [ UAZHE4  46.365 ]
                     [ level II router ]   [    endnode     ]
                          |                  |
[ MAGGIE   50.204 ]--(  large bushy bridged   )--[ CIRRUS  50.140 ]
[ level II router ]  ( ethernet <128.196.128> )  [    endnode     ]
                          |
                     [ PANCHO   50.222 ]
( repeatered enet )--[  cisco AGS II   ]--( large repeatered       )
( <128.196.28>    )  [  level I router ]  ( ethernet <128.196.120> )
        |                                          |
  [ ECEVAX  50.111 ]                        [ DOC   50.231 ]
  [   endnode      ]                        [   endnode    ]

In the above picture, consider any endnode to be representative of a
large number of topologically identical endnodes.  We use the IP
subnet terminology simply as means of identifying Ethernets.  (All
connections above are ethernet; all DECnet nodes but PANCHO are
VAXen.

Note that this is an unusual configuration, in that we have nodes
in multiple DECnet areas residing on the same ethernet (on
128.196.128.)  (This is necessitated by the peculiarities of HEPnet
routing.)

We have found that traffic flows quickly (1) amongst all nodes in
this network EXCEPT for the case where DOC or ECEVAX (or any other
node in subnets 128.196.120 and 128.196.28) tries to communicate
with any node in area 46.  In that case, the traffic flow is
consistently an order of magnitude slower (2).  These results
have been verified by many tests using a large number of
pairs of nodes.

The traffic flow in the too-slow case is as follows: area 50 endnode
to PANCHO (AGS), PANCHO to MAGGIE (level II router for area 50), 
MAGGIE to UAZHE0 (lev II router for area 46), UAZHE0 to area 46
endnode.  Note that the lev II routers are NOT the bottleneck;
traffic that flows between e.g. CIRRUS and UAZHE4, which passes
thru MAGGIE and UAZHE0, but not thru PANCHO, is quick.  Note also
that PANCHO by itself is not the bottleneck, as traffic between
e.g. CIRRUS and DOC is quick.  The only case where traffic is slow
is where inter-area traffic is routed thru PANCHO.

Notes.

All traffic tests were run using DEC's DTSEND utility, via the following
command sequence:

$ MC DTSEND
Test: DATA/PRINT/STATISTICS/TYPE=ECHO/SIZE=500/SECONDS=60/NODE=node

This utility tests task-to-task NSP throughput between DECnet
phase IV nodes.

(1) "Quick" data flow is in the range of 400Kbps to 1.2Kbps.

(2) "Slow" data flow is in the range of 40Kbps to 100Kbps.

tinkelman@ccavax.camb.com (06/08/90)

In article <21919@megaron.cs.arizona.edu>, Leonard@arizona.edu (Aaron
Leonard) described a problem involving slow throughput between certain
pairs of nodes (in a picture, partially reproduced below).

I can't offer a coherent explanation of the differences you reported, but I
do want to comment on one of your examples, that was the throughput between
endnodes CIRRUS and UAZHE4:

>                      [ UAZHE0   46.437 ]   [ UAZHE4  46.365 ]
>                      [ level II router ]   [    endnode     ]
>                           |                  |
> [ MAGGIE   50.204 ]--(  large bushy bridged   )--[ CIRRUS  50.140 ]
> [ level II router ]  ( ethernet <128.196.128> )  [    endnode     ]
... 
>             Note that the lev II routers are NOT the bottleneck;
> traffic that flows between e.g. CIRRUS and UAZHE4, which passes
> thru MAGGIE and UAZHE0, but not thru PANCHO, is quick.  

My comment is that DECnet will use the intermediate level II routers only
to help the two end nodes find each other.  Once the circuit between them
is established, CIRRUS and UAZHE4 will communicate directly with each other.
This means there will be no intermediate DECnet routing *and* max size 
Ethernet packets can be used.
-- 
Bob Tinkelman, Cambridge Computer Associates, Inc., 212-425-5830              
bob@camb.com  or ...!{uupsi,uunet}!camb.com!bob

aaron@dragoon.telcom.arizona.edu (Aaron Leonard) (06/09/90)

In article <25934.266f66c4@ccavax.camb.com>, tinkelman@ccavax.camb.com 
(Bob Tinkelman) corrects an assumption I made in my earlier posting
concerning poor inter-area routing performance.  I had implied that
traffic between two endnodes in different areas on the same Ethernet
will pass between the area routers.

|> >             Note that the lev II routers are NOT the bottleneck;
|> > traffic that flows between e.g. CIRRUS and UAZHE4, which passes
|> > thru MAGGIE and UAZHE0, but not thru PANCHO, is quick.  
|> 

Bob:
|> My comment is that DECnet will use the intermediate level II routers only
|> to help the two end nodes find each other.  Once the circuit between them
|> is established, CIRRUS and UAZHE4 will communicate directly with each other.
|> This means there will be no intermediate DECnet routing *and* max size 
|> Ethernet packets can be used.
|> -- 

Excellent!  We're on to something here.  Bob is right - the DECnet traffic
between CIRRUS and UAZHE4 will indeed short-circuit the area routers and
travel directly over the ethernet. 

And this result, in fact explains why the performance is so poor when the
cisco router is entered into the loop: because the short-circuiting of
same-Ethernet circuits between different-area nodes ONLY HAPPENS when the
nodes are END NODES!  When the set-up path is: Area-A-endnode-on-Eth1 ->
Area-A-lvl-I-rtr -> Area-A-lvl-II-rtr-on-Eth2 -> Area-B-lvl-II-rtr-on-Eth2 ->
Area-B-lvl-II-endnode-on-Eth2, the short circuit between "Area-A-lvl-I-rtr" and
"Area-B-lvl-II-endnode" is never made.  Rather, all traffic between the
endnodes will continue to flow thru every single router in the loop.
(Oh, for an icmp redirect!)

I verified that this is not a problem with the cisco; rather, an identical
path traced thru a VAX/VMS router produced identical results - so by
definition, the cisco is routing correctly!

This brings up, then, another question: if all my DECnet connections
into my ciscos are to direct-attached ethernets, then is there any
reason at all to run DECnet routing on the ciscos?  (Assume here that
equal-cost path splitting is not a topological possibility.)  Why not
just bridge the  whole ball of wax?  In this (admittedly pathological) 
case, at least, bridging will produce better throughput and reduce
host load, right?

tinkelman@ccavax.camb.com (06/09/90)

In my prior article <25934.266f66c4@ccavax.camb.com>, I should have made an
additional observation.  I guess I forgot, because it didn't bear on your
_immediate_ performance related question.  But it does bear on the future.

 ***  The pictured configuration will `soon' be illegal.  DECnet/OSI  ***
 ***  (Phase V) will not allow multiple areas on the same Ethernet.   ***

With Phase V you probably will have all the nodes in your picture in the
same area, and therefore avoid the `extra' hops of area routers.  Packets
will flow NodeOnEnet1-Cisco1-Cisco2-NodeOnEnet2 with no `extra' hops at
DECnet area routers.  

I said `probably' in the above.  You could keep the nodes on each physical
Ethernet in separate DECnet areas if the Cisco boxes will be able to act
as DECnet Phase V Level 2 routers.  (Will they be able to do that?)  You
could also maintain separate DECnet areas on the two LANs *and* bridge
them, if you have the bridges filter all the appropriate DECnet level 2 
routing multicasts.  This latter configuration _should_ work, though I'm
not sure if DEC will say it's supported.

Despite the two alternatives in the preceeding paragraph, I still think 
that unless there is some very strong (and strange?) *technical* reason 
not to do so, you will find it better to go to a single DECnet area.
(DEC's position is certainly that DECnet areas should reflect network
topology, not administrative responsiblities.)
-- 
Bob Tinkelman, Cambridge Computer Associates, Inc., 212-425-5830              
bob@camb.com  or ...!{uupsi,uunet}!camb.com!bob

kph@dustbin.cisco.com (Kevin Paul Herbert) (06/12/90)

>And this result, in fact explains why the performance is so poor when the
>cisco router is entered into the loop: because the short-circuiting of
>same-Ethernet circuits between different-area nodes ONLY HAPPENS when the
>nodes are END NODES!
Yes, this is quite correct. Depending on software version, end-nodes
keep either a cache of the nodes on the same cable, or the previous
hop to get to a node they have been in contact with. Old (pre VMS V5.0
nodes do the former), new do the later.

When a DEC system originates a packet onto the network, it sets a bit
in the message header which means "originated on this cable". If a router
switches a packet on to a different circuit, it clears this bit - if it
is staying on the same cable, it doesn't touch the bit. 

When a receiving end-system gets a message with the bit set, it creates
a cache entry indicating that messages to that node can be sent directly,
bypassing the designated router for that LAN. If it gets a message without
the bit set, it either (for old software) does nothing, or for new software,
it looks at the MAC address of the previous source, and caches that for use
as a route to the source.

Routers do not contain this cache. The DEC philosophy is that routers should
rely only on information learned via routing protocols, and not make any
decisions based on data path optimizations.

>This brings up, then, another question: if all my DECnet connections
>into my ciscos are to direct-attached ethernets, then is there any
>reason at all to run DECnet routing on the ciscos?
DEC end-systems produce a large amount of background traffic (periodic
hello messages). If you run routing, this information is basically collected
into a single control message sent to other routers. If you run bridging,
all of the hellos get bridged. This is considerable traffic if you have
a lot of end-systems.

Kevin