Leonard@arizona.edu (Aaron Leonard) (06/08/90)
We have experienced what you might call negative synergy between DECnet traffic being routed thru a cisco and then between level II routers. The case in which the difficulty emerges is somewhat complicated to explain, but quite reproducible, so please bear with me. [ UAZHE0 46.437 ] [ UAZHE4 46.365 ] [ level II router ] [ endnode ] | | [ MAGGIE 50.204 ]--( large bushy bridged )--[ CIRRUS 50.140 ] [ level II router ] ( ethernet <128.196.128> ) [ endnode ] | [ PANCHO 50.222 ] ( repeatered enet )--[ cisco AGS II ]--( large repeatered ) ( <128.196.28> ) [ level I router ] ( ethernet <128.196.120> ) | | [ ECEVAX 50.111 ] [ DOC 50.231 ] [ endnode ] [ endnode ] In the above picture, consider any endnode to be representative of a large number of topologically identical endnodes. We use the IP subnet terminology simply as means of identifying Ethernets. (All connections above are ethernet; all DECnet nodes but PANCHO are VAXen. Note that this is an unusual configuration, in that we have nodes in multiple DECnet areas residing on the same ethernet (on 128.196.128.) (This is necessitated by the peculiarities of HEPnet routing.) We have found that traffic flows quickly (1) amongst all nodes in this network EXCEPT for the case where DOC or ECEVAX (or any other node in subnets 128.196.120 and 128.196.28) tries to communicate with any node in area 46. In that case, the traffic flow is consistently an order of magnitude slower (2). These results have been verified by many tests using a large number of pairs of nodes. The traffic flow in the too-slow case is as follows: area 50 endnode to PANCHO (AGS), PANCHO to MAGGIE (level II router for area 50), MAGGIE to UAZHE0 (lev II router for area 46), UAZHE0 to area 46 endnode. Note that the lev II routers are NOT the bottleneck; traffic that flows between e.g. CIRRUS and UAZHE4, which passes thru MAGGIE and UAZHE0, but not thru PANCHO, is quick. Note also that PANCHO by itself is not the bottleneck, as traffic between e.g. CIRRUS and DOC is quick. The only case where traffic is slow is where inter-area traffic is routed thru PANCHO. Notes. All traffic tests were run using DEC's DTSEND utility, via the following command sequence: $ MC DTSEND Test: DATA/PRINT/STATISTICS/TYPE=ECHO/SIZE=500/SECONDS=60/NODE=node This utility tests task-to-task NSP throughput between DECnet phase IV nodes. (1) "Quick" data flow is in the range of 400Kbps to 1.2Kbps. (2) "Slow" data flow is in the range of 40Kbps to 100Kbps.
tinkelman@ccavax.camb.com (06/08/90)
In article <21919@megaron.cs.arizona.edu>, Leonard@arizona.edu (Aaron Leonard) described a problem involving slow throughput between certain pairs of nodes (in a picture, partially reproduced below). I can't offer a coherent explanation of the differences you reported, but I do want to comment on one of your examples, that was the throughput between endnodes CIRRUS and UAZHE4: > [ UAZHE0 46.437 ] [ UAZHE4 46.365 ] > [ level II router ] [ endnode ] > | | > [ MAGGIE 50.204 ]--( large bushy bridged )--[ CIRRUS 50.140 ] > [ level II router ] ( ethernet <128.196.128> ) [ endnode ] ... > Note that the lev II routers are NOT the bottleneck; > traffic that flows between e.g. CIRRUS and UAZHE4, which passes > thru MAGGIE and UAZHE0, but not thru PANCHO, is quick. My comment is that DECnet will use the intermediate level II routers only to help the two end nodes find each other. Once the circuit between them is established, CIRRUS and UAZHE4 will communicate directly with each other. This means there will be no intermediate DECnet routing *and* max size Ethernet packets can be used. -- Bob Tinkelman, Cambridge Computer Associates, Inc., 212-425-5830 bob@camb.com or ...!{uupsi,uunet}!camb.com!bob
aaron@dragoon.telcom.arizona.edu (Aaron Leonard) (06/09/90)
In article <25934.266f66c4@ccavax.camb.com>, tinkelman@ccavax.camb.com (Bob Tinkelman) corrects an assumption I made in my earlier posting concerning poor inter-area routing performance. I had implied that traffic between two endnodes in different areas on the same Ethernet will pass between the area routers. |> > Note that the lev II routers are NOT the bottleneck; |> > traffic that flows between e.g. CIRRUS and UAZHE4, which passes |> > thru MAGGIE and UAZHE0, but not thru PANCHO, is quick. |> Bob: |> My comment is that DECnet will use the intermediate level II routers only |> to help the two end nodes find each other. Once the circuit between them |> is established, CIRRUS and UAZHE4 will communicate directly with each other. |> This means there will be no intermediate DECnet routing *and* max size |> Ethernet packets can be used. |> -- Excellent! We're on to something here. Bob is right - the DECnet traffic between CIRRUS and UAZHE4 will indeed short-circuit the area routers and travel directly over the ethernet. And this result, in fact explains why the performance is so poor when the cisco router is entered into the loop: because the short-circuiting of same-Ethernet circuits between different-area nodes ONLY HAPPENS when the nodes are END NODES! When the set-up path is: Area-A-endnode-on-Eth1 -> Area-A-lvl-I-rtr -> Area-A-lvl-II-rtr-on-Eth2 -> Area-B-lvl-II-rtr-on-Eth2 -> Area-B-lvl-II-endnode-on-Eth2, the short circuit between "Area-A-lvl-I-rtr" and "Area-B-lvl-II-endnode" is never made. Rather, all traffic between the endnodes will continue to flow thru every single router in the loop. (Oh, for an icmp redirect!) I verified that this is not a problem with the cisco; rather, an identical path traced thru a VAX/VMS router produced identical results - so by definition, the cisco is routing correctly! This brings up, then, another question: if all my DECnet connections into my ciscos are to direct-attached ethernets, then is there any reason at all to run DECnet routing on the ciscos? (Assume here that equal-cost path splitting is not a topological possibility.) Why not just bridge the whole ball of wax? In this (admittedly pathological) case, at least, bridging will produce better throughput and reduce host load, right?
tinkelman@ccavax.camb.com (06/09/90)
In my prior article <25934.266f66c4@ccavax.camb.com>, I should have made an additional observation. I guess I forgot, because it didn't bear on your _immediate_ performance related question. But it does bear on the future. *** The pictured configuration will `soon' be illegal. DECnet/OSI *** *** (Phase V) will not allow multiple areas on the same Ethernet. *** With Phase V you probably will have all the nodes in your picture in the same area, and therefore avoid the `extra' hops of area routers. Packets will flow NodeOnEnet1-Cisco1-Cisco2-NodeOnEnet2 with no `extra' hops at DECnet area routers. I said `probably' in the above. You could keep the nodes on each physical Ethernet in separate DECnet areas if the Cisco boxes will be able to act as DECnet Phase V Level 2 routers. (Will they be able to do that?) You could also maintain separate DECnet areas on the two LANs *and* bridge them, if you have the bridges filter all the appropriate DECnet level 2 routing multicasts. This latter configuration _should_ work, though I'm not sure if DEC will say it's supported. Despite the two alternatives in the preceeding paragraph, I still think that unless there is some very strong (and strange?) *technical* reason not to do so, you will find it better to go to a single DECnet area. (DEC's position is certainly that DECnet areas should reflect network topology, not administrative responsiblities.) -- Bob Tinkelman, Cambridge Computer Associates, Inc., 212-425-5830 bob@camb.com or ...!{uupsi,uunet}!camb.com!bob
kph@dustbin.cisco.com (Kevin Paul Herbert) (06/12/90)
>And this result, in fact explains why the performance is so poor when the >cisco router is entered into the loop: because the short-circuiting of >same-Ethernet circuits between different-area nodes ONLY HAPPENS when the >nodes are END NODES! Yes, this is quite correct. Depending on software version, end-nodes keep either a cache of the nodes on the same cable, or the previous hop to get to a node they have been in contact with. Old (pre VMS V5.0 nodes do the former), new do the later. When a DEC system originates a packet onto the network, it sets a bit in the message header which means "originated on this cable". If a router switches a packet on to a different circuit, it clears this bit - if it is staying on the same cable, it doesn't touch the bit. When a receiving end-system gets a message with the bit set, it creates a cache entry indicating that messages to that node can be sent directly, bypassing the designated router for that LAN. If it gets a message without the bit set, it either (for old software) does nothing, or for new software, it looks at the MAC address of the previous source, and caches that for use as a route to the source. Routers do not contain this cache. The DEC philosophy is that routers should rely only on information learned via routing protocols, and not make any decisions based on data path optimizations. >This brings up, then, another question: if all my DECnet connections >into my ciscos are to direct-attached ethernets, then is there any >reason at all to run DECnet routing on the ciscos? DEC end-systems produce a large amount of background traffic (periodic hello messages). If you run routing, this information is basically collected into a single control message sent to other routers. If you run bridging, all of the hellos get bridged. This is considerable traffic if you have a lot of end-systems. Kevin