sob@harvisr.harvard.edu (Scott Bradner) (09/08/89)
TCP/IP Router Tests, v2 Well here we go again. I've just finished another batch of tests on some routers. There is real data later on in this posting but first what was tested, how it was tested and other miscellany. The info in this posting was first presented at the IBM user's group SHARE last month. (parenthetical aside, don't give me a hard time about the venue for the talk, once upon a time IBM & TCP/IP did not quite see eye to eye and that problem may still be there with some other 3 letter companies & their 3 letter operating systems, but in the last few years IBM has done much to introduce TCP/IP products for its line of computers and many of these products have evolved into packages that are quite good - now if only the salesmen could learn how to spell TCP/IP :-) ), copies of the slides for this talk can be ftp'ed from husc6.harvard.edu, look in the pub/rtests directory. Retrieve the README to find out what is what. For those of you going to INTEROP, I've been asked to give the SHARE talk again as a BOF (Wed at 2:30). Come give me a hard time if you think it is warranted. I asked all of the companies that 1) producing routers that support TCP/IP & 2) I could find a phone # for, if I could get a router from them for 2 days of tests ( I told them 2 days but most of the tests took longer than that ), I asked for a router & SE who could set it up for one day & just the set-up router for the 2nd day. I asked for the SE so there would be no way that it could be claimed that we had set things up wrong. A number of companies responded "we will get back to you" or "yes, we will get back to you" and when the time came, the router did not. I will admit that I gave some of the vendors very short notice and that was a legit reason for some, an excuse for others. What I did get were boxes from cisco, Network Systems and Proteon. We beat up on these quite a bit and found problems with all of them. And for all of them we reported the problems, got to talk to people that knew what they were talking about and were able to give us assurances that the problems were understood and in most cases fixed. We did get follow up software for cisco & Proteon that did fix many of the problems found. I'm going to be a bit of a pain here and not tell you all what some of the problems were since some of them can be used to easily crash a router, some times in ways that would require someone to manually power cycle before it would work again. The problems that fit into this category are fixed in software that we have tested or that we are assured will be shipped "any day now". Publishing what these problems are could lead to some crashing of NSF regional nets and that can get to be a bit of a drag. What should be tested? All we really tested were the things that were easy to test, max throughput under various router setups and various basic measurements. What we did not test was something that looked like a version of the real world. The real world has many routes, we had one, the real world has many source-destination sets, we had one etc. These tests were done on the cheap, they will be redone later with additional capabilities. We tested devices that provided ethernet to ethernet TCP/IP routing. We would have liked to test other things like the PC based router that has been floating around the net and the use of a minicomputer, like a microvax or a sun, as a router, but did not get to it, next time. Next time we will also include bridges in the tests. If you people out there in netland have specific suggestions about tests and/or devices please let me know. well, on to the test setup. We (actually, Dan Lanciani) put together software that runs using a Micom Interlan NI5210 in a IBM PC/AT. The software made use of some of the Intel LAN chip's features to produce packets at a reasonable rate. We could not get the interpacket intervile shorter than 55 usec, far longer than the spec's 9.6 min, so the max packet rate was not up to what it should have been, but, I hope, still useful. (We are working on new hardware to do better, and will redo these tests about November, by better I mean 9.6 usec gap ) The LAN chip works on a linked buffer list of packet descriptors to use, if one points the last one at the first one, one has a loop. (you say that coming, didn't you?) We just set things up so that a packet in the loop was addressed to the router and the other packet was addressed elsewhere and then adjusted the size of the "other" packet to get the desired packet rate to the router. (The new hardware will do it right & vary the ipi. ) The software that was used to run this is named "hammer", don't run it on a live network. The other end of the router was attached to a Tandy PC clone with another NI5210. This one was put into resource exhaustion mode ( i.e. it was made to think that there was no place to put incoming packets ). When the chip gets a packet that it does not have space for, it tosses it but it also increments a counter. This counter was displayed both in a cumulative mode and a packets per second mode. Note that the clock that was being used for the packets per second display was the one in the Tandy and its accuracy is suspect, but seemed to be at least repeatable.( If you have not already guessed, the counter program is called "anvil". ) The inputs & outputs of the router were connected to a good 2 trace Tektronics scope so the actual packets could be seen. The packets that were used were captured "ping" packets from the BSD ping program, the packet size changed with the command line option. The tests that were run: Delay through router: The scope was used to measure the delay from the end of a packet going into the router to the beginning of the packet coming out. Although much concern has been expressed on the net about the delay though routers, we found that the tested routers all had small (<3ms) and consistent delays under light loading, it is not easy to do this test for heavy loads. Idle state: Run anvil & count the number of packets over a min. Again, there has been discussion on the net about the load that a router (or bridge) can place on a net just by being there and being turned on. We found very low loads from the tested routers. 1 packet every 3 to 30 sec. Note that we did not have routing info in the test setup the addition of this type of thing would add a lot to the traffic generated. max good throughput Hammer was run generating packets into the router, anvil was run counting what came out. The input rate was adjusted for the max rate where the calculated packets-to-router-rate matched to rate shown by anvil. Packets of length 64, 128, 256, 512, 1024 and 1518 were tested. This is a common but flawed test. The actual "good" rate should be the max rate at which no packets are lost. Since one lost packet can have quite an impact on the application to application throughput of a system. We hope that the new hardware will be able to do this test correctly. The rates varied quite a bit. While the slowest router was still faster than the current average load on the Harvard backbone (since we have segmented things) it is only a small fraction of the throughput of the best router. +25% To test simple overload conditions, the packet rate found above was raised by 25% and the output rate was measured. See data for results, some of the routers were too fast for hammer to be able to generate the +25% rate. flood Packets were sent to the router as fast as hammer could generate them. This flood condition, something that one could see with broadcast storms, caused many problems. The most common problem was that the processor in the router stopped responding to the console port so that an operator could not get in there and disable things. filter 1 A single filter condition was added to the router configuration and the max good rate was determined In most routers, the addition of a filter condition did effect the max throughput. filter 10 Nine additional filter conditions were added to the configuration and the max rate was determined. This had an effect on some routers, no effect on others. back to back A series of packets were sent to the router as fast as hammer could send them and the point was found where the router started losing packets. In the real world, NFS servers can often generate back to back packets. The test equipment was not quite up to doing a good test here but the results might be useful. In particular the Proteon router showed much better performance under the episodic load conditions of this test than it showed under constant load. counter accuracy The counters in the router were tested by passing a known number of packets through the router and checking the before and after counts. All of the counters were accurate. errors Packets with errors were sent to the routers to see 1/ that the router would toss the packets and 2/ if there were stats kept on the discarded packets. The errors tested were crc errors, runt & giant packets. All routers discarded the packets. Most routers did not have all of the counter required to keep track of all of the errors. ---------------------------------------------------------------------------- Data: size - packet size including CRC theor - theoretical bandwidth of ethernet if 9.6 usec ipi hammer - all that hammer could do max - max rate that seemed to pass all packets +25% - max +25% offered flood - offered at hammer's max rate f1 - one filter condition f10 - ten filter conditions cisco between MCI cards size theor hammer max +25% flood f1 f10 64 14880 8919 5782 5782 5782 4463 3650 128 8445 6121 4320 4320 4320 3578 3023 256 4528 3757 2917 2917 2917 2544 2279 512 2349 2123 1772 1772 1772 1628 1516 1024 1197 1139 990 990 990 943 901 1518 812 782 695 695 695 672 651 cisco within MCI card size theor hammer max +25% flood f1 f10 64 14880 8919 8919 8919 8919 6808 5338 128 8445 6121 6121 6121 6121 6083 5226 256 4528 3757 3757 3757 3757 3757 3757 512 2349 2123 2123 2123 2123 2123 2123 1024 1197 1139 1139 1139 1139 1139 1139 1518 812 782 782 782 782 782 782 nsc between interface cards size theor hammer max +25% flood f1 f10 64 14880 8919 5216 6245 6245 4454 4454 128 8445 6121 3759 3980 3980 3526 3526 256 4528 3757 3741 3741 3741 3741 3741 512 2349 2123 2123 2123 2123 2123 2123 1024 1197 1139 1139 1139 1139 1139 1139 1518 812 782 782 782 782 782 782 nsc within interface card size theor hammer max +25% flood f1 f10 64 14880 8919 6797 6797 6797 4672 4672 128 8445 6121 5572 5572 5572 3816 3816 256 4528 3757 3748 3748 3748 3740 3740 512 2349 2123 2123 2123 2123 2123 2123 1024 1197 1139 1139 1139 1139 1139 1139 1518 812 782 782 782 782 782 782 Proteon p4200 size theor hammer max +25% flood f1 f10 64 14880 8919 994 889 1017 994 994 128 8445 6121 971 995 266 971 971 256 4528 3757 902 738 33 902 902 512 2349 2123 764 704 107 764 764 1024 1197 1139 469 456 4 469 469 1518 812 782 324 318 39 324 324 Wellfleet (data from tests 6 months ago) size theor hammer max +25% flood f1 f10 64 14880 8919 1594 128 8445 6121 1562 256 4528 3757 1555 512 2349 2123 1365 1024 1197 1139 979 1518 812 782 741 ----------------------------------------------------------------------------- Cpu under load: How does the cpu respond while sending 1024 byte packets at listed rates. router max +25 flood cisco b ok ok dead cisco w ok ok ok nsc b ok ok ok nsc w ok ok ok Proteon ok ok dead ---------------------------------------------------------------------------- back to back how many packets can one send to the router before it starts to drop packets? router 64 256 1024 theoretical 140 59 17 cisco b 90 45 15 cisco w device is faster than the test setup nsc b 22 57 17 nsc w device is faster than the test setup Proteon 20 15 6 Something does look wrong with the nsc between interface cards values but redoing the test came up with the same results. ----------------------------------------------------------------------------- errors cisco bad crc - CRC error counter incremented. runt - Runt error counter incremented. giant - Giant error counter incremented nsc bad crc - Alignment error counter incremented, CRC error counter was not. runt - No counter. giant - No counter. Proteon bad crc - CRC error counter incremented. runt - No counter giant - No counter