sob@harvisr.harvard.edu (Scott Bradner) (09/08/89)
TCP/IP Router Tests, v2
Well here we go again. I've just finished another batch of tests on
some routers. There is real data later on in this posting but first
what was tested, how it was tested and other miscellany.
The info in this posting was first presented at the IBM user's group
SHARE last month. (parenthetical aside, don't give me a hard time
about the venue for the talk, once upon a time IBM & TCP/IP did not
quite see eye to eye and that problem may still be there with some
other 3 letter companies & their 3 letter operating systems, but
in the last few years IBM has done much to introduce TCP/IP products
for its line of computers and many of these products have evolved
into packages that are quite good - now if only the salesmen could
learn how to spell TCP/IP :-) ), copies of the slides for this
talk can be ftp'ed from husc6.harvard.edu, look in the pub/rtests
directory. Retrieve the README to find out what is what.
For those of you going to INTEROP, I've been asked to give the SHARE
talk again as a BOF (Wed at 2:30). Come give me a hard time if you
think it is warranted.
I asked all of the companies that 1) producing routers that support
TCP/IP & 2) I could find a phone # for, if I could get a router from
them for 2 days of tests ( I told them 2 days but most of the tests
took longer than that ), I asked for a router & SE who could set it
up for one day & just the set-up router for the 2nd day. I asked for the
SE so there would be no way that it could be claimed that we had set
things up wrong.
A number of companies responded "we will get back to you" or "yes,
we will get back to you" and when the time came, the router did not.
I will admit that I gave some of the vendors very short notice and
that was a legit reason for some, an excuse for others. What I did get
were boxes from cisco, Network Systems and Proteon. We beat up on these
quite a bit and found problems with all of them. And for all of them
we reported the problems, got to talk to people that knew what they
were talking about and were able to give us assurances that the
problems were understood and in most cases fixed. We did get follow up
software for cisco & Proteon that did fix many of the problems found.
I'm going to be a bit of a pain here and not tell you all what some of the
problems were since some of them can be used to easily crash a router,
some times in ways that would require someone to manually power cycle
before it would work again. The problems that fit into this category
are fixed in software that we have tested or that we are assured will
be shipped "any day now". Publishing what these problems are could
lead to some crashing of NSF regional nets and that can get to be
a bit of a drag.
What should be tested?
All we really tested were the things that were easy to test,
max throughput under various router setups and various basic
measurements. What we did not test was something that looked
like a version of the real world. The real world has many routes,
we had one, the real world has many source-destination sets,
we had one etc. These tests were done on the cheap, they will
be redone later with additional capabilities.
We tested devices that provided ethernet to ethernet TCP/IP
routing. We would have liked to test other things like the
PC based router that has been floating around the net and
the use of a minicomputer, like a microvax or a sun, as a router,
but did not get to it, next time.
Next time we will also include bridges in the tests. If you people
out there in netland have specific suggestions about tests and/or
devices please let me know.
well, on to the test setup.
We (actually, Dan Lanciani) put together software that runs using
a Micom Interlan NI5210 in a IBM PC/AT. The software made use
of some of the Intel LAN chip's features to
produce packets at a reasonable rate. We could not get the
interpacket intervile shorter than 55 usec, far longer than the
spec's 9.6 min, so the max packet rate was not up to what it
should have been, but, I hope, still useful.
(We are working on new hardware to do better, and will redo these tests
about November, by better I mean 9.6 usec gap )
The LAN chip works on a linked buffer list of packet descriptors
to use, if one points the last one at the first one, one has a loop.
(you say that coming, didn't you?) We just set things up so that
a packet in the loop was addressed to the router and the other
packet was addressed elsewhere and then adjusted the size of the
"other" packet to get the desired packet rate to the router.
(The new hardware will do it right & vary the ipi. ) The software
that was used to run this is named "hammer", don't run it on a live
network.
The other end of the router was attached to a Tandy PC clone with another
NI5210. This one was put into resource exhaustion mode ( i.e. it
was made to think that there was no place to put incoming packets ).
When the chip gets a packet that it does not have space for, it tosses it
but it also increments a counter. This counter was displayed
both in a cumulative mode and a packets per second mode. Note that the
clock that was being used for the packets per second display was the
one in the Tandy and its accuracy is suspect, but seemed to be at least
repeatable.( If you have not already guessed, the counter program is called
"anvil". )
The inputs & outputs of the router were connected to a good 2 trace
Tektronics scope so the actual packets could be seen.
The packets that were used were captured "ping" packets from the
BSD ping program, the packet size changed with the command line
option.
The tests that were run:
Delay through router:
The scope was used to measure the delay from the end of a
packet going into the router to the beginning of the packet
coming out.
Although much concern has been expressed on the net about
the delay though routers, we found that the tested routers
all had small (<3ms) and consistent delays under light
loading, it is not easy to do this test for heavy loads.
Idle state:
Run anvil & count the number of packets over a min.
Again, there has been discussion on the net about the
load that a router (or bridge) can place on a net just
by being there and being turned on. We found very low
loads from the tested routers. 1 packet every 3 to 30 sec.
Note that we did not have routing info in the test setup
the addition of this type of thing would add a lot to
the traffic generated.
max good throughput
Hammer was run generating packets into the router,
anvil was run counting what came out. The input rate
was adjusted for the max rate where the calculated
packets-to-router-rate matched to rate shown by anvil.
Packets of length 64, 128, 256, 512, 1024 and 1518 were
tested.
This is a common but flawed test. The actual "good" rate
should be the max rate at which no packets are lost. Since
one lost packet can have quite an impact on the application
to application throughput of a system. We hope that the
new hardware will be able to do this test correctly.
The rates varied quite a bit. While the slowest router was
still faster than the current average load on the Harvard
backbone (since we have segmented things) it is only a
small fraction of the throughput of the best router.
+25%
To test simple overload conditions, the packet rate found
above was raised by 25% and the output rate was measured.
See data for results, some of the routers were too fast for
hammer to be able to generate the +25% rate.
flood
Packets were sent to the router as fast as hammer could generate
them.
This flood condition, something that one could see with broadcast
storms, caused many problems. The most common problem was that
the processor in the router stopped responding to the console
port so that an operator could not get in there and disable
things.
filter 1
A single filter condition was added to the router configuration
and the max good rate was determined
In most routers, the addition of a filter condition did effect
the max throughput.
filter 10
Nine additional filter conditions were added to the configuration
and the max rate was determined.
This had an effect on some routers, no effect on others.
back to back
A series of packets were sent to the router as fast as hammer
could send them and the point was found where the router
started losing packets. In the real world, NFS servers
can often generate back to back packets.
The test equipment was not quite up to doing a good test here
but the results might be useful. In particular the Proteon
router showed much better performance under the episodic
load conditions of this test than it showed under constant
load.
counter accuracy
The counters in the router were tested by passing a known
number of packets through the router and checking the before
and after counts.
All of the counters were accurate.
errors
Packets with errors were sent to the routers to see 1/ that
the router would toss the packets and 2/ if there were stats
kept on the discarded packets. The errors tested were
crc errors, runt & giant packets.
All routers discarded the packets.
Most routers did not have all of the counter required to
keep track of all of the errors.
----------------------------------------------------------------------------
Data:
size - packet size including CRC
theor - theoretical bandwidth of ethernet if 9.6 usec ipi
hammer - all that hammer could do
max - max rate that seemed to pass all packets
+25% - max +25% offered
flood - offered at hammer's max rate
f1 - one filter condition
f10 - ten filter conditions
cisco between MCI cards
size theor hammer max +25% flood f1 f10
64 14880 8919 5782 5782 5782 4463 3650
128 8445 6121 4320 4320 4320 3578 3023
256 4528 3757 2917 2917 2917 2544 2279
512 2349 2123 1772 1772 1772 1628 1516
1024 1197 1139 990 990 990 943 901
1518 812 782 695 695 695 672 651
cisco within MCI card
size theor hammer max +25% flood f1 f10
64 14880 8919 8919 8919 8919 6808 5338
128 8445 6121 6121 6121 6121 6083 5226
256 4528 3757 3757 3757 3757 3757 3757
512 2349 2123 2123 2123 2123 2123 2123
1024 1197 1139 1139 1139 1139 1139 1139
1518 812 782 782 782 782 782 782
nsc between interface cards
size theor hammer max +25% flood f1 f10
64 14880 8919 5216 6245 6245 4454 4454
128 8445 6121 3759 3980 3980 3526 3526
256 4528 3757 3741 3741 3741 3741 3741
512 2349 2123 2123 2123 2123 2123 2123
1024 1197 1139 1139 1139 1139 1139 1139
1518 812 782 782 782 782 782 782
nsc within interface card
size theor hammer max +25% flood f1 f10
64 14880 8919 6797 6797 6797 4672 4672
128 8445 6121 5572 5572 5572 3816 3816
256 4528 3757 3748 3748 3748 3740 3740
512 2349 2123 2123 2123 2123 2123 2123
1024 1197 1139 1139 1139 1139 1139 1139
1518 812 782 782 782 782 782 782
Proteon p4200
size theor hammer max +25% flood f1 f10
64 14880 8919 994 889 1017 994 994
128 8445 6121 971 995 266 971 971
256 4528 3757 902 738 33 902 902
512 2349 2123 764 704 107 764 764
1024 1197 1139 469 456 4 469 469
1518 812 782 324 318 39 324 324
Wellfleet (data from tests 6 months ago)
size theor hammer max +25% flood f1 f10
64 14880 8919 1594
128 8445 6121 1562
256 4528 3757 1555
512 2349 2123 1365
1024 1197 1139 979
1518 812 782 741
-----------------------------------------------------------------------------
Cpu under load:
How does the cpu respond while sending 1024 byte packets at
listed rates.
router max +25 flood
cisco b ok ok dead
cisco w ok ok ok
nsc b ok ok ok
nsc w ok ok ok
Proteon ok ok dead
----------------------------------------------------------------------------
back to back
how many packets can one send to the router before it starts to
drop packets?
router 64 256 1024
theoretical 140 59 17
cisco b 90 45 15
cisco w device is faster than the test setup
nsc b 22 57 17
nsc w device is faster than the test setup
Proteon 20 15 6
Something does look wrong with the nsc between interface cards
values but redoing the test came up with the same results.
-----------------------------------------------------------------------------
errors
cisco
bad crc - CRC error counter incremented.
runt - Runt error counter incremented.
giant - Giant error counter incremented
nsc
bad crc - Alignment error counter incremented, CRC error
counter was not.
runt - No counter.
giant - No counter.
Proteon
bad crc - CRC error counter incremented.
runt - No counter
giant - No counter