[comp.dcom.lans] router tests - round II

sob@harvisr.harvard.edu (Scott Bradner) (09/08/89)

TCP/IP Router Tests, v2

Well here we go again.  I've just finished another batch of tests on
some routers.  There is real data later on in this posting but first
what was tested, how it was tested and other miscellany.

The info in this posting was first presented at the IBM user's group
SHARE last month. (parenthetical aside, don't give me a hard time
about the venue for the talk, once upon a time IBM & TCP/IP did not
quite see eye to eye and that problem may still be there with some
other 3 letter companies & their 3 letter operating systems, but
in the last few years IBM has done much to introduce TCP/IP products
for its line of computers and many of these products have evolved
into packages that are quite good - now if only the salesmen could
learn how to spell TCP/IP :-) ), copies of the slides for this 
talk can be ftp'ed from husc6.harvard.edu, look in the pub/rtests
directory.  Retrieve the README to find out what is what.

For those of you going to INTEROP, I've been asked to give the SHARE
talk again as a BOF (Wed at 2:30). Come give me a hard time if you
think it is warranted.

I asked all of the companies that 1) producing routers that support
TCP/IP & 2) I could find a phone # for, if I could get a router from
them for 2 days of tests ( I told them 2 days but most of the tests 
took longer than that ), I asked for a router & SE who could set it
up for one day & just the set-up router for the 2nd day.  I asked for the
SE so there would be no way that it could be claimed that we had set
things up wrong.

A number of companies responded "we will get back to you" or "yes,
we will get back to you" and when the time came, the router did not.
I will admit that I gave some of the vendors very short notice and
that was a legit reason for some, an excuse for others.  What I did get
were boxes from cisco, Network Systems and Proteon.  We beat up on these
quite a bit and found problems with all of them.  And for all of them
we reported the problems, got to talk to people that knew what they
were talking about and were able to give us assurances that the
problems were understood and in most cases fixed.  We did get follow up
software for cisco & Proteon that did fix many of the problems found.

I'm going to be a bit of a pain here and not tell you all what some of the
problems were since some of them can be used to easily crash a router,
some times in ways that would require someone to manually power cycle
before it would work again.  The problems that fit into this category
are fixed in software that we have tested or that we are assured will
be shipped "any day now".  Publishing what these problems are could 
lead to some crashing of NSF regional nets and that can get to be 
a bit of a drag.

What should be tested?  

All we really tested were the  things that were easy to test,
max throughput under various router setups and various basic
measurements.  What we did not test was something that looked
like a version of the real world.  The real world has many routes,
we had one, the real world has many source-destination sets,
we had one etc.  These tests were done on the cheap, they will
be redone later with additional capabilities.
We tested devices that provided ethernet to ethernet TCP/IP
routing.  We would have liked to test other things like the 
PC based router that has been floating around the net and
the use of a minicomputer, like a microvax or a sun, as a router,
but did not get to it, next time.

Next time we will also include bridges in the tests.  If you people
out there in netland have specific suggestions about tests and/or
devices please let me know.

well, on to the test setup.

We (actually, Dan Lanciani) put together software that runs using
a Micom Interlan NI5210 in a IBM PC/AT. The software made use
of some of the Intel LAN chip's features to
produce packets at a reasonable rate.  We could not get the
interpacket intervile shorter than 55 usec, far longer than the
spec's 9.6 min, so the max packet rate was not up to what it 
should have been, but, I hope, still useful.

(We are working on new hardware to do better, and will redo these tests
about November, by better I mean 9.6 usec gap )

The LAN chip works on a linked buffer list of packet descriptors
to use, if one points the last one at the first one, one has a loop.
(you say that coming, didn't you?)  We just set things up so that
a packet in the loop was addressed to the router and the other
packet was addressed elsewhere and then adjusted the size of the
"other" packet to get the desired packet rate to the router.
(The new hardware will do it right & vary the ipi. ) The software
that was used to run this is named "hammer", don't run it on a live
network.

The other end of the router was attached to a Tandy PC clone with another
NI5210.  This one was put into resource exhaustion mode ( i.e. it
was made to think that there was no place to put incoming packets ).
When the chip gets a packet that it does not have space for, it tosses it
but it also increments a counter.  This counter was displayed
both in a cumulative mode and a packets per second mode.  Note that the
clock that was being used for the packets per second display was the
one in the Tandy and its accuracy is suspect, but seemed to be at least
repeatable.( If you have not already guessed, the counter program is called
"anvil". )

The inputs & outputs of the router were connected to a good 2 trace
Tektronics scope so the actual packets could be seen.

The packets that were used were captured "ping" packets from the
BSD ping program, the packet size changed with the command line
option.

The tests that were run:
   Delay through router:
	The scope was used to measure the delay from the end of a 
	packet going into the router to the beginning of the packet 
	coming out.
	
	Although much concern has been expressed on the net about
	the delay though routers, we found that the tested routers
	all had small (<3ms) and consistent delays under light
	loading, it is not easy to do this test for heavy loads.

   Idle state:
	Run anvil & count the number of packets over a min.

	Again, there has been discussion on the net about the
	load that a router (or bridge) can place on a net just
	by being there and being turned on.  We found very low
	loads from the tested routers. 1 packet every 3 to 30 sec.
	Note that we did not have routing info in the test setup
	the addition of this type of thing would add a lot to
	the traffic generated.

   max good throughput
	Hammer was run generating packets into the router,
	anvil was run counting what came out.  The input rate
	was adjusted for the max rate where the calculated
	packets-to-router-rate matched to rate shown by anvil. 
	Packets of length 64, 128, 256, 512, 1024 and 1518 were
	tested.

	This is a common but flawed test.  The actual "good" rate
	should be the max rate at which no packets are lost.  Since
	one lost packet can have quite an impact on the application
	to application throughput of a system.  We hope that the
	new hardware will be able to do this test correctly.

	The rates varied quite a bit.  While the slowest router was
	still faster than the current average load on the Harvard 
	backbone (since we have segmented things) it is only a
	small fraction of the throughput of the best router.

   +25%
	To test simple overload conditions, the packet rate found
	above was raised by 25% and the output rate was measured.

	See data for results, some of the routers were too fast for 
	hammer to be able to generate the +25% rate.

   flood 
	Packets were sent to the router as fast as hammer could generate
	them.

	This flood condition, something that one could see with broadcast
	storms, caused many problems.  The most common problem was that
	the processor in the router stopped responding to the console
	port so that an operator could not get in there and disable
	things.

   filter 1
	A single filter condition was added to the router configuration
	and the max good rate was determined

	In most routers, the addition of a filter condition did effect
	the max throughput.

   filter 10
	Nine additional filter conditions were added to the configuration
	and the max rate was determined.

	This had an effect on some routers, no effect on others.

   back to back
	A series of packets were sent to the router as fast as hammer
	could send them and the point was found where the router
	started losing packets. In the real world, NFS servers
	can often generate back to back packets.

	The test equipment was not quite up to doing a good test here
	but the results might be useful.  In particular the Proteon
	router showed much better performance under the episodic
	load conditions of this test than it showed under constant
	load.

   counter accuracy
	The counters in the router were tested by passing a known
	number of packets through the router and checking the before
	and after counts.

	All of the counters were accurate.

   errors
	Packets with errors were sent to the routers to see 1/  that
	the router would toss the packets and 2/ if there were stats
	kept on the discarded packets.  The errors tested were
	crc errors, runt & giant packets.

	All routers discarded the packets.
	Most routers did not have all of the counter required to
	keep track of all of the errors.
----------------------------------------------------------------------------
Data:
	
size - packet size including CRC
theor - theoretical bandwidth of ethernet if 9.6 usec ipi
hammer - all that hammer could do
max - max rate that seemed to pass all packets
+25% - max +25% offered
flood - offered at hammer's max rate
f1 - one filter condition
f10 - ten filter conditions

cisco between MCI cards
size	theor	hammer	max	+25%	flood	f1	f10
64	14880	8919	5782	5782	5782	4463	3650
128	8445	6121	4320	4320	4320	3578	3023
256	4528	3757	2917	2917	2917	2544	2279
512	2349	2123	1772	1772	1772	1628	1516
1024	1197	1139	990	990	990	943	901
1518	812	782	695	695	695	672	651

cisco within MCI card
size	theor	hammer	max	+25%	flood	f1	f10
64	14880	8919	8919	8919	8919	6808	5338
128	8445	6121	6121	6121	6121	6083	5226
256	4528	3757	3757	3757	3757	3757	3757
512	2349	2123	2123	2123	2123	2123	2123
1024	1197	1139	1139	1139	1139	1139	1139
1518	812	782	782	782	782	782	782

nsc between interface cards
size	theor	hammer	max	+25%	flood	f1	f10
64	14880	8919	5216	6245	6245	4454	4454
128	8445	6121	3759	3980	3980	3526	3526
256	4528	3757	3741	3741	3741	3741	3741
512	2349	2123	2123	2123	2123	2123	2123
1024	1197	1139	1139	1139	1139	1139	1139
1518	812	782	782	782	782	782	782

nsc within interface card
size	theor	hammer	max	+25%	flood	f1	f10
64	14880	8919	6797	6797	6797	4672	4672
128	8445	6121	5572	5572	5572	3816	3816
256	4528	3757	3748	3748	3748	3740	3740
512	2349	2123	2123	2123	2123	2123	2123
1024	1197	1139	1139	1139	1139	1139	1139
1518	812	782	782	782	782	782	782

Proteon p4200
size	theor	hammer	max	+25%	flood	f1	f10
64	14880	8919	994	889	1017	994	994
128	8445	6121	971	995	266	971	971
256	4528	3757	902	738	33	902	902
512	2349	2123	764	704	107	764	764
1024	1197	1139	469	456	4	469	469
1518	812	782	324	318	39	324	324

Wellfleet (data from tests 6 months ago)
size	theor	hammer	max	+25%	flood	f1	f10
64	14880	8919	1594
128	8445	6121	1562
256	4528	3757	1555
512	2349	2123	1365
1024	1197	1139	979
1518	812	782	741

-----------------------------------------------------------------------------
Cpu under load:

How does the cpu respond while sending 1024 byte packets at
listed rates.

router		max	+25	flood
cisco b		ok	ok	dead
cisco w		ok	ok	ok
nsc b		ok	ok	ok
nsc w		ok	ok	ok
Proteon		ok	ok	dead

----------------------------------------------------------------------------

back to back

how many packets can one send to the router before it starts to
drop packets?

router		64	256	1024

theoretical	140	59	17
cisco b		90	45	15
cisco w		device is faster than the test setup
nsc b		22	57	17
nsc w		device is faster than the test setup
Proteon		20	15	6

Something does look wrong with the nsc between interface cards
values but redoing the test came up with the same results.

-----------------------------------------------------------------------------

errors

cisco
    bad crc	- CRC error counter incremented.
    runt	- Runt error counter incremented.
    giant	- Giant error counter incremented

nsc
    bad crc	- Alignment error counter incremented, CRC error
		  counter was not.
    runt	- No counter.
    giant	- No counter.

Proteon
   bad crc	- CRC error counter incremented.
   runt		- No counter
   giant	- No counter