[comp.unix.i386] Tuning Streams

peter@ficc.uu.net (Peter da Silva) (03/13/90)

We're experiencing some problems with a network implementation on an intel
320 running System V/386. The performance is dog slow, and we suspect that
the fact that it goes through streams may have something to do with it. Has
anyone any suggestions for how to go about tuning the system to optimise
streams throughput?
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

pb@idca.tds.PHILIPS.nl (Peter Brouwer) (03/13/90)

 In article <4Y62V32xds13@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>We're experiencing some problems with a network implementation on an intel
>320 running System V/386. The performance is dog slow, and we suspect that
>the fact that it goes through streams may have something to do with it. Has
>anyone any suggestions for how to go about tuning the system to optimise
>streams throughput?

Yes , you are at the right place to suspect streams. But actually its not 
streams but the spl calls that are initiated byt eh streams library functions.

We have the experience that the overhead might vary between 8 till 30%.
It depends on the stream modules used. For streams tty its ca. 10%
In one case we measured an overhead of 30%. This was a dc testprogram 
generating a 100% cpu load, 30% of that was due to spl calls. The big
spender is the function that changes the interrupt level in the PIC chip.

There is nothing to tune for this . The thing to do is to check the source
code in the use of streams calls . 
-- 
Peter Brouwer,                # Philips Telecommunications and Data Systems,
NET  : pb@idca.tds.philips.nl # Department SSP-P9000 Building V2,
UUCP : ....!mcvax!philapd!pb  # P.O.Box 245, 7300AE Apeldoorn, The Netherlands.
PHONE:ext [+31] [0]55 432523, #

dbrown@apple.com (David Brown) (03/15/90)

In article <4Y62V32xds13@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) 
writes:
> We're experiencing some problems with a network implementation on an 
intel
> 320 running System V/386. The performance is dog slow, and we suspect 
that
> the fact that it goes through streams may have something to do with it. 
Has
> anyone any suggestions for how to go about tuning the system to optimise
> streams throughput?

One easy thing to do is run "crash" and type "strstat" to get streams 
statistics, and look for failures and/or maximums near the limits (you do 
not always get any sort of notification of failures - just unusual 
behavior).  If you find anything, then up those streams parameters and 
rebuild your kernel.

David Brown        415-649-4000
Orion Network Systems
(a subsidiary of Apple Computer)
1995 University Ave. Suite 350
Berkeley, CA 94704

thinman@cup.portal.com (Lance C Norskog) (03/16/90)

Ummmmm, I just remembered something.  The reason you see lots of time
charged to the splx() kernel routine is because kernel profiling is
very screwy in regards to interrupts, and all time spent in device 
interrupts is 'adjusted' (in a peculiar way) right after the splx()
routine drops interrupts to 0.  It's not really spending all that time
fiddling the PIC's.

Lance Norskog
Sales Engineer
Streamlined Networks
408-727-9909

carroll@m.cs.uiuc.edu (03/18/90)

/* Written  6:32 pm  Mar 14, 1990 by dbrown@apple.com in m.cs.uiuc.edu:comp.unix.i386 */
In article <4Y62V32xds13@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>> We're experiencing some problems with a network implementation on an intel
>> 320 running System V/386. The performance is dog slow,  [ ... ]

>One easy thing to do is run "crash" and type "strstat" to get streams 
>statistics, and look for failures and/or maximums near the limits [ ... ]
/* End of text from m.cs.uiuc.edu:comp.unix.i386 */

I'm having very slow response from the network, under 386/ix 2.0.2. My stats
from crash look like
ITEM                  CONFIG   ALLOC    FREE         TOTAL     MAX    FAIL
streams                   96       48      48            81      51       0
queues                   300      238      62           216     252       0
message blocks          2150      106    2044        266571     139       0
data block totals       1720      106    1614        238030     139       0
data block size    4     256        0     256         17618       3       0
data block size   16     256       14     242         26723      19       0
data block size   64     256        8     248        152742      39       0
data block size  128     512       84     428         15986      91       0
data block size  256     128        0     128         24795       3       0
data block size  512     128        0     128            27       2       0
data block size 1024      64        0      64            42       1       0
data block size 2048      64        0      64            97       4       0
data block size 4096      56        0      56             0       0       0

Count of scheduled queues:   0

Additionally, the problem often manifests itself under NFS, when trying to read
files. If the file is longer than a certain (small, roughly a few K), nothing
will be read, while small files will be read just fine. I will get a
"NFS server not responding", while telnet/rlogin/ping all report everything is
fine.
P.S. I looked through old notes, but I didn't see anything on this topic. I
though I remembered such a discussion a while back - if anyone has it, please
email it to me. Thanks.

Alan M. Carroll                "Like the north wind whistling down the sky,
carroll@cs.uiuc.edu             I've got a song, I've got a song.
Conversation Builder:           I carry it with me and I sing it loud
+ Tomorrow's Tools Today +      If it gets me nowhere, I'll go there proud"
Epoch Development Team          
CS Grad / U of Ill @ Urbana    ...{ucbvax,pur-ee,convex}!cs.uiuc.edu!carroll

plocher@sally.Sun.COM (John Plocher) (03/19/90)

+--
| >> We're experiencing some problems with a network implementation on an intel
| >> 320 running System V/386. The performance is dog slow,  [ ... ]
| Additionally, the problem often manifests itself under NFS, when trying to read
| files. If the file is longer than a certain (small, roughly a few K), nothing
| will be read, while small files will be read just fine. I will get a
+--

Aha!

It sounds like you need to set rsize=1024,wsize=1024 in your /etc/fstab file
for all your NFS devices...

Most/(all?) 386 TCP/IP implementations on ethernet have a max packet size
of 1K, and most other NFS systems assume 8K.  Files under 1K work OK, but
big files fail...

example (this is actually only ONE line in /etc/fstab!):

	sun:/usr/spool/news /usr/spool/news nfs
	    ro,soft,bg,intr,timeo=70,wsize=1024,rsize=1024,retrans=5 0 0

This *is* mentioned in Wollongong's 386 TCP/IP & NFS manuals, I don't know
about LAI,ISC, Everex, or Intel...

   -John Plocher

pb@idca.tds.PHILIPS.nl (Peter Brouwer) (03/19/90)

 In article <27916@cup.portal.com> thinman@cup.portal.com (Lance C Norskog) writes:
>Ummmmm, I just remembered something.  The reason you see lots of time
>charged to the splx() kernel routine is because kernel profiling is
>very screwy in regards to interrupts, and all time spent in device 
>interrupts is 'adjusted' (in a peculiar way) right after the splx()
>routine drops interrupts to 0.  It's not really spending all that time
>fiddling the PIC's.
>
I think this is a reaction of a previous posting of me , stating a lot of
time in spend in spl routines. ( varying from 10 - 30% depending on the
drivers usage of streams ).
I did not measure this with the kernel profiler, you are correct this gives
inaccurate results. 
I did measure this with what's called a soft analist. This is simply said
a very clever logical analyser. It looks at the pins of the chip , in this
case a 386 , and samples the events there at 200ns interval.
You load in the software of the analyser the symbol table of the software
to be measured ( /unix in this case ) and specify which functions you want
to measure.
In the performance mode it lists the number of times the function is
called and total time spend in it. ( average, min and max are options )
This is where I got my info from. These figure are very accurate.
I also did a test by patching in the kernel spltty to splhi.
I measured responce times of an order entry application (16 users )
with the patched kernel. They went down from an average of 1 sec to 0.82
seconds.  The cpu time went down with 8% .
This application does terminal io on streams bases terminals.
So you see the influence of the setpicmask function in the spl handling.
-- 
Peter Brouwer,                # Philips Telecommunications and Data Systems,
NET  : pb@idca.tds.philips.nl # Department SSP-P9000 Building V2,
UUCP : ....!mcsun!philapd!pb  # P.O.Box 245, 7300AE Apeldoorn, The Netherlands.
PHONE:ext [+31] [0]55 432523, #