[comp.sys.sun] Socket operation delays

THIER@orcad2.dnet.ge.com (11/21/90)

I am attempting to characterize system response times for IPC operations
on a SPARCStation 1+, running "out-of-the-box" 4.1 (generic kernel,
standard daemons enabled, etc.). The objective is to identify how
deterministic (or not) TCP transfer times can be, given a fixed, small LAN
environment (host-to-host, each running a single user process, no other
network traffic). 

Surrounding the IPC system call with two gettimeofday calls and computing
the difference, I've measured gross variations in socket create, connect
and send durations. The periods are most often only 1 microsecond, but
frequent delays of 10 millisecond intervals are encountered. I assume that
the delays result from the UNIX scheduler taking over during, or between,
the system calls. Is this correct? If so, I'm looking for information
regarding the scheduling mechanism and other OS overhead (frequency of the
scheduler taking control, duration that it holds control, how it handles
interrupted system calls, time allocated to daemons, etc.) and suggestions
for managing these delays. Thanks in advance.

John Thier
GE DSD
Pittsfield, MA
thier@orcad2.dnet.ge.com

hakanson@cse.ogi.edu (Marion Hakanson) (11/30/90)

In article <409@brchh104.bnr.ca> THIER@orcad2.dnet.ge.com writes:
>. . .
>I am attempting to characterize system response times for IPC operations
>on a SPARCStation 1+, running "out-of-the-box" 4.1 (generic kernel,
>. . .
>Surrounding the IPC system call with two gettimeofday calls and computing
>the difference, I've measured gross variations in socket create, connect
>and send durations. The periods are most often only 1 microsecond, but
>frequent delays of 10 millisecond intervals are encountered. I assume that
>the delays result from the UNIX scheduler taking over during, or between,
>the system calls. Is this correct? If so, I'm looking for information

I'm certain that the reason is not what you think.  The clock is broken on
this machine/OS combination.  Even though a SPARC-1 and SPARC-1+ have the
same kind of clock (with 1-microsecond resolution), the SunOS-4.1 kernel
has a bug which causes the SPARC-1+ to run as a normal Sun-4, with a
10-millisecond resolution "soft" clock, while a SPARC-1's gettimeofday()
syscall has access to the hardware clock.

The many 1-microsecond values you're getting come from an artificial
increment added between softclock ticks, which is intended to keep two
subsequent gettimeofday() calls from returning the same value (those with
kernel source can look in microtime(), I believe).  If the clock were
working, you'd most likely see something more than 40-50 usec, which is
about the fastest one can make two gettimeofday() calls without doing much
in between.

So, the 1 usec values are erroneous, and the 10 msec value comes when two
calls happen to straddle a soft tick.  Note that you'll see these kinds of
results on a SPARC-330, 4/280, 4/110, and just about any 4.3bsd-based
system with a 10 msec soft clock.  I have a generic (BSD) test program:
anonymous FTP to cse.ogi.edu, file pub/clockres.c, which can usually tell
you the effective resolution of your system clock.

Reports from folks at Sun say that the bug is fixed in SunOS-4.1.1, which
means that all Sun-4c's (SLC, IPC, SS1+) provide access to the 1 usec
resolution of the underlying hardware clock.

Too bad the SPARC's clock stability (e.g. for use as an NTP platform) is
still so poor....

Marion Hakanson         Domain: hakanson@cse.ogi.edu
                        UUCP  : {hp-pcd,tektronix}!ogicse!hakanson