[comp.sys.isis] How to measure message overhead

rfinch@caldwr.water.ca.gov (Ralph Finch) (07/10/90)

We have 2 Fortran applications here that we wish to run in a
distributed fashion for faster elapsed times.  The first, a
calibration program, will run from 30 minutes to a few hours for each
server job.  Thus the overhead incurred by running under ISIS and
transferring data across the network in messages is not a problem.  In
the second application, though, each server will probably run perhaps
a few tens of milliseconds, and thus we want to keep overhead as low
as possible.

The overhead model I am assuming is something like

O=f+cB

where O is the total overhead, say in milliseconds
f is a fixed overhead incurred with each message (bcast or reply)
c a coefficient, milliseconds/byte
B the number of bytes transferred in each message

Questions:

1) Is the above model anywhere close to reality?

2) What's a good way of measuring overhead?

2) What have others done to reduce overhead?  I have some ideas, most
from the ISIS manual, but am curious as to what non-obvious things
people have done.
-- 
Ralph Finch			916-445-0088
rfinch@water.ca.gov		...ucbvax!ucdavis!caldwr!rfinch
Any opinions expressed are my own; they do not represent the DWR

ken@gvax.cs.cornell.edu (Ken Birman) (07/11/90)

In article <196@locke.water.ca.gov> rfinch@caldwr.water.ca.gov
           (Ralph Finch) writes:
>
>We have 2 Fortran applications here that we wish to run in a
>distributed fashion for faster elapsed times.  The first, a
>calibration program, will run from 30 minutes to a few hours for each
>server job.  Thus the overhead incurred by running under ISIS and
>transferring data across the network in messages is not a problem.  In
>the second application, though, each server will probably run perhaps
>a few tens of milliseconds, and thus we want to keep overhead as low
>as possible.

  People should be warned that in the V2.0 release the Fortran - ISIS
  interface is broken; we have patches and Ralph installed them to his
  version of ISIS.  V2.1 will be fixed up (if you have an urgent need to
  run Fortran, tclark@cs.cornell.edu can email a short patch script).

>The overhead model I am assuming is something like
>
>O=f+cB
>
>where O is the total overhead, say in milliseconds
>f is a fixed overhead incurred with each message (bcast or reply)
>c a coefficient, milliseconds/byte
>B the number of bytes transferred in each message
>
>Questions:
>
>1) Is the above model anywhere close to reality?

  Yes, this is about right.  The fixed overhead, f, is quite large and is
  mostly spent in UNIX system calls (select, sendmsg, recvfrom, gettimeofday).
  The coefficient c is pretty small -- between one and two milliseconds per
  1kbytes.  But, the function isn't as smooth as you would expect because
  the underlying UNIX system calls limit packets to about 8k and INET runs
  in 1400 byte chunks, and because of ISIS headers on the packets.

  Also, this depends a lot on whether you use bcast==abcast or cbcast.
  I'll assume you are using cbcast; double everything for a conservative
  estimate of the abcast figures (a factor of 25%-50% is more accurate).

BYPASS mode case: (tends to jam up under V2.0, though)
  For a message sent asynchronously the overhead _f_ is about 6.5ms on a pair
  of SUN 3/60's for a null ISIS message; the UNIX part of this is about 5.2ms
  and the remainder is split between time in my message libraries, task
  code, and the cbcast code.  

Non-BYPASS mode case:
  Here the numbers are about 4-6 times larger but still mostly in UNIX
  system calls.

Special cases:
  As noted in previous postings, we are optimizing some cases, like groups
  with lots of clients, groups with receive-only clients.  Eventually, these
  will have special-purpose, high performance protocols associated with them
  and all the numbers will be different.

>2) What's a good way of measuring overhead?

  We sometimes measure the difference between a loop-back example that
  does a direct subroutine call (after generating a message) to itself
  instead of the broadcast, and the actual broadcast case.  This will
  give you some approximate idea of the costs.  Ping-pong measurements
  and the like are hard in ISIS due to flow control and other considerations
  that cause performance to vary quite a bit depending on exactly what
  your code does and when.

  For example, the BYPASS code may send twice as many packets (due to
  acknowledgements) in an application that computes for a long time after
  each received message than in one that responds rapidly.  The quick reply
  gives it a nice chance to send out an ack, piggybacked.  Otherwise it
  sends one all by itself -- perhaps halving the available network bandwidth.