[comp.periphs] Disk Seek Time Measurements

aglew@crhc.uiuc.edu (Andy Glew) (11/20/90)

I just helped a friend look at a performance problem that was almost
definitely due to excessive disk seeking, and it causes me to ask: Why
don't disks/disk controllers/dcomputer systems provide more meaningful
measures of disk seeking and other I/O activity?

Disk seeking is such on important bottleneck on many machines that it
sure would be nice to have a way of directly measuring it, instead of
inferring it from other observations.

For example, in our recent performance problem, the workload was both
CPU and I/O intensive (uncompression and analysis of large traces).
It was being run on an Encore with eight processors.  Interactive
response slowed to a crawl, but the applications were not eating much
CPU.  Paging was nil, I/O modest, but system idle time was up around
70%.  Turns out that several large files were being read from disk,
uncompressed, and written back to the same disk => lots of time spent
waiting for disk heads to seek back and forth.

At least, seeking seems the likely culprit --- but it sure would be
nice to have a way of *verifying* this, eg. by looking at the disk
activity monitor and observing that the disk was 100% busy, spending
10% of time on track, 5% of time transferring, and 85% of the time
seeking...
    It sure would be nice to be able to see this disk activity,
instead of guessing (no matter how informed) --- especially when the
application is going to have to be recoded to test this hypothesis
out.


Why should disks not have performance counters on board?  Something
like a counter that ticks every millisecond the drive is seeking, and
then is sampled by the OS to get a usage profile.


On a more primitive level, at Gould I added "Kiviat profiling" to the
kernel.  This took maybe 5 minutes, and provided cross products of the
system activity: instead of %user, %system, %waitio and %idle it gave
cross products: %(user.io), %(user.!io), %(sys.io), %(sys.!io),
%(idle.io), %(idle.!io).
    Actually, I took it a bit further - I had two processors, so
looked at %(user1.user2.io), %((user1.sys2+user2.sys1).io), etc.  ----
any meaningful reduction ---- and similarly for multiple drives.  This
was very useful in diagnosing problems obtaining I/O overlap, etc.  
    Of course, due to the wonders of private enterprise, my old Kiviat
code is inaccessible and I no longer have UNIX source (at least our
vendors won't provide us with the source for OS updates until the
update is obsolete).


Please, systems people: if you're worried about performance, provide
things like Kiviat profiling and disk seek counters!  Provide ways of
measuring the magnitude of real performance problems.

--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]