[comp.arch] Computer time measurements

kevinw@portia.Stanford.EDU (Kevin Rudd) (08/22/90)

In article <AGLEW.90Aug21220304@dwarfs.crhc.uiuc.edu> aglew@dwarfs.crhc.uiuc.edu (Andy Glew) writes:
>    Very few machines have a cycle time that is really commensurate in
>the integrals with exact units like nanoseconds...
>    Very few machines have a cycle time that is really perfectly
>regular...
>But, for the people who really need high resolution timers, provide a
>"RAW" timer that ticks in whatever is the most convenient tick rate
>for your machine.  Try to make the ticks as regular as possible. Don't
>play any tricks like warping the tick rate or dropping ticks.
>Characterize the ticks as well as you can...
 -----------------------------------------

It seems to me that this is the crux:  It is difficult to implement a system
which has a stable timer (to an appropriate accuracy) as well as a reliable
means of acquiring the time when required.  In a computer system there
are many sources of timing "noise" including processor stalls, bus
conflicts, interrupts, and (not to leave out) the phase of the moon and all
of these make it difficult to precisely start and stop some timing increment.
If it is desired to make incredibly precise measurements it seems that
precision measurement equipment would be used, probably clocked off of
a high performance logic analyzer to determine the appropriate states
to mark the timing boundaries.

I am unclear as to the self measuring precision required of a computer
system.  For example, I don't see the relevence of marking file time
stamps to the 1ns increment...

   --Kevin

[If ignorance is bliss then we must all be very happy...]

moss@cs.umass.edu (Eliot Moss) (08/22/90)

I do software performance measurement and would *like* resolution down to the
clock rate of the machine. Personally, I generally want to include the time
taken by pipeline stalls, cache misses, etc., since that is relevant to the
user. I guess what I would really like is elapsed (wall clock) time, cpu time
for the process (split into user and system time), and possibly counters of
other things (instructions executed, memory cycles (maybe split into
reads/writes), cache hits/misses, page translation hits/misses, etc.). I don't
think any of this is necessary *hard* to do, but it does take chip real
estate. The counters should be readable with ordinary instructions, but maybe
settable only with special ones (though if kept on a per process basis, a
process can only screw up itself).

The most important items for general use are elapsed and cpu time, with
resolution down to the machine clock cycle time. Except on machines that
stretch clocks (as opposed to inserting "wait states"), this is not
technologically difficult, though the number of bits required may necessitate
an atomic operation to read the counter being sampled into a special read out
register, than can then be examined at leisure (and similarly for setting).

At current speeds, I can probably live with 1 microsecond or 100 ns
resolution, but it won't be long before we'll need 1 ns resolution or finer.

I should add that all of this is useful to me for measuring the speed of
execution of short blocks of code. I use the numbers to decide on different
ways of implementing things for advanced programming languages. Repeating
operations over and over tends to lead to distorted measurements, since
repeated loops tend to become cache resident more than they might in an actual
program, etc.
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206; Moss@cs.umass.edu

meissner@osf.org (Michael Meissner) (08/22/90)

In article <1990Aug22.044826.18572@portia.Stanford.EDU>
kevinw@portia.Stanford.EDU (Kevin Rudd) writes:

| I am unclear as to the self measuring precision required of a computer
| system.  For example, I don't see the relevence of marking file time
| stamps to the 1ns increment...

But I already see the need for at least microsecond resolution in
filestamps.  A second is fairly long these days with the faster
processors, and things like make are forced to use second
granualarity.  Unfortunately, the POSIX committee doesn't agree with
me, and outlawed extra fields in the stat/utime structures.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Do apple growers tell their kids money doesn't grow on bushes?

aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/22/90)

    I do software performance measurement and would *like* resolution down to the
    clock rate of the machine. Personally, I generally want to include the time
    taken by pipeline stalls, cache misses, etc., since that is relevant to the
    user. I guess what I would really like is elapsed (wall clock) time, cpu time
    for the process (split into user and system time), and possibly counters of
    other things (instructions executed, memory cycles (maybe split into
    reads/writes), cache hits/misses, page translation hits/misses, etc.). I don't
    think any of this is necessary *hard* to do, but it does take chip real
    estate. The counters should be readable with ordinary instructions, but maybe
    settable only with special ones (though if kept on a per process basis, a
    process can only screw up itself).


    


Here's a fairly coherent schema for timers, merged from the best features 
of several machines:
    Provide one, accurate, real time timer.
    Provide an offset register settable by the OS.
    Provide an instruction that atomically reads the real-time timer.
    Provide an instruction that atomically reads the sum of the
    	real-time timer and the offset.
    	This gives you virtual CPU time.  Ideally, you would be able
to read both real and virtual time atomically (and systems like the
i860 LOCK operation let you do that), but you can finesse it as
follows:
    Read Real
    Read Virtual
If you get interrupted between the real and virtual readings, your OS will account for it.

The Gould NP1 provided a machine cycle resolution timer with an offset
register, but only one read operation.  So, the first version of the
OS had an offset, to read virtual time. The second version of the OS
always had the offset as 0, to read real time.  I think eventually that 
it was configured on a per process basis. But everyone wanted both.

A note: the covert channel security guys will want you to provide a
mask to remove the low order bits. They want to prevent high precision
timings being made.



Hardware costs:
    Yes, but there are a lot of things that you can do in software to
reduce the hardware costs.
    As I was trying to say in my first post, I don't want all the hardware
that would be necessary to give me an accurate ns timer - ie. I don't
want the portable interface in hardware.  Give it to me raw.
    The carry chain for fast ticking can be simplified - let me read
the timer in carry-save format! (If you can do big reads this is fine.
If you cannot read twice the width of the timer (carry-save) then
tricks like those below can be applied).


    The most important items for general use are elapsed and cpu time, with
    resolution down to the machine clock cycle time. Except on machines that
    stretch clocks (as opposed to inserting "wait states"), this is not
    technologically difficult, though the number of bits required may necessitate
    an atomic operation to read the counter being sampled into a special read out
    register, than can then be examined at leisure (and similarly for setting).

Ideally, on a 64 bit machine, we will be able to read 64 bit timers
atomically.  (And damn the board designer who puts a timer across an 8
bit interface, so that you have to stop it to read it).

Although, on systems that cannot atomically read the entire timer, I've had
goot luck with timestamps formed as follows:

    Read HIGH-PART -> timestamp.high1
    Read LOW-PART -> timestamp.low
    Read HIGH-PART -> timestamp.high2

If you can guarantee that there is no process interrupt between these
operations (in the kernel that's easy), then postprocessing can
compare high1 and high2.  If same, no problem.  If different, they
usually only differ by one, and with assumptions about how quickly
rollover can occur you can figure out what the true time is.  Of
course, the more of this stuff you have to do, the more LSBs you have
to throw out.
    

    I should add that all of this is useful to me for measuring the speed of
    execution of short blocks of code. I use the numbers to decide on different
    ways of implementing things for advanced programming languages. Repeating
    operations over and over tends to lead to distorted measurements, since
    repeated loops tend to become cache resident more than they might in an actual
    program, etc.

"Measurement of repetition is not repetition of measurement" (Eugene Miya?)
--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]

eugene@wilbur.nas.nasa.gov (Eugene N. Miya) (08/23/90)

>| I am unclear as to the self measuring precision required of a computer
>| system.  For example, I don't see the relevence of marking file time
>| stamps to the 1ns increment...

I did not see the original post.  News has been flakey and I've not been
reading all of c.a.
See a paper I wrote for the Usenix 1988 Supercomputing Workshop.

To quote William Hewlett gave at an MIT graduation:
	"Sometimes noise is significant."
I did not go to MIT, but I read this neat quote.

Basically, there are in my opinion 5 classes of measurement
environments:

A environments: have the best facilities: cycle time clocks,
non-intrusive performance measuring hardware, software to use all this
and other software tools.  Examples: Cray Y-MP, Cray-X-MP, some machines
which never saw the light of day.  Other machines I've under
non-disclosure.  A machines can migrate to B environments as better "A"
machines come out.  I can tell you half a dozen Cray HPM limitations
(they know them).

B environments: Cycle time clock and some software tools. Cray-2,
Cray-1, Convex, some IBM hardware.  AT least good compilers, etc.

C environments: average.  50/60Hz clock average software  Not great:
IBM PC, early workstations, VAX-11/780.  Lots of variance in cycle.

D environments: machines without clocks, poorer software, etc.

E environments: components, not complete systems, but something to be
measured.

VLSI was a perfection of linear measurement: micron space realm.  Unless
similar improvements take place in time, you don't get faster machines.
Fortunately people at places like the NBS realize this and they make
things like atomic clocks.  If you are not interested in faster
machines, just ignore this posting.

Andy: it was close, real close ;^).  Not quite those words, but close.

--e. nobuo miya, NASA Ames Research Center, eugene@orville.nas.nasa.gov
  {uunet,mailrus,other gateways}!ames!eugene

sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (08/23/90)

In article <7945@amelia.nas.nasa.gov>, eugene@wilbur.nas.nasa.gov (Eugene N. Miya) writes:

>VLSI was a perfection of linear measurement: micron space realm.  Unless
>similar improvements take place in time, you don't get faster machines.
>Fortunately people at places like the NBS realize this and they make
>things like atomic clocks.  If you are not interested in faster
>machines, just ignore this posting.

Are you saying future Supercomputers (and thereby, in 5-7 years, workstations)
will have cesium-clocks? ;-) 

henry@zoo.toronto.edu (Henry Spencer) (08/23/90)

In article <MEISSNER.90Aug22114425@osf.osf.org> meissner@osf.org (Michael Meissner) writes:
>... A second is fairly long these days with the faster
>processors, and things like make are forced to use second
>granualarity.  Unfortunately, the POSIX committee doesn't agree with
>me, and outlawed extra fields in the stat/utime structures.

My understanding was that only the utime structure is specifically
forbidden to have extras, and for it there wasn't much choice:  since
it is fed to the kernel, not obtained from it, there is an unsolvable
problem of how you (portably!) fill in mysterious non-standard extra
fields so that the kernel won't see garbage in them.
-- 
Committees do harm merely by existing. | Henry Spencer at U of Toronto Zoology
                       -Freeman Dyson  |  henry@zoo.toronto.edu   utzoo!henry

srg@quick.com (Spencer Garrett) (08/26/90)

In article <1990Aug22.044826.18572@portia.Stanford.EDU>, kevinw@portia.Stanford.EDU (Kevin Rudd) writes:
> I am unclear as to the self measuring precision required of a computer
> system.  For example, I don't see the relevence of marking file time
> stamps to the 1ns increment...

I do.  I would like to be able to handle timestamps as follows:

Express timestamps in nominal nanoseconds, using 64 bit numbers.  This
allows timestamps to be valid for over 290 years from some epoch and
also both positive and negative differences within that span.  The
actual tick rate of the computer in question will be longer, probably
much longer (e.g. 100 Hz), and a tick will cause the timestamp value
to jump ahead to the next appropriate value.  Ticks will not likely
have the same value every time.  Every time someone needs a timestamp,
the current value of the counter is returned, *and then the counter
is incremented*.  It is necessary that no more than one timestamp be
issued per nanosecond.  This is not likely ever to be a bottleneck.
In this manner, timestamps are both unique and ordered, and are valid
times to within the precision of a normal coarse line clock.  Folks
like Dave Mills and Andy Glew will still need a much more precise
source of time, but the scheme I propose is perfectly adequate for
things like timestamps, and is easily implemented on any computer.

throopw@sheol.UUCP (Wayne Throop) (08/27/90)

> From: meissner@osf.org (Michael Meissner)
> But I already see the need for at least microsecond resolution in
> filestamps.  A second is fairly long these days with the faster
> processors, and things like make are forced to use second
> ganualarity.

Well... I agree that computers can accomplish quite a lot in a second
these days, and recording filesystem events at second granularity is
getting about as useful as recording them with hour (or maybe even day)
granularity.

BUT... the fact that make uses the filesystem's timestamp as an
up-to-date indicator is an abomination.  It simply isn't a reasonable
criteria.  Consider the case of loading (or otherwise retrieving) a
.h file from tape (or other backup, archive, or configuration management
database).  The archiver (or whatever) has every reasonable cause to
preserve the date on that .h file, but if it does, make can become
arbitrarily confused.

Clearly, make should NOT depend on filesystem timestamps alone to
store the state of a build.  It badly needs a "lookaside database"
of additional information.

But then... that's an old pet peeve of mine...
--
Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org