[comp.arch] 64 bits for times....

mo@messy.bellcore.com (Michael O'Dell) (08/10/90)

One of the nice things about using 64 bits for the time is that
you can then put it in nanoseconds - which you *almost* really
need on really fast machines. (100ns might be ok, but the
difference is still contained in 64 bits, so just do it!!)

	-Mike

dhoyt@vw.acs.umn.edu (08/11/90)

In article <26012@bellcore.bellcore.com>, mo@messy.bellcore.com (Michael O'Dell) writes...
>One of the nice things about using 64 bits for the time is that
>you can then put it in nanoseconds - which you *almost* really
>need on really fast machines. (100ns might be ok, but the
>difference is still contained in 64 bits, so just do it!!)

Actually you will still want to quantities: date and time.  A date, measured in
milliseconds or microseconds to handle dates and universal time.  That would
handle most people, with the execption of the bang/wimper types.  You would
also want an interval time.  This ideally would be in sub-picosecond units,
perhaps as a distance, even in this day and age. That would allow your OS to
run (and report) the testing equitment for new ssd's, particle accelerators, 50
meter dashes and other transient experiments in a consistant, even manner.

david paul hoyt | dhoyt@vx.acs.umn.edu | dhoyt@umnacvx.bitnet

colin@array.UUCP (Colin Plumb) (08/12/90)

Well, than, it's a good thing that Unix times are unsigned, and will
last until 2106.  For accuracy, use NTP timestamps, which are 32.32 bit
fixed-point integers, giving around .2 ns accuracy.  (232830643
attoseconds, if you're fussy)  It will run out shortly after 06:20 GMT
31 Jan 2106.  I can't tell you the exact time, because it depends on
the number of leap seconds used until then - it's based on atomic time,
while GMT is astronomical and the offset from atomic time is
periodically diddled.
-- 
	-Colin

andrew@alice.UUCP (Andrew Hume) (08/15/90)

In article <26012@bellcore.bellcore.com>, mo@messy.bellcore.com (Michael O'Dell) writes:
> One of the nice things about using 64 bits for the time is that
> you can then put it in nanoseconds - which you *almost* really
> need on really fast machines. (100ns might be ok, but the
> difference is still contained in 64 bits, so just do it!!)
> 
> 	-Mike


	this is nearly true. it is clear that support for high resolution time
is needed and a quanta around 1-10ns is about right. however, the problem is
that all the work around ISO (and there is a LOT of it) on date/time formats
varies considerably on the number of bits required but is tending towards
128 bits (high resolution + all dates (including BC)). i note in passing
that VMS (i think) has some funny date like 1858 as its epoch, the so-called
smithsonian time.
	i also note that in the third edition of unix, when the time was measured
in clock ticks (and thus wrapped around every 2.? yrs), ken proposed to deal
with wraparound by changing the epoch in the manual and running a special
fsck-like program that subtracted a year from every inode.

aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/22/90)

..> Time formats

While my benchmarks chug away, mind if I have my say about time?

Time formats are (1) something that I have strong opinions about, and
(2) something that I believe I know quite a bit about, since I have
spent most of my career making exact time measurements on computers in
one form or another, wrestling with the various time formats.

A 64 bit, 1 bit per nanosecond (or 128 bit, 1 bit per picosecond??),
time format is great FOR PORTABLE USES OF TIME. Like timestamping
files, maybe even a little bit of low accuracy performance
measurement.
    Such a fixed format, though is EVIL and MISLEADING for high
accuracy time measurement. Here are some reasons:

    Very few machines have a cycle time that is really commensurate in
the integrals with exact units like nanoseconds.  Typically, such
machines have cycle times that are, say, 1.0012 ns long.  Now, very
few hardware designers are going to put a divider in to account for
this small deviation - instead, they'll just use a counter, and assume
that the deviation is negligible, or can be handled some other way.
    In other words, a hardware designer who says that he is giving you
an exactly ns clock is probably *lying*. The best he can do is give
you a clock accurate to several ppm (or parts in 10^13 - you just move
the point around).  There is a science/engineering discipline, called
metrology (or horology) that specializes in how to make really
accurate measurements. Most real designers are not (and do not need to
be) trained metrologists.
    Who cares about parts in 1E13?  The guy who is trying to make
really accurate measurements.  I have been able to make measurements
where I have been able to see the effect of a single cache miss in a
long section of code - after I have gone through the processes of
converting "Nanoseconds according to the processor" into "the closest
thing I can get to real nanoseconds".
    Occasionally one reads papers where it is obvious that the
researchers did not go through this process, ie., where they assumed
that the hardware reported nanoseconds were real nanoseconds.  It
would be better if we were just honest and admitted that machines
don't run in nanoseconds - rather, they run in whatever is the fastest
convenient time for the part.

    Even if vendor A says "I can build a nanosecond clock more regular
than any constraint you put on it", vendor B may not be able to, due
to differences in technology or format.

    Very few machines have a cycle time that is really perfectly
regular.  Ie. that supposedly nanosecond clock might be 0.999999867
now, and 1.0000001234 sometime later, according to changes in
temperature, humidity, the local EM environment, whatever.
    I have observed such effects on several real systems, both CMOS
and NMOS.
    Such effects are well known, which is why most timer hardware has
a "timer correction" facility.  Eg. when you compare your clock to
another clock, and find that you are 0.00000001256 out of synch, you
tell your clock to shorten its cycle by 0.00000000001 every tick - ie.
to tick faster.  So, in addition to the physical/environmental
variations in timer cycle, you have a designed in time warp.  You
can't make measurements of a higher resolution than this, without
standing on your head.
    To use a software example that is probably more familiar to the
readers of this group: the BSD "adjtime()" facility warps the meaning
of time, so that naonseconds are not consrant.
    Instead of warping the clock rate, which is difficult to do, some
timers drop a tick every N ticks or so.  So, therefore, your timer is
not accurate to its least significant bit - a delta of 2LSBs might
mean a 1ns difference instead of 2ns, if you were unlucky.


    Provide your "high resolution" 1 ns per bit timer if you must.

But, for the people who really need high resolution timers, provide a
"RAW" timer that ticks in whatever is the most convenient tick rate
for your machine.  Try to make the ticks as regular as possible. Don't
play any tricks like warping the tick rate or dropping ticks.
Characterize the ticks as well as you can.  Provide a software library
to convert from this RAW timer format to the portable timer format.
Ensure that the software library does not just linearly scale ticks to
real time, but instead can inter/extrapolate on a curve fitted between
RAW timer values and certain calibration points in real time.


Ironically enough, less hardware is required for this RAW timer than
for a canonical bit per ns timer - and what hardware there is should
all be devoted towards making the timer as accurate and regular as
possible, rather than scaling into some "portable" format.  Let
software do the mapping.

--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]

hascall@cs.iastate.edu (John Hascall) (08/22/90)

In article <11187@alice.UUCP> andrew@alice.UUCP (Andrew Hume) writes:
}In article <26012@bellcore.bellcore.com>, mo@messy.bellcore.com (Michael O'Dell) writes:
}> One of the nice things about using 64 bits for the time is that
}> you can then put it in nanoseconds - which you *almost* really
}> need on really fast machines. (100ns might be ok, but the
}> difference is still contained in 64 bits, so just do it!!)

}that VMS (i think) has some funny date like 1858 as its epoch, the so-called
}smithsonian time.

   VMS keeps time in 64 bits (really 63, negative times are "delta times"), 
   in 100 nSec units, since 17 Nov 1858 (when the calendar jumped 11 days?).

}   i also note that in the third edition of unix, when the time was measured
}in clock ticks (and thus wrapped around every 2.? yrs), ken proposed to deal
}with wraparound by changing the epoch in the manual and running a special
}fsck-like program that subtracted a year from every inode.

   Boy, this really loses today now that clocks tick faster than 60 HZ
(60 HZ = 828 days, 256 HZ = 194 days, 1000 HZ = 50 days).

John Hascall  /  Project Vincent  /  Iowa State University Comp Ctr
john@iastate.edu  /  hascall@atanasoff.cs.iastate.edu

rpw3@rigden.wpd.sgi.com (Rob Warnock) (08/23/90)

In article <2506@dino.cs.iastate.edu> hascall@cs.iastate.edu
(John Hascall) writes:
+---------------
| }that VMS (i think) has some funny date like 1858 as its epoch, the so-called
| }smithsonian time.
|    VMS keeps time in 64 bits (really 63, negative times are "delta times"), 
|    in 100 nSec units, since 17 Nov 1858 (when the calendar jumped 11 days?).
+---------------

Nice try, but no go. 17 Nov 1858 was the date of the first (recorded) high-
quality astronomical photograph. It is used as Day Zero for quite a few
systems. The DEC PDP-10 also used that as Day Zero, b.t.w. (Didn't CDC, too?)

-Rob

-----
Rob Warnock, MS-9U/510		rpw3@sgi.com		rpw3@pei.com
Silicon Graphics, Inc.		(415)335-1673		Protocol Engines, Inc.
2011 N. Shoreline Blvd.
Mountain View, CA  94039-7311

seanf@sco.COM (Sean Fagan) (08/23/90)

In article <26012@bellcore.bellcore.com>, mo@messy.bellcore.com (Michael O'Dell) writes:
> One of the nice things about using 64 bits for the time is that
> you can then put it in nanoseconds - which you *almost* really
> need on really fast machines. (100ns might be ok, but the
> difference is still contained in 64 bits, so just do it!!)

The Elxsi, a rather nice machine (it has no supervisor mode!), has a 50ns
clock, and a 64-bit clock register.  If you want to find out *exactly* how
many clock-ticks an instruction takes, you do something like:

	ld.l	r1, CLOCK
	<instr>
	ld.l	r2, CLOCK
	sub.l	r3, r1, r2

I'm guessing at the syntax, and I'm *sure* it's wrong, but you get the
general idea.

Rather useful, actually.

-- 
Sean Eric Fagan  | "let's face it, finding yourself dead is one 
seanf@sco.COM    |   of life's more difficult moments."
uunet!sco!seanf  |   -- Mark Leeper, reviewing _Ghost_
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

jkenton@pinocchio.encore.com (Jeff Kenton) (08/23/90)

From article <67535@sgi.sgi.com>, by rpw3@rigden.wpd.sgi.com (Rob Warnock):
> In article <2506@dino.cs.iastate.edu> hascall@cs.iastate.edu
> (John Hascall) writes:
> +---------------
> | }that VMS (i think) has some funny date like 1858 as its epoch, the so-called
> | }smithsonian time.
> |    VMS keeps time in 64 bits (really 63, negative times are "delta times"), 
> |    in 100 nSec units, since 17 Nov 1858 (when the calendar jumped 11 days?).
> +---------------
> 
> Nice try, but no go. 17 Nov 1858 was the date of the first (recorded) high-
> quality astronomical photograph. It is used as Day Zero for quite a few
> systems. The DEC PDP-10 also used that as Day Zero, b.t.w. (Didn't CDC, too?)
> 

The magic date John is thinking of is September 1752 (try 'man cal'):

cal 9 1752

   September 1752
 S  M Tu  W Th  F  S
       1  2 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30







- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      jeff kenton  ---	temporarily at jkenton@pinocchio.encore.com	 
		   ---  always at (617) 894-4508  ---
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

cet1@cl.cam.ac.uk (C.E. Thompson) (08/23/90)

In article <2506@dino.cs.iastate.edu> hascall@cs.iastate.edu (John Hascall) writes:
>In article <11187@alice.UUCP> andrew@alice.UUCP (Andrew Hume) writes:
>}that VMS (i think) has some funny date like 1858 as its epoch, the so-called
>}smithsonian time.
>
>   VMS keeps time in 64 bits (really 63, negative times are "delta times"), 
>   in 100 nSec units, since 17 Nov 1858 (when the calendar jumped 11 days?).
>
The VMS time base is Julian day 2,400,000. Julian day numbers have been
(maybe they still are) popular with astronomers.

The change from the Julian (different Jules, of course) calendar to the
Gregorian calendar happened in England and the American colonies from 
2-14 September 1752. I thought everyone knew that :-)

Chris Thompson
JANET:    cet1@uk.ac.cam.phx
Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk

rtrauben@cortex.Eng.Sun.COM (Richard Trauben) (08/24/90)

>seanf@sco.COM writes:
>If you want to find out *exactly* how
>many clock-ticks an instruction takes, you do something like:
>
>       ld.l    r1, CLOCK
>       <instr>
>       ld.l    r2, CLOCK
>       sub.l   r3, r1, r2
>

Nope. (close but no cigar...)

You just measured the execution time sum of TWO instructions:
<instr> PLUS <ld> execution time where the <ld> includes bus 
arbitration and memory access time to the TOD clock resource.  

In most systems the latter term dominates. Unless <instr> is the
kind of instruction you want to drop anyway. -:) 

-Richard

karsh@trifolium.esd.sgi.com (Bruce Karsh) (08/24/90)

seanf@sco.COM writes:
>If you want to find out *exactly* how
>many clock-ticks an instruction takes, you do something like:
>
>       ld.l    r1, CLOCK
>       <instr>
>       ld.l    r2, CLOCK
>       sub.l   r3, r1, r2

In article <703@exodus.Eng.Sun.COM> rtrauben@cortex.Eng.Sun.COM (Richard Trauben) writes:
>Nope. (close but no cigar...)
>You just measured the execution time sum of TWO instructions:
><instr> PLUS <ld> execution time where the <ld> includes bus 
>arbitration and memory access time to the TOD clock resource.  

How about:

       ld.l    r1, CLOCK
       <instr>
       ld.l    r2, CLOCK
       sub.l   r3, r1, r2	; r3 = Tinstr + Tclockfetch
       ld.l    r1, CLOCK
       <instr>
       <instr>
       ld.l    r2, CLOCK
       sub.l   r4, r1, r2	; r4 = 2*Tinstr + Tclockfetch
       sub.l   r5, r3, r4	; r5 = r4 - r3 = Tinstr

Of course, executing <instr> more than once may have some timing side effects
with respect to the cache.  Hence, you should probably ensure that this
code and all the arguments to <instr> are already in the cache.

			Bruce Karsh
			karsh@sgi.com

rminnich@udel.edu (Ronald G Minnich) (08/24/90)

In article <703@exodus.Eng.Sun.COM>, rtrauben@cortex.Eng.Sun.COM
(Richard Trauben) writes:
|> >       ld.l    r1, CLOCK
|> >       <instr>
|> >       ld.l    r2, CLOCK
|> You just measured the execution time sum of TWO instructions:
|> <instr> PLUS <ld> execution time where the <ld> includes bus 
|> arbitration and memory access time to the TOD clock resource.  

huh? 
CLOCK is a fast register right there on the processor in most cases. 
I can't imagine anyone in their right mind putting that high-res 
clock at the other 
end of a memory bus if it has any kind of resolution. 

Say it ain't so, sean!
ron

1987: We set standards, not Them. Your standard windowing system is NeUWS.
1989: We set standards, not Them. You can have X, but the UI is OpenLock.
1990: Why are you buying all those workstations from Them running Motif?

przemek@liszt.helios.nd.edu (Przemek Klosowski) (08/24/90)

In article <67633@sgi.sgi.com> karsh@trifolium.sgi.com (Bruce Karsh) writes:
>seanf@sco.COM writes:
>>If you want to find out *exactly* how
>>many clock-ticks an instruction takes, you do something like:
>>
>>       ld.l    r1, CLOCK
>>       <instr>
>>       ld.l    r2, CLOCK
>>       sub.l   r3, r1, r2
>
>In article <703@exodus.Eng.Sun.COM> rtrauben@cortex.Eng.Sun.COM (Richard Trauben) writes:
>>Nope. (close but no cigar...)
 < .. Bruce has the idea of executing instr twice ...>
How about:

       ld.l    r1, CLOCK
       <instr>
       ld.l    r2, CLOCK
       ld.l    r3, CLOCK
       sub.l   r4, r2, r1	; r4 = Tinstr + Tclockfetch
       sub.l   r5, r3, r2	; r5 = Tclockfetch
       sub.l   r6, r4, r5	; r6 = Tinstr
No side effects are involved here. Of course CLOCK cannot be cached or else :^)


--
			przemek klosowski (przemek@ndcva.cc.nd.edu)
			Physics Dept
			University of Notre Dame IN 46556

danh@halley.UUCP (Dan Hendrickson) (08/24/90)

In article <67633@sgi.sgi.com> karsh@trifolium.sgi.com (Bruce Karsh) writes:
>seanf@sco.COM writes:
>>If you want to find out *exactly* how
>>many clock-ticks an instruction takes, you do something like:
>>
>>       ld.l    r1, CLOCK
>>       <instr>
>>       ld.l    r2, CLOCK
>>       sub.l   r3, r1, r2
>
>In article <703@exodus.Eng.Sun.COM> rtrauben@cortex.Eng.Sun.COM (Richard Trauben) writes:
>>Nope. (close but no cigar...)
>>You just measured the execution time sum of TWO instructions:
>><instr> PLUS <ld> execution time where the <ld> includes bus 
>>arbitration and memory access time to the TOD clock resource.  
[stuff deleted]
>			Bruce Karsh
>			karsh@sgi.com

I believe that the point of the discussion was if you put a cycle timer "very
close" to the CPU, that is if it took a small number of cycles and always took
the same number of cycles (that is, the access did not go across some bus which
various parts of the machine were trying to use at the same time), then you
had a method of accurately measuring the number of cycles to execute an
instruction.  The only caveat on the approach is that all of the cycles must
be in the instruction cache (inst. buffers in Cray terminology).  The key
is to have the "ld.l r1,CLOCK" instruction be a register transfer, not a
memory reference.

Dan Hendrickson
Tandem Computers
Austin, TX

hawkes@mips.COM (John Hawkes) (08/25/90)

In article <32015@super.ORG> rminnich@udel.edu (Ronald G Minnich) writes:
>In article <703@exodus.Eng.Sun.COM>, rtrauben@cortex.Eng.Sun.COM
>(Richard Trauben) writes:
>|> >       ld.l    r1, CLOCK
>|> >       <instr>
>|> >       ld.l    r2, CLOCK
>|> You just measured the execution time sum of TWO instructions:
>|> <instr> PLUS <ld> execution time where the <ld> includes bus 
>|> arbitration and memory access time to the TOD clock resource.  
>
>huh? 
>CLOCK is a fast register right there on the processor in most cases. 
>I can't imagine anyone in their right mind putting that high-res 
>clock at the other 
>end of a memory bus if it has any kind of resolution. 

Actually, there is a specific Elxsi instruction to read the clock register, and
the register lives on the ALU board (the principal board of the three boards
comprising the CPU).  I don't recall the latency, but I doubt it requires more
than one or two cycles.  The Elxsi CPI is something on the order of two or
three if instructions and data are cached.
-- 

John Hawkes
{ames,decwrl}!mips!hawkes  OR  hawkes@mips.com

preston@gefion.rice.edu (Preston Briggs) (08/25/90)

In article <957@halley.UUCP> danh@halley.UUCP (Dan Hendrickson) writes:

>>seanf@sco.COM writes:
>>>If you want to find out *exactly* how
>>>many clock-ticks an instruction takes, you do something like:
>>>
>>>       ld.l    r1, CLOCK
>>>       <instr>
>>>       ld.l    r2, CLOCK
>>>       sub.l   r3, r1, r2

>I believe that the point of the discussion was if you put a cycle timer "very
>close" to the CPU, that is if it took a small number of cycles and always took
>the same number of cycles (that is, the access did not go across some bus which
>various parts of the machine were trying to use at the same time), then you
>had a method of accurately measuring the number of cycles to execute an
>instruction.  The only caveat on the approach is that all of the cycles must
>be in the instruction cache (inst. buffers in Cray terminology).  The key

I'd guess that on modern machines, the Heisenburg Uncertainty
principle comes into play.  You can't measure the time of a single
instruction usefully because the measurement code interferes
with the cache and various pipelines.  We can certainly measure how
long it takes to issue the instruction, but when does it complete?
In a particular context, it'll sometimes depend on the progress
of earlier instructions, etc.  Inserting measurement code changes the
context.

-- 
Preston Briggs				looking for the great leap forward
preston@titan.rice.edu

karsh@trifolium.esd.sgi.com (Bruce Karsh) (08/25/90)

>How about:
>
>       ld.l    r1, CLOCK
>       <instr>
>       ld.l    r2, CLOCK
>       ld.l    r3, CLOCK
>       sub.l   r4, r2, r1	; r4 = Tinstr + Tclockfetch
>       sub.l   r5, r3, r2	; r5 = Tclockfetch
>       sub.l   r6, r4, r5	; r6 = Tinstr
>No side effects are involved here. Of course CLOCK cannot be cached or else :^)

Very nice!

			Bruce Karsh
			karsh@sgi.com

mash@mips.COM (John Mashey) (08/25/90)

In article <1990Aug24.181208.29581@rice.edu> preston@gefion.rice.edu (Preston Briggs) writes:
...
>I'd guess that on modern machines, the Heisenburg Uncertainty
>principle comes into play.  You can't measure the time of a single
>instruction usefully because the measurement code interferes
>with the cache and various pipelines.  We can certainly measure how
>long it takes to issue the instruction, but when does it complete?
>In a particular context, it'll sometimes depend on the progress
>of earlier instructions, etc.  Inserting measurement code changes the
>context.

Preston is right on, and I'd say it even stronger:

Not only does the measurement code change the context, but even if it
didn't, it's ALMOST USELESS to be trying to measure the speed of
individual instructions on current machines, and not just from cache
and pipeline effects.
Let's add (at least):
	conflicts with functional units (such as write ports
		into a register file)
	memory stalls (either from back-to-back stores, or load/store,
		or store/load)
	memory stalls (from running into DRAM refresh)
	memory stalls (write-buffers, or write-back caches)
	cache misses
and maybe
	location of the instruction within the cache line

Amongst the current machines, any one of these can have some effect,
and there are plenty of others as pipelines get more complex.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/26/90)

>[Preston Briggs]
>I'd guess that on modern machines, the Heisenburg Uncertainty
>principle comes into play.  You can't measure the time of a single
>instruction usefully because the measurement code interferes
>with the cache and various pipelines.  

>[John Mashey]
>Not only does the measurement code change the context, but even if it
>didn't, it's ALMOST USELESS to be trying to measure the speed of
>individual instructions on current machines.

Please note that the guys above said that it is useless (1) to try to
measure the speed of an individual instruction, not (2) that it is
useless to try to measure the speeds of instruction aggregates to
reveal individual instruction effects.
    Or do you want to extend your statements to cover (2), John and 
Preston?  If so, then I disagree.

Experimental physics hasn't stopped since Heisenburg.  We just know a
bit more about what we can and cannot measure.
    I would never suggest timing individual instructions.  You can,
however, time sequences of instructions - basic blocks may be too
small, but critical paths through a function may be large enough
(functions like syscall() come to mind), and you can time precisely
enough to show the effects of individual instructions on these
aggregates.  Just make sure the ramp up and ramp down effects can be
accounted for or averaged out.
    Even then, measurement changes the context - but you account for
that by minimizing the measurement distortion, and designing your
measurement code so that the distortion is unidirectional - so that
you get an upper or lower bound from your measurement. If what you are
trying to do is, eg. set a strict upper bound on context switch time
(hello hard RT - and, yes, I know about the probabilistic effects of
caches) a bound is all you need.


--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]

aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/26/90)

>>How about:
>>
>>       ld.l    r1, CLOCK
>>       <instr>
>>       ld.l    r2, CLOCK
>>       ld.l    r3, CLOCK
>>       sub.l   r4, r2, r1	; r4 = Tinstr + Tclockfetch
>>       sub.l   r5, r3, r2	; r5 = Tclockfetch
>>       sub.l   r6, r4, r5	; r6 = Tinstr
>>No side effects are involved here. Of course CLOCK cannot be cached or else :^)
>
>Very nice!

Ummmmmm....... 

If CLOCK is accessed across a bus, then Tclockfetch may be affected by
bus traffic.  At least you are reading the correction right there, so
you are likely to, but not guaranteed to, get the same bus traffic.

In general, of course, on modern machines you cannot measure
individual instruction times.  But you might be able to measure the
timing effects of individual instructions on larger code sequences,
with care.

--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]

mash@mips.COM (John Mashey) (08/26/90)

In article <AGLEW.90Aug25130251@dwarfs.crhc.uiuc.edu> aglew@dwarfs.crhc.uiuc.edu (Andy Glew) writes:
>>[Preston Briggs]
>>I'd guess that on modern machines, the Heisenburg Uncertainty
>>principle comes into play.  You can't measure the time of a single
>>instruction usefully because the measurement code interferes
>>with the cache and various pipelines.  

>>[John Mashey]
>>Not only does the measurement code change the context, but even if it
>>didn't, it's ALMOST USELESS to be trying to measure the speed of
>>individual instructions on current machines.

>Please note that the guys above said that it is useless (1) to try to
>measure the speed of an individual instruction, not (2) that it is
>useless to try to measure the speeds of instruction aggregates to
>reveal individual instruction effects.
>    Or do you want to extend your statements to cover (2), John and 
>Preston?  If so, then I disagree.

No, of course not.  It is perfectly reasonable to measure aggregates,
subject to all of the caveats that have been mentioned in this
discussion so far.  The bigger & more realistic the aggregates, the
better.  In addition, it will get worse.
Hopefully, people now understand the uselessness of single-instruction
measurements on current machines.  If you agree to this, for the
kinds of pipelines that most current machines use, consider how much
worse it gets with:
	vector units
	superscalar
	superpipelined
	superscalar-superpipelined
	out-of-order execution
	speculative execution
	multi-level cache hierarchies, with various inter-level buffering
since all of these are either here already, or possibly coming soon,
in microprocessors.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

seanf@sco.COM (Sean Fagan) (08/27/90)

In article <41090@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>it's ALMOST USELESS to be trying to measure the speed of
>individual instructions on current machines
>	conflicts with functional units (such as write ports
>		into a register file)

Well, the Elxsi doesn't have pipelines or functional units (serial execution
only), so those don't come into play.

Doesn't mean John's points are invalid, I just wanted to point that out 8-).

-- 
Sean Eric Fagan  | "let's face it, finding yourself dead is one 
seanf@sco.COM    |   of life's more difficult moments."
uunet!sco!seanf  |   -- Mark Leeper, reviewing _Ghost_
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

gillies@m.cs.uiuc.edu (08/28/90)

I think it is rather ridiculous for the ISO to support timing accuracy
in the nanoseconds for pre-history.  Until very recently, we couldn't
measure time in hundredths of a second -- why would we want to measure
time in nanoseconds back into prehistory?  What an idiotic idea.

Also, tell me when it will be possible to synchronize all the computer
clocks with a nano-second accuracy atomic clock.  How will such a
clock be reset later?

My conclusion:  ISO should specify a nanosecond relative timer, and a
much coaser absolute timer (like milliseconds).


Don W. Gillies, Dept. of Computer Science, University of Illinois
1304 W. Springfield, Urbana, Ill 61801      
ARPA: gillies@cs.uiuc.edu   UUCP: {uunet,harvard}!uiucdcs!gillies

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (08/29/90)

In article <3300165@m.cs.uiuc.edu> gillies@m.cs.uiuc.edu writes:

  [ strong statement on stupidity of ns dating ]

| Also, tell me when it will be possible to synchronize all the computer
| clocks with a nano-second accuracy atomic clock.  How will such a
| clock be reset later?

  I guess the first question is when will there be a benefit from doing
so? And how long will it stay in sync?
| 
| My conclusion:  ISO should specify a nanosecond relative timer, and a
| much coaser absolute timer (like milliseconds).

  The timer should not be more accurate than the accuracy of the
setting. Unless there's a good way to set such a timer within a ms
*repeatably* then why worry about how accurately you can measure it? The
relative timer is important, the absolute timer leads people to believe
they have accuracy they don't.

  Yes I know about using phone lines and radio to distribute time, with
and without hardware ping and delay compensation. It is still hard to be
sure you're within a ms. Fortunately getting within 50 ms seems to be
adequate for most things, which is easy to do.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

davecb@yunexus.YorkU.CA (David Collier-Brown) (08/29/90)

>In article <3300165@m.cs.uiuc.edu> gillies@m.cs.uiuc.edu writes:

>  [ strong statement on stupidity of ns dating ]

>| Also, tell me when it will be possible to synchronize all the computer
>| clocks with a nano-second accuracy atomic clock.  How will such a
>| clock be reset later?

>  I guess the first question is when will there be a benefit from doing
>so? And how long will it stay in sync?
>| 
>| My conclusion:  ISO should specify a nanosecond relative timer, and a
>| much coaser absolute timer (like milliseconds).

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>  The timer should not be more accurate than the accuracy of the
>setting. Unless there's a good way to set such a timer within a ms
>*repeatably* then why worry about how accurately you can measure it? The
>relative timer is important, the absolute timer leads people to believe
>they have accuracy they don't.

  Er, this is a solved problem in software engineering... You have an
architecture-specific constant that tells you how may bits are significant
to the ``right'' of the decimal point ,and a function that returns only
those bits non-zero.
  The application can use a constant-size time variable, and discover
how much of it is significant when necessary.

  If I were writing this in an object-oriented language (:-)), I'd
define it thusly:
	declare clock_$absolute_machine_time entry() fixed decimal (72,36),
		clock_$accuracy entry() fixed binary (6);

--dave (pardon me if I got the PL/1 wrong, but I couldn't resist
	bring up the 1970s ``state of practice'' solution) c-b
-- 
David Collier-Brown,  | davecb@Nexus.YorkU.CA, ...!yunexus!davecb or
72 Abitibi Ave.,      | {toronto area...}lethe!dave 
Willowdale, Ontario,  | "And the next 8 man-months came up like
CANADA. 416-223-8968  |   thunder across the bay" --david kipling

bdg@tetons.UUCP (Blaine Gaither) (08/29/90)

I must agree with aglew.  You need high frequency timers (= cpu clock).

Even if you are timing small routines the extra precision is needed to
help you determine whether or not what you are observing is indeed
what you wish to observe.

I have seen countless situations where analysts have assumed some
small anomaly was "handling timer interrupts, .."  only to find out
it was a indication of a major source of error.

A final important reason for high precision timers is to help
architects manage the software implementation.  The cavalier way in
which OS types often treat timing facility implementation is only
exacerbated by letting them hide behind a coarse grain clock.

aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/30/90)

>A final important reason for high precision timers is to help
>architects manage the software implementation.  The cavalier way in
>which OS types often treat timing facility implementation is only
>exacerbated by letting them hide behind a coarse grain clock.

If only the architects would give us a high precision timer, OSers
would not treat timing (and accounting) cavalierly.
    But, when you create the high precision timer, you also have to
budget for the OS development time necessary to undo years of reliance
on low-precision timers because they were the only thing going.

(Hi, Blaine!  Just ribbing you...)
--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]

alex@vmars.tuwien.ac.at (Alexander Vrchoticky) (08/30/90)

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:

[about synchronizing clocks to nanosecond accuracy]
>  I guess the first question is when will there be a benefit from doing
>so? And how long will it stay in sync?

A global sense of time is a powerful concept in distributed real-time systems. 
The synchronization accuracy achievable depends on a lot of factors,
most notably the variability of the communication delay and the 
drift rates of the local clocks. On local area networks a synchronization
accuracy in the order of a few microseconds can be achieved with just a little 
hardware support and with very reasonable overhead. 
Given the advances of computer architecture in the past I don't 
dare say that synchronization accuracy in the order of nanoseconds will
not be achieved. 

>  The timer should not be more accurate than the accuracy of the
>setting. Unless there's a good way to set such a timer within a ms
>*repeatably* then why worry about how accurately you can measure it? The
>relative timer is important, the absolute timer leads people to believe
>they have accuracy they don't.

System calls to set timers and clocks to absolute values 
are of course nonsensical when the variability of the 
execution time of the system call itself is in the order of the 
granularity of the clock or timer or greater. 

For clocks there is a solution to the problem:
Adjust the *rate* of the clock until the correct value is reached
and maintain the correct value by corrections of the rate. 
Unfortunately 1003.4 does not specify an interface for this 
(ok, this does not really belong in comp.arch ...). 

Given such a clock the variability of the system call does *not* matter
for absolute timers (Putting aside pathological cases). 
I don't see a satisfactory solution for relative timers.
The variability of the notification is of course a problem for both types
of timers.

Note that the accuracy of *any* timer used for interval measurements
also depends on the fact that all corrections of the clock setting are 
gradual: Otherwise short durations might be measured
as if they had taken a negative amount of time.

I agree that for networks of workstations and other non-real-time 
applications a few milliseconds of accuracy are probably plenty. 
But other systems are more demanding: Real real-time systems need Real Time :-)
And they need software mechanisms to access it. 

--
Alexander Vrchoticky  Technical University Vienna, Dept. for Real-Time Systems
Voice:  +43/222/58801-8168   Fax: +43/222/569149
e-mail: alex@vmars.tuwien.ac.at or vmars!alex@relay.eu.net  (Don't use 'r'!)

aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/31/90)

>For clocks there is a solution to the problem:
>Adjust the *rate* of the clock until the correct value is reached
>and maintain the correct value by corrections of the rate. 
>Unfortunately 1003.4 does not specify an interface for this 
>(ok, this does not really belong in comp.arch ...). 

Which is exactly the sort of thing I was complaining about.

Adjust the rate for your absolute time clock that you never measure
intervals on. The clock that you basically only use for timestamping
files.

But leave your hands off the clock that I'm using to time loops,
program execution, ie. where I am differencing timer values to gain an
interval time.
    Or at least log all of your rate adjustment times, so that I can
know that a difference of 10 ticks NOW is 10.2 us, while a difference
of 10 ticks yesterday at 9pm is 9.8 us.  Doing those adjustments is a
pain, but its doable.

--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]