[comp.arch] Timers

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (09/04/90)

In article <26196@bellcore.bellcore.com> mo@bellcore.com 
	(Michael O'Dell) writes:
  [describing the timer facilities of the Prisma]
>TOE - Time Of Eternity clock
>	64-bit up-counter, zeroed at machine reset but not randomly
>	loadable, incrementing continuously at the machine cycle clock
>	rate, readable with one alternate-address-space-load-double.
>QCT - Quantum Count-down Timer
>	32-bit down-counter which generates an interrupt when the counter
>	goes to zero.

The upcounter idea is just right. The downcounter, however, has
problems.

To explain why this is so, let me tell you about repetitive
scheduling.  Many real time systems have tasks which must be
performed every T ticks, and schedule creep isn't acceptable.  That
is, the 10th event can happen at 10T + epsilon, but it mustn't happen
at 10T + 10 epsilon. [There are several valid reasons for this.
Perhaps the task will sample data, which we want to graph. Perhaps we
want to FFT the data - which also puts bounds on epsilon.  Perhaps we
have to preserve a statically determined relationship between N such
tasks.]

If the clocks are coarse, repetitive scheduling is impossible or else
easy.  The problem introduced by a fine clock is determining the
_exact_ value to put in the downcounter.  Some software has figured
out the time of the next event: but then we have to know what Now is.

On some machines, Now is hard.  The time Now was scheduled to be, has
gone past: the last interupt's latency has intervened.  Worse, this
latency is affected by interrupt lockout - perhaps the kernel didn't
want to be interrupted.  On some machines, latency even depends on
who was interrupted - e.g. the floating point registers are only
saved if the interrupted task was using them.  Plus, there's the
usual hardware stuff like memory contention.  The crude answer is:

	Now = InterruptScheduledTime + FudgeFactor;

where the fudge factor makes the average about right.  The Prisma
design is obviously better - just consult the TOE register.

But are we out of the woods?  No: if we code something like

	NextTime = ...the minimum of some set of times...;
	DownCountRegister = NextTime - TOE_Register;

then we still get schedule creep.  The assignment takes a nonzero
number of clocks, and worse, it may take a variable number, given
cache misses, bus contention, higher priority interrupts, whatever.
So it's back to

	DownCountRegister = NextTime - TOE_Register - small_fudge_factor;

which will (of course) become wrong when you switch compilers...

We could eliminate the whole headache by scrapping downcounters, and
instead having a comparator on the upcounter.  When the upcounter
equals the value in its Compare Register, then there's an interrupt.
Period.  I'm asking for _very_ simple hardware.  There's no need for
greater-than semantics, because the kernel software can postcheck for
that (if TOE is there to check against):

	NextTime = ...the minimum of some set of times...;
	TOE_CompareRegister = NextTime;                /* COWABUNGA */
	if( NextTime <= TOE_Register ){  	/* check versus Now */
		...unset the compare hardware..
		...mark the process as runnable...
	}

(For five extra points: why code a postcheck, not a precheck?)

In summary: I'm recommending some very simple hardware: an upcounter,
a latch, and an interrupt when they're equal.  But the low price
isn't the virtue: the virtue is that the handler can be Right instead
of just fudged. 
-- 
Don		D.C.Lindsay

jgk@osc.COM (Joe Keane) (09/06/90)

In article <10383@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald
Lindsay) writes:
>But are we out of the woods?  No: if we code something like
>
>	NextTime = ...the minimum of some set of times...;
>	DownCountRegister = NextTime - TOE_Register;
>
>then we still get schedule creep.

I don't follow you here.  Whether or not we get schedule creep depends only on
how we compute NextTime, which is an absolute time value.  As long as this is
computed from the last desired NextTime, and not from when the last interrupt
was actually handled, we should be OK.  There is some error in setting
DownCountRegister, but it does not accumulate.

The same issue comes up with the Unix select system call.  It takes a time
interval as a parameter, while i think it should take an absolute time.

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (09/07/90)

In article <3760@osc.COM> jgk@osc.UUCP (Joe Keane) writes:
>>	NextTime = ...the minimum of some set of times...;
>>	DownCountRegister = NextTime - TOE_Register;
>>
>>then we still get schedule creep.
>I don't follow you here.  Whether or not we get schedule creep
>depends only on how we compute NextTime, which is an absolute time
>value.

You are correct, I misspoke. This case gives jitter, not creep,
because the TOE upcounter is always there to refer to, and in
absolute units.  

(As someone else pointed out, higher level software is the correct
place to keep the mapping of absolute ticks to absolute real-world
time units.)

While I'm at it, I'll point out that the code above is pseudocode.
Those are 64-bit quantities, which one obtains and manipulates
piecemeal.  Also, there's usually some futzing with the control
registers of some faraway chip.  Hence, it's that much more of a pain
to come up with the fudge factors.  Plus, if the code is
interruptible, it's hard to nail down the average case and the upper
bound.  

I'm sure that someone out there is saying "So, lock out interrupts".
That's bad: one should never lock out interrupts, given a choice.
(It makes other handler's worst-cases worse.) (Nor is it good to
design something that _has_ to be the highest priority device.) The
beauty of the upcounter/comparator design, is that the handler
_doesn't_ get more complicated (or impossible) if it has to be
interruptable.  The pseudocode I gave will work regardless.

To be fair to the downcounter design, there are timer chips out there
with a halfway fix.  They have a reload register, and the downcounter
is atomically reloaded from it at the interrupt.  It has been pointed
out to me that this fix is quite good, if you can leave some single
value permanently in the reload register.  However, if the software
has to reload the reload register, then there's a race.

-- 
Don		D.C.Lindsay

aglew@crhc.uiuc.edu (Andy Glew) (09/07/90)

Terminology:

We want regularly scheduled events at times n*Tp

Jitter errors give us events at times n*Tp+epsilon_n,
where epsilon_n is some small random variable.

Drift errors are of the form n*Tp + SUM for i from 1 to n of epsilon_i.

Note that since epsilon_i may not have a symmetric distribution the errors may accumulate.

Some of the previous posters have called drift errors creep errors.
Let's not get too hung up on terminology.

Some jitter is inherent in any timer event (except perhaps hardware
signals caused by the timer) due to software indeterminacy, caches,
bus traffic, etc. Drift, however, can be compensated for in most
conventional timers.

In a count-down timer, you can try to compensate for drift if the
timer continues to count down. You read the count down timer, compare
it to what it should have been, use it to adjust the next timer value
you write. Essentially, this gives you twice the jitter, but you can
adjust for drift long term.

If you have a reload register, it can be set for regular periods of
one cyclic process. But if you have two cyclic processes of
incommensurate cycles, you have to reload the timer register anyway,
which gives the same increased jitter problem.

If you have an atomic exchange with timer operation, you can use this
to adjust for drift - it's equivalent to reading the underflowing
timer register, with reduced indeterminacy between the timer read and
the timer load.

Atomic add to timer can be used in much the same way. All of these
let you compensate for this cycle's error on the next cycle, giving
you essentially twice the jitter.

As Donald Lindsey has written, an absolute timer with comparators can
be used to eliminate drift. It also reduces jitter, in that you only
have to worry about the indeterminacy of one timer interrupt.

A more serious example of drift is between two separate time sources -
like when your "microsecond" timer is not quite in agreement with the
wall clock. Typically, then you have to use the same sort of time
warping methodologies that the fellow from Wien was talking about.
But then you have to worry about distinguishing between people who
really want your best approximation of a rate (I want this done 1000
times a second for all of this satellite's pass-over) and those who
want your best approximation of an exact time.

--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]