kevinw@portia.Stanford.EDU (Kevin Rudd) (08/22/90)
In article <AGLEW.90Aug21220304@dwarfs.crhc.uiuc.edu> aglew@dwarfs.crhc.uiuc.edu (Andy Glew) writes: > Very few machines have a cycle time that is really commensurate in >the integrals with exact units like nanoseconds... > Very few machines have a cycle time that is really perfectly >regular... >But, for the people who really need high resolution timers, provide a >"RAW" timer that ticks in whatever is the most convenient tick rate >for your machine. Try to make the ticks as regular as possible. Don't >play any tricks like warping the tick rate or dropping ticks. >Characterize the ticks as well as you can... ----------------------------------------- It seems to me that this is the crux: It is difficult to implement a system which has a stable timer (to an appropriate accuracy) as well as a reliable means of acquiring the time when required. In a computer system there are many sources of timing "noise" including processor stalls, bus conflicts, interrupts, and (not to leave out) the phase of the moon and all of these make it difficult to precisely start and stop some timing increment. If it is desired to make incredibly precise measurements it seems that precision measurement equipment would be used, probably clocked off of a high performance logic analyzer to determine the appropriate states to mark the timing boundaries. I am unclear as to the self measuring precision required of a computer system. For example, I don't see the relevence of marking file time stamps to the 1ns increment... --Kevin [If ignorance is bliss then we must all be very happy...]
moss@cs.umass.edu (Eliot Moss) (08/22/90)
I do software performance measurement and would *like* resolution down to the clock rate of the machine. Personally, I generally want to include the time taken by pipeline stalls, cache misses, etc., since that is relevant to the user. I guess what I would really like is elapsed (wall clock) time, cpu time for the process (split into user and system time), and possibly counters of other things (instructions executed, memory cycles (maybe split into reads/writes), cache hits/misses, page translation hits/misses, etc.). I don't think any of this is necessary *hard* to do, but it does take chip real estate. The counters should be readable with ordinary instructions, but maybe settable only with special ones (though if kept on a per process basis, a process can only screw up itself). The most important items for general use are elapsed and cpu time, with resolution down to the machine clock cycle time. Except on machines that stretch clocks (as opposed to inserting "wait states"), this is not technologically difficult, though the number of bits required may necessitate an atomic operation to read the counter being sampled into a special read out register, than can then be examined at leisure (and similarly for setting). At current speeds, I can probably live with 1 microsecond or 100 ns resolution, but it won't be long before we'll need 1 ns resolution or finer. I should add that all of this is useful to me for measuring the speed of execution of short blocks of code. I use the numbers to decide on different ways of implementing things for advanced programming languages. Repeating operations over and over tends to lead to distorted measurements, since repeated loops tend to become cache resident more than they might in an actual program, etc. -- J. Eliot B. Moss, Assistant Professor Department of Computer and Information Science Lederle Graduate Research Center University of Massachusetts Amherst, MA 01003 (413) 545-4206; Moss@cs.umass.edu
meissner@osf.org (Michael Meissner) (08/22/90)
In article <1990Aug22.044826.18572@portia.Stanford.EDU> kevinw@portia.Stanford.EDU (Kevin Rudd) writes: | I am unclear as to the self measuring precision required of a computer | system. For example, I don't see the relevence of marking file time | stamps to the 1ns increment... But I already see the need for at least microsecond resolution in filestamps. A second is fairly long these days with the faster processors, and things like make are forced to use second granualarity. Unfortunately, the POSIX committee doesn't agree with me, and outlawed extra fields in the stat/utime structures. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Do apple growers tell their kids money doesn't grow on bushes?
aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/22/90)
I do software performance measurement and would *like* resolution down to the clock rate of the machine. Personally, I generally want to include the time taken by pipeline stalls, cache misses, etc., since that is relevant to the user. I guess what I would really like is elapsed (wall clock) time, cpu time for the process (split into user and system time), and possibly counters of other things (instructions executed, memory cycles (maybe split into reads/writes), cache hits/misses, page translation hits/misses, etc.). I don't think any of this is necessary *hard* to do, but it does take chip real estate. The counters should be readable with ordinary instructions, but maybe settable only with special ones (though if kept on a per process basis, a process can only screw up itself). Here's a fairly coherent schema for timers, merged from the best features of several machines: Provide one, accurate, real time timer. Provide an offset register settable by the OS. Provide an instruction that atomically reads the real-time timer. Provide an instruction that atomically reads the sum of the real-time timer and the offset. This gives you virtual CPU time. Ideally, you would be able to read both real and virtual time atomically (and systems like the i860 LOCK operation let you do that), but you can finesse it as follows: Read Real Read Virtual If you get interrupted between the real and virtual readings, your OS will account for it. The Gould NP1 provided a machine cycle resolution timer with an offset register, but only one read operation. So, the first version of the OS had an offset, to read virtual time. The second version of the OS always had the offset as 0, to read real time. I think eventually that it was configured on a per process basis. But everyone wanted both. A note: the covert channel security guys will want you to provide a mask to remove the low order bits. They want to prevent high precision timings being made. Hardware costs: Yes, but there are a lot of things that you can do in software to reduce the hardware costs. As I was trying to say in my first post, I don't want all the hardware that would be necessary to give me an accurate ns timer - ie. I don't want the portable interface in hardware. Give it to me raw. The carry chain for fast ticking can be simplified - let me read the timer in carry-save format! (If you can do big reads this is fine. If you cannot read twice the width of the timer (carry-save) then tricks like those below can be applied). The most important items for general use are elapsed and cpu time, with resolution down to the machine clock cycle time. Except on machines that stretch clocks (as opposed to inserting "wait states"), this is not technologically difficult, though the number of bits required may necessitate an atomic operation to read the counter being sampled into a special read out register, than can then be examined at leisure (and similarly for setting). Ideally, on a 64 bit machine, we will be able to read 64 bit timers atomically. (And damn the board designer who puts a timer across an 8 bit interface, so that you have to stop it to read it). Although, on systems that cannot atomically read the entire timer, I've had goot luck with timestamps formed as follows: Read HIGH-PART -> timestamp.high1 Read LOW-PART -> timestamp.low Read HIGH-PART -> timestamp.high2 If you can guarantee that there is no process interrupt between these operations (in the kernel that's easy), then postprocessing can compare high1 and high2. If same, no problem. If different, they usually only differ by one, and with assumptions about how quickly rollover can occur you can figure out what the true time is. Of course, the more of this stuff you have to do, the more LSBs you have to throw out. I should add that all of this is useful to me for measuring the speed of execution of short blocks of code. I use the numbers to decide on different ways of implementing things for advanced programming languages. Repeating operations over and over tends to lead to distorted measurements, since repeated loops tend to become cache resident more than they might in an actual program, etc. "Measurement of repetition is not repetition of measurement" (Eugene Miya?) -- Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]
eugene@wilbur.nas.nasa.gov (Eugene N. Miya) (08/23/90)
>| I am unclear as to the self measuring precision required of a computer >| system. For example, I don't see the relevence of marking file time >| stamps to the 1ns increment... I did not see the original post. News has been flakey and I've not been reading all of c.a. See a paper I wrote for the Usenix 1988 Supercomputing Workshop. To quote William Hewlett gave at an MIT graduation: "Sometimes noise is significant." I did not go to MIT, but I read this neat quote. Basically, there are in my opinion 5 classes of measurement environments: A environments: have the best facilities: cycle time clocks, non-intrusive performance measuring hardware, software to use all this and other software tools. Examples: Cray Y-MP, Cray-X-MP, some machines which never saw the light of day. Other machines I've under non-disclosure. A machines can migrate to B environments as better "A" machines come out. I can tell you half a dozen Cray HPM limitations (they know them). B environments: Cycle time clock and some software tools. Cray-2, Cray-1, Convex, some IBM hardware. AT least good compilers, etc. C environments: average. 50/60Hz clock average software Not great: IBM PC, early workstations, VAX-11/780. Lots of variance in cycle. D environments: machines without clocks, poorer software, etc. E environments: components, not complete systems, but something to be measured. VLSI was a perfection of linear measurement: micron space realm. Unless similar improvements take place in time, you don't get faster machines. Fortunately people at places like the NBS realize this and they make things like atomic clocks. If you are not interested in faster machines, just ignore this posting. Andy: it was close, real close ;^). Not quite those words, but close. --e. nobuo miya, NASA Ames Research Center, eugene@orville.nas.nasa.gov {uunet,mailrus,other gateways}!ames!eugene
sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (08/23/90)
In article <7945@amelia.nas.nasa.gov>, eugene@wilbur.nas.nasa.gov (Eugene N. Miya) writes: >VLSI was a perfection of linear measurement: micron space realm. Unless >similar improvements take place in time, you don't get faster machines. >Fortunately people at places like the NBS realize this and they make >things like atomic clocks. If you are not interested in faster >machines, just ignore this posting. Are you saying future Supercomputers (and thereby, in 5-7 years, workstations) will have cesium-clocks? ;-)
henry@zoo.toronto.edu (Henry Spencer) (08/23/90)
In article <MEISSNER.90Aug22114425@osf.osf.org> meissner@osf.org (Michael Meissner) writes: >... A second is fairly long these days with the faster >processors, and things like make are forced to use second >granualarity. Unfortunately, the POSIX committee doesn't agree with >me, and outlawed extra fields in the stat/utime structures. My understanding was that only the utime structure is specifically forbidden to have extras, and for it there wasn't much choice: since it is fed to the kernel, not obtained from it, there is an unsolvable problem of how you (portably!) fill in mysterious non-standard extra fields so that the kernel won't see garbage in them. -- Committees do harm merely by existing. | Henry Spencer at U of Toronto Zoology -Freeman Dyson | henry@zoo.toronto.edu utzoo!henry
srg@quick.com (Spencer Garrett) (08/26/90)
In article <1990Aug22.044826.18572@portia.Stanford.EDU>, kevinw@portia.Stanford.EDU (Kevin Rudd) writes: > I am unclear as to the self measuring precision required of a computer > system. For example, I don't see the relevence of marking file time > stamps to the 1ns increment... I do. I would like to be able to handle timestamps as follows: Express timestamps in nominal nanoseconds, using 64 bit numbers. This allows timestamps to be valid for over 290 years from some epoch and also both positive and negative differences within that span. The actual tick rate of the computer in question will be longer, probably much longer (e.g. 100 Hz), and a tick will cause the timestamp value to jump ahead to the next appropriate value. Ticks will not likely have the same value every time. Every time someone needs a timestamp, the current value of the counter is returned, *and then the counter is incremented*. It is necessary that no more than one timestamp be issued per nanosecond. This is not likely ever to be a bottleneck. In this manner, timestamps are both unique and ordered, and are valid times to within the precision of a normal coarse line clock. Folks like Dave Mills and Andy Glew will still need a much more precise source of time, but the scheme I propose is perfectly adequate for things like timestamps, and is easily implemented on any computer.
throopw@sheol.UUCP (Wayne Throop) (08/27/90)
> From: meissner@osf.org (Michael Meissner) > But I already see the need for at least microsecond resolution in > filestamps. A second is fairly long these days with the faster > processors, and things like make are forced to use second > ganualarity. Well... I agree that computers can accomplish quite a lot in a second these days, and recording filesystem events at second granularity is getting about as useful as recording them with hour (or maybe even day) granularity. BUT... the fact that make uses the filesystem's timestamp as an up-to-date indicator is an abomination. It simply isn't a reasonable criteria. Consider the case of loading (or otherwise retrieving) a .h file from tape (or other backup, archive, or configuration management database). The archiver (or whatever) has every reasonable cause to preserve the date on that .h file, but if it does, make can become arbitrarily confused. Clearly, make should NOT depend on filesystem timestamps alone to store the state of a build. It badly needs a "lookaside database" of additional information. But then... that's an old pet peeve of mine... -- Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org