mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) (08/10/89)
In the above-referenced message, Dirk Grunwald asks about cycle counters: >I know something like this exists on the Cray X-MP; do other machines >have cycle counters as well? Not that it makes much difference, but the ETA-10 has several extra registers to keep track of cycle counts for the vector and scalar units separately. On the FSU machine, these registers are publicly accessible, and we have a utility which gives the fraction of vector cycles for each process in the system. This makes for entertaining user's group meetings: Person A:"Why don't you get your !@#$^& scalar code off of our vector computer!" Person B:"Well, your graduate student ran a 10-hour job yesterday at 0.007% vector utilization!" etc.... >Being able to set it would mean that you might not care if it was only >32bits, since you set it to 0 to time routines. With a 20 nanosecond >clock, it would only be good for 86 seconds, but that might be enough. The ETA-10 has 48 bits of usable integer (though it uses 64 bits of storage). At 142 MHz, this allows about 11.4 days before rollover, and the machine tends to be rebooted more frequently than this.... -- John D. McCalpin - mccalpin@masig1.ocean.fsu.edu - mccalpin@nu.cs.fsu.edu mccalpin@delocn.udel.edu
grunwald@flute.cs.uiuc.edu (Dirk Grunwald) (08/10/89)
Hi, Another ``how much does this cost'' question. When doing performance monitoring, benchmarking or profiling, you want a high-resolution timer. Some systems have microsecond timers, and those are considered pretty snazzy; I know I was overjoyed when I found one on the Encore. Normal machines, e.g., a Sun, have about 5 millisecond resolution. That's pathetic. How much would it cost to add an additional register that would be incremented each cycle? It doesn't need to flow through the ALU, it would be doing a single count-up. One could conjecture using a mode-flag to say ``yeah, count using this register'' -- if you didn't want to use the counter, you'd have one extra register to play with. I know something like this exists on the Cray X-MP; do other machines have cycle counters as well? Using a register has some advantages; it's a normal part of the processor state, reducing save/restore cost. Also, processes can have a virtual cycle counter, reflecting the cycle counter for that process alone. Being able to set it would mean that you might not care if it was only 32bits, since you set it to 0 to time routines. With a 20 nanosecond clock, it would only be good for 86 seconds, but that might be enough.
ttl@astroatc.UUCP (Tony Laundrie) (08/10/89)
In article <GRUNWALD.89Aug9162836@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes: > >I know something like this exists on the Cray X-MP; do other machines >have cycle counters as well? > The famous Astronautics ZS-1 has a 64-bit register that increments every clock period of 45 nS, making it good for 263 centuries. -----
tjd@foghorn.mpd.tandem.com (Tom Davidson) (08/10/89)
>Not that if makes much difference, but the ETA-10 has several extra registers >to keep track of cycle counts for the vector and scalar units. Actually, for performance analysis, the ETA10 had some rather useful hardware. AS John mentions, some "registers" kept such goodies as a clock counter (in whatever periods the particular cpu was running: 7, 10.5, 19ns etc), vector unit busy. It also had 5 programmable counters which could be set to track such things as . number of in stack branches . number of branches NOT taken . number of times opcode xx was executed and a whole host of other neat things. All this could be accesed from a fortran program. These counters were kept on a per-process basis in a state area called an "invisible package". Performance analysis and code profiling were made a lot easier with this type of hardware feature. I hope h/w architects are doing the same.... Tom Tom Davidson internet: halley!foghorn!tjd@cs.utexas.edu Tandem Computers, Inc. fax: (512) 244-8247 voice: (512) 244-8375 14231 Tandem Boulevard Austin, TX 78728-6610
lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (08/10/89)
In article <559@halley.UUCP> tjd@foghorn.mpd.tandem.com (Tom Davidson) writes: >>Not that if makes much difference, but the ETA-10 has several extra registers >>to keep track of cycle counts for the vector and scalar units. > >AS John mentions, some "registers" kept such goodies as a clock counter (in >whatever periods the particular cpu was running: 7, 10.5, 19ns etc), vector >unit busy. It also had 5 programmable counters which could be set to track >such things as > . number of in stack branches > . number of branches NOT taken > . number of times opcode xx was executed >and a whole host of other neat things. All this could be accesed from a >fortran program. One thing that the ETA lacks is a count of the page table traffic generated by the memory management unit. That's not too surprising, because I don't know of any production machine that has this. But they all should! When a programmer suspects thrashing, the average OS can help by reporting paging rates, task switch counts, interrupt load, ethernet packets, and so on. The OS typically is unable to report on cache traffic or on TLB traffic. To the serious performance tuner, this is a flaw. On rare occasions, it's even a serious flaw. -- Don D.C.Lindsay Carnegie Mellon School of Computer Science
seanf@sco.COM (Sean Fagan) (08/11/89)
In article <MCCALPIN.89Aug9153545@masig3.ocean.fsu.edu> mccalpin@masig3.ocean.fsu.edu (John D. McCalpin) writes: >In the above-referenced message, Dirk Grunwald asks about cycle counters: >>I know something like this exists on the Cray X-MP; do other machines >>have cycle counters as well? >Not that it makes much difference, but the ETA-10 has several extra >registers to keep track of cycle counts for the vector and scalar units >separately. I've been told that the Elxsi has a clock register available, with 25ns resolution. Since the cycle time is 25 ns, this makes it possible to time any instruction (mov <clock>, r1; instruction; sub <clock>, r1; or whatever the syntax would be). Since it's also 64 bits, I believe its epoch is something like 14000 years from now... -- Sean Eric Fagan | "Uhm, excuse me..." seanf@sco.UUCP | -- James T. Kirk (William Shatner), ST V: TFF (408) 458-1422 | Any opinions expressed are my own, not my employers'.
ram@shukra.Sun.COM (Renu Raman) (08/13/89)
In article <GRUNWALD.89Aug9162836@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes: > >Hi, > >Another ``how much does this cost'' question. > >When doing performance monitoring, benchmarking or profiling, you want >a high-resolution timer. Some systems have microsecond timers, and >those are considered pretty snazzy; I know I was overjoyed when I >found one on the Encore. Normal machines, e.g., a Sun, have about 5 >millisecond resolution. That's pathetic. Depends on what kind of a "normal" Sun you have. Anything since SPARCstation should have a micro-second timer (only 21 bits tho') - so 2 second is all you have if you want to watch anything. renu raman
dwc@cbnewsh.ATT.COM (Malaclypse the Elder) (08/13/89)
In article <GRUNWALD.89Aug9162836@flute.cs.uiuc.edu>, grunwald@flute.cs.uiuc.edu (Dirk Grunwald) writes: > > When doing performance monitoring, benchmarking or profiling, you want > a high-resolution timer. Some systems have microsecond timers, and > those are considered pretty snazzy; I know I was overjoyed when I > found one on the Encore. Normal machines, e.g., a Sun, have about 5 > millisecond resolution. That's pathetic. > in a paper that we presented in the 88 summer usenix, we describe a high resolution timing and tracing package for unix system v (called casper) that takes advantage of the fact that most systems now use programmable interval timers to generate their clock interrupts. these interval timers are usually loaded with an initial value, count down at a rate that is determined by an external clock signal, and generate the clock interrupt when it hits zero. they then reload their initial value and start over again. the nice thing about these things is that they are usually driven at a fairly hit rate. using these interval timers, our package is able to deliver 10 microsecond resolution on the at&t's 3b2 computers and 1 microsecond resolution on the at&t 6386s. not too shabby and cheap too. and yes, when we looked at the suns, we found that they used some hardwired interrupt generator so we could only get clock interrupt resolutions (10 milliseconds). the other nice thing about doing things this way is that since there is usually a kernel variable keeping count of the number of clock interrupts since boot, we can combine the value of the interrupt counter with the value in the countdown timer and not worry about wrap-around. there are problems introduced by the fact that looking at the kernel variable and the countdown timer is not an atomic operation but i refer interested parties to the paper for details. danny chen att!hocus!dwc
melvin@vangogh.Berkeley.EDU (Steve Melvin) (08/15/89)
In article <GRUNWALD.89Aug9162836@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes: > >Hi, > >Another ``how much does this cost'' question. > >When doing performance monitoring, benchmarking or profiling, you want >a high-resolution timer. Some systems have microsecond timers, and >those are considered pretty snazzy; I know I was overjoyed when I >found one on the Encore. Normal machines, e.g., a Sun, have about 5 >millisecond resolution. That's pathetic. First, some clarification. All Sun 3's and Sun 4's lack a high resolution timer. Sun 2's had them and SPARCstations have them. In the Sun 3's and Sun 4's there is a 10ms hardware interrupt which in SunOS is ignored every other time to generate a 20ms interrupt for use by the operating system. Fortunately, however, Sun put sockets in these machines for data encryption chips (the whole encryption chip story is interesting in itself). Another grad student here at Berkeley (Peter Danzig) and I have designed a small board which plugs into these machines and allows a timer chip (being clocked at 4Mhz) to look like the encryption chip. Then, with an appropriate device driver installed, the software has access to high resolution time measurements just as though the feature was built in. Apparently, Sun at one time had intended to sell a data encryption option for these machines. The encryption chip provided for was the AMD Am9518 (which implements the official data encryption standard (DES)). In the 3/50 and 3/60, all that was needed was to plug the chip in (and move a jumper in the case of the 3/50) but in later models chips needed to drive the 9518 (one or two PALs and a buffer) were not supplied on the motherboard. The idea was apparently dropped and as far as we know the DES option has never been made available. It probably had a lot to do with the Feds (the DES chip is supposedly not allowed to be exported from the US). Having the socket enabled by a PAL may have had something to do with controlling the use of the DES chip. Anyway, I think the answer to your question is that these kinds of things cost very little in hardware and don't slow anything down, but since they don't tangibly affect the bottom line performance, they are often ignored by hardware designers. ------- Steve Melvin ...!ucbvax!melvin melvin@polaris.Berkeley.EDU -------
jmk@alice.UUCP (Jim McKie) (08/17/89)
The Crisp CPU has the following register: Timer The timer is a 28 bit internal register which can be incremented every cpu clock cycle or at the completion of every instruction. The timer can also, optionally, interrupt the cpu when the count overflows. When read, the least significant bit of the timer appears on bit 4 of the resultant data and the low-order four bits are always zero. The timer is both readable and writeable, and the counting function is controlled by the low three bits, which are write only. These timer register bits are used to configure the timer: - bit 0. When clear, the timer counts cycles. When set, the timer counts completed instructions (folded branches do not count). - bit 1. When clear, the timer is on all the time (with reference to bit 0). When set, the timer only counts when the PSW indicates User execution level. - bit 2. When set, the timer will generate a timeout exception, not an interrupt, when it overflows (goes from 0 to non-zero). It has precedence over all exceptions except zero divide. Interrupts have precedence over time outs. A similar register was added to the locally-developed 68020-based grey-scale bitmap terminal and has been used as the basis for a debugger. Jim McKie research!jmk -or- jmk@research.att.com
cbcscmrs@csun.edu (08/18/89)
In article <121192@sun.Eng.Sun.COM> ram@sun.UUCP (Renu Raman) writes: >In article <GRUNWALD.89Aug9162836@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes: >>When doing performance monitoring, benchmarking or profiling, you want >>a high-resolution timer. Some systems have microsecond timers, and >>those are considered pretty snazzy; I know I was overjoyed when I >>found one on the Encore. Normal machines, e.g., a Sun, have about 5 >>millisecond resolution. That's pathetic. > > Depends on what kind of a "normal" Sun you have. Anything since > SPARCstation should have a micro-second timer (only 21 bits tho') - so > 2 second is all you have if you want to watch anything. I like nanosecond timers, built into the instruction set! You can tell how far the head on the disk moved if you hit a page fault! The elxsi has a 25 nanosecond resolution process timer (to measure CPU time) and a CPU wide real time clock that also has 25 nanosecond resolution. Oh, unlike the 21 bits SPARCstation timer, on the elxsi you have to wait a little longer if you want to see the counter overflow. About 7,311 years and 284 days. Yes, that is 63 bits. :-) I think they are signed, and no, I don't know why... (Does it really matter at that point?) Syncing the thing up to a chimmer takes on a new meaning... :-) (NTP time servers...)