[comp.sys.ibm.pc.misc] Timing the CPU and bus size

torkil@Pacesetter.COM (Torkil Hammer) (09/21/90)

Everything can be timed, including CPU speed, memory speed and prefetch
queue.

The trick is to do incremental measurements.  Suppose you have written
the routine that measures the system timer's low bits (in Turbo C it is
outportb(0x43,0x00); low byte = inportb(0x40); high byte = inportb(0x40); -
but you have to write it in assembler to really know what you get).
Each bit increment corresponds to about 1 microsecond (actually 1/1.19318)
but don't be surprised if you never see any odd bit count.  It counts
down, by the way.

Now you put 2 of these back to back.  The difference in readings is the
time it took to read the clock, roughly proportional to the CPU speed.
On a 16 MHz machine, that time should be about 16 clicks or 13 usec,
depending on the actual assembler code.  Of course, you should first
wait until the clock() just incremented, then do the 2 readings so you
don't get interrupted in the middle.  Actually, do several measurements
and screen out bad ones.

In between you now sandwich something like [mov cx,100; rep nop]
and the difference from before is what these instructions took.  Now
replace 100 with 200, and the incremental difference is the exact time
it takes to do 100 prefetched nops on location, a number that only depends
on the CPU speed (such as 5 CPU clock cycles per rep nop).  Replace 100 with
10100 for greater accuracy.

Also, replace rep nop (5 cycles) with rep mov bx,cx (4 cycles) for an
independent check.

Rotates of varying count is another way to get an incremental time that
depends only on CPU speed. (1 incremental CPU clock per incremental rotate
per repeat, but that only works on 80188 and up, though you can write
a loop that switches cx between loop count and rotate count for the 8088)

Once you have the exact CPU speed, you get the memory wait cycles from
switching between a register operation (mov bx,ax) and a memory operation
(mov [si],bx) inside the rep and measure the incremental time.
You have to look up the cycles required for each operation.

You should get the prefetch queue size from doing a backwards loop and
expanding the scope of it with nop's.  Once you fall over the edge, the
time should go up by more than the cpu cycles in the nop's. (I haven't tried)

After this you can play with 8, 16 and 32 bit instructions to tell the bus
time and then the size of the bus.

Enjoy

Torkil Hammer

james@bigtex.cactus.org (James Van Artsdalen) (09/25/90)

In <1990Sep20.231706.27009@Pacesetter.COM>, torkil@Pacesetter.COM wrote:

> Everything can be timed, including CPU speed, memory speed and prefetch
> queue.

Well, as a matter of theory, yes, but I haven't seen it done in
practice yet.  The Dell 325D, for example, has a cache, that has a
write buffer, page mode DRAM, which is interleaved, and refreshed.
All of these are hard to account for.  A 486 is a bit harder yet
because you have several write buffers.

Other effects: cache line size, cache line fill order, interleave bit,
page size...

The measurable parameters are: cache hit, cache miss page hit,
cache miss page miss, page size, interleave bit, RAS precharge on
the "other" interleaved SIMM on a page miss, refresh pulse width,
refresh frequency, and a bunch of other things I'm sure - and we
haven't covered any cycle that misses the system board RAM.

> The trick is to do incremental measurements.  Suppose you have written
> the routine that measures the system timer's low bits (in Turbo C it is
> outportb(0x43,0x00); low byte = inportb(0x40); high byte = inportb(0x40); -
> but you have to write it in assembler to really know what you get).
> Each bit increment corresponds to about 1 microsecond (actually 1/1.19318)
> but don't be surprised if you never see any odd bit count.  It counts
> down, by the way.

If you don't see an odd bit count, there is a bug - you're probably
seeing two microseconds per tick or worse.

It is very hard to write a microsecond timer route that in fact works.
Even when you have one, you have to account for external variables
such as refresh.  It is very important that such a routine be
validated, or else you'll get another magazine-quality benchmark that
can't return the same answer twice.

Below is a routine I've derived experimentally.  It returns a 32 bit
value.  It does not work near midnight.  There's also the test
program.  Don't even think of modifying this without running the test
program for a few hours.  There are several different tests: the
desired one is selected by choosing which #elif to enable (yes, I know
it's gross but the file grew that way).  All of the assembly routines
after gettime() are just test helpers.

On my Dell 386/16, this routine is consistent within 1 microtick after
allowing IRQ 0.  You probably won't get quite this good a result,
since the 386/16 is unique in having true zero wait state RAM (no DRAM
in system).  If you use it, you have to allow for IRQ 0 and midnight,
and make sure that the results are repeatable (nontrivial with CGA)

	page	59,130
	.386p

DEBUG	equ	1

PIC0	equ	20h			;8259 Interrupt controller 0
PIC0MASK equ	21h			;8259 Interrupt controller 0 mask
TIMER	equ	40h			;8254 timer counter 0 address
MAGIC	equ	80h			;Magic stone - used for 1us delays
PIC1	equ	0a0h			;8259 Interrupt controller 1
PIC1MASK equ	0a1h			;8259 Interrupt controller 1 mask

WAFORIO macro
	out	MAGIC,al
	endm

A_BLOCK segment use16 at 0a000h
	public	_vga
_vga	equ	$
A_BLOCK ends

B_BLOCK segment use16 at 0b000h
	public	_monochrome, _cga
_monochrome	db	80 * 25 * 2 dup (?)
		org	8000h
_cga		db	80 * 25 * 2 dup (?)
B_BLOCK ends

ROMDAT	segment use16 at 40h
	org	06ch
	public	_bios_timer, _bios_timer_low, _bios_timer_high
	public	_bios_timer_overflow
_bios_timer	equ	$
_bios_timer_low 	 dw	 ?
_bios_timer_high	 dw	 ?
_bios_timer_overflow	 db	 ?
ROMDAT	ends

_TEXT	SEGMENT use16 WORD PUBLIC 'CODE'
_TEXT	ENDS

_DATA	SEGMENT use16 WORD PUBLIC 'DATA'
_DATA	ENDS

CONST	SEGMENT use16 WORD PUBLIC 'CONST'
CONST	ENDS

_BSS	SEGMENT use16 WORD PUBLIC 'BSS'
	extrn	_pending_int:word
	extrn	_first_read:word
	extrn	_second_read:word
	extrn	_first_timer:byte
	extrn	_second_timer:byte
_BSS	ENDS

DGROUP	GROUP	CONST, _BSS, _DATA
	ASSUME	CS: _TEXT, DS: DGROUP, SS: DGROUP

_TEXT	segment

;------------------------------------------------------------------------------
; Strategy:
;
;	Read the 8254 timer/counter.  Read the 17th bit also.  Concatinate
;	with the BIOS timer tick count.  Since the 8254 counts down, but the
;	BIOS counts up, invert all 17 bits from the 8254 so that it counts
;	up too.
;
;	There are two special cases.  The first is when the timer over ticked
;	"recently".  The interrupt to update the BIOS count may or may not
;	have occurred.	So if the timer wrapped recently, check to see if
;	there is a pending interrupt.  If so, the BIOS count was not updated,
;	so update it "manually".  If the 8254 didn't tick "recently", don't
;	update the BIOS counter.
;
;	The other special case is when the 8254 returns exactly 0. It is
;	apparently not possible to determine if the interrupt has or hasn't
;	occurred, or if the 17th bit is correct.  So, the 8254 must be read
;	a second time (we know that this second time will not return 0
;	obviously, and so will not be the same special case again).  The
;	second 8254 read returns a "wrapped" or "correct" value.  If the
;	"wrapped" value has the same BIOS count as the "0" read did, then
;	the "0" read didn't get the right 17th bit.
;
;	A couple of special attributes to this code.  The routine always
;	takes exactly the same amount of time to run.  There is no
;	variability in calling gettime():  it is guaranteed to be constant
;	time.  This is important if a benchmark calls it often.  Second, the
;	same read from the 8254 is returned each time.	The second read
;	decides how to update the first read, but the 8254 value read the
;	second time does not replace the first value (the second will be
;	several counts later than the first.  This limits or eliminates any
;	variability in terms of when within gettime() the 8254 is read:  it
;	is guaranteed that if gettime() is called N times, all N calls will
;	measure the same duration, within two microseconds (subject to
;	external interference).
;
;	When using or modifying this, remember that accuracy here isn't the
;	last word.  Memory refresh, DMA cycles & BIOS timer INT overhead are
;	substantial compared to the accuracy of gettime().  gettime() does
;	not enable interrupts if they were previously disabled, so an
;	application may disable interrupts (and take care of the BIOS timer
;	itself).
;
;	Handle the BIOS midnight timer count reset!!!

	public	_gettime
_gettime proc	near

	pushf
	push	bx

	mov	al,0c2h 		; latch counter value
	cli
	out	43h,al

	WAFORIO
	in	al,TIMER		; OUT pin status
ifdef DEBUG
	mov	_first_timer,al
endif
	add	al,al			; Save OUT in carry flag
	WAFORIO
	in	al,TIMER		; low order timer count
	mov	ah,al
	WAFORIO
	in	al,TIMER		; high order timer count
	xchg	ah,al
	not	ax
ifdef DEBUG
	mov	_first_read,ax
endif
	mov	dx,ax			; Save "raw" count read (inverted)

	cmc				; Put !OUT in AX:15
	rcr	ax,1

	mov	bx,ax			; Save the "official" count

	cmp	dx,-1			; CL == 0 iff the timer is ticking
	sete	cl			;    over right now
	rol	bx,cl			; If ticking over, will need BX
					;    rotated later

	WAFORIO 			; These seem important
	WAFORIO
	WAFORIO
	WAFORIO
	mov	al,0c2h 		; latch counter value
	out	43h,al
	WAFORIO
	in	al,TIMER		; OUT pin status
ifdef DEBUG
	mov	_second_timer,al
endif
	add	al,al			; Save OUT in carry flag
	WAFORIO
	in	al,TIMER		; low order timer count
ifdef DEBUG
	mov	ah,al
endif
	WAFORIO
	in	al,TIMER		; high order timer count
ifdef DEBUG
	xchg	ah,al
	not	ax
	mov	_second_read,ax
endif
	cmc
	rcr	bx,cl			; Put carry flag (second !OUT) into
					;    BX:15.  BX already shifted in
					;    this case.

	shl	cl,4			; 16 (if timer ticked over) or 0

ifdef DEBUG
	mov	byte ptr _pending_int+1,cl
endif

	shld	dx,bx,1 		; If the "raw" value read was exactly
	add	bx,bx
	shl	bx,cl			;    ffffh, then clear BX:[0-14]
	shrd	bx,dx,1

; OK after here

	mov	ax,bx			; If BX == ffffh or BX < 3fffh
	inc	ax			; then if there is a pending interrupt
	and	ax,7fffh		;    add it to what the BIOS count is
	cmp	ax,4000h		;    since BIOS will update its count
	setae	cl			;    as soon as we do the POPF
	shl	cl,4			; 16 (BX in range) or 0

	mov	al,0ah
	out	PIC0,al
	in	al,PIC0
	and	ax,1			; See if IRQ 0 is requesting
	mov	byte ptr _pending_int,al
	shl	ax,cl			; 1 (if BX in range) or 0

	mov	dx,ax
	mov	cx,es
	mov	ax,ROMDAT
	mov	es,ax
	assume	es:ROMDAT
	add	dx,_bios_timer_low
	mov	es,cx
	assume	es:nothing

	mov	ax,bx

	pop	bx
	popf				;Potentially enable interrupts
	ret
_gettime endp

;------------------------------------------------------------------------------

	public	_poke_bytes
_poke_bytes proc near

	push	bp
	mov	bp,sp
	push	di
	push	es

	mov	dx,10[bp]		;Count
	mov	al,12[bp]		;Byte to write

poke_bytes_loop:
	les	di,4[bp]
	mov	cx,8[bp]
    rep stosb
	dec	dx
	jnz	poke_bytes_loop

	pop	es
	pop	di
	pop	bp
	ret
_poke_bytes endp

	public	_poke_words
_poke_words proc near

	push	bp
	mov	bp,sp
	push	di
	push	es

	mov	dx,10[bp]		;Count
	mov	ax,12[bp]		;Byte to write

poke_words_loop:
	les	di,4[bp]
	mov	cx,8[bp]
    rep stosw
	dec	dx
	jnz	poke_words_loop

	pop	es
	pop	di
	pop	bp
	ret
_poke_words endp

	public	_peek_bytes
_peek_bytes proc near

	push	bp
	mov	bp,sp
	push	di
	push	es

	mov	dx,10[bp]		;Count
	mov	al,12[bp]		;Byte to read

peek_bytes_loop:
	les	di,4[bp]
	mov	cx,8[bp]
  repne scasb
	je	peek_bytes_failed
	dec	dx
	jnz	peek_bytes_loop

	mov	ax,0			;No error

peek_bytes_exit:
	pop	es
	pop	di
	pop	bp
	ret

peek_bytes_failed:
	mov	ax,1
	jmp	peek_bytes_exit
_peek_bytes endp

	public	_peek_words
_peek_words proc near

	push	bp
	mov	bp,sp
	push	di
	push	es

	mov	dx,10[bp]		;Count
	mov	ax,12[bp]		;Byte to write

peek_words_loop:
	les	di,4[bp]
	mov	cx,8[bp]
  repne scasw
	je	peek_words_failed
	dec	dx
	jnz	peek_words_loop

	mov	ax,0			;No error

peek_words_exit:
	pop	es
	pop	di
	pop	bp
	ret

peek_words_failed:
	mov	ax,1
	jmp	peek_words_exit
_peek_words endp

	public	_blit_bytes
_blit_bytes proc near

	push	bp
	mov	bp,sp
	push	si
	push	di
	push	ds
	push	es

	mov	dx,10[bp]		;Count
	mov	al,12[bp]		;Byte to write

blit_bytes_loop:
	les	di,4[bp]
	lds	si,4[bp]
	add	si,160
	mov	cx,8[bp]
    rep movsb
	dec	dx
	jnz	blit_bytes_loop

	pop	es
	pop	ds
	pop	di
	pop	si
	pop	bp
	ret
_blit_bytes endp

	public	_blit_words
_blit_words proc near

	push	bp
	mov	bp,sp
	push	si
	push	di
	push	ds
	push	es

	mov	dx,10[bp]		;Count
	mov	al,12[bp]		;Byte to write

blit_words_loop:
	les	di,4[bp]
	lds	si,4[bp]
	add	si,160
	mov	cx,8[bp]
    rep movsw
	dec	dx
	jnz	blit_words_loop

	pop	es
	pop	ds
	pop	di
	pop	si
	pop	bp
	ret
_blit_words endp

	public	_get_irq_mask
_get_irq_mask proc near
	in	al,PIC1MASK
	mov	ah,al
	WAFORIO
	in	al,PIC0MASK
	ret
_get_irq_mask endp

	public	_set_irq_mask
_set_irq_mask proc near
	push	bp
	mov	bp,sp

	mov	ax,4[bp]
	out	PIC0MASK,al
	mov	al,ah
	WAFORIO
	out	PIC1MASK,al

	pop	bp
	ret
_set_irq_mask endp

	public	_get_vga_mode
_get_vga_mode proc near

	push	bp
	mov	ah,0fh
	int	10h
	pop	bp
	mov	ah,0

	ret
_get_vga_mode endp

	public	_set_vga_mode
_set_vga_mode proc near

	push	bp
	mov	bp,sp

	mov	ax,4[bp]		;Get new mode

	pusha
	int	10h
	popa

	pop	bp
	ret
_set_vga_mode endp

	public	_get_cursor
_get_cursor proc near

	push	bp
	mov	ah,3
	int	10h
	pop	bp
	mov	ax,cx

	ret
_get_cursor endp

	public	_set_cursor
_set_cursor proc near

	push	bp
	mov	bp,sp

	mov	cx,4[bp]
	mov	ah,1
	pusha
	int	10h
	popa

	pop	bp
	ret
_set_cursor endp

_TEXT	ends
	end

========================================

#include <stdio.h>

extern volatile unsigned long far bios_timer;
extern volatile unsigned int far bios_timer_low;
extern volatile unsigned int far bios_timer_high;

extern unsigned char far cga[2][80][25];
extern unsigned char far monochrome[2][80][25];
extern unsigned char far vga[256][256];

extern unsigned long gettime(void);
extern void poke_bytes(void far *, int, int);

int pending_int = 0;
char first_timer, second_timer;
int first_read, second_read;

main(argc, argv, envp)
   int argc;
   char **argv;
   char **envp;
{
   char buf[256];
   int n;
   unsigned int e, f, g, h, i, j;
   unsigned long start, end;
   unsigned long a, b, c, d;
   unsigned int irq_mask;

#if 1
	/* Make sure that the timer never returns an unreasonable value.
	 * Do this by sampling it twice in quick succession, and making sure
	 * that the returned value is not too much larger than the previous
	 * value.
	 */

   int a1read, a2read, b1read, b2read;
   int a1timer, a2timer, b1timer, b2timer;
   int aint, bint;

   b = gettime();
   bint = pending_int;
   b1timer = (int) first_timer & 0xff;
   b1read = first_read;
   b2timer = (int) second_timer & 0xff;
   b2read = second_read;

   while (1) {
      a = gettime();
      aint = pending_int;
      a1timer = (int) first_timer & 0xff;
      a1read = first_read;
      a2timer = (int) second_timer & 0xff;
      a2read = second_read;

      if (a - b > 128) {
	 printf("N %8lx, diff %ld\n", gettime(), a - b);
	 printf("a %8lx int %4x, 1read %4x, 1t %2x, 2read %4x, 2t %2x\n",
		b, bint, b1read, b1timer, b2read, b2timer);
	 printf("a %8lx int %4x, 1read %4x, 1t %2x, 2read %4x, 2t %2x\n\n",
		a, aint, a1read, a1timer, a2read, a2timer);
	 b = gettime();
	 bint = pending_int;
	 b1timer = (int) first_timer & 0xff;
	 b1read = first_read;
	 b2timer = (int) second_timer & 0xff;
	 b2read = second_read;
      } else {
	 b = a;
	 bint = aint;
	 b1timer = a1timer;
	 b1read = a1read;
	 b2timer = a2timer;
	 b2read = a2read;
      }
   }
#elif 0
	/* Make sure that gettime() is returning the low order bit clear
	 * some of the time.  Might not happen if the 8254 is read wrong.
	 */

	while (1) {
	   a = gettime();
	   if (!(a & 1)) {
	      printf("a %lx\n", a);
	      exit(1);
	   } /* endif */
	} /* endwhile */
#elif 0
	/* Test CGA screen access time.  This won't give reproducible results
	 * until the test is sync'd with the vertical refresh, and even then
	 * the timer tick needs to be sync'd too.
	 */

	irq_mask = get_irq_mask();
	set_irq_mask(0xfffe);

	/* sync with timer tick */
	for (e = bios_timer_low; e == bios_timer_low; )
	   ;

	start = gettime();

	poke_bytes(cga, 256, 1000);

	end = gettime();

	printf("Start %lx\n", start);
	printf("Stop  %lx\n", end);
	printf("Took %lu microticks.\n", end - start);

	set_irq_mask(irq_mask);

	exit(0);
#elif 0
	/* Another test routine to make sure that gettime() never returns an
	 * unreasonable value.	Print what gettime() got if there is a
	 * problem.
	 */

	while (1) {
	   int a1read, a2read, b1read, b2read;
	   int a1timer, a2timer, b1timer, b2timer;
	   int aint, bint;

	   a = gettime();
	   aint = pending_int;
	   a1timer = (int) first_timer & 0xff;
	   a1read = first_read;
	   a2timer = (int) second_timer & 0xff;
	   a2read = second_read;
	   b = gettime();
	   bint = pending_int;
	   b1timer = (int) first_timer & 0xff;
	   b1read = first_read;
	   b2timer = (int) second_timer & 0xff;
	   b2read = second_read;

	   if (b - a > 600
#if 0
	       || ((a1read == 0xffff && a1timer != a2timer)
		   || (b1read == 0xffff && b1timer != b2timer))
#endif
	       ) {
	      printf("N %8lx, diff %ld\n", gettime(), b - a);
	      printf("a %8lx int %4x, 1read %4x, 1t %2x, 2read %4x, 2t %2x\n",
		     a, aint, a1read, a1timer, a2read, a2timer);
	      printf("a %8lx int %4x, 1read %4x, 1t %2x, 2read %4x, 2t %2x\n\n",
		     b, bint, b1read, b1timer, b2read, b2timer);
			exit(1);
	      }
	}
#elif 0
	/* Get a bunch of results and then print them.	This is a good way
	 * of seeing that the results are reproducible.  It also shows why
	 * you have to account for IRQ 0 timer ticks.
	 */

	while (1) {
	   unsigned long a[100];
	   unsigned long diff;

	   for (n = 0; n < 100; n++)
	      a[n] = gettime();
	   diff = a[1] - a[0];
	   for (n = 1; n < 99; n++) {
	      if (a[n+1] - a[n] > diff + 1)
		 printf("a %lx, b %lx, diff %ld\n", a[n], a[n+1], a[n+1] - a[n]);
	   }
	}
#else
	/* Get some samples.  Only keep a sample if it contains a "tick over"
	 * event, where the 8254 rolled over.  First wait for the high order
	 * part of the 8254 to be FF: that means that a tickover will happen
	 * "soon", and we're likely to capture it.  This loop used so you
	 * can visually see that the gettime() results are monotonic and
	 * regular across tickovers.
	 */

   while (1) {
	unsigned long a[100];

	while ((gettime() & 0xff00) != 0xff00)
		;

	for (n = 99; n >= 0; n--)
		a[n] = gettime();
	if ((a[0] & 0xffff0000) == (a[99] & 0xffff0000))
		continue;
	for (n = 99; n >= 0; n--)
		printf("%lx\n", a[n]);
	printf("\n");
   }
#endif

   return 0;
}
-- 
James R. Van Artsdalen          james@bigtex.cactus.org   "Live Free or Die"
Dell Computer Co    9505 Arboretum Blvd Austin TX 78759         512-338-8789