[comp.arch] compiling bitblt and instruction cache

forsyth@minster.york.ac.uk (06/29/90)

during a discussion of self-modifying code and compiling bitblt,
i described the technique we use to avoid trouble with the instruction cache
on machines like the 68020.  this provoked a response from
gillies@m.cs.uiuc.edu:

> Horrors, your code is not only machine-dependent, it's cache
> dependent.  Who volunteers to port your code?  Some naive grad
> student?

	Lord FINCHLEY tried to mend the Electric Light
	Himself. It struck him dead: And serve him right!
	It is the business of the wealthy man
	To give employment to the artisan.
		(H Belloc)

i think that those of our research students who actually write code,
armed with a copy of Locanthi's article,
are competent to work out this small problem.
indeed, it was a research student who wrote the implementation
in the first place!

i have several serious comments.  first, there is nothing to stop us
using a portable version of bitblt on a new architecture until
we can adapt the more machine-dependent version.  now, bear in mind its size:

$ wc -l *.c
    313 mem_rop.c
    193 templates.c
    506 total

most of the 313 lines does clipping, case analysis, and copying
instructions from template sets into the code buffer;
only a few lines deal with the cache (or the 68000).

by comparison, just the SunOS 3.2 `.h' files are:

$ wc -l mem_rop*.h
    260 mem_rop_impl_ops.h
    106 mem_rop_impl_util.h
    366 total

(ours is currently monochrome only, but it happens that all
parts of the Sun implementation mentioned here do not include
things specific to colour machines.  Sun's code for mem_rop
itself is much larger, of course, owing in part to their need
to handle 1 to n-bit and n-bit to 1 pixel conversions.
nothing here is meant as criticism of their code.)

Sun's code (i gather) originally contained `portable' constructions such as
	register short x;
	do { ... } while (--x != -1);
to cause 68k dbra's to be generated.  with the addition of sparc & 386i
they have apparently removed this sort of thing from the body of
the code and put it into #defines selected by machine type (eg, in pr_impl_util.h):

/* loop macros */
PR_LOOPVP(var, op)
PR_LOOPV(var, op)
PR_LOOPP(count, op)
PR_LOOP(count, op)

these definitions and others include embedded calls to #defines
_STMT, IFLINT, PTR_ADD, PTR_INCR, LOOP_DECR, and so on.
this is a reasonable approach, but however compact or adaptable this makes the C code,
the result is a kind of private language (less clear than Bourne's
algol68 for his shell) which must be understood precisely if one
is to work out whether the code will indeed port to a new architecture.
while it is true that the `register short' code will execute (suboptimally)
on other machines, one must still know the trick to realise
why the code is so oddly written.
in other words, in either case (compiling or `portable') there is
in practice a modest amount of work involved in understanding
the code well enough to check that it will work correctly on a new machine.

but if it is portable C (perhaps configured by #define/#ifdef) will it not
work immediately on a new machine, without further examination?
possibly not.  there are many potential pitfalls:
the frame buffer and memory might have different layouts (eg, byte orders,
interleaving); registers and memory might use different bit orders; and so on.
the C code as written might not yet be adaptable enough to cope
with some bizarre new (no doubt patented) invention of the hardware developer.


as to the cache dependence: of course!  on the other hand, by not
using the special SunOS trap, our code is actually more portable:
it will work on any 68020, not just Suns!  i shall have to recheck
it for later 68k machines, and it is possible the current scheme will not work;
we do not change machine architectures often enough, much as Sun
would like us to, for that checking to be onerous.  much more of my time
is spent working out what tinkering has been done by the operating
system suppliers between releases (working on bitblt is more interesting,
i assure you).

the main purpose of our Suns is to provide a good, responsive,
b&w bitmapped graphics environment (on 68020s with 4 Mbytes!).
bitblt is therefore a critical primitive, and it is worthwhile
squeezing as much performance as we can out of it.  yet for maintenance
we know from experience that the C+cpp+asm approach is more readily
understood than a `pure' assembly language version.
to judge from the speed of one of our colour SS1s compared to our colour 3/60,
the SPARC might benefit from compiling bitblt too... but then,
workstation suppliers want to sell whichever flavour of graphics hardware assist
they were supporting last week!

markha@microsoft.UUCP (Mark HAHN) (07/02/90)

there are plenty of applications where it would
be great to generate code on the fly.  the only
thing that really distinguishes bitblt is that users 
can easily see when it's too slow.  but what about
all the other tidy abstractions like hash tables?

the Synthesis Kernel seems to be a variant of this cool idea,
though it apparently only does it for system-level stuff.
they seem to think of this kind of 'customization' is something
that only an oracle-like metaclass would want to do.  their example
is a filesystem that provides thunks for your particular FILE*.

any of the code-is-data languages should have it easy,
since they can just provide a compiler in the runtime.

complaining that it's unportable is pretty weak, 
since all performace tuning (rather than redesign) is unportable.
what's needed is some adequately expressive portable language,
which should also probably be terse and easy to optimize.
the portability necessary couldn't be achieved in the past,
but most major architectures today are just minor variations.
(exercise: name a machine (please DON'T post it!) 
that has other than 8, 16, 32 bit data types on natural boundaries, 
two's complement math with 32bit addresses and IEEE fp.  
do not consider historic blemishes such as mainframes or the 286.)

besides compiling the language, the OS needs to provide fixups.
doesn't seem much to ask, does it?
 
then, of course, you've got to trust the builtin compiler, 
and it won't help you if you need/prefer the local weirdnesses...

regards,
-- 
Mark Hahn         microsoft!markha@uunet.uu.net         uunet!microsoft!markha
YES, Bill Gates IS my personal savior, and I CHANNEL for him in CLEAR WEATHER.