[net.unix-wizards] Optimize my kernel?

olmstead (02/22/83)

I recently noticed that the compiler never gets a -O flag when
compiling our kernel (well, actually, it does when it compiles vers.c,
but that hardly counts).  Before I go ahead and turn it on, can anyone
tell me whether that's a bad idea?  I've seen examples of the
optimizer's generating bad code, but I think that was all
floating-point stuff.  Will it break our UN*X?  Will it cause all those
funky sed scripts to fail?

[I ask this last one because I once added some code to call spl7() from
a function that declared a register variable and initialized it to
TS_OK (= 0).  The compiler cleverly remembered that it had a zero handy
in the register and called spl7 via "calls reg, _spl7" or some such.
The sed script later tried to change all calls to splX into inline
instructions; it KNEW, of course, that these appeared
as "calls, $ 0, _splX" (the space is there so somebody's news
doesn't eat my dollar-zero).  The end result was that the loader told me
that spl7 was undefined.]

So, to optimize or not to optimize: that is the question.
				TIA,
				Patrick Olmstead

				...ucbvax!menlo70!sytek!olmstead
				...decvax!sytek!olmstead
				Olmstead.PA@PARC-MAXC.ARPA

ken (02/24/83)

It is alright to optimize the kernel, but it is not recommended to
invoke the C optimizer on the kernel code.

The kernel is much different than other programs because it accesses
the devices sitting out there on the bus which are not necessarily
memory.  The optimizer assumes that every location that it accesses
behaves according to a model that resembles generic memory.  Many of
the control, status, command, and data registers of devices such as
UARTS, disk controllers, graphics boards, and the like DO NOT behave
like memory.  Some registers are not read/write; that is, they may be
writeable, but may return garbage when read.  Likewise, they may be
status-type bits, which can be read but not written.  One of the worst
hardware designs (found quite a bit in practice) is to have the same
memory address refer to two totally different registers when reading as
opposed to writing.  There may be a memory susbystem out there on the
bus which responds to every byte address in a certain range, but which
cannot be accessed in word quantities.  Some types of registers modify
the contents of other registers when they are read; e.g. reading the
data register of a UART usually clears the FULL bit in the status
register.

What difference does this make, you say? Well the optimizer will do
things like not even bother to read a location it has just written,
even if the C code does; it already [thinks] it knows what is contained
in that location because it just wrote it!  Some (probably most)
computers have a special instruction that performs the CLEAR function.
At least one UNIX-supporting minicomputer that I know of accomplishes
this by subtracting the location from itself, generating a
read-modify-write cycle instead of just a write cycle.  This can wreak
havoc with registers that have side effects (such as the UART data
register).  It is better to write the constant 0 than invoke the CLEAR
instruction.

I discovered these problems by writing diagnostic code for imaging &
graphics peripherals.  Bad hardware tested good when I used the
optimizer, and tested bad when I didn't.

It would probably be all right to sic the optimizer on portions of the
kernel that do not deal with devices, but unless you are intimately
familiar with the hardware of a particular device, it is recommended
NOT TO OPTIMIZE ANY DEVICE DRIVER.

			Ken Turkowski
		{ucbvax,decvax}!decwrl!turtlevax!ken

davec (02/26/83)

It has just been pointed out that only relatively recent versions of the
4.1bsd tape from Berkeley have an optimized kernel (say within the last
year, since that's about the age of our system, which has always been
running an optimized kernel.)

dgc