[comp.sys.mac.programmer] Code generation in LSC

mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) (01/25/89)

In article <11494@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes:
>
>Some time ago, I ran some C benchmarks on a Mac II & Sun III.  Both
>machines have 68020's, the SUN ran at 20 MHz, the Mac II at
>15.something.  LSC 2.15 vs the standard Sun "cc".  Optimization flags
>were used on the Sun.  The Mac tended to have run times that
>were very close to those of the Sun, with some run times actually
>faster than the Sun.  Compilation speed is also tremendously greater.

Hmm. This is surprising.  What kind of benchmarks did you use?
How did you time them? 

>
>Compilation speed of Turbo C for a 7.16 MHz XT clone is more than
>three times faster than a VAX.

A VAX isn't everybody's idea of "Speedy Gonzales" :) but the point
is true.

>  "Even on microcomputers" is incorrect.
>The compilers for PCs and Macs are MUCH better than for larger
>machines.  It even makes sense.  There is more money in it.

In terms of compile time and overall convenience, undoubtedly.
Code generation?  No.

Just looking at some of my programs with MacsBug, I can see
many _obvious_ inefficiencies that a even a peephole optimizer could remove:

	MOV.L -(SP), DO
	MOV	DO, D6
or reloading the same expression which was already in a register.

LSC doesn't do other kinds of optimizations such as turning

for(i=0; i<=num; i++)
	d += a[i];

into

register int	*p;
for(p=a; p< a+num; p++)  /* a+num should _not_ be computed in each iteration */
	d += *p;	 /* of the loop! */	 


For these trivial examples, it's no big deal, but in complicated
computationally-intensive programs, all of these types of optimizations
combine and can be very significant.

Note that I generally write scientific programs with lots of loops, &
arrays and such in which good optimization can make a big difference.

I suspect that many Mac applications are of the type

	Sys_Call();
	twaddle structures()
	User_function()
	go back to beginning

and so these kind of "global" optimizations aren't so important.
Intelligent register usage should always be a win, though!

>
>Would gcc be better than LSC?  Well, gcc would never produce code as
>quickly, gdb would never be as nice as LSC 3.0, you'd need "make",
>etc.  The code *might* be as fast as LSC's code.

If it were _only_ as fast at LSC, I'd think it were broken! :)

Seriously, I've looked at the output from good optimizing compilers,
and in general, they do a better job than I (admittedly a non-expert
assembly programmer) could do without a very large amount of effort.
(This is on "normal" programs: not freaky things like device drivers
and interrupt handlers)

Looking at the output of the MIPSco C compiler, I am completely 
astounded as to the transformational magic that it manages to perform
on my programs.  (Then again, BASIC would scream on a MIPS)

>  It probably wouldn't
>be as fast, since one would probably make int's 32 bits.
>

Quite true.  But isn't this true only for a 68000, i.e. shouldn't
a Mac II be as fast w/ 32 bit ints as 16?  (I think that some minis
are _slower_ accessing 16 bits rather than 32!)

NOTE: My direct experience has _ONLY_ been with LSC 2.15.  If things
have changed significantly, please correct me!  Thanks.

>Stephen.

Matt Kennel
mbkennel@phoenix.princeton.edu
"Assume a spherical cow."

suitti@haddock.ima.isc.com (Stephen Uitti) (01/26/89)

In article <5801@phoenix.Princeton.EDU> mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) writes:
>In article <11494@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes:
>>
>>Some time ago, I ran some C benchmarks on a Mac II & Sun III.
>>...The Mac tended to have run times that
>>were very close to those of the Sun, with some run times actually
>>faster than the Sun.
>
>Hmm. This is surprising.  What kind of benchmarks did you use?
>How did you time them? 
	The benchmarks were non-floating point.  The sieve, for
example, was faster on the Mac.  Embedded time calls were used.  Wall
clock timing, etc., was used to make sure things were being reported
at least approximately correctly.  All runs were greater than 30
seconds.

>>  "Even on microcomputers" is incorrect.
>>The compilers for PCs and Macs are MUCH better than for larger
>>machines.  It even makes sense.  There is more money in it.
>
>In terms of compile time and overall convenience, undoubtedly.
>Code generation?  No.
	PCC based compilers are still far more common than GNU C for
UNIX.  MSC (for the PC) claims to do all sorts of interesting things.
Global registers for the 8086 are probably a loss, since there aren't
lots of registers...  I find that Turbo C produces code that is
smaller and about the same speed (see below) as MSC, on the same
machine.  It produces code at least three times faster.
	Of course, "microcomputer" compilers are also closer to
supporting ANSI C (prototypes, etc.).  I say "microcomputer", but my
Mac II is at least 2.5 times faster than a 780 (though an SE will beat
a Mac II in a foot race down the hall).  I really mean "personal" or
"home" computer.

>Just looking at some of my programs with MacsBug, I can see
>many _obvious_ inefficiencies that a even a peephole optimizer could remove:
>
>	MOV.L -(SP), DO
>	MOV	DO, D6
>or reloading the same expression which was already in a register.

Can you see labels with MacsBug?  Could it have been:
	MOV.L -(SP), DO
foo:	MOV	DO, D6

Anyway, I've seen this type of thing fall through (even without
intervening labels) with PCC based compilers optimizers.  There are
sequences where, in the above example, the compiler really did want
the value in both registers...  Anyway, LSC is not worse than large
systems compilers here.

>LSC doesn't do other kinds of optimizations such as turning
>
>for(i=0; i<=num; i++)
>	d += a[i];
>
>into
>
>register int	*p;
>for(p=a; p< a+num; p++)  /* a+num should _not_ be computed in each iteration */
>	d += *p;	 /* of the loop! */	 

	Some compilers will figure out what to stuff into registers
and do loop invariant type stuff (MSC for the PC).  My code specifies
the use of registers, in order of preference, and tends not to have
loop invariants.  Other optimizations are also wasted.  Compilers
which attempt to do this for me tend to be broken (MSC for the PC,
compilers for Cyber 205, IBM RT, etc), meaning that turning off the
optimizer tends to allow my code to work.
	Another odd thing that has come up is that when I have written
code that explicitly removes loop invariants, the optimizers of some
of the "smarter" compilers tend to slow my code down.  It seems that
they allocate additional variables to point into the arrays, and even
update the redundant copies.

>For these trivial examples, it's no big deal, but in complicated
>computationally-intensive programs, all of these types of optimizations
>combine and can be very significant.
	For large programs, using a profiler will allow you to
concentrate on the correct portion of the program.  This will allow
you to use a quick compiler to outperform an optimizing compiler,
generally speaking.

>Note that I generally write scientific programs with lots of loops, &
>arrays and such in which good optimization can make a big difference.
	Floating point?

>I suspect that many Mac applications are of the type
>...
>and so these kind of "global" optimizations aren't so important.
>Intelligent register usage should always be a win, though!

	Mac applications mostly spend their time waiting for the next
event (polling).  However, when they do something, they often want to
do it quickly, as they do not want to appear sluggish.

>>Would gcc be better than LSC?  Well, gcc would never produce code as
>>quickly, gdb would never be as nice as LSC 3.0, you'd need "make",
>>etc.  The code *might* be as fast as LSC's code.
>
>If it were _only_ as fast at LSC, I'd think it were broken! :)
	The main reason it wouldn't be is that a typical LSC
edit/compile/run/debug loop will be an order of magnitude better.
LSC has a (simple) profiler.

>Seriously, I've looked at the output from good optimizing compilers,
>and in general, they do a better job than I (admittedly a non-expert
>assembly programmer) could do without a very large amount of effort.
	I've yet to see an optimizing compiler that did as well as I
would do.  When doing assembler, I count bytes and cycles.  I optimize
register usage.  One gets used to it.  The painful part of assembly
is when you've got code that works & have discovered a neat new way
of doing the problem...  It is often hard enough to get yourself to
redo it when using a higher level language.

>Looking at the output of the MIPSco C compiler, I am completely 
>astounded as to the transformational magic that it manages to perform
>on my programs.  (Then again, BASIC would scream on a MIPS)
	When I look at what PCC does to my code, then the optimizer
undoes, I'm amazed.  Still, optimized code (for me) is easier to
debug... if I'm forced to debug in assembly.

>>  It probably wouldn't
>>be as fast, since one would probably make int's 32 bits.

>Quite true.  But isn't this true only for a 68000, i.e. shouldn't
>a Mac II be as fast w/ 32 bit ints as 16?  (I think that some minis
>are _slower_ accessing 16 bits rather than 32!)
	No.  For whatever reason, 16 bits are considerably quicker.

>NOTE: My direct experience has _ONLY_ been with LSC 2.15.  If things
>have changed significantly, please correct me!  Thanks.

	LSC 3.0 (etc.) supports floating point on the 68881.  When
used, this is a win.  3.0 has a serious debugger.  It is really
awesome when your system has more than one screen (like mine does).

	Summary: I'd prefer a compiler that spits out reasonable code
infinitely fast (such as LSC) to one that spends all day working on
it.  LSC is not as bad as you think.  "Mainframe" optimizing compilers
are not as good as you think.  I believe that the LSC compiler can be
improved, but that they are going about it in the correct manner.

>Matt Kennel
>mbkennel@phoenix.princeton.edu
>"Assume a spherical cow."

	Stephen.
"Everything in moderation.  Including moderation."

mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) (01/28/89)

In article <11536@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes:
}
}	PCC based compilers are still far more common than GNU C for
}UNIX.

Alas, this is true.

}  MSC (for the PC) claims to do all sorts of interesting things.
}Global registers for the 8086 are probably a loss, since there aren't
}lots of registers...  I find that Turbo C produces code that is
}smaller and about the same speed (see below) as MSC, on the same
}machine.  It produces code at least three times faster.

Yes, I forgot to mention Microsoft C---it appears to perform traditional
"global" optimizations.  Does Turbo C do all that too, and is still 3 times
faster?  If so, I am quite impressed.

This, is an argument for my side, though.  If Turbo C can do it, and if
Microsoft (gasp shudder!) can do it, why not LSC?  Macs are better than
PC's anyway! :)

}	Of course, "microcomputer" compilers are also closer to
}supporting ANSI C (prototypes, etc.).  I say "microcomputer", but my
}Mac II is at least 2.5 times faster than a 780.

Certainly!  I don't consider a 780 with PCC to be anywhere near
"state of the art".  I'd much rather use a Mac.

}
}}Just looking at some of my programs with MacsBug, I can see
}}many _obvious_ inefficiencies that a even a peephole optimizer could remove:
}}
}}	MOV.L -(SP), DO
}}	MOV	DO, D6
}}or reloading the same expression which was already in a register.
}
}Can you see labels with MacsBug?  Could it have been:
}	MOV.L -(SP), DO
}foo:	MOV	DO, D6

In some cases, there was a reason for this.  I looked for things like this,
and in 90% of the cases it was sheer uselessness.  I suspect that LSC is being
very conservative in its code generation, making sure that the code ends up
being correct---which, of course, is the first priority.  But this is already
a solved problem, and it is eminently possible to do much better, and still
be right.

}
}Anyway, I've seen this type of thing fall through (even without
}intervening labels) with PCC based compilers optimizers.  There are
}sequences where, in the above example, the compiler really did want
}the value in both registers...  Anyway, LSC is not worse than large
}systems compilers here.

It's not worse than PCC, you mean, which is a 10 year old compiler.

}
}}LSC doesn't do other kinds of optimizations such as turning
}}
}}for(i=0; i<=num; i++)
}}	d += a[i];
}}
}}into
}}
}}register int	*p;
}}for(p=a; p< a+num; p++)  /* a+num should _not_ be computed in each iteration */
}}	d += *p;	 /* of the loop! */	 
}
}	Some compilers will figure out what to stuff into registers
}and do loop invariant type stuff (MSC for the PC).  My code specifies
}the use of registers, in order of preference, and tends not to have
}loop invariants.  Other optimizations are also wasted.

It is my opinion that the compiler should figure out what variables to
place in registers most of the time.  For example, if I specify a variable
as "register", frequently it ought to be kept in a register for only
part of some function.  If it is required to stay in a register for the life
of a function, then in other places it takes up a register that a different
variable could be put in for beneficial effect.  Something like this
can't be specified with standard C semantics---only a smart optimizer could
figure this one out.

}  Compilers
}which attempt to do this for me tend to be broken (MSC for the PC,
}compilers for Cyber 205, IBM RT, etc), meaning that turning off the
}optimizer tends to allow my code to work.

This is unfortunate.  I've occasionally had ephemeral optimizer problems
(in fortran, though) and so I end up having to compile some particular
function without the optimizer.  But they do work 99% of the time, and they
seem to be useful, in the market's opinion.

}	Another odd thing that has come up is that when I have written
}code that explicitly removes loop invariants, the optimizers of some
}of the "smarter" compilers tend to slow my code down.  It seems that
}they allocate additional variables to point into the arrays, and even
}update the redundant copies.

I wonder if this is the infamous "aliasing" problem again?  

As I state below, the higher-level a program is written in, in general, the
better job the optimizers do.  This is good both from human and computer
points of view:  1) Readable code 2) Fast code.  Isn't this the best way?

}
}}For these trivial examples, it's no big deal, but in complicated
}}computationally-intensive programs, all of these types of optimizations
}}combine and can be very significant.
}	For large programs, using a profiler will allow you to
}concentrate on the correct portion of the program.  This will allow
}you to use a quick compiler to outperform an optimizing compiler,
}generally speaking.

In all cases, the optimizer should be optional.  Then, you get the best
of both worlds, fast compilation during debugging and development, and
fast run times for final runs.

(I have a particular program I've been working on in mind here) I know
exactly where my program is spending its time---a typical inner core
of nested loops, in general.  However, I want to write in the notation
that most clearly reflects the problem at hand.  If I want to step through
arrays, that's what I`ll write.  I think it's the compiler's job to optimize
this into whatever idiom happens to be fastest on the particular machine
I'm on: the fastest code for a VAX might be different than for a PC or 
a Mac or a CRAY.  In general, the "higher level" the program is written at
the better the optimizer can recognize your expression as something that it
can play with.  (i.e. it's easy to see how to turn a[i] = b[i]+c[i] into
a pointer-based computation or even some god-awful vectorized expression,
 but often not vice-versa).  This is, of course, moot if you will never
port your code.  (But never say never!)  

}
}}}Would gcc be better than LSC?  Well, gcc would never produce code as
}}}quickly, gdb would never be as nice as LSC 3.0, you'd need "make",
}}}etc.  The code *might* be as fast as LSC's code.
}}
}}If it were _only_ as fast at LSC, I'd think it were broken! :)
}	The main reason it wouldn't be is that a typical LSC
}edit/compile/run/debug loop will be an order of magnitude better.
}LSC has a (simple) profiler.

If the compiler makes fast code to begin with I don't have to go through
any edit-cycles at all to optimize it!  Except in unlikely cases, I don't
see how I could profile, edit a routine, optimize it, and debug and
recompile the application in less time than it would take to simply recompile
everything with the optimizer on.  (You must be a pretty hot programmer!)

}
}}Seriously, I've looked at the output from good optimizing compilers,
}}and in general, they do a better job than I (admittedly a non-expert
}}assembly programmer) could do without a very large amount of effort.
}	I've yet to see an optimizing compiler that did as well as I
}would do.

You must be a better assembler than I, and haven't seen many really good
compilers.  Ok, very infrequently, I can find something that I could do better,
but, it's nowhere near as bad as LSC, which has glaring problems.

}  When doing assembler, I count bytes and cycles.  I optimize
}register usage.  One gets used to it.

Perhaps, but I'd like an optimizing compiler to do 99.44% (ok 90%) of the work
for me automatically.  One doesn't have time to seriously optimize everything
that one could.

}  The painful part of assembly
}is when you've got code that works & have discovered a neat new way
}of doing the problem...  It is often hard enough to get yourself to
}redo it when using a higher level language.

So, do you agree with me? 

}
}}Looking at the output of the MIPSco C compiler, I am completely 
}}astounded as to the transformational magic that it manages to perform
}}on my programs.  (Then again, BASIC would scream on a MIPS)

}	When I look at what PCC does to my code, then the optimizer
}undoes, I'm amazed.  Still, optimized code (for me) is easier to
}debug... if I'm forced to debug in assembly.

The obvious conclusion: PCC != MIPS, by a large margin.  I agree totally
with your assesment of PCC.

}
}}}  It probably wouldn't
}}}be as fast, since one would probably make int's 32 bits.
}
}}Quite true.  But isn't this true only for a 68000, i.e. shouldn't
}}a Mac II be as fast w/ 32 bit ints as 16?  (I think that some minis
}}are _slower_ accessing 16 bits rather than 32!)
}	No.  For whatever reason, 16 bits are considerably quicker.
}
}}NOTE: My direct experience has _ONLY_ been with LSC 2.15.  If things
}}have changed significantly, please correct me!  Thanks.
}
}	LSC 3.0 (etc.) supports floating point on the 68881.  When
}used, this is a win.  3.0 has a serious debugger.  It is really
}awesome when your system has more than one screen (like mine does).
}
}	Summary: I'd prefer a compiler that spits out reasonable code
}infinitely fast (such as LSC) to one that spends all day working on
}it.

True, but what I want is LSC with much better _optional_ optimization.
(put in in a different segment to save memory when not needed).

Also, I don't see any workaround for a bad register allocator.

}  LSC is not as bad as you think.  "Mainframe" optimizing compilers
}are not as good as you think.

Obviously, LSC is better than the competition.  (I bought it!)
But for people doing some kinds of system development, it could be better:
everything I've asked for has been done before.


}  I believe that the LSC compiler can be
}improved, but that they are going about it in the correct manner.
}

True, in part:  Their code is always correct (I believe) which is great.
I can see why they would put their energies into a debugger before writing
a whiz-bang global-optimizer.  However, it is my opinion that their lousy
register allocator/whatever should have been improved beyond its present
state a long long time ago (the ones you see in standard textbooks must
be alot better), and thus I was disappointed that THINK apparently
didn't put any substantial efforts in improving the code generator for V3.

It would also have been nice if they didn't have to spend their time trying
to keep up with Apple's system version n+1.

Ok, in sum:  Better code by 4.0, and soon, please.  They did something for
pascal, now, why not C?  Of course... I want it all. :-)

}}Matt Kennel
}
}	Stephen.
}"Everything in moderation.  Including moderation."

Matt K.
mbkennel@phoenix.princeton.edu