[comp.sys.amiga.tech] C compilers code generation

andrew@teslab.lab.OZ (Andrew Phillips) (11/08/90)

Over the years I have been intrigued by the code generated by
different C compilers, and have been comparing Lattice C code with
Aztec C.  From the first it always seemed that Lattice performed more
optimizations but that Aztec did better simply because of better code
generation.  Nowadays, they seem to be much closer, producing
reasonable code with simple optimizations - but there is a lot of
room for improvement.

Recently I have been comparing Lattice C 5.04, Aztec C 5.0, DICE 2.02
and PDC 3.34 using several benchmarks.  On disassembling the
innermost loop of the sieve of Eratosthenes I found that the four
compilers had generated the code shown below.

The C code for this loop was:

    register short i, k;
    ...

        for (k = i + i; k <= 8190; k += i)
            flags[k] = 0;

In the assembler code below the first part is the loop initialization
(k = i + i) and the names I and K represent the data registers
corresponding to the variables i and k.  Interestingly Lattice and
Aztec took the same time in the benchmark and generated the same code
for this loop (with all optimizations on).

     LATTICE/AZTEC           DICE                    PDC

     MOVE.W  I,K             MOVE.W  I,D0            EXT.L   I
     ADD.W   I,K             EXT.L   D0              EXT.L   I
                             MOVE,W  K,D1            MOVE.L  I,D0
                             EXT.L   D1              ADD.L   I,D0
                             ADD.L   D0,D1           MOVE.L  D0,K
                             MOVE.W  D1,D3
     BRA.B   IN              BRA.B   IN

LOOP LEA     f(A4),A0   LOOP LEA     f(A4),A0   LOOP CMPI.L  #8190,K
     CLR.B   0(A0,K.W)       ADDA.W  K,A0            BGT.B   OUT
                             MOVE.B  #0,(A0)         LEA     f(A4),A0
     ADD.W   I,K             ADD.W   I,K             ADDA.L  K,A0
IN   CMPI.W  #8190,K    IN   CMPI.W  #8190,K         CLR.B   (A0)
     BLE.B   LOOP            BLE.B   LOOP            ADD.L   I,K
                                                     BRA.B   LOOP
                                                OUT  ...

I calculated the total 68000 clock cycles for the inner loop
(excluding initialization) to be: Lattice 48, Aztec 48, DICE 50 and
PDC 64.  These correspond roughly to the ratios of run times that I
got when timing the whole program.

Even with all optimizations on, both Lattice and Aztec left the first
instruction of the loop inside the loop depite the fact that it is
"loop invariant".  They also seem to make poor use of the available
registers.

It is interesting to note that PDC appears to treat shorts as 32 bit
quantities, like ints and longs.  It also seems that BOTH of the
lines with "EXT.L I" are redundant as I is already 32 bits.

So I think Lattice and Aztec still have work to do.  I hope someone
finds this of interest.

Andrew.
-- 
Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712

dolfing@cs.utwente.nl (Hans Dolfing) (11/12/90)

In article 2033 of comp.sys.amiga.tech, andrew@teslab.lab.OZ (Andrew Phillips) said:

>Over the years I have been intrigued by the code generated by
>different C compilers, and have been comparing Lattice C code with
>Aztec C.  From the first it always seemed that Lattice performed more
>optimizations but that Aztec did better simply because of better code
>generation.  Nowadays, they seem to be much closer, producing
>reasonable code with simple optimizations - but there is a lot of
>room for improvement.

Hello everybody,

My name is Hans Dolfing. I am Computer Science student and currently
graduating. Of course I do own a AMIGA. When I read the former article,
some thoughs an ideas came up which may be interesting for everybody.

Although the C-compilers of Lattice 5.05 and Manx 5.0 work fine, they can
definitely be improved. Maybe, some people on the net are aware of
Borland Turbo-C 2.0 for the Atari ST. If not, you should take a look at it.
It is simply the best C-compiler I have ever seen. Comparing this compiler to
Lattice (now SAS) and Manx for the amiga, you can see that Turbo-C beats
them on almost everything.

Turbo-C is
 - An integrated package which works fine.
 - It compiles 4 times faster than Lattice and Manx (4000 lines/min).
 - The produced code is better. Especially the register allocation
   strategy is exciting.
 - The library routines are good! Just look at the code of 'memcpy'.

Why isn't there a firm that makes such a compiler for the Amiga?

Maybe, SAS and Manx should think about the following proposals:

- The user-interface (which user-interface?) can be improved. Options
  should be enabled/diabled by clicking on them, just like it is done in
  Turbo-C++/PC.
- I like the project files of Turbo-C/ST and Turbo-C++/PC. Maybe, this
  can be implemented too.
- Compiler and linker should be integrated in a package of around 200/300K
  size. The editor can possibly be integrated using ARexx so that a user
  can decide which editor to use. LSE goes a bit in this direction
  but since  I like to use Cygnus Ed 2.0, this is a problem.
- Why did the size of Manx Aztec increase with 70K to 150K when going
  from version 3.6 to 5.0? This seems an indication that the compiler
  code is growing too large (like the Apple Finder 7.0!). So, please
  reduce the compiler size of lc1, lc2 and cc.
- The compiling process can be done faster (see Turbo-C/ST 2.0).
- The libraries of Lattice and Manx should be reworked. Why do we need
  different library for register and stack variables/arguments. Can't the
  compiler keep track of this? The same is true for 16 and 32 bit
  integers and small and large data/code models. Can't the compiler
  keep some marks which are finally written in the generated .o file so
  that the linker knows which size/model to use? Summarizing, it seems
  to me that at least 6 libraries are superflous!
- Why not putting all variables and arguments in registers? If
  stack args are really needed (eg varargs), we can use __stdargs or
  something comparable. Please use always a small data model. If a table
  becomes too large, the compiler should notice this and change the
  adressing of this (and only this) table to a large model.
- Last but not least, the library routines should be as fast as possible.
  Now, I have sometimes to bother if the used library routine is fast
  enough (memcpy).

P.S. I'am not trying to break down the compilers of SAS and Manx. I'am
  just wondering why there are compilers on other machines (Turbo-C 2.0/ST
  and Turbo-C++/PC) which have a really good user interface, work fine
  and produce good code. Therefore, I gave some hints which may help the
 'compiler-builders' to improve their products and to produce even more
  professional software packages (especially user-interfaces which will
  hopefully be improved under OS 2.0) for this wonderful machine named
  AMIGA.

P.S. 2: Maybe it is a nice idea to put together all good ideas/improvements
        on the net and send them to SAS and Manx?

---------------------------
	Greetings, Hans Dolfing (dolfing@cs.utwente.nl)

nj@teak.berkeley.edu (...) (11/13/90)

[I was originally going to respond to this via email, but it
 bounced.  Hopefully n-million other people won't post followups
 as well.]

I've seen Turbo C on an Itty Bitty Machine, and the integrated
environment is nice.  However, some of your objections have in fact
been addressed to some extent with SAS C 5.10, and (at least according
to what they say in the documentation) will be improved even more in
the next release.

>- The user-interface (which user-interface?) can be improved. Options
>  should be enabled/diabled by clicking on them, just like it is done in
>  Turbo-C++/PC.

SAS C 5.10 comes with an Intuitionized tool that can set most compiler
options.  The interface has 2.0-style gadgets (even under 1.3--don't
know how they pulled this off; maybe they grabbed the gadtools code
from 2.0 and put it directly in the program) and makes sure everything
fits together right (e.g. if you select registerized parameters, it'll
link with the registerized library, etc.).  It's not perfect yet, but
they acknowledge this in the dox, and promise to improve it in the
future.

>- I like the project files of Turbo-C/ST and Turbo-C++/PC. Maybe, this
>  can be implemented too.

I don't know what project files are, but I assume they're similar to
makefiles, which SAS has.  Granted, makefiles are a bit weird, but since
SAS was partially targeted at people coming from UN*X, it's understandable.

>- Compiler and linker should be integrated in a package of around 200/300K
>  size. 

There's no provision for this yet.

>  The editor can possibly be integrated using ARexx so that a user
>  can decide which editor to use. LSE goes a bit in this direction
>  but since  I like to use Cygnus Ed 2.0, this is a problem.

Version 5.10 comes with a program that will let you invoke CED instead of
LSE after the compiler finds an error.  I guess the problem with using
other editors is that they may not have the same AREXX commands for
moving to a specific line or whatever.  

>- Why did the size of Manx Aztec increase with 70K to 150K when going
>  from version 3.6 to 5.0? This seems an indication that the compiler
>  code is growing too large (like the Apple Finder 7.0!). So, please
>  reduce the compiler size of lc1, lc2 and cc.
>- The compiling process can be done faster (see Turbo-C/ST 2.0).

Don't know about these two, though the compiler tends to increase in
speed with each release.

>- The libraries of Lattice and Manx should be reworked. 

This is a bit annoying, both in terms of disk space and getting the
command line straight.  Not being a compiler guru I don't know how
easy it would be for them to fix this; if they ever got around to
integrating the compiler and the linker, they might find a solution.
In the interim, the Intuitionized interface to the compiler will keep
track of all the libraries for you.

>- Why not putting all variables and arguments in registers? If
>  stack args are really needed (eg varargs), we can use __stdargs or
>  something comparable. 

This is just a matter of compiling with the -rr option and linking with
the right library.

>  Please use always a small data model. If a table
>  becomes too large, the compiler should notice this and change the
>  adressing of this (and only this) table to a large model.

In version 5.10, LC1 defaults to "near" on everything; if it runs out
of room, it starts making things "far".  (I assume this is what you
mean by "small data model".)  Also, blink has a SMALLDATA option for
merging all the near data into one hunk.

>- Last but not least, the library routines should be as fast as possible.
>  Now, I have sometimes to bother if the used library routine is fast
>  enough (memcpy).

Don't know about this.  In the interim, of course, you can use the Exec
routine CopyMemQuick().

There are many improvements to be made to SAS C, but I think they're
getting a little better.  I do hope they add more to their
Intuitionized interface, and integrate it with the makefiles (right
now you can make it so it only compiles files that have been recently
changed, but it won't take dependencies into account).


nj

dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/13/90)

>In article <1149@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes:
>Over the years I have been intrigued by the code generated by
>different C compilers, and have been comparing Lattice C code with
>Aztec C.  From the first it always seemed that Lattice performed more
>optimizations but that Aztec did better simply because of better code
>generation.  Nowadays, they seem to be much closer, producing
>reasonable code with simple optimizations - but there is a lot of
>room for improvement.
>
>Recently I have been comparing Lattice C 5.04, Aztec C 5.0, DICE 2.02
>and PDC 3.34 using several benchmarks.  On disassembling the
> ...
>for this loop (with all optimizations on).
>
>     LATTICE/AZTEC	      DICE		      PDC
>
>     MOVE.W  I,K	      MOVE.W  I,D0	      EXT.L   I
>     ADD.W   I,K	      EXT.L   D0	      EXT.L   I
>			      MOVE,W  K,D1	      MOVE.L  I,D0
>			      EXT.L   D1	      ADD.L   I,D0
>			      ADD.L   D0,D1	      MOVE.L  D0,K
>			      MOVE.W  D1,D3
>     BRA.B   IN	      BRA.B   IN
>
>LOOP LEA     f(A4),A0   LOOP LEA     f(A4),A0   LOOP CMPI.L  #8190,K
>     CLR.B   0(A0,K.W)       ADDA.W  K,A0            BGT.B   OUT
>			      MOVE.B  #0,(A0)         LEA     f(A4),A0
>     ADD.W   I,K	      ADD.W   I,K	      ADDA.L  K,A0
>IN   CMPI.W  #8190,K	 IN   CMPI.W  #8190,K	      CLR.B   (A0)
>     BLE.B   LOOP	      BLE.B   LOOP	      ADD.L   I,K
>						      BRA.B   LOOP
>						 OUT  ...
>
>I calculated the total 68000 clock cycles for the inner loop
>(excluding initialization) to be: Lattice 48, Aztec 48, DICE 50 and

    Neat!  BTW, DICE now optimizes short adds when the result is also a
    short, the initialization part of the loop generates:

	move.w	D0,D1
	add.w	D0,D1
	bra	IN

    It also optimizes other arithmatic and logical operations that act
    entirely on shorts (DICE is a 32bit-int compiler only, BTW.  In the
    above code was Aztec and Lattice run in 32bit-int modes? Probably, but
    just wondering...).

    As far as the inner loop goes, I'm kind of proud of DICE in that it
    does a pretty good job without any real optimization at all. Lattice
    and Aztec could actually get more speed out of their code if they did
    not use CLR.  CLR always reads the location before writing a 0.

    PDC looks like it needs a lot of work.

>Andrew.
>--
>Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712

					-Matt

--

    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA

markv@kuhub.cc.ukans.edu (11/14/90)

Dont forget about SAS/Lattice's support for __builtin functions like
memcpy, memset, etc that use inline code rather than function calls.
(By flipping the compiler switch for processor you can also get such
loops to use DBxx loops for 68010 and 32 bit instructions for 68020).

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mark Gooderum			Only...		\    Good Cheer !!!
Academic Computing Services	       ///	  \___________________________
University of Kansas		     ///  /|         __    _
Bix:	  markgood	      \\\  ///  /__| |\/| | | _   /_\  makes it
Bitnet:   MARKV@UKANVAX		\/\/  /    | |  | | |__| /   \ possible...
Internet: markv@kuhub.cc.ukans.edu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ben@epmooch.UUCP (Rev. Ben A. Mesander) (11/14/90)

>In article <1149@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes:
>Over the years I have been intrigued by the code generated by
>different C compilers, and have been comparing Lattice C code with
>Aztec C.  From the first it always seemed that Lattice performed more
>optimizations but that Aztec did better simply because of better code
>generation.  Nowadays, they seem to be much closer, producing
>reasonable code with simple optimizations - but there is a lot of
>room for improvement.

Fascinating! Here's the output that the GCC port I'm working on produces
when the following C code is compiled:
main()
{
	register short i, k;
	int flags[8190];

	i=10;

	for (k = i+i; k <= 8190; k += i)
		flags[k] = 0;
}

#NO_APP
gcc_compiled.:
.text
	.even
.globl _main
_main:
	link a6,#-32760
	movel d2,sp@-
	moveq #10,d2
	moveq #20,d1
L5:
	movew d1,d0
	extl d0
	asll #2,d0
	lea a6@(0,d0:l),a0
	clrl a0@(-32760)
	addw d2,d1
	cmpw #8190,d1
	jle L5
	movel a6@(-32764),d2
	unlk a6
	rts

(Note that GCC uses a different assembler language format than Amigans
 usually use - however, it doesn't look _too_ different)

Before anyone gets excited, I can produce this sort of assembler code,
but I can't do anything with it yet. No, I'm not the author of the port
either, I just be an alpha-tester. Andrew, if you posted your entire
test program, I'll compile it for you so that results are directly
comparable to the Lattice or Aztec runs.

The following cretinous invocation of the GNU compiler was used (the
front-end doesn't work right yet...) Optimization is turned on.

/cpp test2.c -I include:/compiler_headers |
/cc1 -O -m68000 -msoft-fload -o test2.s

>Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712

--
| ben@epmooch.UUCP   (Ben Mesander)       | "Cash is more important than |
| ben%servalan.UUCP@uokmax.ecn.uoknor.edu |  your mother." - Al Shugart, |
| !chinet!uokmax!servalan!epmooch!ben     |  CEO, Seagate Technologies   |

dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/15/90)

In article <26893.273fe96d@kuhub.cc.ukans.edu> markv@kuhub.cc.ukans.edu writes:
>Dont forget about SAS/Lattice's support for __builtin functions like
>memcpy, memset, etc that use inline code rather than function calls.
>(By flipping the compiler switch for processor you can also get such
>loops to use DBxx loops for 68010 and 32 bit instructions for 68020).

    Well, actually, while the built-in stuff is cute it is also pretty
    useless in most cases.  For example, the code for a 'full' version of
    setmem()/memset(), movmem()/memmov(), etc.... is pretty big, but also
    can be a hell of a lot faster (using MOVEM's or at least long ops
    instead of char ops).  I think the only real builtin function that
    is useful is, maybe, strlen().  This applies to all processors since
    a DBxx loop using a BYTE transfer size is still a BYTE transfer loop,
    even if all the instructions are cached.

    The DBxx loops are nothing more than a simple optimization in my book,
    though one that DICE does not currently do.

    Frankly, I just do not see any advantage and it can be *really*
    confusing.

>--

				    -Matt

    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA

cedman@golem.ps.uci.edu (Carl Edman) (11/15/90)

In article <dillon.7256@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:

   In article <26893.273fe96d@kuhub.cc.ukans.edu> markv@kuhub.cc.ukans.edu writes:
   >Dont forget about SAS/Lattice's support for __builtin functions like
   >memcpy, memset, etc that use inline code rather than function calls.
   >(By flipping the compiler switch for processor you can also get such
   >loops to use DBxx loops for 68010 and 32 bit instructions for 68020).

       Well, actually, while the built-in stuff is cute it is also pretty
       useless in most cases.  For example, the code for a 'full' version of
       setmem()/memset(), movmem()/memmov(), etc.... is pretty big, but also
       can be a hell of a lot faster (using MOVEM's or at least long ops
       instead of char ops).  I think the only real builtin function that
       is useful is, maybe, strlen().  This applies to all processors since
       a DBxx loop using a BYTE transfer size is still a BYTE transfer loop,
       even if all the instructions are cached.

       The DBxx loops are nothing more than a simple optimization in my book,
       though one that DICE does not currently do.

       Frankly, I just do not see any advantage and it can be *really*
       confusing.

That e.g. memmove() functions which are really optimal are quite large
might be true. But most of that complexity results from an analysis
of the parameters and choosing the corresponding algorithm to deal
optimally with these parameters (e.g. overlapping/non-overlapping memory
areas, odd/word-even/long-word addresses/lengths, downward/upward copy,
large arrays/small arrays a.s.o.). Each combination of these parameters
requires a different routine to be optimal. So the code which analyses
the parameters and the different codes for different parameter sets
make up most of the code. But now imagine a C compiler which does
the parameter analysis (as far as possible) at run time and only
inserts the 'correct' routine for these parameter sets.

I think you will have to admit that in this case you could have significant
speedups and space savings.

        Carl Edman

Theorectical Physicist,N.:A physicist whose  | Send mail
existence is postulated, to make the numbers |  to
balance but who is never actually observed   | cedman@golem.ps.uci.edu
in the laboratory.                           | edmanc@uciph0.ps.uci.edu

jeh@sisd.kodak.com (Ed Hanway) (11/16/90)

dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
>    Well, actually, while the built-in stuff is cute it is also pretty
>    useless in most cases.  For example, the code for a 'full' version of
>    setmem()/memset(), movmem()/memmov(), etc.... is pretty big, but also
>    can be a hell of a lot faster (using MOVEM's or at least long ops
>    instead of char ops).  I think the only real builtin function that
>    is useful is, maybe, strlen().

__builtin_strlen() is definitely useful. SAS/C evaluates 
strlen("string constant") at compile time.  I don't know how good the
builtin versions of the other functions are, but if the compiler knows
anything about arguments to the function at compile time, it can
use different versions of the move/set code for small/large size,
aligned/unaligned, etc.  If no information is available, it can always
call the general routine.

Also, because the compiler knows about any side-effects of the builtin, it
can optimize the code around the builtin more than if it was just an arbitrary
function call.
--
Ed Hanway --- uunet!sisd!jeh
Some of the trademarks mentioned in this product are for identification
purposes only.  All models are over 18 years of age.

micke@slaka.sirius.se (Mikael Karlsson) (11/16/90)

In message <1990Nov12.164804.5490@agate.berkeley.edu>,
    nj@teak.berkeley.edu writes:

>>  The editor can possibly be integrated using ARexx so that a user
>>  can decide which editor to use. LSE goes a bit in this direction
>>  but since  I like to use Cygnus Ed 2.0, this is a problem.
>
>Version 5.10 comes with a program that will let you invoke CED instead of
>LSE after the compiler finds an error.  I guess the problem with using
>other editors is that they may not have the same AREXX commands for
>moving to a specific line or whatever.

The readme file for this program tells you to add the switch that
turns off ANSI-sequences in error messages.

I can't find any mention of this switch in the documentation.
Can anybody tell me what it looks like?

Thanks.

Kevin Morwood <EETY1478@Ryerson.CA> (11/16/90)

Actually one definite problem I've had with __builtins is that when
you want to do something like:

qsort(&base,num,size,strcmp);

The compiler tries to inline the reference to strcmp and then bitches
rather profusely cause it doesn't like it.  Then you have to undefine the
builtin at the point of the qsort call followed by redefining it afterward
(if you want to get the benefit of ANY of the __builtin availability).

Generally...unimpressed.

dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/18/90)

In article <1990Nov15.170810.5868@sisd.kodak.com> jeh@sisd.kodak.com (Ed Hanway) writes:
>dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
>>    Well, actually, while the built-in stuff is cute it is also pretty
>>    useless in most cases.  For example, the code for a 'full' version of
>>    setmem()/memset(), movmem()/memmov(), etc.... is pretty big, but also
>>    can be a hell of a lot faster (using MOVEM's or at least long ops
>>    instead of char ops).  I think the only real builtin function that
>>    is useful is, maybe, strlen().
>
>__builtin_strlen() is definitely useful. SAS/C evaluates
>strlen("string constant") at compile time.  I don't know how good the
>builtin versions of the other functions are, but if the compiler knows
>anything about arguments to the function at compile time, it can
>use different versions of the move/set code for small/large size,
>aligned/unaligned, etc.  If no information is available, it can always
>call the general routine.

    Uh, I NEVER use strlen() on a string constant.  That's the most
    ridiculous thing I've ever heard of in my life!

    I use (sizeof("string-constant") - 1).  And, if you are worried about
    things looking 'neat', simply write a little preprocessor macro to do
    it.

    No, I was thinking strlen() is useful as a builtin function -- one of
    the few -- because it takes just a little more code than the equivalent
    call (push/jsr/addq.l#4,sp) would take.

>Also, because the compiler knows about any side-effects of the builtin, it
>can optimize the code around the builtin more than if it was just an arbitrary
>function call.

    About the only thing the compiler can optimize are the stupid mistakes
    either it or the programmer makes ... useless, because only relatively
    good programmers are worried about such trivial optimizations and they
    do not make the mistakes in the first place.

    Many optimizations fall into that category .. reading the description
    makes you feel good because your compiler is 'optimizing' but the
    reality is that they do not do a pittling thing.  Inexperienced
    programmers do not code well enough for them to make much of a
    difference, and experienced programmers code well enough that they do
    not make much of a difference either.  Of course, there are many, many
    optimizations that *do* do major good things, but a small plethora of
    'built in' functions is not one of them.  Really a huge waste of time;
    As far as I can, lattice would have spent their time better on other
    optimizations.

					-Matt


>--
>Ed Hanway --- uunet!sisd!jeh
>Some of the trademarks mentioned in this product are for identification
>purposes only.  All models are over 18 years of age.

--


    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA

dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/18/90)

In article <CEDMAN.90Nov14221934@lynx.ps.uci.edu> cedman@golem.ps.uci.edu (Carl Edman) writes:
>
>That e.g. memmove() functions which are really optimal are quite large
>might be true. But most of that complexity results from an analysis
>of the parameters and choosing the corresponding algorithm to deal
>...
>make up most of the code. But now imagine a C compiler which does
>the parameter analysis (as far as possible) at run time and only
>inserts the 'correct' routine for these parameter sets.
>
>I think you will have to admit that in this case you could have significant
>speedups and space savings.
>
>	 Carl Edman

--

    Uh huh, right.	movmem(s, d, len)

    Now, unless all the parameters are basically globals of known alignment
    AND the 'length' is a constant, the compiler will not be able to make
    any major assumptions about the copy.

    Basically, a movmem would have to use the same operational guidelines
    as a structural assignment for the compiler to be able to make any
    assumptions, and then you might as well use a structure / structure
    assignment rather than a movmem.

    Unless  you can find several optimizable examples that would be
    *WIDELY* used in random source code (at least as a percentage of
    movmem()s in the source code), my opinion will not change :-)

					    -Matt

    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA

limonce@pilot.njin.net (Tom Limoncelli) (11/19/90)

In article <dillon.7260@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:

> In article <1990Nov15.170810.5868@sisd.kodak.com> jeh@sisd.kodak.com (Ed Hanway) writes:
>     Many optimizations fall into that category .. reading the description
>     makes you feel good because your compiler is 'optimizing' but the
>     reality is that they do not do a pittling thing.  Inexperienced
>     programmers do not code well enough for them to make much of a
>     difference, and experienced programmers code well enough that they do
>     not make much of a difference either.  Of course, there are many, many

I write a lot of code that must look like a non-"experienced
programmer" wrote.  I do this on purpose because (1) I feel it is
easier to read (2) I assume that it's syntatical sugar that I assume
the compiler will (internally) be re-written into the "experienced"
form.

I guess it's not "macho" to write readable, maintainable code (just
kidding folks, if you want to re-ignite that useless flame war take it
to comp.misc!).  :-) :-)

Anything a compiler company can do to encourage programmers to create
maintainable code should be encouraged.  (Now that's a sentence!)

Does that sway you?

Tom
P.S.  Obligatory ungrateful user question:  I own the non-shareware
version of DICE... so Matt, when will there be a debugger? :-)
-- 
tlimonce@drew.edu     Tom Limoncelli      "Flash!  Flash!  I love you!
tlimonce@drew.bitnet  +1 201 408 5389        ...but we only have fourteen
tlimonce@drew.uucp    limonce@pilot.njin.net       hours to save the earth!"

jeh@sisd.kodak.com (Ed Hanway) (11/19/90)

dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
>    Uh, I NEVER use strlen() on a string constant.  That's the most
>    ridiculous thing I've ever heard of in my life!
>
>    I use (sizeof("string-constant") - 1).  And, if you are worried about
>    things looking 'neat', simply write a little preprocessor macro to do
>    it.

I'd never use strlen("constant") either, unless I knew that it was
evaluated at compile time. I have used (albeit in a toy program):

#define SAY(s)	Write(backstdout, s, strlen(s))

which works for both SAY("const") and SAY(var).

>                                   ... Of course, there are many, many
>    optimizations that *do* do major good things, but a small plethora of
>    'built in' functions is not one of them.  Really a huge waste of time;
>    As far as I can, lattice would have spent their time better on other
>    optimizations.

I tend to agree that builtin functions _by themselves_ are not much good,
but in combination with a good optimizer I think that they are worthwhile,
at the very least because arguments wouldn't need to be stuffed into
specific registers for the function call.
--
Ed Hanway --- uunet!sisd!jeh
Must be 18 or older to play.  Prerecorded for this time zone.
Do not read while operating a motor vehicle or heavy equipment.

bruce@zuhause.MN.ORG (Bruce Albrecht) (11/20/90)

>In article <dillon.7260@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
>In article <1990Nov15.170810.5868@sisd.kodak.com> jeh@sisd.kodak.com (Ed Hanway) writes:
>>__builtin_strlen() is definitely useful. SAS/C evaluates
>>strlen("string constant") at compile time.  I don't know how good the
>>builtin versions of the other functions are, but if the compiler knows
>>anything about arguments to the function at compile time, it can
>>use different versions of the move/set code for small/large size,
>>aligned/unaligned, etc.  If no information is available, it can always
>>call the general routine.
>
>    Uh, I NEVER use strlen() on a string constant.  That's the most
>    ridiculous thing I've ever heard of in my life!
>
>    I use (sizeof("string-constant") - 1).  And, if you are worried about
>    things looking 'neat', simply write a little preprocessor macro to do
>    it.

If the string constant is created via #define, it's probably not a good idea to use
sizeof() to get its length.  The #define could later be replaced by a char array,
and the sizeof() would not produce correct result if the actual string length was
smaller than the size of the array.  The sizeof would still be syntatically correct,
and possibly difficult to locate.
--


bruce@zuhause.mn.org

GIAMPAL@auvm.auvm.edu (11/21/90)

In article <1990Nov19.130657.19380@sisd.kodak.com>, jeh@sisd.kodak.com (Ed
Hanway) says:
>dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
>>    Uh, I NEVER use strlen() on a string constant.  That's the most
>>    ridiculous thing I've ever heard of in my life!

>I'd never use strlen("constant") either, unless I knew that it was
>evaluated at compile time. I have used (albeit in a toy program):
>
>#define SAY(s)  Write(backstdout, s, strlen(s))
>
>which works for both SAY("const") and SAY(var).
Yes this does work, but if the string is a constant, then you get a duplicate
copy of the string put in your data segment.  I use :

#define MSG(s) { char *s; Write(Output(), s, strlen(s)); }

(note: those are supposed to be curly braces, but this is an APL keyboard,
       so I don't know what it will be on your screen)


This way you only get one copy of the string constant, and you get a nice
function.  BTW, don't use MSG() if you are running from WB with no
output file handle, you'll hang (a definite bug in WB, IMHO).

--dominic

jeh@sisd.kodak.com (Ed Hanway) (11/21/90)

GIAMPAL@auvm.auvm.edu writes:
>In article <1990Nov19.130657.19380@sisd.kodak.com>, jeh@sisd.kodak.com (Ed
>Hanway) says:
>>#define SAY(s)  Write(backstdout, s, strlen(s))

>#define MSG(s) { char *s; Write(Output(), s, strlen(s)); }
>
>This way you only get one copy of the string constant, and you get a nice
>function.

I guess you really mean

#define MSG(s)	{ char *tmp = s; Write(whatever, tmp, strlen(tmp)); }

which is fine, but my version was posted as an example of when a builtin
version of strlen() came in handy.  In Lattice (now SAS) C, strlen("constant")
is evaluated at compile time, so, using my version, SAY("foo") would
compile as Write(backstdout, "foo", 3).  This not only eliminates the
extra copy of the string constant, it eliminates the strlen() operation
altogether.

--
Ed Hanway --- uunet!sisd!jeh
This message is packed as full as practicable by modern automated equipment.
Contents may settle during shipment.

mwm@raven.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) (11/22/90)

In article <90324.204949GIAMPAL@auvm.auvm.edu> GIAMPAL@auvm.auvm.edu writes:
   In article <1990Nov19.130657.19380@sisd.kodak.com>, jeh@sisd.kodak.com (Ed
   Hanway) says:
   >I'd never use strlen("constant") either, unless I knew that it was
   >evaluated at compile time. I have used (albeit in a toy program):
   >
   >#define SAY(s)  Write(backstdout, s, strlen(s))
   >
   >which works for both SAY("const") and SAY(var).
   Yes this does work, but if the string is a constant, then you get a
   duplicate copy of the string put in your data segment.  I use :

   #define MSG(s) { char *s; Write(Output(), s, strlen(s)); }

So use the flag on your compiler that forces identical string
constants into the same string. That way, you don't have the
extraneous variables (and manipulations thereof) in your code.

Or take the approach of one programmer I knew - he never coded string
constants, except for the place where he defined a variable to point
to them. But the compiler can do that for you.

	<mike
--
Cats will be cats and cats will be cool			Mike Meyer
Cats can be callous and cats can be cruel		mwm@relay.pa.dec.com
Cats will be cats, remember this words!			decwrl!mwm
Cats will be cats and cats eat birds.

andrew@teslab.lab.OZ (Andrew Phillips) (11/22/90)

In article <dillon.7176@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
>>In article <1149@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes:
>    In the above code was Aztec and Lattice run in 32bit-int modes?
>    Probably, but just wondering...).

They used 32 bit ints (the default).  But this wouldn't matter, would
it since the program only used shorts (16 bits) not ints.

BTW I used no compiler options except to turn on maximum
optimizations.  With no command line options at all (i.e. all
defaults) DICE did better than both Lattice and Aztec. 

>    As far as the inner loop goes, I'm kind of proud of DICE in that it
>    does a pretty good job without any real optimization at all. Lattice
>    and Aztec could actually get more speed out of their code if they did
>    not use CLR.  CLR always reads the location before writing a 0.

According to my interpretation of the Motorola M68000 Microprocessor
User's Manual (8th edition) pages 8-2 and 8-6 both instructions (i.e.
CLR.B 0(A0,D0.W) and MOVE.B #0,0(A0,D0.W) ) take 18 clock cycles.  Of
course the CLR instruction is never BETTER than the equivalent MOVE #0.

Andrew.
-- 
Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712

dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) (11/22/90)

In article <1990Nov21.131206.2634@sisd.kodak.com> jeh@sisd.kodak.com (Ed Hanway) writes:
>GIAMPAL@auvm.auvm.edu writes:
>>In article <1990Nov19.130657.19380@sisd.kodak.com>, jeh@sisd.kodak.com (Ed
>>Hanway) says:
>>>#define SAY(s)  Write(backstdout, s, strlen(s))
>>#define MSG(s) { char *s; Write(Output(), s, strlen(s)); }
>>This way you only get one copy of the string constant, and you get a nice
>>function.
>I guess you really mean
>#define MSG(s)	{ char *tmp = s; Write(whatever, tmp, strlen(tmp)); }
>which is fine, but my version was posted as an example of when a builtin
>version of strlen() came in handy.  In Lattice (now SAS) C, strlen("constant")
>is evaluated at compile time, so, using my version, SAY("foo") would
>compile as Write(backstdout, "foo", 3).
Our compiler (for Unix, not for my Amiga) allows 'sizeof("constant")'
which is eliminated at compile time. This obviously doesn't work for dynamic
strings.
-- 
   _ _ 
  / U |  Dolf Grunbauer  Tel: +31 55 433233 Internet dolf@idca.tds.philips.nl
 /__'<   Philips Information Systems        UUCP     ...!mcsun!philapd!dolf
88  |_\  If you are granted one wish do you know what to wish for right now ?

jmeissen@oregon.oacis.org ( Staff OACIS) (11/23/90)

In article <90324.204949GIAMPAL@auvm.auvm.edu> GIAMPAL@auvm.auvm.edu writes:
>>#define SAY(s)  Write(backstdout, s, strlen(s))
>>which works for both SAY("const") and SAY(var).
>Yes this does work, but if the string is a constant, then you get a duplicate
>copy of the string put in your data segment.  I use :

Not if you are using SAS/Lattice :-) The Lattice compiler has an option that
will cause the compiler to only generate a single copy of duplicate string
constants (should be the default, IMHO. Isn't because of Unix wierdos who
modify string constants).

dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/27/90)

In article <90324.204949GIAMPAL@auvm.auvm.edu> GIAMPAL@auvm.auvm.edu writes:
>
>>I'd never use strlen("constant") either, unless I knew that it was
>>evaluated at compile time. I have used (albeit in a toy program):
>>
>>#define SAY(s)  Write(backstdout, s, strlen(s))
>>
>>which works for both SAY("const") and SAY(var).
>Yes this does work, but if the string is a constant, then you get a duplicate
>copy of the string put in your data segment.  I use :
>
>#define MSG(s) { char *s; Write(Output(), s, strlen(s)); }
>
>(note: those are supposed to be curly braces, but this is an APL keyboard,
>	so I don't know what it will be on your screen)
>
>
>This way you only get one copy of the string constant, and you get a nice
>function.  BTW, don't use MSG() if you are running from WB with no
>output file handle, you'll hang (a definite bug in WB, IMHO).
>
>--dominic

    I generally do this:

    Say(s)
    char *s;
    {
	return(Write(Output(), s, strlen(s)));
    }

    Or the equivalent, which takes much less room (in terms of code size).
    Also, Write() and Output() take so much overhead that the difference in
    execution speed of the subroutine, Say(), verses the SAY macro would
    not be noticable.

    builtins are generally useless and I sometimes wonder if Lattice added
    them simply to optimize inefficiencies in their own source code.

					-Matt

--


    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA

dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/27/90)

In article <530@ssp9.idca.tds.philips.nl> dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) writes:
>
>Our compiler (for Unix, not for my Amiga) allows 'sizeof("constant")'
>which is eliminated at compile time. This obviously doesn't work for dynamic
>strings.
>--
>   _ _
>  / U |  Dolf Grunbauer  Tel: +31 55 433233 Internet dolf@idca.tds.philips.nl
> /__'<   Philips Information Systems        UUCP     ...!mcsun!philapd!dolf
>88  |_\  If you are granted one wish do you know what to wish for right now ?

--

    Actually, *all* compilers do sizeof("constant") at compile time.
    Unfortunately, many also declare storage for the string even though it
    is never referenced.  That has always amused me.

    Perhaps you were talking about strlen("constant"); ??  This whole
    argument is over builtins and compiler optimization of said.

					    -Matt

    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA

dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/28/90)

In article <1159@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes:
>In article <dillon.7176@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
>>>In article <1149@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes:
>>    In the above code was Aztec and Lattice run in 32bit-int modes?
>>    Probably, but just wondering...).
>
>They used 32 bit ints (the default).  But this wouldn't matter, would
>it since the program only used shorts (16 bits) not ints.

    Actually, it can matter.  A 16 bit compiler does not need to optimize
    arithmatic expressions at all.  A 32 bit compiler such as Lattice / DICE
    must start out by virtual EXTing the 16 bit quantities to 32 bits
    and then optimize them back down to 16 bits.  There are many expressions
    that cannot be optimized.  It does not matter whether, for 16 bit
    compilation, one uses 'short' or 'int', they are identical (just as
    'int' and 'long' are identical for 32 bit compilers).

    For example:

	short a, b;
	foo (a * b);    /*  16 bit argument for 16 bit compiler, 32 bit
			 *  argument for 32 bit compiler
			 */


    Another good example is array indexing.  Most IBM C compilers use 16
    bit indexes for array indexing and ignore possible overflows (i.e.
    you have an array 65536 of short).	Turbo C on the IBM is even worse...
    it uses a 16 bit array index even if you use a long quantity as the
    index!  As far as I know, all Amiga C compilers (Aztec,Lattice,DICE),
    use 32 bit array indexes when forced to multiply by 2 or more for
    non-char arrays.

    Yet another is the question of bit field packing ... do you pack in
    16 bit fields when ints are 16 bits, even if your compiler supports
    32 bit ints (i.e. disallows bit fields larger than 16 bits wide when
    in 16 bit mode?)

    These are simple examples, of course, but you get the idea.  There are
    many differences between 16 bit and 32 bit implementations.  I'm not
    arguing or anything, I'm rather pleased myself!

>BTW I used no compiler options except to turn on maximum
>optimizations.  With no command line options at all (i.e. all
>defaults) DICE did better than both Lattice and Aztec.

    That is very interesting.  One of DICE's big points is that compilation
    time is at least as fast as Aztec (haven't tested it formally).  It is
    definitely much faster than Lattice.  Lattice has always taken a while
    to do compiles, and with the -O option it goes even slower! Personally,
    I've never trusted Lattice's -O option having been personally bitten by
    bugs in previous versions of Lattice C (5.02). I dunno about later
    versions because I stopped using the option.  I have no experience with
    Aztec's optimization options.

>Andrew.
>--
>Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712

			    -Matt

--


    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA

ben@epmooch.UUCP (Rev. Ben A. Mesander) (12/01/90)

>In article <dillon.7352@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
[discussion of various builtins in SAS C]
>    builtins are generally useless and I sometimes wonder if Lattice added
>    them simply to optimize inefficiencies in their own source code.
>
Most of them seem to be rather useless. However, the builtin printf stuff
can really cut down code size, because linking with the huge general
purpose printf is not necessary. I think that it's probably best to
inline calls to strlen and movmem if you have chosen to optimise speed
over space.

>    Matthew Dillon	    dillon@Overload.Berkeley.CA.US

--
| ben@epmooch.UUCP   (Ben Mesander)       | "Cash is more important than |
| ben%servalan.UUCP@uokmax.ecn.uoknor.edu |  your mother." - Al Shugart, |
| !chinet!uokmax!servalan!epmooch!ben     |  CEO, Seagate Technologies   |