andrew@teslab.lab.OZ (Andrew Phillips) (11/08/90)
Over the years I have been intrigued by the code generated by different C compilers, and have been comparing Lattice C code with Aztec C. From the first it always seemed that Lattice performed more optimizations but that Aztec did better simply because of better code generation. Nowadays, they seem to be much closer, producing reasonable code with simple optimizations - but there is a lot of room for improvement. Recently I have been comparing Lattice C 5.04, Aztec C 5.0, DICE 2.02 and PDC 3.34 using several benchmarks. On disassembling the innermost loop of the sieve of Eratosthenes I found that the four compilers had generated the code shown below. The C code for this loop was: register short i, k; ... for (k = i + i; k <= 8190; k += i) flags[k] = 0; In the assembler code below the first part is the loop initialization (k = i + i) and the names I and K represent the data registers corresponding to the variables i and k. Interestingly Lattice and Aztec took the same time in the benchmark and generated the same code for this loop (with all optimizations on). LATTICE/AZTEC DICE PDC MOVE.W I,K MOVE.W I,D0 EXT.L I ADD.W I,K EXT.L D0 EXT.L I MOVE,W K,D1 MOVE.L I,D0 EXT.L D1 ADD.L I,D0 ADD.L D0,D1 MOVE.L D0,K MOVE.W D1,D3 BRA.B IN BRA.B IN LOOP LEA f(A4),A0 LOOP LEA f(A4),A0 LOOP CMPI.L #8190,K CLR.B 0(A0,K.W) ADDA.W K,A0 BGT.B OUT MOVE.B #0,(A0) LEA f(A4),A0 ADD.W I,K ADD.W I,K ADDA.L K,A0 IN CMPI.W #8190,K IN CMPI.W #8190,K CLR.B (A0) BLE.B LOOP BLE.B LOOP ADD.L I,K BRA.B LOOP OUT ... I calculated the total 68000 clock cycles for the inner loop (excluding initialization) to be: Lattice 48, Aztec 48, DICE 50 and PDC 64. These correspond roughly to the ratios of run times that I got when timing the whole program. Even with all optimizations on, both Lattice and Aztec left the first instruction of the loop inside the loop depite the fact that it is "loop invariant". They also seem to make poor use of the available registers. It is interesting to note that PDC appears to treat shorts as 32 bit quantities, like ints and longs. It also seems that BOTH of the lines with "EXT.L I" are redundant as I is already 32 bits. So I think Lattice and Aztec still have work to do. I hope someone finds this of interest. Andrew. -- Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712
dolfing@cs.utwente.nl (Hans Dolfing) (11/12/90)
In article 2033 of comp.sys.amiga.tech, andrew@teslab.lab.OZ (Andrew Phillips) said: >Over the years I have been intrigued by the code generated by >different C compilers, and have been comparing Lattice C code with >Aztec C. From the first it always seemed that Lattice performed more >optimizations but that Aztec did better simply because of better code >generation. Nowadays, they seem to be much closer, producing >reasonable code with simple optimizations - but there is a lot of >room for improvement. Hello everybody, My name is Hans Dolfing. I am Computer Science student and currently graduating. Of course I do own a AMIGA. When I read the former article, some thoughs an ideas came up which may be interesting for everybody. Although the C-compilers of Lattice 5.05 and Manx 5.0 work fine, they can definitely be improved. Maybe, some people on the net are aware of Borland Turbo-C 2.0 for the Atari ST. If not, you should take a look at it. It is simply the best C-compiler I have ever seen. Comparing this compiler to Lattice (now SAS) and Manx for the amiga, you can see that Turbo-C beats them on almost everything. Turbo-C is - An integrated package which works fine. - It compiles 4 times faster than Lattice and Manx (4000 lines/min). - The produced code is better. Especially the register allocation strategy is exciting. - The library routines are good! Just look at the code of 'memcpy'. Why isn't there a firm that makes such a compiler for the Amiga? Maybe, SAS and Manx should think about the following proposals: - The user-interface (which user-interface?) can be improved. Options should be enabled/diabled by clicking on them, just like it is done in Turbo-C++/PC. - I like the project files of Turbo-C/ST and Turbo-C++/PC. Maybe, this can be implemented too. - Compiler and linker should be integrated in a package of around 200/300K size. The editor can possibly be integrated using ARexx so that a user can decide which editor to use. LSE goes a bit in this direction but since I like to use Cygnus Ed 2.0, this is a problem. - Why did the size of Manx Aztec increase with 70K to 150K when going from version 3.6 to 5.0? This seems an indication that the compiler code is growing too large (like the Apple Finder 7.0!). So, please reduce the compiler size of lc1, lc2 and cc. - The compiling process can be done faster (see Turbo-C/ST 2.0). - The libraries of Lattice and Manx should be reworked. Why do we need different library for register and stack variables/arguments. Can't the compiler keep track of this? The same is true for 16 and 32 bit integers and small and large data/code models. Can't the compiler keep some marks which are finally written in the generated .o file so that the linker knows which size/model to use? Summarizing, it seems to me that at least 6 libraries are superflous! - Why not putting all variables and arguments in registers? If stack args are really needed (eg varargs), we can use __stdargs or something comparable. Please use always a small data model. If a table becomes too large, the compiler should notice this and change the adressing of this (and only this) table to a large model. - Last but not least, the library routines should be as fast as possible. Now, I have sometimes to bother if the used library routine is fast enough (memcpy). P.S. I'am not trying to break down the compilers of SAS and Manx. I'am just wondering why there are compilers on other machines (Turbo-C 2.0/ST and Turbo-C++/PC) which have a really good user interface, work fine and produce good code. Therefore, I gave some hints which may help the 'compiler-builders' to improve their products and to produce even more professional software packages (especially user-interfaces which will hopefully be improved under OS 2.0) for this wonderful machine named AMIGA. P.S. 2: Maybe it is a nice idea to put together all good ideas/improvements on the net and send them to SAS and Manx? --------------------------- Greetings, Hans Dolfing (dolfing@cs.utwente.nl)
nj@teak.berkeley.edu (...) (11/13/90)
[I was originally going to respond to this via email, but it bounced. Hopefully n-million other people won't post followups as well.] I've seen Turbo C on an Itty Bitty Machine, and the integrated environment is nice. However, some of your objections have in fact been addressed to some extent with SAS C 5.10, and (at least according to what they say in the documentation) will be improved even more in the next release. >- The user-interface (which user-interface?) can be improved. Options > should be enabled/diabled by clicking on them, just like it is done in > Turbo-C++/PC. SAS C 5.10 comes with an Intuitionized tool that can set most compiler options. The interface has 2.0-style gadgets (even under 1.3--don't know how they pulled this off; maybe they grabbed the gadtools code from 2.0 and put it directly in the program) and makes sure everything fits together right (e.g. if you select registerized parameters, it'll link with the registerized library, etc.). It's not perfect yet, but they acknowledge this in the dox, and promise to improve it in the future. >- I like the project files of Turbo-C/ST and Turbo-C++/PC. Maybe, this > can be implemented too. I don't know what project files are, but I assume they're similar to makefiles, which SAS has. Granted, makefiles are a bit weird, but since SAS was partially targeted at people coming from UN*X, it's understandable. >- Compiler and linker should be integrated in a package of around 200/300K > size. There's no provision for this yet. > The editor can possibly be integrated using ARexx so that a user > can decide which editor to use. LSE goes a bit in this direction > but since I like to use Cygnus Ed 2.0, this is a problem. Version 5.10 comes with a program that will let you invoke CED instead of LSE after the compiler finds an error. I guess the problem with using other editors is that they may not have the same AREXX commands for moving to a specific line or whatever. >- Why did the size of Manx Aztec increase with 70K to 150K when going > from version 3.6 to 5.0? This seems an indication that the compiler > code is growing too large (like the Apple Finder 7.0!). So, please > reduce the compiler size of lc1, lc2 and cc. >- The compiling process can be done faster (see Turbo-C/ST 2.0). Don't know about these two, though the compiler tends to increase in speed with each release. >- The libraries of Lattice and Manx should be reworked. This is a bit annoying, both in terms of disk space and getting the command line straight. Not being a compiler guru I don't know how easy it would be for them to fix this; if they ever got around to integrating the compiler and the linker, they might find a solution. In the interim, the Intuitionized interface to the compiler will keep track of all the libraries for you. >- Why not putting all variables and arguments in registers? If > stack args are really needed (eg varargs), we can use __stdargs or > something comparable. This is just a matter of compiling with the -rr option and linking with the right library. > Please use always a small data model. If a table > becomes too large, the compiler should notice this and change the > adressing of this (and only this) table to a large model. In version 5.10, LC1 defaults to "near" on everything; if it runs out of room, it starts making things "far". (I assume this is what you mean by "small data model".) Also, blink has a SMALLDATA option for merging all the near data into one hunk. >- Last but not least, the library routines should be as fast as possible. > Now, I have sometimes to bother if the used library routine is fast > enough (memcpy). Don't know about this. In the interim, of course, you can use the Exec routine CopyMemQuick(). There are many improvements to be made to SAS C, but I think they're getting a little better. I do hope they add more to their Intuitionized interface, and integrate it with the makefiles (right now you can make it so it only compiles files that have been recently changed, but it won't take dependencies into account). nj
dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/13/90)
>In article <1149@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes: >Over the years I have been intrigued by the code generated by >different C compilers, and have been comparing Lattice C code with >Aztec C. From the first it always seemed that Lattice performed more >optimizations but that Aztec did better simply because of better code >generation. Nowadays, they seem to be much closer, producing >reasonable code with simple optimizations - but there is a lot of >room for improvement. > >Recently I have been comparing Lattice C 5.04, Aztec C 5.0, DICE 2.02 >and PDC 3.34 using several benchmarks. On disassembling the > ... >for this loop (with all optimizations on). > > LATTICE/AZTEC DICE PDC > > MOVE.W I,K MOVE.W I,D0 EXT.L I > ADD.W I,K EXT.L D0 EXT.L I > MOVE,W K,D1 MOVE.L I,D0 > EXT.L D1 ADD.L I,D0 > ADD.L D0,D1 MOVE.L D0,K > MOVE.W D1,D3 > BRA.B IN BRA.B IN > >LOOP LEA f(A4),A0 LOOP LEA f(A4),A0 LOOP CMPI.L #8190,K > CLR.B 0(A0,K.W) ADDA.W K,A0 BGT.B OUT > MOVE.B #0,(A0) LEA f(A4),A0 > ADD.W I,K ADD.W I,K ADDA.L K,A0 >IN CMPI.W #8190,K IN CMPI.W #8190,K CLR.B (A0) > BLE.B LOOP BLE.B LOOP ADD.L I,K > BRA.B LOOP > OUT ... > >I calculated the total 68000 clock cycles for the inner loop >(excluding initialization) to be: Lattice 48, Aztec 48, DICE 50 and Neat! BTW, DICE now optimizes short adds when the result is also a short, the initialization part of the loop generates: move.w D0,D1 add.w D0,D1 bra IN It also optimizes other arithmatic and logical operations that act entirely on shorts (DICE is a 32bit-int compiler only, BTW. In the above code was Aztec and Lattice run in 32bit-int modes? Probably, but just wondering...). As far as the inner loop goes, I'm kind of proud of DICE in that it does a pretty good job without any real optimization at all. Lattice and Aztec could actually get more speed out of their code if they did not use CLR. CLR always reads the location before writing a 0. PDC looks like it needs a lot of work. >Andrew. >-- >Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712 -Matt -- Matthew Dillon dillon@Overload.Berkeley.CA.US 891 Regal Rd. uunet.uu.net!overload!dillon Berkeley, Ca. 94708 USA
markv@kuhub.cc.ukans.edu (11/14/90)
Dont forget about SAS/Lattice's support for __builtin functions like memcpy, memset, etc that use inline code rather than function calls. (By flipping the compiler switch for processor you can also get such loops to use DBxx loops for 68010 and 32 bit instructions for 68020). -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mark Gooderum Only... \ Good Cheer !!! Academic Computing Services /// \___________________________ University of Kansas /// /| __ _ Bix: markgood \\\ /// /__| |\/| | | _ /_\ makes it Bitnet: MARKV@UKANVAX \/\/ / | | | | |__| / \ possible... Internet: markv@kuhub.cc.ukans.edu ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ben@epmooch.UUCP (Rev. Ben A. Mesander) (11/14/90)
>In article <1149@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes: >Over the years I have been intrigued by the code generated by >different C compilers, and have been comparing Lattice C code with >Aztec C. From the first it always seemed that Lattice performed more >optimizations but that Aztec did better simply because of better code >generation. Nowadays, they seem to be much closer, producing >reasonable code with simple optimizations - but there is a lot of >room for improvement. Fascinating! Here's the output that the GCC port I'm working on produces when the following C code is compiled: main() { register short i, k; int flags[8190]; i=10; for (k = i+i; k <= 8190; k += i) flags[k] = 0; } #NO_APP gcc_compiled.: .text .even .globl _main _main: link a6,#-32760 movel d2,sp@- moveq #10,d2 moveq #20,d1 L5: movew d1,d0 extl d0 asll #2,d0 lea a6@(0,d0:l),a0 clrl a0@(-32760) addw d2,d1 cmpw #8190,d1 jle L5 movel a6@(-32764),d2 unlk a6 rts (Note that GCC uses a different assembler language format than Amigans usually use - however, it doesn't look _too_ different) Before anyone gets excited, I can produce this sort of assembler code, but I can't do anything with it yet. No, I'm not the author of the port either, I just be an alpha-tester. Andrew, if you posted your entire test program, I'll compile it for you so that results are directly comparable to the Lattice or Aztec runs. The following cretinous invocation of the GNU compiler was used (the front-end doesn't work right yet...) Optimization is turned on. /cpp test2.c -I include:/compiler_headers | /cc1 -O -m68000 -msoft-fload -o test2.s >Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712 -- | ben@epmooch.UUCP (Ben Mesander) | "Cash is more important than | | ben%servalan.UUCP@uokmax.ecn.uoknor.edu | your mother." - Al Shugart, | | !chinet!uokmax!servalan!epmooch!ben | CEO, Seagate Technologies |
dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/15/90)
In article <26893.273fe96d@kuhub.cc.ukans.edu> markv@kuhub.cc.ukans.edu writes: >Dont forget about SAS/Lattice's support for __builtin functions like >memcpy, memset, etc that use inline code rather than function calls. >(By flipping the compiler switch for processor you can also get such >loops to use DBxx loops for 68010 and 32 bit instructions for 68020). Well, actually, while the built-in stuff is cute it is also pretty useless in most cases. For example, the code for a 'full' version of setmem()/memset(), movmem()/memmov(), etc.... is pretty big, but also can be a hell of a lot faster (using MOVEM's or at least long ops instead of char ops). I think the only real builtin function that is useful is, maybe, strlen(). This applies to all processors since a DBxx loop using a BYTE transfer size is still a BYTE transfer loop, even if all the instructions are cached. The DBxx loops are nothing more than a simple optimization in my book, though one that DICE does not currently do. Frankly, I just do not see any advantage and it can be *really* confusing. >-- -Matt Matthew Dillon dillon@Overload.Berkeley.CA.US 891 Regal Rd. uunet.uu.net!overload!dillon Berkeley, Ca. 94708 USA
cedman@golem.ps.uci.edu (Carl Edman) (11/15/90)
In article <dillon.7256@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: In article <26893.273fe96d@kuhub.cc.ukans.edu> markv@kuhub.cc.ukans.edu writes: >Dont forget about SAS/Lattice's support for __builtin functions like >memcpy, memset, etc that use inline code rather than function calls. >(By flipping the compiler switch for processor you can also get such >loops to use DBxx loops for 68010 and 32 bit instructions for 68020). Well, actually, while the built-in stuff is cute it is also pretty useless in most cases. For example, the code for a 'full' version of setmem()/memset(), movmem()/memmov(), etc.... is pretty big, but also can be a hell of a lot faster (using MOVEM's or at least long ops instead of char ops). I think the only real builtin function that is useful is, maybe, strlen(). This applies to all processors since a DBxx loop using a BYTE transfer size is still a BYTE transfer loop, even if all the instructions are cached. The DBxx loops are nothing more than a simple optimization in my book, though one that DICE does not currently do. Frankly, I just do not see any advantage and it can be *really* confusing. That e.g. memmove() functions which are really optimal are quite large might be true. But most of that complexity results from an analysis of the parameters and choosing the corresponding algorithm to deal optimally with these parameters (e.g. overlapping/non-overlapping memory areas, odd/word-even/long-word addresses/lengths, downward/upward copy, large arrays/small arrays a.s.o.). Each combination of these parameters requires a different routine to be optimal. So the code which analyses the parameters and the different codes for different parameter sets make up most of the code. But now imagine a C compiler which does the parameter analysis (as far as possible) at run time and only inserts the 'correct' routine for these parameter sets. I think you will have to admit that in this case you could have significant speedups and space savings. Carl Edman Theorectical Physicist,N.:A physicist whose | Send mail existence is postulated, to make the numbers | to balance but who is never actually observed | cedman@golem.ps.uci.edu in the laboratory. | edmanc@uciph0.ps.uci.edu
jeh@sisd.kodak.com (Ed Hanway) (11/16/90)
dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: > Well, actually, while the built-in stuff is cute it is also pretty > useless in most cases. For example, the code for a 'full' version of > setmem()/memset(), movmem()/memmov(), etc.... is pretty big, but also > can be a hell of a lot faster (using MOVEM's or at least long ops > instead of char ops). I think the only real builtin function that > is useful is, maybe, strlen(). __builtin_strlen() is definitely useful. SAS/C evaluates strlen("string constant") at compile time. I don't know how good the builtin versions of the other functions are, but if the compiler knows anything about arguments to the function at compile time, it can use different versions of the move/set code for small/large size, aligned/unaligned, etc. If no information is available, it can always call the general routine. Also, because the compiler knows about any side-effects of the builtin, it can optimize the code around the builtin more than if it was just an arbitrary function call. -- Ed Hanway --- uunet!sisd!jeh Some of the trademarks mentioned in this product are for identification purposes only. All models are over 18 years of age.
micke@slaka.sirius.se (Mikael Karlsson) (11/16/90)
In message <1990Nov12.164804.5490@agate.berkeley.edu>, nj@teak.berkeley.edu writes: >> The editor can possibly be integrated using ARexx so that a user >> can decide which editor to use. LSE goes a bit in this direction >> but since I like to use Cygnus Ed 2.0, this is a problem. > >Version 5.10 comes with a program that will let you invoke CED instead of >LSE after the compiler finds an error. I guess the problem with using >other editors is that they may not have the same AREXX commands for >moving to a specific line or whatever. The readme file for this program tells you to add the switch that turns off ANSI-sequences in error messages. I can't find any mention of this switch in the documentation. Can anybody tell me what it looks like? Thanks.
Kevin Morwood <EETY1478@Ryerson.CA> (11/16/90)
Actually one definite problem I've had with __builtins is that when you want to do something like: qsort(&base,num,size,strcmp); The compiler tries to inline the reference to strcmp and then bitches rather profusely cause it doesn't like it. Then you have to undefine the builtin at the point of the qsort call followed by redefining it afterward (if you want to get the benefit of ANY of the __builtin availability). Generally...unimpressed.
dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/18/90)
In article <1990Nov15.170810.5868@sisd.kodak.com> jeh@sisd.kodak.com (Ed Hanway) writes: >dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: >> Well, actually, while the built-in stuff is cute it is also pretty >> useless in most cases. For example, the code for a 'full' version of >> setmem()/memset(), movmem()/memmov(), etc.... is pretty big, but also >> can be a hell of a lot faster (using MOVEM's or at least long ops >> instead of char ops). I think the only real builtin function that >> is useful is, maybe, strlen(). > >__builtin_strlen() is definitely useful. SAS/C evaluates >strlen("string constant") at compile time. I don't know how good the >builtin versions of the other functions are, but if the compiler knows >anything about arguments to the function at compile time, it can >use different versions of the move/set code for small/large size, >aligned/unaligned, etc. If no information is available, it can always >call the general routine. Uh, I NEVER use strlen() on a string constant. That's the most ridiculous thing I've ever heard of in my life! I use (sizeof("string-constant") - 1). And, if you are worried about things looking 'neat', simply write a little preprocessor macro to do it. No, I was thinking strlen() is useful as a builtin function -- one of the few -- because it takes just a little more code than the equivalent call (push/jsr/addq.l#4,sp) would take. >Also, because the compiler knows about any side-effects of the builtin, it >can optimize the code around the builtin more than if it was just an arbitrary >function call. About the only thing the compiler can optimize are the stupid mistakes either it or the programmer makes ... useless, because only relatively good programmers are worried about such trivial optimizations and they do not make the mistakes in the first place. Many optimizations fall into that category .. reading the description makes you feel good because your compiler is 'optimizing' but the reality is that they do not do a pittling thing. Inexperienced programmers do not code well enough for them to make much of a difference, and experienced programmers code well enough that they do not make much of a difference either. Of course, there are many, many optimizations that *do* do major good things, but a small plethora of 'built in' functions is not one of them. Really a huge waste of time; As far as I can, lattice would have spent their time better on other optimizations. -Matt >-- >Ed Hanway --- uunet!sisd!jeh >Some of the trademarks mentioned in this product are for identification >purposes only. All models are over 18 years of age. -- Matthew Dillon dillon@Overload.Berkeley.CA.US 891 Regal Rd. uunet.uu.net!overload!dillon Berkeley, Ca. 94708 USA
dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/18/90)
In article <CEDMAN.90Nov14221934@lynx.ps.uci.edu> cedman@golem.ps.uci.edu (Carl Edman) writes: > >That e.g. memmove() functions which are really optimal are quite large >might be true. But most of that complexity results from an analysis >of the parameters and choosing the corresponding algorithm to deal >... >make up most of the code. But now imagine a C compiler which does >the parameter analysis (as far as possible) at run time and only >inserts the 'correct' routine for these parameter sets. > >I think you will have to admit that in this case you could have significant >speedups and space savings. > > Carl Edman -- Uh huh, right. movmem(s, d, len) Now, unless all the parameters are basically globals of known alignment AND the 'length' is a constant, the compiler will not be able to make any major assumptions about the copy. Basically, a movmem would have to use the same operational guidelines as a structural assignment for the compiler to be able to make any assumptions, and then you might as well use a structure / structure assignment rather than a movmem. Unless you can find several optimizable examples that would be *WIDELY* used in random source code (at least as a percentage of movmem()s in the source code), my opinion will not change :-) -Matt Matthew Dillon dillon@Overload.Berkeley.CA.US 891 Regal Rd. uunet.uu.net!overload!dillon Berkeley, Ca. 94708 USA
limonce@pilot.njin.net (Tom Limoncelli) (11/19/90)
In article <dillon.7260@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: > In article <1990Nov15.170810.5868@sisd.kodak.com> jeh@sisd.kodak.com (Ed Hanway) writes: > Many optimizations fall into that category .. reading the description > makes you feel good because your compiler is 'optimizing' but the > reality is that they do not do a pittling thing. Inexperienced > programmers do not code well enough for them to make much of a > difference, and experienced programmers code well enough that they do > not make much of a difference either. Of course, there are many, many I write a lot of code that must look like a non-"experienced programmer" wrote. I do this on purpose because (1) I feel it is easier to read (2) I assume that it's syntatical sugar that I assume the compiler will (internally) be re-written into the "experienced" form. I guess it's not "macho" to write readable, maintainable code (just kidding folks, if you want to re-ignite that useless flame war take it to comp.misc!). :-) :-) Anything a compiler company can do to encourage programmers to create maintainable code should be encouraged. (Now that's a sentence!) Does that sway you? Tom P.S. Obligatory ungrateful user question: I own the non-shareware version of DICE... so Matt, when will there be a debugger? :-) -- tlimonce@drew.edu Tom Limoncelli "Flash! Flash! I love you! tlimonce@drew.bitnet +1 201 408 5389 ...but we only have fourteen tlimonce@drew.uucp limonce@pilot.njin.net hours to save the earth!"
jeh@sisd.kodak.com (Ed Hanway) (11/19/90)
dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: > Uh, I NEVER use strlen() on a string constant. That's the most > ridiculous thing I've ever heard of in my life! > > I use (sizeof("string-constant") - 1). And, if you are worried about > things looking 'neat', simply write a little preprocessor macro to do > it. I'd never use strlen("constant") either, unless I knew that it was evaluated at compile time. I have used (albeit in a toy program): #define SAY(s) Write(backstdout, s, strlen(s)) which works for both SAY("const") and SAY(var). > ... Of course, there are many, many > optimizations that *do* do major good things, but a small plethora of > 'built in' functions is not one of them. Really a huge waste of time; > As far as I can, lattice would have spent their time better on other > optimizations. I tend to agree that builtin functions _by themselves_ are not much good, but in combination with a good optimizer I think that they are worthwhile, at the very least because arguments wouldn't need to be stuffed into specific registers for the function call. -- Ed Hanway --- uunet!sisd!jeh Must be 18 or older to play. Prerecorded for this time zone. Do not read while operating a motor vehicle or heavy equipment.
bruce@zuhause.MN.ORG (Bruce Albrecht) (11/20/90)
>In article <dillon.7260@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: >In article <1990Nov15.170810.5868@sisd.kodak.com> jeh@sisd.kodak.com (Ed Hanway) writes: >>__builtin_strlen() is definitely useful. SAS/C evaluates >>strlen("string constant") at compile time. I don't know how good the >>builtin versions of the other functions are, but if the compiler knows >>anything about arguments to the function at compile time, it can >>use different versions of the move/set code for small/large size, >>aligned/unaligned, etc. If no information is available, it can always >>call the general routine. > > Uh, I NEVER use strlen() on a string constant. That's the most > ridiculous thing I've ever heard of in my life! > > I use (sizeof("string-constant") - 1). And, if you are worried about > things looking 'neat', simply write a little preprocessor macro to do > it. If the string constant is created via #define, it's probably not a good idea to use sizeof() to get its length. The #define could later be replaced by a char array, and the sizeof() would not produce correct result if the actual string length was smaller than the size of the array. The sizeof would still be syntatically correct, and possibly difficult to locate. -- bruce@zuhause.mn.org
GIAMPAL@auvm.auvm.edu (11/21/90)
In article <1990Nov19.130657.19380@sisd.kodak.com>, jeh@sisd.kodak.com (Ed Hanway) says: >dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: >> Uh, I NEVER use strlen() on a string constant. That's the most >> ridiculous thing I've ever heard of in my life! >I'd never use strlen("constant") either, unless I knew that it was >evaluated at compile time. I have used (albeit in a toy program): > >#define SAY(s) Write(backstdout, s, strlen(s)) > >which works for both SAY("const") and SAY(var). Yes this does work, but if the string is a constant, then you get a duplicate copy of the string put in your data segment. I use : #define MSG(s) { char *s; Write(Output(), s, strlen(s)); } (note: those are supposed to be curly braces, but this is an APL keyboard, so I don't know what it will be on your screen) This way you only get one copy of the string constant, and you get a nice function. BTW, don't use MSG() if you are running from WB with no output file handle, you'll hang (a definite bug in WB, IMHO). --dominic
jeh@sisd.kodak.com (Ed Hanway) (11/21/90)
GIAMPAL@auvm.auvm.edu writes: >In article <1990Nov19.130657.19380@sisd.kodak.com>, jeh@sisd.kodak.com (Ed >Hanway) says: >>#define SAY(s) Write(backstdout, s, strlen(s)) >#define MSG(s) { char *s; Write(Output(), s, strlen(s)); } > >This way you only get one copy of the string constant, and you get a nice >function. I guess you really mean #define MSG(s) { char *tmp = s; Write(whatever, tmp, strlen(tmp)); } which is fine, but my version was posted as an example of when a builtin version of strlen() came in handy. In Lattice (now SAS) C, strlen("constant") is evaluated at compile time, so, using my version, SAY("foo") would compile as Write(backstdout, "foo", 3). This not only eliminates the extra copy of the string constant, it eliminates the strlen() operation altogether. -- Ed Hanway --- uunet!sisd!jeh This message is packed as full as practicable by modern automated equipment. Contents may settle during shipment.
mwm@raven.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) (11/22/90)
In article <90324.204949GIAMPAL@auvm.auvm.edu> GIAMPAL@auvm.auvm.edu writes: In article <1990Nov19.130657.19380@sisd.kodak.com>, jeh@sisd.kodak.com (Ed Hanway) says: >I'd never use strlen("constant") either, unless I knew that it was >evaluated at compile time. I have used (albeit in a toy program): > >#define SAY(s) Write(backstdout, s, strlen(s)) > >which works for both SAY("const") and SAY(var). Yes this does work, but if the string is a constant, then you get a duplicate copy of the string put in your data segment. I use : #define MSG(s) { char *s; Write(Output(), s, strlen(s)); } So use the flag on your compiler that forces identical string constants into the same string. That way, you don't have the extraneous variables (and manipulations thereof) in your code. Or take the approach of one programmer I knew - he never coded string constants, except for the place where he defined a variable to point to them. But the compiler can do that for you. <mike -- Cats will be cats and cats will be cool Mike Meyer Cats can be callous and cats can be cruel mwm@relay.pa.dec.com Cats will be cats, remember this words! decwrl!mwm Cats will be cats and cats eat birds.
andrew@teslab.lab.OZ (Andrew Phillips) (11/22/90)
In article <dillon.7176@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: >>In article <1149@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes: > In the above code was Aztec and Lattice run in 32bit-int modes? > Probably, but just wondering...). They used 32 bit ints (the default). But this wouldn't matter, would it since the program only used shorts (16 bits) not ints. BTW I used no compiler options except to turn on maximum optimizations. With no command line options at all (i.e. all defaults) DICE did better than both Lattice and Aztec. > As far as the inner loop goes, I'm kind of proud of DICE in that it > does a pretty good job without any real optimization at all. Lattice > and Aztec could actually get more speed out of their code if they did > not use CLR. CLR always reads the location before writing a 0. According to my interpretation of the Motorola M68000 Microprocessor User's Manual (8th edition) pages 8-2 and 8-6 both instructions (i.e. CLR.B 0(A0,D0.W) and MOVE.B #0,0(A0,D0.W) ) take 18 clock cycles. Of course the CLR instruction is never BETTER than the equivalent MOVE #0. Andrew. -- Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712
dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) (11/22/90)
In article <1990Nov21.131206.2634@sisd.kodak.com> jeh@sisd.kodak.com (Ed Hanway) writes: >GIAMPAL@auvm.auvm.edu writes: >>In article <1990Nov19.130657.19380@sisd.kodak.com>, jeh@sisd.kodak.com (Ed >>Hanway) says: >>>#define SAY(s) Write(backstdout, s, strlen(s)) >>#define MSG(s) { char *s; Write(Output(), s, strlen(s)); } >>This way you only get one copy of the string constant, and you get a nice >>function. >I guess you really mean >#define MSG(s) { char *tmp = s; Write(whatever, tmp, strlen(tmp)); } >which is fine, but my version was posted as an example of when a builtin >version of strlen() came in handy. In Lattice (now SAS) C, strlen("constant") >is evaluated at compile time, so, using my version, SAY("foo") would >compile as Write(backstdout, "foo", 3). Our compiler (for Unix, not for my Amiga) allows 'sizeof("constant")' which is eliminated at compile time. This obviously doesn't work for dynamic strings. -- _ _ / U | Dolf Grunbauer Tel: +31 55 433233 Internet dolf@idca.tds.philips.nl /__'< Philips Information Systems UUCP ...!mcsun!philapd!dolf 88 |_\ If you are granted one wish do you know what to wish for right now ?
jmeissen@oregon.oacis.org ( Staff OACIS) (11/23/90)
In article <90324.204949GIAMPAL@auvm.auvm.edu> GIAMPAL@auvm.auvm.edu writes: >>#define SAY(s) Write(backstdout, s, strlen(s)) >>which works for both SAY("const") and SAY(var). >Yes this does work, but if the string is a constant, then you get a duplicate >copy of the string put in your data segment. I use : Not if you are using SAS/Lattice :-) The Lattice compiler has an option that will cause the compiler to only generate a single copy of duplicate string constants (should be the default, IMHO. Isn't because of Unix wierdos who modify string constants).
dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/27/90)
In article <90324.204949GIAMPAL@auvm.auvm.edu> GIAMPAL@auvm.auvm.edu writes: > >>I'd never use strlen("constant") either, unless I knew that it was >>evaluated at compile time. I have used (albeit in a toy program): >> >>#define SAY(s) Write(backstdout, s, strlen(s)) >> >>which works for both SAY("const") and SAY(var). >Yes this does work, but if the string is a constant, then you get a duplicate >copy of the string put in your data segment. I use : > >#define MSG(s) { char *s; Write(Output(), s, strlen(s)); } > >(note: those are supposed to be curly braces, but this is an APL keyboard, > so I don't know what it will be on your screen) > > >This way you only get one copy of the string constant, and you get a nice >function. BTW, don't use MSG() if you are running from WB with no >output file handle, you'll hang (a definite bug in WB, IMHO). > >--dominic I generally do this: Say(s) char *s; { return(Write(Output(), s, strlen(s))); } Or the equivalent, which takes much less room (in terms of code size). Also, Write() and Output() take so much overhead that the difference in execution speed of the subroutine, Say(), verses the SAY macro would not be noticable. builtins are generally useless and I sometimes wonder if Lattice added them simply to optimize inefficiencies in their own source code. -Matt -- Matthew Dillon dillon@Overload.Berkeley.CA.US 891 Regal Rd. uunet.uu.net!overload!dillon Berkeley, Ca. 94708 USA
dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/27/90)
In article <530@ssp9.idca.tds.philips.nl> dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) writes: > >Our compiler (for Unix, not for my Amiga) allows 'sizeof("constant")' >which is eliminated at compile time. This obviously doesn't work for dynamic >strings. >-- > _ _ > / U | Dolf Grunbauer Tel: +31 55 433233 Internet dolf@idca.tds.philips.nl > /__'< Philips Information Systems UUCP ...!mcsun!philapd!dolf >88 |_\ If you are granted one wish do you know what to wish for right now ? -- Actually, *all* compilers do sizeof("constant") at compile time. Unfortunately, many also declare storage for the string even though it is never referenced. That has always amused me. Perhaps you were talking about strlen("constant"); ?? This whole argument is over builtins and compiler optimization of said. -Matt Matthew Dillon dillon@Overload.Berkeley.CA.US 891 Regal Rd. uunet.uu.net!overload!dillon Berkeley, Ca. 94708 USA
dillon@overload.Berkeley.CA.US (Matthew Dillon) (11/28/90)
In article <1159@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes: >In article <dillon.7176@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: >>>In article <1149@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes: >> In the above code was Aztec and Lattice run in 32bit-int modes? >> Probably, but just wondering...). > >They used 32 bit ints (the default). But this wouldn't matter, would >it since the program only used shorts (16 bits) not ints. Actually, it can matter. A 16 bit compiler does not need to optimize arithmatic expressions at all. A 32 bit compiler such as Lattice / DICE must start out by virtual EXTing the 16 bit quantities to 32 bits and then optimize them back down to 16 bits. There are many expressions that cannot be optimized. It does not matter whether, for 16 bit compilation, one uses 'short' or 'int', they are identical (just as 'int' and 'long' are identical for 32 bit compilers). For example: short a, b; foo (a * b); /* 16 bit argument for 16 bit compiler, 32 bit * argument for 32 bit compiler */ Another good example is array indexing. Most IBM C compilers use 16 bit indexes for array indexing and ignore possible overflows (i.e. you have an array 65536 of short). Turbo C on the IBM is even worse... it uses a 16 bit array index even if you use a long quantity as the index! As far as I know, all Amiga C compilers (Aztec,Lattice,DICE), use 32 bit array indexes when forced to multiply by 2 or more for non-char arrays. Yet another is the question of bit field packing ... do you pack in 16 bit fields when ints are 16 bits, even if your compiler supports 32 bit ints (i.e. disallows bit fields larger than 16 bits wide when in 16 bit mode?) These are simple examples, of course, but you get the idea. There are many differences between 16 bit and 32 bit implementations. I'm not arguing or anything, I'm rather pleased myself! >BTW I used no compiler options except to turn on maximum >optimizations. With no command line options at all (i.e. all >defaults) DICE did better than both Lattice and Aztec. That is very interesting. One of DICE's big points is that compilation time is at least as fast as Aztec (haven't tested it formally). It is definitely much faster than Lattice. Lattice has always taken a while to do compiles, and with the -O option it goes even slower! Personally, I've never trusted Lattice's -O option having been personally bitten by bugs in previous versions of Lattice C (5.02). I dunno about later versions because I stopped using the option. I have no experience with Aztec's optimization options. >Andrew. >-- >Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712 -Matt -- Matthew Dillon dillon@Overload.Berkeley.CA.US 891 Regal Rd. uunet.uu.net!overload!dillon Berkeley, Ca. 94708 USA
ben@epmooch.UUCP (Rev. Ben A. Mesander) (12/01/90)
>In article <dillon.7352@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: [discussion of various builtins in SAS C] > builtins are generally useless and I sometimes wonder if Lattice added > them simply to optimize inefficiencies in their own source code. > Most of them seem to be rather useless. However, the builtin printf stuff can really cut down code size, because linking with the huge general purpose printf is not necessary. I think that it's probably best to inline calls to strlen and movmem if you have chosen to optimise speed over space. > Matthew Dillon dillon@Overload.Berkeley.CA.US -- | ben@epmooch.UUCP (Ben Mesander) | "Cash is more important than | | ben%servalan.UUCP@uokmax.ecn.uoknor.edu | your mother." - Al Shugart, | | !chinet!uokmax!servalan!epmooch!ben | CEO, Seagate Technologies |