gnu@hoptoad.UUCP (04/05/87)
In article <1537@husc6.UUCP>, reiter@endor.harvard.edu (Ehud Reiter) writes: > 2) Simple routines like strcpy should be adjusted to perform well on a > particular architecture (if the microVAX doesn't have a hardware locc > instruction, then is it too much to ask that the run-time library supplied > for the microVAX be changed not to use locc, at least in small and frequently > used routines like strcpy?) It only becomes reasonable to tailor a system for a particular piece of hardware when there are only a small number of variants that run that architecture. In other words, this might have been fine when there was the 780 and the 750 (nobody counted the 730 or MV-1 anyway) but once you have a bunch of models, you just have to make the code straightforward and don't do anything that *really* breaks on some machine. I presume in the Vax case this means mostly avoiding the unimplemented instructions. I worked on an APL system for the IBM 360/370 and just finding out the timings for the 15 or 20 models that could run the code was too much work, let alone figuring out which combination would be best until IBM's next release. (No flames on 15..20, this was in 1973!) (Of course, the same applies to an "architecture" like C/Unix -- write code that's straightforward and doesn't do anything that really breaks anywhere. Super optimizing your C source is kinda hard these days -- are you *sure* it's better to code it this way on the Cray? IBM? DG? DEC? 8080?) It's true that a tailored shared library could give some benefit, but the general problem extends to what code to generate inline, not just in library routines. > 3) Simple routines like strcpy should be recoded in assembler, at least to > the degree of having their procedure prologues simplified, and so that they > use registers which don't have to be restored. > 4) In-line expansion of common (and simple) library routines should be > considered. These should both be done automatically by a good compiler. Compilers that put in large procedure prolog/epilogs and don't simplify them when possible have no excuses. Those that won't use the scratch registers for variables when possible have excuses but newer compilers are beating them -- excuses don't benchmark very well. -- Copyright 1987 John Gilmore; you can redistribute only if your recipients can. (This is an effort to bend Stargate to work with Usenet, not against it.) {sun,ptsfa,lll-crg,ihnp4,ucbvax}!hoptoad!gnu gnu@ingres.berkeley.edu
nelson@ohlone.UUCP (04/06/87)
In article <1959@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes: > It only becomes reasonable to tailor a system for a particular piece of > hardware when there are only a small number of variants that run that > architecture. > [...] > (Of course, the same applies to an "architecture" like C/Unix -- write > code that's straightforward and doesn't do anything that really breaks > anywhere. Super optimizing your C source is kinda hard these days -- > are you *sure* it's better to code it this way on the Cray? IBM? DG? > DEC? 8080?) Right on! Things like a[i++] = *++p; are totally lost on a Cray. Unfortunately, the current Cray C compiler is none too great ... we're working on it! ----------------------- Bron Nelson {ihnp4, lll-lcc}!ohlone!nelson Not the opinions of Cray Research
jda@mas1.UUCP (04/06/87)
In article <1959@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes: > In article <1537@husc6.UUCP>, reiter@endor.harvard.edu (Ehud Reiter) writes: > > 2) Simple routines like strcpy should be adjusted to perform well on a > > particular architecture.... > > It only becomes reasonable to tailor a system for a particular piece of > hardware when there are only a small number of variants that run that > architecture.... > I worked on an APL system for the IBM 360/370 and just finding out the > timings for the 15 or 20 models that could run the code was too much work, > let alone figuring out which combination would be best until IBM's next > release. (No flames on 15..20, this was in 1973!) A simple but common logic flaw in my opinion. Granted that it can require up to 15 or 20 times the effort to support 15 or 20 models, but the issue is whether any such model is worth added support. I can understand a statement like "I'm not going to optimize for the Lemon III Model B since Lemon Computer Corporation hasn't even sold one yet." But John Gilmore seems to be saying: "IBM was selling thousands of machines a month so the only sensible thing was to move my product to a company whose market was so small they wouldn't confuse me with multiple models." Apologies to John -- the problem is likely to be with budgeting misconceptions rather than the technical staff. > > It's true that a tailored shared library could give some benefit, but > the general problem extends to what code to generate inline, not just > in library routines. The user doesn't demand a general solution. He just doesn't like his application running 20 times slower than necessary. The plain fact is that major savings can result from optimizing a few routines (strcpy, ldiv being good examples). > > 3) Simple routines like strcpy should be recoded in assembler, at least to > > the degree of having their procedure prologues simplified, and so that they > > use registers which don't have to be restored. > > These should both be done automatically by a good compiler.... "should be" but not necessarily "is". There are some *really pathetic* compilers out there. From a recent poison remedy pamphlet: Induce vomiting. If necessary show the subject the output of Whitesmiths 68k C compiler. James D. Allen -- opinions not necessarily necessary.
guy@gorodish.UUCP (04/07/87)
>A simple but common logic flaw in my opinion. Granted that it can require >up to 15 or 20 times the effort to support 15 or 20 models, but the issue is >whether any such model is worth added support. Do you know of any major vendor who *does* provide that kind of support? Does DEC provide different versions of the VMS libraries for different VAX models? Does IBM provide different versions of the MVS libraries for different 370 models? >But John Gilmore seems to be saying: "IBM was selling thousands of machines >a month so the only sensible thing was to move my product to a company whose >market was so small they wouldn't confuse me with multiple models." John is obviously NOT saying anything even remotely resembling that. Show me where he said *anything* about *moving* his product from an IBM machine. He's merely saying that it wasn't worth the effort producing N versions of the APL system for N different IBM 370 models. >The user doesn't demand a general solution. He just doesn't like his >application running 20 times slower than necessary. The plain fact is that >major savings can result from optimizing a few routines (strcpy, ldiv being >good examples). OK, show me a plain fact that indicates that, for anybody supplying large volumes of some software package, you can get a 20X speed up by tweaking the code for different models of a line of machines (not "tweaking the code for machines with or without a given hardware option", but "tweaking the code for different models of a line of machines" - e.g., a VAX-11/780, VAX 8600, and VAX 8200, or a 370/168, 3033, 4381, and 3090). >> These should both be done automatically by a good compiler.... > >"should be" but not necessarily "is". There are some *really pathetic* >compilers out there. OK, are the compilers in question "pathetic" or "good"? They can't be both....