doug@terak.UUCP (Doug Pardee) (04/04/85)
I continue to receive mail on the "assembly versus high-level language" issue. Many of my correspondents claim that a good optimizing compiler can produce code as good as an assembler programmer can. I obviously disagree. If we are talking about "mediocre" programmers, okay, I'll accept that a mediocre C programmer with an optimizing compiler can produce as good of code as a mediocre assembler programmer. But if we're talking about sharp programmers (like me :-), there is no way that I can get a C compiler to produce as good of code as I can write in assembler. I decided to run a (not too badly rigged) experiment. I took a Pascal program and rewrote it into both C and assembler for the NS32016 chip. The program I chose as the guinea pig is the ol' classroom favorite, the Ackerman function. This is a very recursive function, with little actual computation. I chose it because it was short and I knew that it would give me the best results (from my point of view) in the test. (I did admit to a bit of rigging...) It took me about five times as long to write the assembler version as the C version (no surprise). My goal was to include as many refinements as possible in the assembler version that were flat-out impossible to code in C. There were three major areas of attack. First, the 32000 series *hates* to branch. So I wrote the main recursive loop to be totally straight-line code. I also put all branch labels on word boundaries; that cuts down the branching penalty. Secondly, the instructions were carefully chosen based on the published 32016 timing charts. And most importantly, the "call" overhead was decimated by passing parameters in registers instead of on the stack, stacking only the one variable that would be needed later, and by using the 32000's simple call instruction instead of the "do-all" instruction that C has to use. And the results?? The optimized C program did not run faster than the assembler program. It did not run as fast as the assembler. It didn't take only 10%, or only 25% longer. It couldn't finish given even *twice* as much time. In fact, it took more than five times as long as the assembler version. The unoptimized C version took six times as long as the assembler version. And both C versions required 6 times as much memory for stack space as the assembler program did. I don't claim that the factor of 5 or 6 is at all representative. I did choose Ackerman because I knew that calling and branching are the biggest chinks in the armor, and that's about *all* that Ackerman does. But the point is still valid: there is no way that an optimizing compiler can equal a sharp assembler programmer when code efficiency is crucial. -- Doug Pardee -- Terak Corp. -- !{hao,ihnp4,decvax}!noao!terak!doug
ksbszabo@wateng.UUCP (Kevin Szabo) (04/07/85)
All this chatter about Assembler Vs H.L.L. is silly. Doesn't anybody read `Programming Pearls' by Jon Bentley in CACM? You drop into assembler only when you absolutely HAVE to. But first, you get the program right. Once you have it correct you leave the correct code in as comments, and you spend some time on the most time critical portion. Before you drop into assembler though I hope you have a) examined your algorithm, and b) profiled your code. You wouldn't want to waste all that time you are going to spend trying to make it fast! Doug spent 5 times as much time coding the algorithm so it was fast. He could very easily made a small error. The asm coding forces you to think about the implementation, not what you are really trying to do, which is solve a problem. A fast program that gives the wrong answers is useless. Kevin -- Kevin Szabo watmath!wateng!ksbszabo (U of Waterloo VLSI Group, Waterloo Ont.)
doug@terak.UUCP (Doug Pardee) (04/08/85)
My original posting attacked the popular notion that there exists in some Nirvana an optimizing compiler which produces code as good as expertly coded assembler. The response was on a different subject: > You drop into assembler only when you absolutely HAVE to. > ... > Before you drop into assembler though I hope you have > a) examined your algorithm, and b) profiled your code. The Unix community apparently disapproves of such practices, since there is no profiling program in the "toolbox" that is part of 4.2BSD. I dunno about System V. > Doug spent 5 times as much time coding the algorithm so > it was fast. He could very easily made a small error. Heresy! I'll have you know that I live in the desert so that no one notices that my shoes never touch the sidewalk when I take a walk in the rain :-) -- Doug Pardee -- Terak Corp. -- !{hao,ihnp4,decvax}!noao!terak!doug
brownc@utah-cs.UUCP (Eric C. Brown) (04/09/85)
In article <492@terak.UUCP> doug@terak.UUCP (Doug Pardee) writes: >My original posting attacked the popular notion that there exists in >some Nirvana an optimizing compiler which produces code as good as >expertly coded assembler. Again, I would suggest that you look at BLISS; BLISS produces code that is almost always *smaller* and usually much *faster* than most assembler code produced by assembler. >The Unix community apparently disapproves of such practices, since there >is no profiling program in the "toolbox" that is part of 4.2BSD. I >dunno about System V. Gee, why don't you look at 'cc -p' and prof? They're in *this* 4.2 BSD installation. Eric C. Brown brownc@utah-cs ...!seismo!utah-cs!brownc
guy@sun.uucp (Guy Harris) (04/10/85)
> The Unix community apparently disapproves of such practices, since there > is no profiling program in the "toolbox" that is part of 4.2BSD. I > dunno about System V. Try looking at PROF(1) in the: V7 4.1BSD 4.2BSD System III System V, Release 1 System V, Release 2 manuals. Then look at GPROF(1) in the 4.2BSD manual, and at PROF(5) in the System V, Release 2 Programmer's Manual. Guy Harris sun!guy
hes@ecsvax.UUCP (Henry Schaffer) (04/10/85)
My BASIC interpreter also produces smaller "code" than does an assembly language program. (Of course there is a 16k run time file!) How can any language (I don't know BLISS) produce a smaller program than does assembler (assuming a competent assembly programmer)? I'm more interested in HOW than in a discussion of how many competent programmers there are. --henry schaffer
laura@utzoo.UUCP (Laura Creighton) (04/11/85)
The great problem with assembler is that once you get the sucker working, you never want to touch it again. The linear sort that was just fine when you were sorting 5 elements is going to be intolerably slow when you search 500,000 elements -- but do you *really* want to write quicksort in (let's pick a horrid example) pdp-8 assembler? I have mostly used assembler, not when I needed something *fast* as when I needed something *real small*. But there it is easy -- write the code in C, produce the assembler and then tighten that. (but not on the pdp-8, though. I don't think there *is* a C compiler for the pdp-8) Laura Creighton utzoo!laura
herbie@watdcsu.UUCP (Herb Chong [DCS]) (04/11/85)
In article <5461@utzoo.UUCP> laura@utzoo.UUCP (Laura Creighton) writes: >I have mostly used assembler, not when I needed something *fast* as >when I needed something *real small*. But there it is easy -- write the >code in C, produce the assembler and then tighten that. (but not on the >pdp-8, though. I don't think there *is* a C compiler for the pdp-8) > >Laura Creighton >utzoo!laura on some systems other than unix, assembler is used because of the special restrictions of the design of the system. in VM/CMS, many system services are awkward to get at directly from a high-level language (poor design) but are very useful. other system services are designed to not be accessible from a high level language. CMS itself occupies a part of the user's address space but is protected against write by ordinary programs. a special area (called the transient area) is reserved for commands that are not part of the kernel but are often used. it's only 8k in size, but there are many compelling reasons to write commands that use that area. i wrote a version of the Boyer-Moore string matching algorithm to fit in 8K, and it contains a shell sort, error handler, and interrupt handler. the most important reasons for making it transient was to make it available to be called by any high-level language program and to use system services not available otherwise. because CMS is in the same address space as a user program, it is possible to use kernel code by branching directly to it instead of doing a system call (SVC) provided certain conditons are met. i do i/o by branching directly to the i/o routines in the kernel. execution time is reduced by 1/3 because i bypass the system call interface. this is a well documented, but seldom used, feature of CMS because of the restrictions on write access to the kernel. it is possible (using our Waterloo C compiler) to write a C program that fits in the transient area, but the minimal library occupies about 4K, leaving 4K for code and variables. using assembler partly depends also upon the coding style used. everyone writes it their own way unless compelled to do otherwise. writing code that is meant to be modified by others requires discipline and consistency. at waterloo, we modify and maintain our own version of VM/SP3 that is sufficiently different from others that it takes getting used to. it's all written in /370 assembler. maintaining it is no more trouble, in my opinion, than 4.2bsd code, not because /370 assembler is so good, but because badly written C code (as most of 4.2bsd is) is worse than well written assembler. miscellaneous ramblings from the keyboard of Herb Chong... I'm user-friendly -- I don't byte, I nybble.... UUCP: {decvax|utzoo|ihnp4|allegra|clyde}!watmath!water!watdcsu!herbie CSNET: herbie%watdcsu@waterloo.csnet ARPA: herbie%watdcsu%waterloo.csnet@csnet-relay.arpa NETNORTH, BITNET, EARN: herbie@watdcs, herbie@watdcsu
brownc@utah-cs.UUCP (Eric C. Brown) (04/12/85)
In article <1002@ecsvax.UUCP> hes@ecsvax.UUCP (Henry Schaffer) writes: >How can any language (I don't know BLISS) produce a smaller program >than does assembler (assuming a competent assembly programmer)? I'm >more interested in HOW than in a discussion of how many >competent programmers there are. >--henry schaffer Well, it does things that any competent programmer could do, such as global data-flow optimization, constant folding, variable life-death analysis, and so forth, but it does it over and over again without a) getting confused or b) breaking the program. The best description of the BLISS-11 compiler is "The Design of an Optimizing Compiler", by Wulf et. al., published by (I believe) American Elsevier. Eric C. Brown brownc@utah-cs ..!seismo!utah-cs!brownc Execute People, not Programs!!
henry@utzoo.UUCP (Henry Spencer) (04/14/85)
> How can any language (I don't know BLISS) produce a smaller program > than does assembler (assuming a competent assembly programmer)? I'm > more interested in HOW than in a discussion of how many > competent programmers there are. Because producing really tight code is an essentially mechanical job, requiring complex bookkeeping and application of many special tricks. Programs are much better at this than human beings. My recollection is that a human assembler programmer could, given a lot of work, get code comparable to that of the Bliss compilers for any given small piece of code. But the amount of work involved is prohibitive for any substantial program. Remember, we are not talking run-of-the-mill assembler programming here, we are talking serious crunch-the-last-bit maximum-use-of-every-register efforts. Mundane compilers generally do not do as well as good human assembler programmers, since the human programmer applies the simpler sorts of optimizations routinely. But real optimizing compilers put more work into every piece of code than a human programmer can afford for anything but the most crucial stuff. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
keithd@cadovax.UUCP (Keith Doyle) (04/18/85)
[...................] > > How can any language (I don't know BLISS) produce a smaller program > > than does assembler (assuming a competent assembly programmer)? I'm > > more interested in HOW than in a discussion of how many > > competent programmers there are. > > Because producing really tight code is an essentially mechanical job, > requiring complex bookkeeping and application of many special tricks. > Programs are much better at this than human beings. My recollection > is that a human assembler programmer could, given a lot of work, get > code comparable to that of the Bliss compilers for any given small > piece of code. But the amount of work involved is prohibitive for any > substantial program. > > Remember, we are not talking run-of-the-mill assembler programming here, > we are talking serious crunch-the-last-bit maximum-use-of-every-register > efforts. Mundane compilers generally do not do as well as good human > assembler programmers, since the human programmer applies the simpler > sorts of optimizations routinely. But real optimizing compilers put > more work into every piece of code than a human programmer can afford > for anything but the most crucial stuff. > -- > Henry Spencer @ U of Toronto Zoology > {allegra,ihnp4,linus,decvax}!utzoo!henry However, size is only one criteria that may exist for a given program. Performance can also be an issue. Size and performance have no direct correlation. In some parts of a program, performance may be the primary issue. Size may be a secondary, but still important issue. Thus a machine language programmer can decide where to use which techniques. This may be an argument for smart compilers that produce really small code that allow for machine coded subroutines that can use table techniques that cost space if it buys appropriate performance. Tradeoffs such as these have to be weighed, and different parts of the same program may require different techniques in order to optimize it for several conditions at once. Keith Doyle # {ucbvax,ihnp4,decvax}!trwrb!cadovax!keithd
henry@utzoo.UUCP (Henry Spencer) (04/22/85)
With regard to speed vs. space in optimizing compilers, it is not unheard-of for optimizing compilers to vary their tradeoffs depending on the code, e.g. optimizing for speed rather than space inside nested loops. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
@RUTGERS.ARPA:Bruce_A._Hamilton.OsbuSouth@Xerox.ARPA (04/24/85)
From: Hamilton.OsbuSouth@Xerox.ARPA ------------ From: Doug Bryan <BRYAN@SU-SIERRA.ARPA> domains where size and speed are very important... are becoming fewer and fewer...readability and understandability become the major qualities of "good code". ------------ From: Keith Doyle {ucbvax,ihnp4,decvax}!trwrb!cadovax!keithd Size and performance have no direct correlation. --------------- This discussion should move to SOFT-ENG. There's a lot of horsefeathers floating around here. First, size and speed will always be very important, except in the most trivial applications. After 30 years or so, I hope we've all learned that EVERY computing resource (like every other human resource) gets used to the max. Especially in today's cut-throat marketplace, all of us except MAYBE Grid and Cray will continue to try to stuff ten people into a VW, so to speak. Size and speed "versus" readability and understandability is a strawman. If code is elegant (i.e. readable and understandable), it will tend to be efficient in both size and space. There's no magic in HOL's. The Xerox Star and Network Services are coded entirely in Mesa (similar to Modula-2), in the million-lines-of-code ballpark, supported by a smattering of special-purpose microcode, and have encountered the whole ball of wax in performance issues: time vs. space tradeoffs, project management questions, how to indoctrinate new hires,... I don't care HOW high-level the language, questions of individual style arise and local conventions must be arbitrated, in any large system. I grant that HOL's give some readability advantage and (with type checking) prevent a lot of dumb mistakes, but they also make it easy for the incompetent to generate hundreds of lines of HOL garbage, instead of dozens of lines of assembler garbage. Nothing substitutes for talent. Size and performance sure ARE correlated. My disk doesn't swap in zero time. Even if you will soon be blessed with a personal workstation with multimegs of RAM that can always hold your entire working set, that only moves the problem to the database servers, bitmap processors, etc., that you interact with. --Bruce
jans@mako.UUCP (Jan Steinman) (04/25/85)
In article <5523@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes: >With regard to speed vs. space in optimizing compilers, it is not >unheard-of for optimizing compilers to vary their tradeoffs depending >on the code, e.g. optimizing for speed rather than space inside nested >loops. This is why Ada has: pragma OPTIMIZE (TIME); -- Tell compiler to make it quick. pragma OPTIMIZE (SPACE); -- Tell compiler to make it small. Too bad nobody is wholeheartedly implementing OPTIMIZE yet. (Appologies in advance if your favorite Ada compiler does an adequate job of this!) Most of the people in the "optimizing compiler camp" forget that the basic mechanisms of a particular language are often slow and/or unwieldy. I will readily admit that a good optimizer can do about as good a job on the internals of a procedure as an assembly coder, but it doesn't have much say in the code needed to invoke the procedure. (I don't know anything about th much discussed BLISS, so I can't comment on it.) most languages get in the way of optimizers by having a "deep" structure. If a program is *designed* for a particular architecture, and is *skillfully* coded in assembly, I have to throw my vote to Doug and the "assembly camp". STANDARD "ASSEMBLY CAMP" COWARDLY DISCLAIMER: Of course, the time and expense of assembly code should only be used when it has been determined that the best compilers available simply do not generate the desired code, based on size and time specifications! (Such as a Smalltalk Virtual Machine!) -- :::::: Jan Steinman Box 1000, MS 61-161 (w)503/685-2843 :::::: :::::: tektronix!tekecs!jans Wilsonville, OR 97070 (h)503/657-7703 ::::::
sth@rayssd.UUCP (05/08/85)
*** REPLACE THIS LINE WITH YOUR MESSAGE ***