billw@navajo.STANFORD.EDU (William E. Westfield) (12/19/86)
Well, Ill admit to doing almost all of my current programming in assembly languages of various sorts (mostly tops20 monitor and utility hacking - it's already in macro, so I don't have much choice. On the other hand, the PDP10, like the PDP11, has an artistically beautiful instruction set. The other thing I program a lot in assembly language is Intel 808x based MSDOS machines...) While higher level languages certainly have portability advantages in many cases (but not all), and there are clear areas where complexity forbids assembly (I hope never to have to write a package doing heavy floating point math in assembler, even on a machine that has floating point instructions!), I'd like to argue against the idea that HLL code is only a "little bit" slower than assembly code. It is true that any compiler can generate code as good or better than a human for straight line math code. A good compiler ought to be able to optimize loops and array accesses and register allocation for complex expressions as well as a human. But where HLL's fall down is when you start to do function and procedure calls. And of course you should do a lot of those in a well structured program. An assembly language programmer has the option of writing "fast" subroutines that pass arguments in registers and don't bother setting up stack frames, and so on. He can make sure that the arguments he is passing in registers are already in the resgisters that they are supposed to be in, and results go where they are ready to be used next. Arguments can be passed by value or by reference, which ever is more convenient, or faster. And of course there are special instructions to be exploited. As an extreme example, which I have actually done, consider writing some kind of parser in a high level language. One of the things you have to do is push and pop tokens from a stack. Great. Almost every popular processor has instructions that do PUSH and POP in hardware. Some processors do overflow and/or underflow detection at the same time. Many processors allow multiple stacks to exist concurrently (multiple stack pointers in registers). In other cases its is perfectly possible to use the hardware CALL/RETURN/etc stack for user values too. So what happens when you try to do this in PASCAL, huh? Well, you write functions called PUSH and POP, and your arguments get put on a stack frame on the hardware stack, and you do a call instruction which puts the return addresses on a stack and goes off to code that probably fetches the value of stack pointer from memory and puts it back, and does a return instruction that fetches the return address back, and then you clean up the stack frame and continue on. Wow, that was only 10 or 20 times slower than the hardware instuction. Boy am I glad that I was able to write that in a high level language with error checking and things like that instead of having to remember the strange mnemonic for the PUSH instruction (gee, it could have been something really nasty like MOVW source,A2@-). That was a frustrating Pascal program to write. [OK, ill alllow that in C this might end up written as a macro doing something like *sp++ = &source that would have a good chance of being optimized to a single instruction too, but this is not specifically a C vs Assembler debate...] Sigh. BillW, writing small, fast, programs, in assembler, and happy.
amos@instable.UUCP (Amos Shapir) (12/19/86)
In article <1233@navajo.STANFORD.EDU> billw@navajo.STANFORD.EDU (William E. Westfield) writes: >... I'd like to argue against the idea that HLL code >is only a "little bit" slower than assembly code. > > [describing a complicated way to do stack ops in PASCAL subroutines, and the compliacted code that is generated] What makes you think that a *good* optimizing compilers can't unroll function calls, put them in-line, and optimize them to a point that comes very close to the intended machine instruction? Any logic that you can put into the programming process, can be translated to an algorithm and done by tne compiler. It only takes a lot of CPU power and memory (in compile time only!) and these become cheaper all the time. (I know most of you still cant use such compilers, but they also become cheaper and more available - no flames on that, please) >BillW, writing small, fast, programs, in assembler, and happy. Amos Shapir, writing small, fast, programs, in C, and happy - much happier than I'd be if I had to write them in assembler! -- Amos Shapir National Semiconductor (Israel) 6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel (011-972) 52-522261 amos%nsta@nsc 34.48'E 32.10'N
shebs@utah-cs.UUCP (Stanley Shebs) (12/19/86)
In article <1233@navajo.STANFORD.EDU> billw@navajo.STANFORD.EDU (William E. Westfield) writes: >But where HLL's fall down is when you >start to do function and procedure calls. And of course you should do >a lot of those in a well structured program. Can you say "interprocedural analysis"? A really hot compiler would flush every subroutine call that it possibly could. Disadvantages are that you must have both the source code for caller and callee available to mess with, and the resulting code is harder to debug at the object level (no stack frames to look at). >An assembly language programmer has the option of writing "fast" >subroutines that pass arguments in registers and don't bother setting >up stack frames, and so on. Some language implementations (for instance Portable Standard Lisp) *do* pass arguments in registers. But of course since a function may be called before it is defined, the calling protocol has to be more conservative and save registers in non-tail-recursive cases. >... One of the things you >have to do is push and pop tokens from a stack. Great. Almost every >popular processor has instructions that do PUSH and POP in hardware. >[... various hardware features] >So what happens when you try to do this in PASCAL, huh? >Well, you write functions called PUSH and POP, and your arguments get >put on a stack frame on the hardware stack, and you do a call >instruction which puts the return addresses on a stack and goes off to >code that probably fetches the value of stack pointer from memory and >puts it back, and does a return instruction that fetches the return >address back, and then you clean up the stack frame and continue on. The first right answer is "don't use Pascal, use a high-level language". The second right answer has already been mentioned by others, and that is to "fix the compiler". You need a sufficiently high-level language (like Lisp) so that you can write the push and pop operations as normal functions. For prototyping, you can use a dumb implementation, then when it's time to optimize, you hack the compiler to opencode the function calls into an appropriate sequence of instructions. Of course, hacking the compiler may not be easy! On the other hand, some compilers (such as the PSL and PCLS compilers) are highly declarative, and it is not unusually hard to add optimizations. At the moment, you still have to know quite a bit about how the system works internally, and we've been working on ways to make optimizations even easier to add/modify. Ultimately, we want a compiler that is completely open and accessible to programmers. >BillW, writing small, fast, programs, in assembler, and happy. stan shebs (orion.utah.edu)
rentsch@unc.UUCP (Tim Rentsch) (12/24/86)
In some article William E. Westfield writes: > It is true that any compiler can generate code as good or better than a > human for straight line math code. A good compiler ought to be able to > optimize loops and array accesses and register allocation for complex > expressions as well as a human. But where HLL's fall down is when you > start to do function and procedure calls. And of course you should do > a lot of those in a well structured program. It seems to me that a well-structured programming style argues in favor of a HLL, not against it. The reason is that a provision for "inline" procedure calls supports a highly procedurized style, while not incurring the procedure call overhead (at runtime). Contrast with assembly where your write 'PUSHJ' (or whatever) to call a subroutine. Oh, I know all about macro assemblers, but macro processors are never quite transparent, and then you have two *different* mechanisms (with peculiar interactions). With inlineable procedures, on the other hand, there is uniformity of syntax, identical semantics, strong type checking, all those other good things, AND no procedure call overhead. Just like having your cake and eating it, too. > An assembly language programmer has the option of writing "fast" > subroutines that pass arguments in registers and don't bother setting > up stack frames, and so on. He can make sure that the arguments he is > passing in registers are already in the resgisters that they are > supposed to be in, and results go where they are ready to be used next. > Arguments can be passed by value or by reference, which ever is more > convenient, or faster. And of course there are special instructions to > be exploited. Again, inline procedures give a safer, easier to use mechanism, and they run faster as well. Argument passing? Good compilers (some actually exist) will check the body of the procedure and pass parameters by renaming the variables in the procedure body, if the renaming preserves the semantics of the procedure call. The result is that no data is moved (as in the assembler case), but without the programmer having to think if the improved code will "work". Here is the crucial difference between HLL and assembler -- when I write in assembler I have to think about each such optimization to be sure it does the right thing. Using inline procedure calls, the semantics of the call is guaranteed not to change if the procedure is made inline, so I never have to worry if passing an argument by reference is identical to passing it by value -- the compiler does that for me. Exploiting special instructions is something most compilers do poorly (if at all), so there should be an (infrequently used) escape mechanism to generate them directly. BUT, this escape should be much more structured than the assembly language style of "anything goes". For example, the generated instructions should be encapsulated within a procedure. This way, the caller need not know whether he is using the special instruction, and one implementation can be exchanged for another completely transparently. If combined with the inline mechanism, the procedure call overhead again disappears, and we obtain the best of both worlds. Lest anyone think such a language is just a fantasy, I should say that the MESA programming language supports both INLINE procedures and procedures escaped to machine code. If your view is that therefore MESA is not an HLL but rather an assembler, well, I won't disagree with you, but I will say in that case the the MESA assembler is better than your assembler. cheers, txr
rh@mit-eddie.MIT.EDU (Randy Haskins) (12/25/86)
There are problems with INLINE, however. I have forsaken the world of big things and started writing code for PC's, and for the first time in my life, have to give serious thought to trading speed for space in some cases. Inline coding may be faster, but it takes up more space. The coder has to decide the relative importance for each case. Just to add to the argument, here is an example of what a human can do that I'd seriously doubt that a compiler would do. I first saw something like this in some local hacks to the Monitor for a Tops-20 at MIT. At first, I was offended, but then I saw the sheer elegance of it. PRSPC: Skipa T2,[32.] ; move a space into 2, skip next instruction PRTAB: Movei T2,9. ; put a tab in 2 Movei T1,.PRIOU ; primary output Bout% ; byte output Ercal .+1 ; ignore error Popj P ; return Not only does it avoid having to make an extraneous jump or call (in order to use common code), it saves space. Would a compiler "know" to do something like this? I doubt it. Random -- Randwulf (Randy Haskins); Path= genrad!mit-eddie!rh
ggs@ulysses.homer.nj.att.com (Griff Smith) (12/27/86)
In article <4375@mit-eddie.MIT.EDU>, rh@mit-eddie.MIT.EDU (Randy Haskins) writes: > Just to add to the argument, here is an example of what a human can do > that I'd seriously doubt that a compiler would do. I first saw something > like this in some local hacks to the Monitor for a Tops-20 at MIT. At > first, I was offended, but then I saw the sheer elegance of it. > > PRSPC: Skipa T2,[32.] ; move a space into 2, skip next instruction > PRTAB: Movei T2,9. ; put a tab in 2 > Movei T1,.PRIOU ; primary output > Bout% ; byte output > Ercal .+1 ; ignore error > Popj P ; return > > Not only does it avoid having to make an extraneous jump or call (in order > to use common code), it saves space. Would a compiler "know" to do something > like this? I doubt it. > > Randwulf (Randy Haskins); Path= genrad!mit-eddie!rh Actually, this kind of optimization was standard for the TOPS-10 and TOPS-20 Fortran and LISP compilers. A simple peephole optimizer could find these cases easily. The first step would be for the optimizer to recognize that the ends of the two functions are identical; it would then do a space optimization by creating a jump to the common code. The {skipa, movei} trick would then fall out while applying standard peephole optimizations of moves and jumps. As an example of how far a compiler could carry it, consider the following hypothetical output from a (probably non-existent) optimizing C compiler: Given the statement: if (a == 0 || a == 1) b += 1; else b -= 1; the raw PDP-10 assembly language from the code generator would be something like MOVE R1,a ; if (a == 0 JUMPE R1,L1 CAIE R1,1 ; || a == 1) JRST L2 L1: AOS b ; b += 1 JRST L3 L2: SOS b ; b -= 1 L3: A simple optimization of the last five lines of text would reduce this to: MOVE R1,a ; if (a == 0 JUMPE R1,L1 CAIE R1,1 ; || a == 1) JRST L2 L1: AOSA b ; b += 1 (skip next instruction) L2: SOS b ; b -= 1 This can then be further optimized by inverting the sense of the compare and eliminating the jump (JRST): MOVE R1,a ; if (a == 0 JUMPE R1,L1 CAIN R1,1 ; || a == 1) L1: AOSA b ; b += 1 (skip next instruction) SOS b ; b -= 1 Finally, by replacing the {move, jumpe} with an instruction that loads and tests in one operation we get: SKIPE R1,a ; if (a == 0 CAIN R1,1 ; || a == 1) AOSA b ; b += 1 (skip next instruction) SOS b ; b -= 1 None of these substitutions are any more complicated that ones commonly used by the UNIX(R) C compiler, yet the resulting sequence of instructions appears to be the result of clever design. Given how long it took to make sure I didn't have any mistakes in the above example, I would much rather have a compiler do it for me. -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: 1-201-582-7736 UUCP: {allegra|ihnp4}!ulysses!ggs Internet: ggs@ulysses.uucp
rentsch@unc.UUCP (Tim Rentsch) (12/31/86)
In article <4375@mit-eddie.MIT.EDU> rh@eddie.MIT.EDU (Randy Haskins writes: > There are problems with INLINE, however. I have forsaken the > world of big things and started writing code for PC's, and for > the first time in my life, have to give serious thought to trading > speed for space in some cases. Inline coding may be faster, but > it takes up more space. The coder has to decide the relative > importance for each case. Good point. On the other hand, any procedure which is called only once (such as those used to provide structure) will run faster and be smaller if expanded inline. > Just to add to the argument, here is an example of what a human can do > that I'd seriously doubt that a compiler would do. I first saw something > like this in some local hacks to the Monitor for a Tops-20 at MIT. At > first, I was offended, but then I saw the sheer elegance of it. > > PRSPC:Skipa T2,[32.] ; move a space into 2, skip next instruction > PRTAB: Movei T2,9. ; put a tab in 2 > Movei T1,.PRIOU ; primary output > Bout% ; byte output > Ercal .+1 ; ignore error > Popj P ; return Yes, clever code, but be careful not to mistake saving space in the small for saving space in the large. If the routine were duplicated, it would cost only a few extra instructions, in other words not much. Programs get much bigger usually by more global design choices, and these little optimizations don't contribute much one way or the other. As with coding speed, it's the *big* decisions that matter, and this isn't one of them. cheers, txr