schow@bnr-public.uucp (Stanley Chow) (04/27/89)
Since this discussion of complex instructions is venturing outside of architectural issues and getting into language/OS issues, I have cross posted to what I think are appropiate groups. Hopefully, we can get some other viewpoints on this issue. Please limit followups to appropiate group(s). In article <25384@amdcad.AMD.COM> prem@crackle.amd.com (Prem Sobel) writes: >Many years ago, there was a machine called the Interdata Model 70 which >had instructions for atomically adding or removing items from circular >double ended queues. The data structure was defined reasonable effeciently >and the machine was microcoded. > >Yet no one, no compiler seriously used these instructions. The reason was, >amazingly, that individual instructions were faster!!! I never looked at >the microcode, so I cannot comment why that was. Strangely enough, we have a proprietary machine that have micro-coded instructions for much the same functions. The queueing instructions happen to be at the top of the usage list. Even more amazing, micro-coding of frequently used instruction sequences essentially doubled performance. Since I wrote much of the micro-code, (and did much of analysis to begin with), I can state that the main reasons are: - reduced program bandwidth - better pipelining of program and data access - better parallelism for using the hardware units. All this is done with a peephole optimizer! And *all* the instructions fitted into 4K by 40 bits of micro-code! You think the VAX procedure calling instructions are big? We have special instructions for swapping processes in and out! The code for swapping process is something like: SaveRegisters(); ; one instruction. ; old process is implicit RestoreRegisters(new_process); ; another instruction These instructions play with the hardware registers, firmware registers, help the scheduler do software stuff, calculate the CPU time spent in the current process, save/restore the runtime stacks and some other things that I cannot remember off hand. The end result is that process swapping happens at data memory bandwidth. We looked at the options, and concluded that even with absolutely no program store wait-states, it is impossible for any software (compiled or hand-tuned) to evan come close to this performance. Note that this is on a machine designed for micro-coding in the early 70's so the comparisons may not be valid for current machines. Considering that it uses only MSI TTL on 4-layer boards, we get very good through-put. [We have already come out with a 68K based replacement and are working on more fun stuff, more about that next decade.] A word of caution for people that want to look into micro-coding: get control of your operating system and compiler before you try it. There is no point in micro-coding instructions for your application unless you can make the OS and the compiler like it. (I managed to introduce new syntax into the language and changed whole chunks of the OS to support some of the fancy micro-code). Stanley Chow ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public (613) 763-2831 Disclaimer: Since I am only talking about an old system, and all the information has already been published in one form or another, I don't think my employer minds we talking about it. That does not mean I represent anyone. -- Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU Plausible paths are { decvax | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request