firth@sei.cmu.edu (Robert Firth) (04/13/89)
In article <5064@hubcap.clemson.edu> mark@hubcap.clemson.edu (Mark Smotherman) writes: [quoting Clark & Strecker] > "Anecdotal accounts of irrational implementations are certainly > interesting. Is it *typical*, however, that composite instructions > run more slowly than equivalent sequences of simple instructions? In my experience, yes, it is indeed typical. Taking only the example of the VAX, I have found naive code faster than the block operations (MOVC3, CMPC3), the loop instruction (ACB), INDEX, and, of course, the call instructions. Moving to RISC machines, one point to note is that the authors' assertion that, other things being equal, several instructions cannot be faster than one is false: the several instructions can be interleaved by the compiler with other code, so as to overlap delays with useful work.
slackey@bbn.com (Stan Lackey) (04/13/89)
In article <3169@bd.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >In article <5064@hubcap.clemson.edu> mark@hubcap.clemson.edu (Mark Smotherman) writes: > [quoting Clark & Strecker] > >> "Anecdotal accounts of irrational implementations are certainly >> interesting. Is it *typical*, however, that composite instructions >> run more slowly than equivalent sequences of simple instructions? > >In my experience, yes, it is indeed typical. Taking only the example >of the VAX, I have found naive code faster than the block operations >(MOVC3, CMPC3), the loop instruction (ACB), INDEX, and, of course, the >call instructions. You should include ALL of the data here. The proper way of saying this is: "The implementation of the subset that I need in a particular case is faster than the full operation." For example, the VAX CALL includes aligning the stack and saving enough state so it can be fully restored, saving the condition codes, pushing a minimum of five longwords, picking up a mask at the called procedure, and pushing only those registers with corresponding bits set in the mask. I'll bet your substitution code did not do all that. The string instructions have logic in them to make sure overlapping strings worked, zero-length strings worked etc., regardless of the case. Did your code do that? INDEX was another example where some of the exception checking was left out of the emulation. I guess it USUALLY works! CALL was never intended to be faster than JSR; it was designed to ease the programming burden of a consistent calling standard, so cross-language calls would just work. Now if there were no JSR, there would be something to complain about. But all the simple instructions are present, and implemented in an optimized way! So the user has a choice: use the fully functional instructions if he wants the machine to do the work, or reinvent all the way down to the bare metal. With RISC, there is no choice. So there! :-) Stan
joel@pandora.pa.dec.com (Joel McCormack) (04/14/89)
I too can vouch for the fact that sequences of simple VAX instructions are often faster than a single ``smart'' instruction. The WRL Modula-2 compiler generates decent, but not great, code. Nonetheless, it is quite competitive against most of the
bcase@cup.portal.com (Brian bcase Case) (04/15/89)
>You should include ALL of the data here. The proper way of saying >this is: "The implementation of the subset that I need in a particular >case is faster than the full operation." For example, ... [examples...] >CALL was never intended to be faster than JSR; it was designed to ease >the programming burden of a consistent calling standard, so >cross-language calls would just work. Now if there were no JSR, there >would be something to complain about. But all the simple instructions >are present, and implemented in an optimized way! So the user has a >choice: use the fully functional instructions if he wants the machine >to do the work, or reinvent all the way down to the bare metal. With >RISC, there is no choice. You make a valid point: the complex instructions do a lot of work, and that work is sequenced in a transparent way so that instruction fetch overhead is eliminated and advantage of low-level special cases can be taken in microcode (well, you didn't say all that; I guessed you were thinking it). Unfortunately, all the work that the complex instructions do is rarely needed. And in the case of the call instruction, I find the VAX complex instruction sorta insulting: assuming that compiler writers are too stupid to adhere to a common spec for inter-language call protocol. Having the microcode do it is no better than having a subroutine in the compiler emit the sequence of instructions that implements the agreed-upon protocol. And the agreed-upon protocol can do just what is needed. Maybe compiler writers are that stupid, in general, or were at the time of the VAX's conception. You are also right when you say with RISC there is no choice: one is FORCED to use the fast instructions! :-)
ok@quintus.UUCP (Richard A. O'Keefe) (04/16/89)
In article <17167@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes: >And in the case of the call instruction, I find the VAX complex instruction >sorta insulting: assuming that compiler writers are too stupid to adhere to >a common spec for inter-language call protocol. Try calling Pascal from PL/I under MVS sometime.
bcase@cup.portal.com (Brian bcase Case) (04/17/89)
>>And in the case of the call instruction, I find the VAX complex instruction >>sorta insulting: assuming that compiler writers are too stupid to adhere to >>a common spec for inter-language call protocol. > >Try calling Pascal from PL/I under MVS sometime. I don't want this to get out of hand, and it might belong elsewhere anyway. BUT, was calling Pascal from PL/I under MVS a design goal? If so, then why doesn't the existence of the VAX's CALLS instruction guarantee this goal will be met? I guess this is yet another point: if the CALLS instruction was included so that inter-language calls "would just work," then it fails even at that!
leichter@CS.YALE.EDU (Jerry Leichter (LEICHTER-JERRY@CS.YALE.EDU)) (04/17/89)
In article <17296@cup.portal.com>, bcase@cup.portal.com (Brian bcase Case) writes... >> >>Try calling Pascal from PL/I under MVS sometime. > >I don't want this to get out of hand, and it might belong elsewhere anyway. > >BUT, was calling Pascal from PL/I under MVS a design goal? If so, then why >doesn't the existence of the VAX's CALLS instruction guarantee this goal >will be met? I guess this is yet another point: if the CALLS instruction >was included so that inter-language calls "would just work," then it fails >even at that! Ahem. Last time I checked, MVS didn't run on VAXes; it ran on IBM mainframes. VMS runs on VAXes. It was a design goal of VMS that Pascal code be callable from PL/I code. It is. (Yes, there are complications when you cross over into an area where both languages specify "how the world should look" - for example, the I/O system.) I have no idea what goals MVS may have had in this are. -- Jerry
ok@quintus.UUCP (Richard A. O'Keefe) (04/18/89)
In article <17296@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes: >>>And in the case of the call instruction, I find the VAX complex instruction >>>sorta insulting: assuming that compiler writers are too stupid to adhere to >>>a common spec for inter-language call protocol. >> >>Try calling Pascal from PL/I under MVS sometime. > >I don't want this to get out of hand, and it might belong elsewhere anyway. > >BUT, was calling Pascal from PL/I under MVS a design goal? If so, then why >doesn't the existence of the VAX's CALLS instruction guarantee this goal >will be met? I guess this is yet another point: if the CALLS instruction >was included so that inter-language calls "would just work," then it fails >even at that! I am *sick* of people not reading what I wrote. Watch carefully now. EM. VEE. ESS. I did not say "try calling Pascal from PL/I under <VMS> sometime." To the best of my knowledge that works perfectly well and has done ever since it was possible to try. And yes, the CALLS/CALLG instructions and VIA were important in achieving that. My point was that IBM have a software convention for System/370, and that their Pascal compiler (which is not a fully supported product, but it is the compiler they will provide to customers who want Pascal) was written after that convention was well established and described in hundreds of books, and it _does_ _not_ follow that convention. My point, then, is in DEC's defence: yes there _are_ compiler writers who will not adhere to a common spec for inter-language call protocol. That's not to say that they are stupid. They had to trade off incompatibility with other languages against inefficiency for Pascal. DEC tried to make that a non-issue. On the subject of calling conventions, I just had my first look at a 68030 manual. I know, I'm way behind the times, but it was supposed to be just more of the same. Why did Motorola delete CALLM?
richt@breakpoint.ksr.com (Rich Title) (04/18/89)
In article <57252@yale-celray.yale.UUCP> leichter@CS.YALE.EDU (Jerry Leichter (LEICHTER-JERRY@CS.YALE.EDU)) writes: > [Discussion of VMS call conventions & stack unwinding...] >Now, you can argue about whether such a condition handling mechanism is impor- >tant. Unix gets by without it... In UNIX, setjmp/longjmp are the mechanism for stack unwinding. UNIX didn't make the VMS mistake of getting stack-unwind mixed up with condition handling: longjmp can be executed from anywhere (though it is most commonly done from within a handler). setjmp simply saves an entire register set, so longjmp need not do the frame-by-frame unwind that VMS's $UNWIND must do. The advantage of the UNIX method of stack unwinding is that it is relatively portable, and doesn't make demands on the calling conventions. The disadvantage is that the establish-an-unwind-point (the setjmp call) is more expensive: an entire register set must be saved, as opposed to just putting a word on the stack as in VMS. So there's pro & cons to each approach. >In fact, what often gets overlooked in discussing the VAX architecture was >that it was NOT designed by hardware people alone. It was the result of a >lot of give and take between hardware and software people. It was the soft- >ware people who said "we want a condition handling facility which provides >certain features (in particular, no cost to code which doesn't use it... Well, to rebut that one, let me simply quote you again (!) >Here's a simple example: It's common for procedures to have a two different >paths, one for a simple case, one for a complex case. It may very well be >that the simple case path uses fewer registers than the complex case path. >So you'd be tempted to say "on entry (i.e., using the procedure entry mask) >just save the registers that the simple case uses (or that both use); if the >complex case occurs, save and restore those explicitly". This is illegal >under the procedure call standard, for a simple reason: If the stack gets >unwound through the complex case's code, the registers it had saved "by hand" >would not be restored... Aha! So there *is* a cost to the code that is not using it after all... - Rich
rob@tolerant.UUCP (Rob Kleinschmidt) (04/19/89)
Once in a while, the atomic nature of a complex instruction can have real advantages over a sequence of fast single cycle instructions. Because the complex instruction cannot be interrupted in mid sequence, there is no need to modify processor priority. This can allow better manipulation of shared data items (linked lists, counters, etc.) especially by user level processes. Anybody got a good way for a user execute an un-interupted sequence of a dozen RISC instructions ?
chase@Ozona.orc.olivetti.com (David Chase) (04/19/89)
In article <4101@tolerant.UUCP> rob@tolerant.UUCP (Rob Kleinschmidt) writes: >Anybody got a good way for a user execute an un-interupted sequence >of a dozen RISC instructions ? Saw (via cable) a talk at Stanford by one of the 80860 people from Intel where he said (I believe) that the chip provided the ability to do just that. Upon examining the Programmer's Reference Manual, I find that up to 32 user-mode instructions can be protected by a LOCK instruction (no branching allowed, must be restartable from the LOCK in case of traps). That sounds like what you want. David
tim@crackle.amd.com (Tim Olson) (04/19/89)
In article <4101@tolerant.UUCP> rob@tolerant.UUCP (Rob Kleinschmidt) writes: | Once in a while, the atomic nature of a complex instruction can have | real advantages over a sequence of fast single cycle instructions. | Because the complex instruction cannot be interrupted in mid sequence, | there is no need to modify processor priority. This can allow better | manipulation of shared data items (linked lists, counters, etc.) | especially by user level processes. Anybody got a good way for a user | execute an un-interupted sequence of a dozen RISC instructions ? I believe the i860 has a Lock bit which can be set by the user, disabling interrupts and setting an external Lock pin. It is tracked by a watchdog timer, so that after a certain number of cycles (10?) it automatically is reset (to prevent the user from disabling interrupts for too long). On the Am29000, the on-chip timer is disabled separately from the external interrupts. Thus, an operating system could supply a user-invokable disable() routine which masked interrupts and set the timer to interrupt in a small number of cycles. The user can then perform a series of operations, then call an enable() routine (resetting the timer and the interrupt mask). If enable() were not called in time, the timer would catch it and force the interrupt mask off. One other (software) method is to associate a semaphore with each critical region (or a collection of critical regions), and manipulate it with an atomic load/set instruction (which many RISC processors have). Interrupts don't have to be disabled if all users of the critical region obey the semaphore. I guess one may then have to worry about deadlock conditions, though... -- Tim Olson Advanced Micro Devices (tim@amd.com)
tim@crackle.amd.com (Tim Olson) (04/19/89)
In article <531@ksr.UUCP> richt@ksr.UUCP (Rich Title) writes: | In article <57252@yale-celray.yale.UUCP> leichter@CS.YALE.EDU (Jerry Leichter (LEICHTER-JERRY@CS.YALE.EDU)) writes: | The advantage of the UNIX method of stack unwinding is that it is | relatively portable, and doesn't make demands on the calling conventions. | The disadvantage is that the establish-an-unwind-point (the setjmp call) | is more expensive: an entire register set must be saved, as opposed to | just putting a word on the stack as in VMS. So there's pro & cons to each | approach. Not necessarily. As has been pointed out here in the past, register-windowed machines can get by with very simple setjmp()/longjmp() implementations. Setjmp() only needs to save the current stack pointer and stack-cache bounds in the jmpbuf; longjmp() restores these values (the standard return from a function makes sure that the frame being returned into is entirely in the stack cache). This implementation also makes "register" variables act like memory variables -- after a longjmp() both contain the same values as of the time of the longjmp() call, rather than having register values suddenly restored to their contents at the time of the setjmp() call. -- Tim Olson Advanced Micro Devices (tim@amd.com)
rpw3@amdcad.AMD.COM (Rob Warnock) (04/19/89)
[Although the implementation discussed below is Am29000-specific, the general notion is probably feasible on the other current RISCs, and maybe even CISCs.] In article <25280@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: +--------------- | On the Am29000, the on-chip timer is disabled separately from the | external interrupts. Thus, an operating system could supply a | user-invokable disable() routine which masked interrupts and set the timer | to interrupt in a small number of cycles. The user can then perform a | series of operations, then call an enable() routine (resetting the timer | and the interrupt mask). If enable() were not called in time, the timer | would catch it and force the interrupt mask off. +--------------- [Funny you should mention... Our internal 29k/4.3bsd Unix port group was talking about this just yesterday. ;-} ] Actually, Tim, you can do a lot better than that, by remembering to optimize for the *expected* case, which is: There will be no attempt to break the lock. The main thing you are trying to protect against is rescheduling and "signals" due to external events such as typing <^C>. For that, you don't need a separate enable() call. Instead, you can use a temp_disable() call [which can be a *very* lightweight sys call, on the 29k] which needs to do just two things: 1. Check a u_page flag for a pending event that may have been held off by a previous temporary_disable() call. [This prevents the user from keeping the scheduler locked out.] 2. Store the current value of the CPU timer in the u_page. In 29000 assembler, this is: ; User does assert instruction "asne V_UIOFF,gr1,gr1", which traps ; here through vector V_UIOFF, since gr1 *does* equal gr1. user_ioff: add it0,upage,BRK_UIOFF_OFS ; find u.u_break_uioff load 0, 0, it0, it0 ; see if we need a resched mfsr it1, TCV ; get current timer value add it1, it1, 1 ; make sure non-zero add it2,upage,UIOFF_OFS ; find u.u_uioff jmpt it0, user_ioff_broken ; take traps first... store 0, 0, it1, it2 ; save timer+1 in u.u_uioff iret Assuming hits in the BTC [user doing this a lot], this is 9+8+2+2 = 21 cycles or just under one microsecond on a 25 MHz machine [with 2-first/1-burst memory]. Now, when you get an external signal like <^C>, you do all the normal Unix signal stuff, but "issig()" will check the u.u_uioff and, if it's non-zero and not more than "delta" more than the current timer value, will just set u.u_break_uioff and *not* ask for the signal. Likewise, "sched()" will delay rescheduling a uioff'd process. The timer interrupt code checks the current process to see if the u.u_uioff word is "old" [mod 2^32], and if so, clears it (and causes a resched if needed). I'm sure there are more places to touch that that, but you get the idea. That works for user-mode code, but for the kernel you *would* need hardware. What we used back at Dig. Comm. Assoc. (in a Z-80 comm system) was a counter chip, wired so it counted down to zero and hung there until reloaded. Whenever it was *not* zero, interrupts and DMA requests were held of (simple gate). The trick is that the store which loads the counter *enables* interrupts and DMA [at least long enough for them to "take"], thus preventing infinite lockouts. Hmmm... You could do the same thing for user mode, if your MMU allows mapping user pages into I/O space. [The 29k does.] Dedicate a whole I/O page to the "temp_ioff" counter, put it in every process's address space, including the kernel's. Cost: A 25-cent external chip. [I assume there are enough misc. I/O decodes lying around in a PAL somewhere.] Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403
alan@rnms1.paradyne.com (Alan Lovejoy) (04/20/89)
In article <17363@cup.portal.com< bcase@cup.portal.com (Brian bcase Case) writes:
<>On the subject of calling conventions, I just had my first look at a
<>68030 manual. I know, I'm way behind the times, but it was supposed to
<>be just more of the same. Why did Motorola delete CALLM?
<
<Did they really do this? Maybe it is because the PMMU subset on the 030
<can't support the protection semantics, but I'm not sure about that. Maybe
<it's because nobody used the instruction. :-) :-)
I heard that they checked, and nobody was using the instruction. I don't
remember now what the motivation for dropping CALLM was, other than lack
of use. That seems like sufficient justification in itself.
Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL.
Disclaimer: I do not speak for AT&T Paradyne. They do not speak for me.
_________________________Design Flaws Travel In Herds_________________________
Motto: If nanomachines will be able to reconstruct you, YOU AREN'T DEAD YET.
guy@auspex.auspex.com (Guy Harris) (04/21/89)
>You should include ALL of the data here. The proper way of saying >this is: "The implementation of the subset that I need in a particular >case is faster than the full operation." For example, the VAX CALL >includes aligning the stack and saving enough state so it can be fully >restored, saving the condition codes, pushing a minimum of five longwords, >picking up a mask at the called procedure, and pushing only >those registers with corresponding bits set in the mask. I'll bet >your substitution code did not do all that. If it didn't *have* to, it probably didn't. Does VAX compiled code really care about having the condition code saved and restored? Does it care about all five of those longwords? The compiled code in the callee can save the registers itself - or the designer of the calling sequence can choose to have the caller save the registers (which, as I remember, Firth has advocated in the past). >INDEX was another example where some of the exception checking was >left out of the emulation. I guess it USUALLY works! In a lot of cases the exception checking *can* be left out, if the compiler already knows that the subscript won't be out of bounds. Admittedly, I think IBM may have a patent on one way of doing this, so unless you do it differently that may be an extra cost :-).
schow@bnr-public.uucp (Stanley Chow) (04/21/89)
In article <134@dg.dg.com> rec@dg.UUCP (Robert Cousins) writes: > >As a result of this difficulty, architects begain to build in >instructions to handle data structures more directly. However, >few of these instructions can be compiled since the C compilers ^^^^ >have difficulty decyphering the code well enough to know when >to use the "insert into circular queue" instruction as opposed to >smaller operations to manipulate pointers. Also, few programmers >are sufficiently versed in the underlying hardware to generate >compatible data structures which could allow a very smart compiler >to generate the instructions in the first place. > There are many other compilers that can take advantage of complex intructions. Actualy, even in C, it is relatively easy to set up include files that will declare the right data types and inline procedures [assuming your compiler let's you.] Throwing out architectural features just because C can't use it? There are other languages. How about throwing out C instead? :-) Also, it is not necessary for all programmers to know enough to use the specail instructions. Some guru can set it up and everyone can then use the pre-built modules. Depending on how you set everything up, the compiler does not have to be smart at all. For example, we have a system with really wild instructions that gets generated even though the user don't know what they are and the compiler does only peephole optimization. As is turns out, having atomic operations for free is really nice. >I personally prefer the way DG/UX handles things. (I'm biased, I >work for DG.) Each virtual processor contains a value called "atomic >operation count" which can be raised or lowered (somewhat like parenthesis >can be nested). Whenever the AOC is > 0, the processor cannot be suspended >for anything but a simple interrupt service. This guarantees that atomic >operations can complete without interference. If more stringent control >is required, then some form of event counter solution is used. I like this design. Sound like it is implemented by the kernal for virtual processors. How do you handle the case of kernal not wanting to be interrupted? Stanley Chow ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public My opinions are mine alone. Since there has been no interest in others being represented by them, I am lowering the representation fee.
cik@l.cc.purdue.edu (Herman Rubin) (04/21/89)
In article <5965@pdn.paradyne.com<, alan@rnms1.paradyne.com (Alan Lovejoy) writes: < In article <17363@cup.portal.com< bcase@cup.portal.com (Brian bcase Case) writes: < <>On the subject of calling conventions, I just had my first look at a < <>68030 manual. I know, I'm way behind the times, but it was supposed to < <>be just more of the same. Why did Motorola delete CALLM? < < < <Did they really do this? Maybe it is because the PMMU subset on the 030 < <can't support the protection semantics, but I'm not sure about that. Maybe < <it's because nobody used the instruction. :-) :-) > > I heard that they checked, and nobody was using the instruction. I don't > remember now what the motivation for dropping CALLM was, other than lack > of use. That seems like sufficient justification in itself. Could this be another example of the vicious cycle? The language does not use the instruction, therefore the instruction does not get used, therefore the instruction gets dropped. Which algorithm I use depends on the hardware instructions; if the instruction is not there, I may very well know that simulating the instruction is too costly to consider, so it would even be impossible to tell from a careful examination of code that it would be desirable. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
barmar@think.COM (Barry Margolin) (04/21/89)
In article <134@dg.dg.com> rec@dg.UUCP (Robert Cousins) writes: >As a result of this difficulty, architects begain to build in >instructions to handle data structures more directly. However, >few of these instructions can be compiled since the C compilers >have difficulty decyphering the code well enough to know when >to use the "insert into circular queue" instruction as opposed to >smaller operations to manipulate pointers. Also, few programmers >are sufficiently versed in the underlying hardware to generate >compatible data structures which could allow a very smart compiler >to generate the instructions in the first place. The solution to this is to hide these instructions away in subroutine libraries and/or compiler intrinsics. In addition to making it easy for the compiler to recognize these operations and optimize them appropriately, it also makes it easier to port the code and still get good performance. This is how computer languages and processors evolve in general. Consider floating point and trigonometric functions. Once upon a time programmers might have been required to code these things themselves. But this would make it difficult to use hardware which has these operations built in. So just about every language provides them as standard features, with provisions for open-coding them. What's the difference between operations on floating point numbers and operations on sophisticated data structures such as queues? Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
rpw3@amdcad.AMD.COM (Rob Warnock) (04/22/89)
In article <1244@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: +--------------- | In article <5965@pdn.paradyne.com<, alan@rnms1.paradyne.com writes: | < In article <17363@cup.portal.com< bcase@cup.portal.com writes: | < < ... Why did Motorola delete CALLM? | < < Maybe it's because nobody used the instruction. :-) :-) | > don't remember now what the motivation for dropping CALLM was, other | > than lack of use. That seems like sufficient justification in itself. | Could this be another example of the vicious cycle? The language does not | use the instruction, therefore the instruction does not get used, therefore | the instruction gets dropped... +--------------- Gerry Weinberg, in his neat little book "Re-thinking Systems Analysis", calls this "The Railroad Paradox". The 2:30pm train to the city zooms on by without stopping, but some suburban wives would like to take that one into the city sometimes so they can shop before meeting hubby for dinner. They write several notes and petitions, finally getting back a polite letter from the railroad which says, "We have stationed an observer on the platform at your stop for a week now who reports that there is never anyone waiting for the 2:30pm train to stop. Therefore, we must regretfully deny your request." He also gives computer-related examples... Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403
henry@utzoo.uucp (Henry Spencer) (04/23/89)
In article <1011@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >My point was that IBM have a software convention for System/370, and that >their Pascal compiler (which is not a fully supported product, but it is >the compiler they will provide to customers who want Pascal) was written >after that convention was well established and described in hundreds of >books, and it _does_ _not_ follow that convention... Might this have something to do with how ill-designed that convention is? "If you want people to use standards, make it easier and faster to be standard than to be non-standard." -- Mars in 1980s: USSR, 2 tries, | Henry Spencer at U of Toronto Zoology 2 failures; USA, 0 tries. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
cik@l.cc.purdue.edu (Herman Rubin) (04/24/89)
In article <39609@think.UUCP>, barmar@think.COM (Barry Margolin) writes: > In article <134@dg.dg.com> rec@dg.UUCP (Robert Cousins) writes: ...................... > The solution to this is to hide these instructions away in subroutine > libraries and/or compiler intrinsics. In addition to making it easy > for the compiler to recognize these operations and optimize them > appropriately, it also makes it easier to port the code and still get > good performance. > > This is how computer languages and processors evolve in general. > Consider floating point and trigonometric functions. Once upon a time > programmers might have been required to code these things themselves. > But this would make it difficult to use hardware which has these > operations built in. So just about every language provides them as > standard features, with provisions for open-coding them. What's the > difference between operations on floating point numbers and operations > on sophisticated data structures such as queues? And suppose you want multiple precision. In that case you need good coding AND GOOD INTEGER INSTRUCTIONS. There are multiple precision packages written in floating point. They simulate integer arithmetic in floating point, and are necessarily VERY slow. Current hardware on most machines throws real obstacles into efficient multiple precision coding. Slight differences in the multiplication and division instruc- tions can cause a factor of 2-20 or more in these. It is not true that all languages provide procedures for open coding. Most provide some procedures for attempting to open-code, but most implementations I have seen discourage this. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
prem@crackle.amd.com (Prem Sobel) (04/25/89)
Many years ago, there was a machine called the Interdata Model 70 which had instructions for atomically adding or removing items from circular double ended queues. The data structure was defined reasonable effeciently and the machine was microcoded. Yet no one, no compiler seriously used these instructions. The reason was, amazingly, that individual instructions were faster!!! I never looked at the microcode, so I cannot commebnt why that was. Prem
schow@bnr-public.uucp (Stanley Chow) (04/27/89)
Since this discussion of complex instructions is venturing outside of architectural issues and getting into language/OS issues, I have cross posted to what I think are appropiate groups. Hopefully, we can get some other viewpoints on this issue. Please limit followups to appropiate group(s). In article <25384@amdcad.AMD.COM> prem@crackle.amd.com (Prem Sobel) writes: >Many years ago, there was a machine called the Interdata Model 70 which >had instructions for atomically adding or removing items from circular >double ended queues. The data structure was defined reasonable effeciently >and the machine was microcoded. > >Yet no one, no compiler seriously used these instructions. The reason was, >amazingly, that individual instructions were faster!!! I never looked at >the microcode, so I cannot comment why that was. Strangely enough, we have a proprietary machine that have micro-coded instructions for much the same functions. The queueing instructions happen to be at the top of the usage list. Even more amazing, micro-coding of frequently used instruction sequences essentially doubled performance. Since I wrote much of the micro-code, (and did much of analysis to begin with), I can state that the main reasons are: - reduced program bandwidth - better pipelining of program and data access - better parallelism for using the hardware units. All this is done with a peephole optimizer! And *all* the instructions fitted into 4K by 40 bits of micro-code! You think the VAX procedure calling instructions are big? We have special instructions for swapping processes in and out! The code for swapping process is something like: SaveRegisters(); ; one instruction. ; old process is implicit RestoreRegisters(new_process); ; another instruction These instructions play with the hardware registers, firmware registers, help the scheduler do software stuff, calculate the CPU time spent in the current process, save/restore the runtime stacks and some other things that I cannot remember off hand. The end result is that process swapping happens at data memory bandwidth. We looked at the options, and concluded that even with absolutely no program store wait-states, it is impossible for any software (compiled or hand-tuned) to evan come close to this performance. Note that this is on a machine designed for micro-coding in the early 70's so the comparisons may not be valid for current machines. Considering that it uses only MSI TTL on 4-layer boards, we get very good through-put. [We have already come out with a 68K based replacement and are working on more fun stuff, more about that next decade.] A word of caution for people that want to look into micro-coding: get control of your operating system and compiler before you try it. There is no point in micro-coding instructions for your application unless you can make the OS and the compiler like it. (I managed to introduce new syntax into the language and changed whole chunks of the OS to support some of the fancy micro-code). Stanley Chow ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public (613) 763-2831 Disclaimer: Since I am only talking about an old system, and all the information has already been published in one form or another, I don't think my employer minds we talking about it. That does not mean I represent anyone. -- Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU Plausible paths are { decvax | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
aglew@mcdurb.Urbana.Gould.COM (04/29/89)
>Once in a while, the atomic nature of a complex instruction can have >real advantages over a sequence of fast single cycle instructions. >Because the complex instruction cannot be interrupted in mid sequence, >there is no need to modify processor priority. This can allow better >manipulation of shared data items (linked lists, counters, etc.) >especially by user level processes. Anybody got a good way for a user >execute an un-interupted sequence of a dozen RISC instructions ? Don't rely on "the atomic nature of a complex instruction". 1) they may be restartable or continuable across interrupts, but they may not necessarily guarantee atomicity across an interrupt - ie. if the interrupt routine messes with the data structures the complex instruction is manipulating, atomicity may be lost. Simple example: what does atomicity of a block copy mean to you? It might mean that exactly what was present in memory when the instruction was first issued should be copied - which is seldom true (in fact, I know of RT systems that depended on interrupt routines being able to modify copies in progress.) Smaller examples, like Compare-and-swap, the processors usually do guarantee atomicity wrt interrupts on. But not always... 2) much more frequently, although complicated instructions may be atomic wrt interrupts, they are *not* atomic wrt other processors in the system.
jkrueger@daitc.daitc.mil (Jonathan Krueger) (05/03/89)
In article <426@bnr-fos.UUCP>, schow@bnr-public (Stanley Chow) writes: >it is not necessary for all programmers to know enough to use >the specail instructions. Some guru can set it up and everyone can >then use the pre-built modules. Depending on how you set everything >up, the compiler does not have to be smart at all. Of course, but you lose the performance advantage if the modules get called frequently and require significant setup (e.g. can't maintain state for multiple callers or sequences) with respect to work performed. Or to put it another way, That Way Lies Threaded Code. However, this isn't an argument for Better Living Through Inlining, rather an appeal for analysis and measurement of the tradeoffs. >For example, we have a system with really wild instructions that gets >generated even though the user don't know what they are and the >compiler does only peephole optimization. As is turns out, having >atomic operations for free is really nice. Sounds like you've managed to let your users express what they want in terms familiar to them but also understandable to the language translator, following which optimization to particular hardware can be done in a reasonable way. By any chance, would this be done by way of a function library? -- Jon --
cik@l.cc.purdue.edu (Herman Rubin) (05/03/89)
In article <504@daitc.daitc.mil>, jkrueger@daitc.daitc.mil (Jonathan Krueger) writes: > In article <426@bnr-fos.UUCP>, schow@bnr-public (Stanley Chow) writes: > >it is not necessary for all programmers to know enough to use > >the specail instructions. Some guru can set it up and everyone can > >then use the pre-built modules. Depending on how you set everything > >up, the compiler does not have to be smart at all. I agree it is not necessary for ALL programmers to know how to really use the computer, any more than it is necessary for the house painter to be an artist. But the artist cannot set up the modules for a house painter to produce a work of art. It is also not true that even the use of modules will handle what is needed. I may see tomorrow an ingenious use of the hardware that I do not see today. It is quite common when I see a different architecture that I see that some instruction, intended for a totally different purpose, does what I want. Also, I can see that some easily implemented hardware will do not only what I want, but what others can use. > Of course, but you lose the performance advantage if the modules get > called frequently and require significant setup (e.g. can't maintain > state for multiple callers or sequences) with respect to work > performed. Or to put it another way, That Way Lies Threaded Code. > However, this isn't an argument for Better Living Through Inlining, > rather an appeal for analysis and measurement of the tradeoffs. ' What I ask for here is flexibility. The guru is not a universal guru. < >For example, we have a system with really wild instructions that gets < >generated even though the user don't know what they are and the < >compiler does only peephole optimization. As is turns out, having < >atomic operations for free is really nice. < < Sounds like you've managed to let your users express what they want in < terms familiar to them but also understandable to the language < translator, following which optimization to particular hardware can be < done in a reasonable way. By any chance, would this be done by way of < a function library? < And what if your users find that the language is inadequate? I know of no adequate one. Do not cripple the programmer. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
les@unicads.UUCP (Les Milash) (05/03/89)
In article <1277@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >What I ask for here is flexibility. The guru is not a universal guru. the guru that is not universal is not the true guru. 0 :-)-|X
cdshaw@alberta.UUCP (Chris Shaw) (05/04/89)
In article <1277@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >In article <504@daitc.daitc.mil>, jkrueger@daitc.daitc.mil (Jonathan Krueger) writes: >< >For example, we have a system with really wild instructions that gets >< >generated even though the user don't know what they are and the >< >compiler does only peephole optimization. As is turns out, having >< >atomic operations for free is really nice. >< >< Sounds like you've managed to let your users express what they want in >< terms familiar to them but also understandable to the language >< translator, following which optimization to particular hardware can be >< done in a reasonable way. By any chance, would this be done by way of >< a function library? > >And what if your users find that the language is inadequate? I know of >no adequate one. Do not cripple the programmer. What if your stock Ford Escort doesn't do 180MPH on the straights? Don't cripple the driver! What if your 2 micron CMOS gate array doesn't give 0.3 nanosecond inverter propagation delay with fanout of 8? Don't cripple the logic designer! Get it? The point I'm illustrating is that if you want to do something that 99% of the rest of the world is unwilling to pay for, it should come as no surprise that that the stock solutions fall short. Griping and complaining that the stock solutions are inadequate is simply vacuous. If you invest the time and money, you can get a Ford Escort to go 180. But it won't be stock. If you invest the time and money, you can get 2.0 micron CMOS to give 0.3 ns propagation delay. But it won't be a gate array, it'll be full custom. Frankly, I'm getting very tired of reading Herman Rubin's complaints about C, because he never seems to have a coherent solution. All I see is calls for someone else to solve his particular problem, coupled with the unspoken attitude that his particular complaint should have been taken care of long ago. "The meter's running, buddy! Get to work!" So what? Maybe people do seriously listen, but come to the conclusion that it isn't worth the time and effort to give Herman Rubin what he wants. I don't know, but it seems to me that Herman Rubin is wasting his breath. Time for action, I think. Time for Herman Rubin to write "SL", for "Statistical Language", and to report on its results. Calling other people stupid is valuable only as long as they are willing to listen. For me, time was up long ago. >Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 -- Chris Shaw cdshaw@alberta.UUCP (or via watmath or ubc-vision) University of Alberta CatchPhrase: Bogus as HELL !
cik@l.cc.purdue.edu (Herman Rubin) (05/04/89)
In article <2249@pembina.UUCP>, cdshaw@alberta.UUCP (Chris Shaw) writes: > In article <1277@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > >In article <504@daitc.daitc.mil>, jkrueger@daitc.daitc.mil (Jonathan Krueger) writes: ....................... > >And what if your users find that the language is inadequate? I know of > >no adequate one. Do not cripple the programmer. > > What if your stock Ford Escort doesn't do 180MPH on the straights? Don't cripple > the driver! What if your 2 micron CMOS gate array doesn't give 0.3 nanosecond > inverter propagation delay with fanout of 8? Don't cripple the logic designer! This is the wrong analogy. YOU are saying that I should not complain if the hardware is not capable of doing everything I want it to. I do complain about this, but not very loudly. I do not ask for technological impossi- bilities, or things requiring great cost. I may ask for better brakes, but not to stop the car from 100mph in 50 feet. But my real complaint has nothing to do with the hardware. You would provide me with detailed instructions about how to get from one location to another, but you would not allow me to use a road map to change the route. You would not even allow me to back the car into the driveway to clean out the garage. The car has a steering wheel. I do not wish to put it under your control. > The point I'm illustrating is that if you want to do something that 99% of the > rest of the world is unwilling to pay for, ......................... What I am asking for is cheap. I am asking that you unchain the steering wheel. > Frankly, I'm getting very tired of reading Herman Rubin's complaints about C, > because he never seems to have a coherent solution. All I see is calls > for someone else to solve his particular problem, coupled with the unspoken > attitude that his particular complaint should have been taken care of long ago. You jump to conclusions and assume more than was stated. I have consistently stated that the best that can be produced is a fair language, and that to produce such a language, the producers must be aware that that is all that it possible. Also, one person cannot produce a language. I am not in a CS department. Funding for people in mathematics and statistics to hire programmers is difficult and niggardly, even to hire students. I have made one concrete suggestion for something which is not a language, but which would make for versatile relatively easy programming. It is a versatile macro processor; maybe one should call it a user-controlled semi-compiler. Let me give an example for the use of a single machine instruction on the CYBER 205. The format of the way I would like to write the macro is c{'z} ={t} {-}{|} a{'x} OP{mod} {|} b{'y} { /\ {~} w} where the {} delineate optional fields. The interpretation of this depends on the types of the arguments, and covers more than 40 instructions, some of which have up to 8 optional bits. For starters, I would like to translate expressions like this into other statements which are in the syntax recognized by a processor. Of course, macros could be recursive. Such a processor would allow me to produce a customized means of writing machine instructions, or sequences of machine instructions, instead of the clumsy assembler mnemonics. With this type of macro assembler, I believe it would be easy to produce good semi-portable code. Since I am customizing the way I write instructions, I expect to have to provide you with the description of what I am doing. We may get groups of people to agree on some of these; that could be called a partial language. I am not asking for the perfect language. I am asking for a means whereby a programmer can use a reasonable notation to tell the modified assembler what instructions to use. It should be possible to use more complicated macros to achieve what the current languages do, and more. The current languages are specialized macro statement languages. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
chuck@melmac.harris-atd.com (Chuck Musciano) (05/04/89)
In article <2249@pembina.UUCP> cdshaw@pembina.UUCP (Chris Shaw) writes: >Frankly, I'm getting very tired of reading Herman Rubin's complaints about C, >because he never seems to have a coherent solution. All I see is calls >for someone else to solve his particular problem, coupled with the unspoken >attitude that his particular complaint should have been taken care of long ago. >"The meter's running, buddy! Get to work!" > >So what? > >Maybe people do seriously listen, but come to the conclusion that it isn't >worth the time and effort to give Herman Rubin what he wants. I don't know, >but it seems to me that Herman Rubin is wasting his breath. Time for action, I >think. Time for Herman Rubin to write "SL", for "Statistical Language", and >to report on its results. Calling other people stupid is valuable only as >long as they are willing to listen. For me, time was up long ago. Hear! Hear! I was just thinking this morning about how the statistical world would feel about a bunch of programmers announcing that they were completely dissatisfied with the current state of statistical analysis, and demanding that statisticians begin "doing statistics right!" I anxiously await release 1.0 of SL. I'll need an implementation for my Sun-3/60, of course, and I'm running SunOS 3.4, if that helps. I would suggest that Mr. Rubin get a firm grounding before beginning his project. I would suggest as a basic curriculum a compilers course, including study of lexical analysis, grammar theory, top down and bottom up parsing, LL, LR, and LALR grammar generation, basic code generation, code optimizations, and flow analysis. Then, dive into a semester of machine architecture, including the fundamentals of stack, one, two, and three address machines, pipelining, the merits of CISC and RISC, and instruction set design. Since languages have users, you'll need some time in a few psychology courses, to study how people think about problems and perceive tools. It is also helpful to take a language survey course, and write a few programs in about 10 or 15 languages, just to get a feel for what is out there. A few operating systems courses will bring you up to speed on linkers and loaders, resource management, device interfacing, and I/O issues. Finally, some training in software methodology, so that your compilers will be easy to build, modify, and port, and so that you'll be able to include good design features in your language. After you have done all this, like I and countless other computing professionals have, then you are qualified to make the hard tradeoff decisions that arise in language design and implmentation. There is no perfect decision and you can't please all users. Like my old office mate used to say, you can't be all tools for all fools. This may seem a little harsh, and I guess it is because I'm a little peeved. The point is, until you fully understand an issue, you are only qualified to make suggestions. When a more qualified professional overrides your suggestion, you should defer to his or her opinion. I don't try to tell auto engineers how to build cars. I might have some ideas about how my car could be better, but if they say there is a good reason why not, that pretty much closes the issue for me. I'm simply not informed enough to dispute them. Chris Shaw also raises the point of effort versus payoff for a large group of users. I think Mr. Rubin is in a small majority of users, and it simply does not behoove system designers to expend the effort to support his desires. Classic case of supply and demand in a free market. Chuck Musciano ARPA : chuck@trantor.harris-atd.com Harris Corporation Usenet: ...!uunet!x102a!trantor!chuck PO Box 37, MS 3A/1912 AT&T : (407) 727-6131 Melbourne, FL 32902 FAX : (407) 727-{5118,5227,4004}
chuck@melmac.harris-atd.com (Chuck Musciano) (05/05/89)
In article <2024@trantor.harris-atd.com> chuck@trantor.harris-atd.com (Chuck Musciano) writes: >group of users. I think Mr. Rubin is in a small majority of users, and it ^^^^^^^^ Ahem. Make that "minority". Chuck Musciano ARPA : chuck@trantor.harris-atd.com Harris Corporation Usenet: ...!uunet!x102a!trantor!chuck PO Box 37, MS 3A/1912 AT&T : (407) 727-6131 Melbourne, FL 32902 FAX : (407) 727-{5118,5227,4004}
lfoard@wpi.wpi.edu (Lawrence C Foard) (05/06/89)
In article <2024@trantor.harris-atd.com> chuck@trantor.harris-atd.com (Chuck Musciano) writes: >In article <2249@pembina.UUCP> cdshaw@pembina.UUCP (Chris Shaw) writes: >..... > This may seem a little harsh, and I guess it is because I'm a little >peeved. The point is, until you fully understand an issue, you are only >qualified to make suggestions. When a more qualified professional overrides >your suggestion, you should defer to his or her opinion. I don't try to >tell auto engineers how to build cars. I might have some ideas about how >my car could be better, but if they say there is a good reason why not, that >pretty much closes the issue for me. I'm simply not informed enough to >dispute them. > >.... Come on who defines qualified professional??? Of course it will depend on how many expensive slips of paper you have (degrees). History is full of great idea's that where at first rejected by the "qualified professionals". If the car maker's only explaination is "it isn't done that way" there is no reason why you should except that for an answer. There may be a hundred reasons why SL might be a bad idea but you didn't name any you just stated that you where a qualified professional and that pions arn't allowed to question you! This isn't the dark ages where the pope was infallible, people no matter how well trained or professional can still make mistakes and many times it takes an outsider to see them. If you had brought the idea of Risc upto the "qualified professionals" in the VAX development team, I wonder what the answer would have been? -- Disclaimer: My school does not share my views about FORTRAN. FORTRAN does not share my views about my school.
cdshaw@alberta.UUCP (Chris Shaw) (05/06/89)
In article <1281@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: me>In article <2249@pembina.UUCP>, cdshaw@alberta.UUCP (Chris Shaw) writes: HR>>In article <1277@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes HR>>>And what if your users find that the language is inadequate? I know of HR>>>no adequate one. Do not cripple the programmer. me>> <Silly automotive analogy. Don't cripple the driver, etc.> HR>This is the wrong analogy. I do not ask for technological impossi- HR>bilities, or things requiring great cost. On the contrary, special cases that HR desires DO come at great cost. Read on. HR>The car has a steering wheel. I do not wish to put it under your control. Granted. Of course, cars aren't computers, so chasing this analogy is a waste of time. Explaining an analogy is worse than explaining a joke. me>>The point I'm illustrating is that if you want to do something that 99% of me>>the rest of the world is unwilling to pay for.... HR>What I am asking for is cheap. I am asking that you unchain the steering HR>wheel....Also, one person cannot produce a language.... I dispute this. Many one-person languages exist and are in use today. A one person language has the advantage of conceptual wholeness. Now I admit, it won't be easy to create SL all by yourself, but if it was easy, your solution would be for sale on PC's for $75. Or you could hack it in an afternoon. HR>I am not in a CS department. Funding for people in mathematics and statis- HR>tics to hire programmers is difficult and niggardly, even to hire students. I suppose the real thrust of my argument is that you can do anything you like, but if nobody else is interested, expect to do it yourself! HR states that he has no money for programmers, and I infer from this that he has either no time or no ability to do this himself. Perhaps he has neither, I don't know. But the latter statement from HR about needing $$ to implement his plan contradicts his earlier statement that he's not looking for something at great cost. In a nutshell: HR wants a niche product, and tacitly admits that substantial design and development time are required to produce the desired product. It's going to be expensive, because you can't amortize the development cost over a lot of customers. HR>I have made one concrete suggestion for something which is not a language,but HR>which would make for versatile relatively easy programming. It is a versatile HR>macro processor; maybe one should call it a user-controlled semi-compiler. HR>...an example for the use of a single machine instruction on the CYBER 205... I think that HR's goals conflict. He wants something portable but at the same time able to maximize machine performance. In other words, you need an expression syntax that is able to encompass all available modifications of a particular 205 instruction, plus all mods for Cray, plus SX-1, etc. But at the same time, portability is desired. What if 205 option A is not available on Cray? What do you do? Cut the expression into inefficient little bits? How is this different from normal C code? I guess my point is that if you only want nifty expressions for the 205, you're simply asking for a spoonful of syntactic sugar, assuming that the #asm thing allows proper data structure access. If the compiler-asm link is inadequate, then this is probably a straightforward system fix. I'd wager that the grail -- portable and maximally efficient -- is impossible. HR>I am asking for a means whereby a programmer can use a reasonable notation to HR>tell the modified assembler what instructions to use. Machine specific, unportable, and therefore window dressing, unless you are content with giving hints only. But this means you have to say the same thing twice. A hard problem. HR>Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 -- Chris (not a guru) Shaw cdshaw@alberta.UUCP (or via watmath or ubc-vision) University of Alberta CatchPhrase: Bogus as HELL ! -- Chris Shaw cdshaw@alberta.UUCP (or via watmath or ubc-vision) University of Alberta CatchPhrase: Bogus as HELL !
milne@ics.uci.edu (Alastair Milne) (05/06/89)
prem@crackle.amd.com (Prem Sobel) writes >Many years ago, there was a machine called the Interdata Model 70 which >had instructions for atomically adding or removing items from circular >double ended queues. The data structure was defined reasonable effeciently >and the machine was microcoded. > >Yet no one, no compiler seriously used these instructions. The reason was, >amazingly, that individual instructions were faster!!! I never looked at >the microcode, so I cannot commebnt why that was. Is this sort of finding not precisely the reason behind RISC architecture? The realisation that many an instruction to handle a relatively sophisticated job turned out to be slower than a series of very simple instructions to do the equivalent thing? I myself am not quite clear yet on why this should be ( is there perhaps a reason that larger, more sophisticated microcode routines can't be made optimally fast?). Yet it does seem to be a fact of life. Alastair Milne
cik@l.cc.purdue.edu (Herman Rubin) (05/06/89)
In article <2253@pembina.UUCP>, cdshaw@alberta.UUCP (Chris Shaw) writes: > In article <1281@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > me>In article <2249@pembina.UUCP>, cdshaw@alberta.UUCP (Chris Shaw) writes: > HR>>In article <1277@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes > HR>>>And what if your users find that the language is inadequate? I know of > HR>>>no adequate one. Do not cripple the programmer. > > me>> <Silly automotive analogy. Don't cripple the driver, etc.> > > HR>This is the wrong analogy. I do not ask for technological impossi- > HR>bilities, or things requiring great cost. > > On the contrary, special cases that HR desires DO come at great cost. Read on. > > HR>The car has a steering wheel. I do not wish to put it under your control. > > Granted. Of course, cars aren't computers, so chasing this analogy is a waste of > time. Explaining an analogy is worse than explaining a joke. > > me>>The point I'm illustrating is that if you want to do something that 99% of > me>>the rest of the world is unwilling to pay for.... So we should not train professional sports figures. We should not produce Ph.D.'s. We should not produce professional programmers. We should not produce doctors. More than 99% of the population is unable to use the training and education for these fields. Therefore we should not provide it for those who can use it. > HR>What I am asking for is cheap. I am asking that you unchain the steering > HR>wheel....Also, one person cannot produce a language.... > > I dispute this. Many one-person languages exist and are in use today. A one > person language has the advantage of conceptual wholeness. Now I admit, it won't > be easy to create SL all by yourself, but if it was easy, your solution would > be for sale on PC's for $75. Or you could hack it in an afternoon. If you think I want a language for statistics, you are wrong. I want a language in which a lot of things in mathematics and statistics can be done. And I do not want the language to restrict my thinking. You want a program to be portable. But what if only junk is portable? You would insist that a good program on a machine, easy to envision, not be used because another machine cannot use it efficiently. > HR>I am not in a CS department. Funding for people in mathematics and statis- > HR>tics to hire programmers is difficult and niggardly, even to hire students. It is not the problem of getting people to agree it is worthwhile. I can do that. It is the fact that universities will not fund things in mathematics and statistics; they insist that the money be obtained from the government. And the NSF budget does not permit the funding of enough faculty, and graduate students are more expensive than faculty. We have no problem getting computing time, but no faculty member in our department can get non-routine programming help. And we are considered one of the top statistics departments. >. It's going to be expensive, because you can't amortize the development > cost over a lot of customers. > > HR>I have made one concrete suggestion for something which is not a language,but > HR>which would make for versatile relatively easy programming. It is a versatile > HR>macro processor; maybe one should call it a user-controlled semi-compiler. I believe that I could do this with one CS hacker in a few months. > HR>...an example for the use of a single machine instruction on the CYBER 205... > > I think that HR's goals conflict. He wants something portable but at the same > time able to maximize machine performance. In other words, you need an > expression syntax that is able to encompass all available modifications of > a particular 205 instruction, plus all mods for Cray, plus SX-1, etc. > But at the same time, portability is desired. What if 205 option A is not > available on Cray? What do you do? Cut the expression into inefficient little > bits? How is this different from normal C code? > > I guess my point is that if you only want nifty expressions for the 205, you're > simply asking for a spoonful of syntactic sugar, assuming that the #asm > thing allows proper data structure access. If the compiler-asm link is > inadequate, then this is probably a straightforward system fix. I'd wager that > the grail -- portable and maximally efficient -- is impossible. I agree. But semi-portable and close to efficient is not impossible. You may have to abandon an algorithm completely on some machines. In some cases the algorithm may have to be entirely abandoned. This is especially the case in generating random variables, where it is the method which must be correct, NOT the answer. The CRAY-1 is so bad I consider it a basket case for vectorization. I would suggest connecting a mini, or even a micro, to do this part of the job. The algorithm I would use on the 205 differs from what I would use on the IBM 3090; the 205 algorithm could be implemented on the 3090, but would be decidedly more costly. Implementing the 3090 algorithm on the 205 would require breaking up the vector into short pieces, which is required on vector register machines such as the 3090 and CRAY, but is not required on vector stream machines such as the 205. But to me, these considerations are immediate and obvious. > HR>I am asking for a means whereby a programmer can use a reasonable notation to > HR>tell the modified assembler what instructions to use. > > Machine specific, unportable, and therefore window dressing, unless you are > content with giving hints only. But this means you have to say the same thing > twice. A hard problem. There is a heck of a lot of portability which can be achieved, but not in any of the current HLLs. The problem of adding vectors is portable, but the C code to do it efficiently on one machine is inefficient on another. In this respect C is too low level. The same holds for string management. But C does not handle transfers too well. There are situations where what is wanted is a branch to subroutine, not a subroutine call. C does not handle this at all. Here C is too high level. C can do recursion, but not coroutines. And what about a recursion in which one wishes to force the use of quantities unchanged in registers? The ideal language does not exist, and never will. Portability means using the same strategies with the automobile and airplane as with the horse and buggy. There may be similarities, but full portability is totally incompatible with even fair efficiency. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
cdshaw@alberta.UUCP (Chris Shaw) (05/07/89)
HR>In article <1287@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: me>In article <2253@pembina.UUCP>, cdshaw@alberta.UUCP (Chris Shaw) writes: me>The point I'm illustrating is that if you want to do something that 99% of me>the rest of the world is unwilling to pay for.... HR>So we should not train professional sports figures. We should not produce HR>Ph.D.'s. We should not produce professional programmers. We should not HR>produce doctors. More than 99% of the population is unable to use the HR>training and education for these fields. Therefore we should not provide HR>it for those who can use it. This corellation is too fallacious to be taken seriously. The training of professionals has demonstrable value to society, so society as a whole pays some of the expense for such training. By contrast, SL is an unproven concept. It's not even clear that its goals are achievable. The burden of proof for SL therefore rests upon its advocates. Therefore, they must pay all costs. HR>But what if only junk is portable? We agree on the technical aspects of SL, but we fundamentally disagree on its worth. HR wants a language and compiler system that programs as well in assembler as is humanly possible. He wants maximum efficiency. If necessary, machine-specific detail must be provided, since the paramount goal is speed. My point is that with a specification such as this, HR is really looking for a more friendly assembler, since the machine specific statements that MUST be provided for (say) the 205 are useless on the 3090. In other words, one must state the algorithm differently for each target machine. This is called unportability. So why not just use 205 assembler and not bother with SL at all? Well, I can see a few reasons why you'd want SL -- control, storage allocation, etc. But if half your code is 205 specific, then you're wasting your time with a high level language. A syntactically-sugared assembler is still an assembler. Why is portability important? Because one might want to run your program on a machine different from the one you originally wrote it for. HR>You (CDS) would insist that a good program on a machine, easy to envision, HR>not be used because another machine cannot use it efficiently. I have no idea where HR got this from what I was saying. What I'm saying is that this mythical easily-envisioned program must be written in one language that can be compiled on all machines. If for the sake of speed, the program is really 7 programs mutually excluded by #ifdefs, then this program could be better written in the assembly language for each of those 7 machines. In fact, if it's 7 different programs, then the choice of language is moot. The specifics of the compiler and the machine matter a lot more than the features of the language itself. HR>It is not the problem of getting people to agree it is worthwhile. I can do HR>that. (but) the NSF budget does not permit the funding of enough faculty.. Well, it would seem that SL is low on the list of priorities. A pity, perhaps, but I think that HR is tilting at windmills. me>I'd wager that the grail --portable and maximally efficient-- is impossible. HR>I agree. But semi-portable and close to efficient is not impossible. You HR>may have to abandon an algorithm completely on some machines. Then this is UNportable, and for good reason. HR's example of correct random numbers shows that you can't do some things portably. Fine. This is no different than device driver code. But if HALF your code is device driver code, why bother with portability at all? You can have portability or speed. Not both. The boundaries are being pushed, but I don't thing that a syntactic patch-up job will do what you want. HR>There is a heck of a lot of portability which can be achieved, but not in any HR>of the current HLLs. The problem of adding vectors is portable, but the C HR>code to do it efficiently on one machine is inefficient on another. I think this might have a lot to do with a lack of consensus on what is a "good" vector add instruction. Or maybe the C compiler sucks on one of the machines. Sue the compiler manufacturer. It's not the language's fault. Anyway, QUANTIFY "heck of a lot". Quantify that across 3 machines. You may be able to hand compile your C program nicely, but can a program (or 3 programs) do it? HR>But C does not handle transfers too well. There are situations where what is HR>wanted is a branch to subroutine, not a subroutine call. C does not handle HR>this at all. And what about a recursion in which HR>one wishes to force the use of quantities unchanged in registers? This is bogus. The implementation of a subroutine call is up to the compiler writer, and a bad implementation does not imply that C is bad. HR>There may be similarities, but full portability is totally incompatible HR>with even fair efficiency. Exactly. So what's your point? You want your cake and eat it too, it seems, and are crying because nobody will deliver. Tough bananas. Prove how stupid the rest of the world is, and come up with a better alternative. I doubt you can do it, because the maximal use of a particular machine probably requires too much specific and hence non-portable information about that machine. If portability is not your goal, then who cares? Assembler fits your bill. >Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 -- Chris Shaw cdshaw@alberta.UUCP (or via watmath or ubc-vision) University of Alberta CatchPhrase: Bogus as HELL !
dave@lethe.UUCP (Dave Collier-Brown) (05/07/89)
In article <1281@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: HR>I have made one concrete suggestion for something which is not a language,but HR>which would make for versatile relatively easy programming. It is a versatile HR>macro processor; maybe one should call it a user-controlled semi-compiler. HR>...an example for the use of a single machine instruction on the CYBER 205... In article <2253@pembina.UUCP> cdshaw@pembina.UUCP (Chris Shaw) writes: >I guess my point is that if you only want nifty expressions for the 205, you're CS>simply asking for a spoonful of syntactic sugar, assuming that the #asm CS>thing allows proper data structure access. If the compiler-asm link is CS>inadequate, then this is probably a straightforward system fix. I'd wager that CS>the grail -- portable and maximally efficient -- is impossible. I'm inclined to agree that portability and maximal "fit" to the machine are probably incompatable, but I rather like the idea of using a general-purpose macroprocessor to mock up a mini-language. Andy Tannenbaum (of Minix fame) used to suffer from similar financial strictures, and used the same solution. He carefully misused a macroprocessor. (The article, I believe, was in SP&E many moons ago). --dave (1: this is getting awfully far from architecture 2: I'd use lisp to write the translator, myself) c-b -- David Collier-Brown, | {toronto area...}lethe!dave 72 Abitibi Ave., | Joyce C-B: Willowdale, Ontario, | He's so smart he's dumb. CANADA. 223-8968 |
dean@columbine.Berkeley.EDU (R. Drew Dean) (05/07/89)
In article <1287@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >I agree. But semi-portable and close to efficient is not impossible. You >may have to abandon an algorithm completely on some machines. In some cases >the algorithm may have to be entirely abandoned. This is especially the case >in generating random variables, where it is the method which must be correct, >NOT the answer. The CRAY-1 is so bad I consider it a basket case for >vectorization. I would suggest connecting a mini, or even a micro, to >do this part of the job. The algorithm I would use on the 205 differs >from what I would use on the IBM 3090; the 205 algorithm could be implemented >on the 3090, but would be decidedly more costly. Implementing the 3090 >algorithm on the 205 would require breaking up the vector into short pieces, >which is required on vector register machines such as the 3090 and CRAY, but >is not required on vector stream machines such as the 205. But to me, these >considerations are immediate and obvious. > >-- >Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Can you please provide examples of the different algorithms you use on these different machines ? I refer you to the ACM Turing Lecture book, where I paraphrase from (I forget whose lecture this was, the book is in the library where I'm not, but if my memory hasn't dropped too many refresh cycles it's McCarthy's...) Look at generating Fibonacci numbers. The obvious algorithm F(n) = F(n-1) + F(n-2) generates a process with exponential running time, but this can be transformed into a process with linear running time...[new algorithm omitted] I don't care how great your "optimizing" compiler is -- a compiler which sees this transformation (reducing the order of growth), generating not necessarily perfect code, will trounce your compiler.... Note: Ok, this would be a really difficult compiler to write....I don't know of anybody doing it yet. But it's the principle that matters here: It's _not_ how micro-efficient your code is that matters. If it's too slow, rethink the algorithm. That's where the real difference is -- if you reduce the order of growth, wonderful things start happening. If your random variable generator uses an algorithm that executes in the least known time, then please forgive this posting.... Drew Dean Internet: dean@xcssun.berkeley.edu UUCP: ...!ucbvax!xcssun!dean FROM Disclaimers IMPORT StandardDisclaimer;
cik@l.cc.purdue.edu (Herman Rubin) (05/07/89)
In article <13396@pasteur.Berkeley.EDU>, dean@columbine.Berkeley.EDU (R. Drew Dean) writes: > In article <1287@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: < <I agree. But semi-portable and close to efficient is not impossible. You < <may have to abandon an algorithm completely on some machines. In some cases < <the algorithm may have to be entirely abandoned. This is especially the case < <in generating random variables, where it is the method which must be correct, < <NOT the answer. The CRAY-1 is so bad I consider it a basket case for < <vectorization. I would suggest connecting a mini, or even a micro, to < <do this part of the job. The algorithm I would use on the 205 differs < <from what I would use on the IBM 3090; the 205 algorithm could be implemented < <on the 3090, but would be decidedly more costly. Implementing the 3090 < <algorithm on the 205 would require breaking up the vector into short pieces, < <which is required on vector register machines such as the 3090 and CRAY, but < <is not required on vector stream machines such as the 205. But to me, these < <considerations are immediate and obvious. > Can you please provide examples of the different algorithms you use on these > different machines ? I am in the process of producing a report on this. I will send a copy to anyone who requests it. It is possible I may post (not to this group) an abbreviated source form. > I refer you to the ACM Turing Lecture book, where I paraphrase from (I forget > whose lecture this was, the book is in the library where I'm not, but if > my memory hasn't dropped too many refresh cycles it's McCarthy's...) > Look at generating Fibonacci numbers. The obvious algorithm > F(n) = F(n-1) + F(n-2) generates a process with exponential > running time, but this can be transformed into a process with > linear running time...[new algorithm omitted] I don't care > how great your "optimizing" compiler is -- a compiler which > sees this transformation (reducing the order of growth), generating > not necessarily perfect code, will trounce your compiler.... > Note: Ok, this would be a really difficult compiler to write....I don't know > of anybody doing it yet. But it's the principle that matters here: > It's _not_ how micro-efficient your code is that matters. If it's > too slow, rethink the algorithm. That's where the real difference is -- > if you reduce the order of growth, wonderful things start happening. This is definitely the case if you want only F(1384257). I am not sure if you want F(100). But if you want F(1), ..., F(100), I suggest you might want to use the obvious algorithm. Vector machines are another problem; I have no idea how to vectorize this algorithm. On some vector machines, I can come up with a safe way to do it, but I do not know how good it is. > If your random variable generator uses an algorithm that executes in the > least known time, then please forgive this posting.... Given sufficient resources, I can come arbitrarily close to the algorithm which uses the least possible expected number of bits. Possibly even the "arbitrarily close" can be removed in an algorithm with a finite expected time. The algorithms which are bit efficient are not likely to be time efficient, although I have some hope for some being able to do a fair job of both.. Technical reports on this are available. available, although incomplete. > Drew Dean > Internet: dean@xcssun.berkeley.edu > UUCP: ...!ucbvax!xcssun!dean > FROM Disclaimers IMPORT StandardDisclaimer; -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
mike@thor.acc.stolaf.edu (Mike Haertel) (05/07/89)
In article <1287@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >There is a heck of a lot of portability which can be achieved, but not in any >of the current HLLs. The problem of adding vectors is portable, but the C >code to do it efficiently on one machine is inefficient on another. In this >respect C is too low level. The same holds for string management. > >But C does not handle transfers too well. There are situations where what is >wanted is a branch to subroutine, not a subroutine call. C does not handle >this at all. Here C is too high level. C can do recursion, but not >coroutines. And what about a recursion in which one wishes to force >the use of quantities unchanged in registers? You seem to have the language C confused with the existing implementations of C. Granted, coroutines cannot be expressed in C. But there is no reason a C compiler couldn't generate more efficient procedure calls by using the simpler instructions, or do smart register allocation across multiple procedures. Saying "C is inadequate because it leads to inefficient code" is like saying "Fortran is good because it generates efficient numeric code." I bet I could write a Fortran compiler that generated awful numeric code! The Lisp community has understood this difference for years. As long as anyone can remember people have been saying "Lisp is slow." Well, the MACLISP compiler (on the PDP-10) generated better numeric code than DEC's Fortran compiler. Please try not to confuse a language itself (an abstraction, really) with the implementations you have seen. -- Mike Haertel <mike@stolaf.edu> main() ??< printf("hello, world??/n"); ??>
aglew@mcdurb.Urbana.Gould.COM (05/08/89)
>prem@crackle.amd.com (Prem Sobel) writes >>Many years ago, there was a machine called the Interdata Model 70 which >>had instructions for atomically adding or removing items from circular >>double ended queues. The data structure was defined reasonable effeciently >>and the machine was microcoded. >> >>Yet no one, no compiler seriously used these instructions. The reason was, >>amazingly, that individual instructions were faster!!! I never looked at >>the microcode, so I cannot commebnt why that was. > > Is this sort of finding not precisely the reason behind RISC architecture? > The realisation that many an instruction to handle a relatively > sophisticated job turned out to be slower than a series of very simple > instructions to do the equivalent thing? > > I myself am not quite clear yet on why this should be ( is there perhaps a > reason that larger, more sophisticated microcode routines can't be made > optimally fast?). Yet it does seem to be a fact of life. > > > Alastair Milne The Gould NP1 had queue manipulation instructions that provided atomicity. And, yes, the OS avoided using them, because a hand-crafted set of individual instructions were faster. But the CISCy queue manipulation instructions were left in because the hand crafted sequences used priviliged operations not callable from user mode. Creating a system call to do the same operation was more expensive than leaving the microcoded instructions - especially since the system call would have had to do privilige checks that the hand-crafted inside-the-OS version didn't. Providing user-level access to these instructions was only important because Gould's Real-Time Simulator customers regularly did explicit parallel programming (the queue operations were message primitives, if you will) - and they *did* *not* want to have to run their entire application priviliged. Now, I would much rather have decomposed these CISCy operations into a RISCy set of primitives that might be used to implement other multiprocessing primitives. But, the normal arithmetic and logic primitives are not sufficient. Several times I have attempted to start discussion in comp.arch about what such primitives might be - here is what I remember of the previous discussions (I have the highlights archived, but not easily accessible). It may be time for another go-around. RISCy PRIMITIVES FOR PARALLEL PROCESSING ======================================== Active Memory ------------- Some manufacturers implement special kinds of memory that provide extended semantics suitabler for multiprocessing - like Sequent's MULTIBUS semaphore locations. Somebody at MIPS said someething similar, but it wasn't clear if the approach was actually used. PRO: no extra CPU operations. CON: resource allocation problem - especially if you want to make these accessible to ordinary user processes - and if the active memory is limited in size (the HEP's full/empty bits on all memory locations wouldn't suffer this). Atomicity Boundaries -------------------- Letting the user block interrupts, but only for a maximum amount of time (erroring if still blocked at the end of the max time) has been tried, eg. in Honeywell and Norsk Data(?) machines. Similarly, architecting that interrupts may occur only at, say, modulo N instruction boundaries (either static or dynamic - static requiring special provision for loops) can let the user specify some uninterruptable (atomic wrt interrupts on the same processor) operations, with help from language system to locate operations between the appropriate boundaries. Apparently the ARM does something similar, only checking for interrupts at branches. Obviously, approaches similar to those for interrupts can be applied to multiple processor synchronization. Split Synchronization --------------------- My contribution has been to point out that such synchronization operations may take a long time, and may be split into START-SYNCHRONIZATION and STOP-SYNCHRONIZATION operations - and that other instructions may be placed between these synchronization points. So you do not necessarily have to stop the entire processor when performing one of these operations. Recently a similar idea appeared in ASPLOS III in the paper titled "Fuzzy Barriers..." Andy "Krazy" Glew aglew@urbana.mcd.mot.com uunet!uiucdcs!mcdurb!aglew Motorola Microcomputer Division, Champaign-Urbana Design Center 1101 E. University, Urbana, Illinois 61801, USA. My opinions are my own, and are not the opinions of my employer, or any other organisation. I indicate my company only so that the reader may account for any possible bias I may have towards our products.
schow@bnr-public.uucp (Stanley Chow) (05/08/89)
In article <504@daitc.daitc.mil> jkrueger@daitc.daitc.mil (Jonathan Krueger) writes: >In article <426@bnr-fos.UUCP>, schow@bnr-public (Stanley Chow) writes: >>it is not necessary for all programmers to know enough to use >>the specail instructions. Some guru can set it up and everyone can >>then use the pre-built modules. Depending on how you set everything >>up, the compiler does not have to be smart at all. > >Of course, but you lose the performance advantage if the modules get >called frequently and require significant setup (e.g. can't maintain >state for multiple callers or sequences) with respect to work >performed. Or to put it another way, That Way Lies Threaded Code. >However, this isn't an argument for Better Living Through Inlining, >rather an appeal for analysis and measurement of the tradeoffs. > This depends a lot on the system and language you use. We spend considerable time to make sure the utilities are suited to the application. (This is easy, since this whole system is a single application, and we have measured it to death). In our operating system, maintaining states for multiple callers is easy and essentially free. Most of the funny opcodes can be easily generated by a peepholer and the really strange opcodes are what we call Intrinsics - half way between macros and inlining. >>For example, we have a system with really wild instructions that gets >>generated even though the user don't know what they are and the >>compiler does only peephole optimization. As is turns out, having >>atomic operations for free is really nice. > >Sounds like you've managed to let your users express what they want in >terms familiar to them but also understandable to the language >translator, following which optimization to particular hardware can be >done in a reasonable way. By any chance, would this be done by way of >a function library? We use just about every trick in the book and some that aren't. Since we have a project that is big enough to support our own language and operating system, we have a lot of freedom. Usually, we figure out the hardware/firmware limits, decide how we want it in the language, then change the language/compiler and operating system to suit. Sorry I can't give details. Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!psuvax1!BNR.CA.bitnet!schow (613) 763-2831 ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public I am just a small cog in a big machine. I don't represent nobody.
schow@bnr-public.uucp (Stanley Chow) (05/08/89)
Since this is moving from architecture to language, I have set Follow-up: to comp.lang.misc. In article <1277@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >In article <504@daitc.daitc.mil>, jkrueger@daitc.daitc.mil (Jonathan Krueger) writes: >> In article <426@bnr-fos.UUCP>, schow@bnr-public (Stanley Chow) writes: >> >it is not necessary for all programmers to know enough to use >> >the specail instructions. Some guru can set it up and everyone can >> >then use the pre-built modules. Depending on how you set everything >> >up, the compiler does not have to be smart at all. > >I agree it is not necessary for ALL programmers to know how to really >use the computer, any more than it is necessary for the house painter >to be an artist. But the artist cannot set up the modules for a >house painter to produce a work of art. > This is probably a major difference in opinion between groups. You are an artist and feel constrained by the languages that are designed for painting houses. Much of the programming in this world is done by house painters as opposed to artists. In the case of our project, all "artists" who wants/needs to are turned into guru's and set up utilities for themselves and others to use. The system is by no means frozen. I can understand your pleas for flexibility. Forturnately, I have quite a bit of it. I am allowed to change (or at least influence) everything from hardware, firware, language, compiler, operating system through to any application code. If someone sees a new use for some instructions, we change the utilities or compiler as needed. New hardware that will be usefull? We will just throw it into the next revision. Having said that, I must point out that we are (very?) unusual. Most projects are not big enough to support this style of working. if you have to rely on a commercial compiler; chances are the compiler was not optimized for your needs. This means the compiler writer had to trade-off many different demands. Unforturnately (from your point of view), portability is high priority, good debuger is important, good object code from protable source is compulsory. Since most people don't care about flexibility to take advantage of different architecutes, you don't get that from most commercial compilers. Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!psuvax1!BNR.CA.bitnet!schow (613) 763-2831 ..!utgpu!bnr-vpa!bnr-fos!schow%bnr-public I am just a small cog in a big machine. I don't represent nobody.
dietz@cs.rochester.edu (Paul Dietz) (05/09/89)
In article <13396@pasteur.Berkeley.EDU> dean@columbine.Berkeley.EDU (R. Drew Dean) writes: > Look at generating Fibonacci numbers. The obvious algorithm > F(n) = F(n-1) + F(n-2) generates a process with exponential > running time, but this can be transformed into a process with > linear running time...[new algorithm omitted] You can generate the nth Fibonacci number using O(log n) arithmetic operations (albeit operations on numbers with O(n) bits). Hint: if A is the 2x2 matrix ((0 1) (1 1)), the lower right corner of A^n contains Fib(n+1). Paul F. Dietz dietz@cs.rochester.edu
josh@klaatu.rutgers.edu (J Storrs Hall) (05/09/89)
Paul F. Dietz writes:
You can generate the nth Fibonacci number using O(log n) arithmetic
operations (albeit operations on numbers with O(n) bits).
Hint: if A is the 2x2 matrix ((0 1) (1 1)), the lower right corner
of A^n contains Fib(n+1).
Or, using the same trick, generate all the fibonacci numbers up to
the nth in log n time in arithmetic vector operations.
However, there is a catch: the arithmetic operations are multiply
rather than add. And since we're talking about architecture here,
remember that for any of this to make any difference at all you
must be considering values of n where the size of the number is way
over any reasonable word size. Every arbitrary-precision arithmetic
package I've seen has linear add and quadratic multiply. Given
that, both algorithms are quadratic in n.
--JoSH
chuck@melmac.harris-atd.com (Chuck Musciano) (05/10/89)
In article <2223@wpi.wpi.edu> lfoard@wpi.wpi.edu (Lawrence C Foard) writes: >Come on who defines qualified professional??? Of course it will depend on how >many expensive slips of paper you have (degrees). History is full of great >idea's that where at first rejected by the "qualified professionals". If the >car maker's only explaination is "it isn't done that way" there is no reason >why you should except that for an answer. There may be a hundred reasons why >SL might be a bad idea but you didn't name any you just stated that you where >a qualified professional and that pions arn't allowed to question you! This >isn't the dark ages where the pope was infallible, people no matter how well >trained or professional can still make mistakes and many times it takes an >outsider to see them. If you had brought the idea of Risc upto the "qualified >professionals" in the VAX development team, I wonder what the answer would have >been? "Qualified professional" is a term defined by other qualified professionals. If they came from real schools, those "expensive slips of paper" represent large amounts of knowledge. When you have the slips of paper, you are a graduate, nothing more. After you go into the world and do something with the knowledge, you begin to attain the rank of professional. In my example, the car maker doesn't necessarily say "it isn't done that way", he will probably say "it can't be done that way". I don't have enough knowledge to dispute him. If he is a responsible professional, he'll mull over my suggestions, and perhaps modify his thinking. If I'm an interested third party, I'll educate myself to better understand the problem. I didn't say anything about SL. I can't, since it doesn't exist. The point I was trying to make is that before one begins taking aim at langauge designers, one should learn a lot about language design. Without sufficient background, you simply cannot comment. I can't count the times people have come to me with some new language feature, not realizing the impact on the rest of the language. Lots of things that people think are simple represent massive work, based upon the way a particular compiler has been built. Other features are completely incompatible with the language philosophy and aesthetics. Still others are not extensible to include existing language features. My job as a professional is to sift the wheat from the chaff, and do the things that are right. Anyone can feel free to question my work. The weight I give their questions is commensurate with their professional background. I try to give an acceptable answer to anyone who asks me things, couched in a way that matches their knowledge. When my Mom asks how computers work, I don't launch into a discussion of the impact of cache loading on pipeline stalls. When a coworker asks why I chose a particular algorithm in some piece of code, I'll go into great detail, as needed. My wife is an industrial engineer. I don't presume to tell her how to design automated factories. She doesn't tell me how to build compilers. I might offer some input when she is designing assembly method sheets, since the psychology of instruction writing is somewhat similar to the psychology of user interface design. In that realm, we are qualified peers. Certainly, professionals make mistakes. I've made my share. Almost all of them have been pointed out by other, equally qualified professionals. This field is so complex, the issues so detailed, it is difficult for laymen to understand the subtle shadings particular issues have. While the merits of the Vax architecture design have been debated in this group ad infinitum, I would still hazard the guess that the Vax team would have correctly rejected RISC had it been presented to them. They were filling a particular market niche that RISC, at that time, did not meet. That isn't to say that they would not have given it a lot of thought, and perhaps begun to think of their next machine which might use RISC technology. Chuck Musciano ARPA : chuck@trantor.harris-atd.com Harris Corporation Usenet: ...!uunet!x102a!trantor!chuck PO Box 37, MS 3A/1912 AT&T : (407) 727-6131 Melbourne, FL 32902 FAX : (407) 727-{5118,5227,4004}
jkrueger@daitc.daitc.mil (Jonathan Krueger) (05/10/89)
In article <2254@pembina.UUCP>, cdshaw@alberta (Chris Shaw) writes: >HR>It is not the problem of getting people to agree it is worthwhile. I can do >HR>that. (but) the NSF budget does not permit the funding of enough faculty.. > >Well, it would seem that SL is low on the list of priorities. A pity, perhaps, >but I think that HR is tilting at windmills. Well, let's just say that Knuth's ideas about computer typesetting are better understood with reference to TeX. It exists, it runs, and it can be critically examined. And he didn't complain on the net until someone else built it. -- Jon --