cik@l.cc.purdue.edu (Herman Rubin) (07/08/89)
In article <13970@haddock.ima.isc.com>, suitti@haddock.ima.isc.com (Stephen Uitti) writes: > >For division with remainder, the code should > >be machine language, not C. Would you use C to do machine-independent > >implementation of division? You would not think of it. Neither would > >I think of making similar operations such as frexp, divrem, exponentiation, > >absolute value, integer part, etc., machine independently coded in C. > >I would not consider coding the =UP above in C. It is an additional > >primitive operation, like +, *, /, &. > > Not only would i think of it, i've done it. A check is made to see that > the compiler really generates the right instructions. Of course algorithms should be checked. And it is quite likely that the checking procedure will be relatively inefficient. But to include a fantastically slow machine independent code in a VAX-specific source library is criminal, especially since even a klutz understanding the VAX instructions would produce an algorithm beating it always. And making abs and fabs function calls! This in a widely-distributed product! It is not at all difficult to give natural problems in which the frexp provided would be 1000 times as slow as crude inline code. And frexp was the first thing done in log and sqrt in 4.2BSD. ................... > While i find such papers hard to read (especially since the > definition of the new notation syntax is seldom given), for > programs it is worse. Even with the C preprocessor's limited > capabilities, one can write code that doesn't bear any > resemblence to C. Maintanence of this code is nearly impossible. > Thus, sadly, notation flexibiltiy and maintainability are at odds. If the definition of the new notation is not given, it is considered by the referee and associate editor not to be new. It is expected that someone with the necessary background to read the paper already knows the notation. I agree that this is not always the case, but even here the author typically refers to previous papers for the notation. > >A simple example of the lack of desirability > >of C portability is the addition of two vectors. The good coding of this > >depends on the costs of the *x++ in loading and storing and the use of > >index in loading and storing. These depend both on the machine and the > >storage classes of the adresses. > > The code sequence is: > > void lvecadd(a, b, c, s) /* a = b + c, length s */ > long *a; long *b; long *c; long s; > { > do { > *a++ = *b++ + *c++; > } while (--s != 0); > } I am afraid I will have to give you a D- on this. Most of the time I would not even bother with a call, considering the code length. But the code is bad for vectors of length >2. How about { end = a + s; do { *a++ = *b++ + *c++; } while (a < end); } I hope I am not unusual, but I consider things like this extremely obvious. > or: > > void lvecadd(a, b, c, s) /* a = b + c, length s */ > long *a; long *b; long *c; long s; > { > register long t; > for (t = 0; i < s; t++) > a[t] = b[t] + c[t]; > } > > A '205 C compiler should (doesn't but should) take either > piece of code and translate it into the instruction. This was not the question. I find the first code natural and efficient on a VAX, and the second natural and efficient on a PYRAMID. This from reading the description of the addressing modes on the two machines. > I've seen a C compiler which did better for the second example > than the first. What was odd was that the target machine, a > 68000, does better with the first algorithm. Worse, the second > peice of code was effectively translated into the first, using > redundant temporary variables for the pointers. Somehow, the > compiler thought that extra work had to take place for the first > example. The point, though, is that the programmer visible > syntax need not bear any resemblence to the quality of the > produced code. I have yet to see a decent compiler. However, I see no reason why the human programmer should have any difficulty with this type of problem. Is it the case that our training of programmers removes their native ability to see things like this? > Portability has its costs - usually about 10 to 20% of the > optimum machine speed. Compiler technology has been attempting > to narrow this. However, it has enourmous benifits. UNIX, being > somewhat portable, allows new architectures to become useful in > an extremely short time after manufacture (something like a > year). The speed penalty that comes from UNIX not being designed > for the architecture is offset by the years of development that > it takes to create a new OS. By that time, the architecture may > well be obsolete. User applications are the same way. The value > of using a portable language is more than just being able to use > code that was written on one machine on another. It takes time > for programmers to become proficient in a language. If you write > a new language for every program you write, no one will be able > to make heads or tails of what you've done. I can put up with 10 to 20%. But I do not see that. I see 50% or more. It sure does take time for a programmer to become proficient in a language. Languages are complicated. And there is no way the language writers can anticipate what is needed. They make no attempt to find out what additional features the knowledgeable users want. They complain, rightly, that mathematicians do not use the computer enough, and then go out of their way, almost, to leave out what is wanted. This extends to the hardware people, also. I was told by an IBM physicist that the IBM scientific people had complained about the total lack of communication on the 360 and its successors between the integer registers and the floating point registers. At that time, the fix would have been easy. Another example: if a machine has hardware division, it should have hardware square root. SOME hardware people see this as obvious, but others do not see why there is any difference between square root and transcendental functions. As far as operating systems, maybe UNIX is a simplified version of MULTICS, but I still consider it horribly complicated. I believe that the OS for a machine should be of the KISS variety, with everything else added on. And the syntax! Who would use mv for rename? I will not go into further details. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
suitti@haddock.ima.isc.com (Stephen Uitti) (07/09/89)
In various articles, cik & suitti argue... >>suitti >cik >> The code sequence is: ... >> >> void lvecadd(a, b, c, s) /* a = b + c, length s */ >> long *a; long *b; long *c; long s; >> { >> do { >> *a++ = *b++ + *c++; >> } while (--s != 0); >> } For the VAX: L18:addl3 (r9)+,(r10)+,r0 movl r0,(r11)+ decl r8 jneq L18 > >I am afraid I will have to give you a D- on this. Most of the time I would >not even bother with a call, considering the code length. But the code is >bad for vectors of length >2. How about >{ > end = a + s; > do { > *a++ = *b++ + *c++; > } while (a < end); >} L26:addl3 (r9)+,(r10)+,r0 movl r0,(r11)+ cmpl r11,r7 jlss L26 These are the loops as coded by the VAX PCC based C compiler. I've no idea why it felt the movl was needed. In any case both code fragments compile to the same number of instructions. My decl has fewer arguments than your cmpl. Without running the test, there is no way to know which will run faster. It depends on which VAX. A microVAX might (and probably does) have different relative times than a 780. Still, i expected the compiler to use some sort of subtract one and branch if not zero instruction instead of a decrement, and a branch. Unfortunately, i didn't check my VAX architecture handbook in advance, and i'd forgotten (through disuse) that the VAX has an SOBGTR rather than an SOBNE. I should have used } while (s > 0); On a PDP-11, the "SOB" instruction uses a test for not equal to zero. Further, on a PDP-11 (and many other machines) an extra variable may strain the register resources of the machine requiring some high use variable be put into slower storage. A really good compiler should have been able to convert the end condition and use the right instruction. The VAX PCC based compiler is not a "really good compiler". Also, i was considering inlined code, even if originally presented as a function. If the vector lengths are large (and experience shows that they never are), it doesn't matter if it is a subroutine or not, just as the computation for 'end' in your example is immaterial. Yeah, i know, facts in an argument. Fortunately, i went to a school that didn't have letter grades. It was pass/fail. Many schools, including Purdue, often treat their grad students with some dignity and respect. I had the good fortune to find an undergrad school that did so. I never had to argue with instructors. It was always real clear if i'd passed or failed (did i do the work? yes: pass, no: fail). Thus, i never got a D- for code that worked. Once, however, in high school, i managed to get an A (best you could get), with a U (unsatisfactory) for effort. Stephen.
hascall@atanasoff.cs.iastate.edu (John Hascall) (07/09/89)
In article <13979@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes: >In various articles, cik & suitti argue... >>> The code sequence is: ... >>> >>> void lvecadd(a, b, c, s) /* a = b + c, length s */ >>> long *a; long *b; long *c; long s; >>> { >>> do { >>> *a++ = *b++ + *c++; >>> } while (--s != 0); >>> } >For the VAX: >L18:addl3 (r9)+,(r10)+,r0 >movl r0,(r11)+ >decl r8 >jneq L18 >>I am afraid I will have to give you a D- on this. Most of the time I would >>not even bother with a call, considering the code length. But the code is >>bad for vectors of length >2. How about >>{ >> end = a + s; >> do { >> *a++ = *b++ + *c++; >> } while (a < end); >>} >L26:addl3 (r9)+,(r10)+,r0 >movl r0,(r11)+ >cmpl r11,r7 >jlss L26 Of course, both routines fall in the dumper if the vector length is 0 :-) void lvecadd(a, b, c, s) register long *a, *b, *c, s; /* a[0..s] = b[0..s] + c[0..s] */ { if (s <= 0) return; /* never trust your callers */ do { *a++ = *b++ + *c++; } while (--s > 0); } giving (using VAX C V2.4-026 [VMS]): 0000 lvecadd: 007C 0000 .entry lvecadd,^m<r2,r3,r4,r5,r6> 5E 04 C2 0002 subl2 #4,sp 53 04 AC D0 0005 movl 4(ap),r3 52 08 AC D0 0009 movl 8(ap),r2 51 0C AC D0 000D movl 12(ap),r1 50 10 AC D0 0011 movl 16(ap),r0 01 14 0015 bgtr sym.1 04 0017 ret 0018 sym.1: 83 81 82 C1 0018 addl3 (r2)+,(r1)+,(r3)+ 50 D7 001C decl r0 F8 14 001E bgtr sym.1 I'm not sure why "sobgtr r0,sym.1" wasn't used for the last two instructions, is "decl r0 / bgtr sym.1" faster?? Does the optimizer not recognize the decrement/branch sequence?? [maybe someone with a newer version of the compiler could try it.] Why are r4,r5,r6 being saved? And what the devil is "subl2 #4,sp" for? Can't the optimizer clean up after itself?? At least it got the important part (the loop) right by using (rn)+ for all the operands. John Hascall