terry@wsccs.UUCP (terry) (02/23/88)
We have been porting software at the company I work for for a number of years... consider: our primary product runs on over 130 unix machines, all of the 286/386 UNIX cloning OS's (UNIX/Xenix/Venix/etc), Berk, SysV, SysIII, VMS, CTOS, BTOS, MS-DOS, CP/M, MP/M, Turbodos, Whatever Apple calls their normal MAC OS, and some nasty things you never heard of. We also run on one purportedly RISC system, the RT (IBM). Further, it is communications software which EMULATES 9 terminals EXACTLY (no, it is NOT the simplistic 'cu', and NO, we do not do 132 columns on a VT50) as possible given the limitations of the hardware. This means it has to talk to some pretty bizzarre serial/ethernet devices. It does this correctly. At 9600+ baud. [Lest this be consired an ad, let me point out that 1) I have not mentioned the product name and 2) the reader is supposed to glean an idea of portablity. It is the same code on all machines except CP/M and MP/M, and they don't count] We have been able to port to all machines we attempted, except one (no, not VMS... although that's a horse of a different wheelbase). That one was SUN's new (not so new any more) RISC machine, errorneously labeled via an increment of a prior product, thus fooling the user into believing his code will run. During our many-houred visit in this strange dimension (a usual port takes no longer than 2 hours usually... the Harris HCX9 took 15 minutes, including writing 5 tapes), we attempted to port 5 products, not one of which runs on less than 60 machines. The only one that didn't grunk (read Segmentation fault, core dumped) was the one I have described above in great gory detail, and that took tweaking. We currently do not distribute this due to an insane fear of technical support calls, although, again, it does run. THE REASON: Type-casting. You can't. Small programs seem to, but it doesn't work. Bytes tend to be word aligned. Other messy stuff. It was not a pretty sight (site?). I am sure there are other problems, but geez, this is demonstrably portable code. I am all for RISC machines when reasonably implimented. My idea of RISC is an instruction set that is sufficently small to allow the manufacturer to call it RISC and not get sued, but sufficiently varied to allow me to go off and have the assembler impliment enough macro's that my compiler thinks it's running on a 680x0. ----Oh rats! If I can't have that, at LEAST my portable C compiler should be. Sun must have some good people to be able to have ported a semblance of UNIX to this thing. I shudder when I hear people say "Won't it be neat when we can buy a RISC workstation based on the Sun chip!". EEAaauuUGGhah! | Terry Lambert UUCP: ...!decvax!utah-cs!century!terry | | @ Century Software or : ...utah-cs!uplherc!sp7040!obie!wsccs!terry | | SLC, Utah | | These opinions are not my companies, but if you find them | | useful, send a $20.00 donation to Brisbane Australia... | | 'There are monkey boys in the facility. Do not be alarmed; you are secure' |
steve@nuchat.UUCP (Steve Nuchia) (02/28/88)
From article <179@wsccs.UUCP>, by terry@wsccs.UUCP (terry): [ lots of self-congratulation about how portable his code is, followed by complaints that it isn't portable to the SPARC ] > THE REASON: Type-casting. You can't. Small programs seem to, but it doesn't > work. Bytes tend to be word aligned. Other messy stuff. It was not a > pretty sight (site?). I am sure there are other problems, but geez, this is > demonstrably portable code. ^^^^^^^^^^^^^^^^^^^^^^^^^^ FLAME ON! ( I love this! ) WRONG. It is demonstrably NON-portable code - it failed to port to a working compiler on a reasonable machine. If the bloody unix kernel runs (and it does) your silly application should, too. FLAME OFF. How about some code fragments and a discussion of how they meet the standards (de facto or otherwise) and how the SPARC compiler fails to properly implement them. Get a clue - portable doesn't mean "runs on X processors", it means "conforms to standards". -- Steve Nuchia | [...] but the machine would probably be allowed no mercy. uunet!nuchat!steve | In other words then, if a machine is expected to be (713) 334 6720 | infallible, it cannot be intelligent. - Alan Turing, 1947
cruff@scdpyr.UUCP (Craig Ruff) (02/28/88)
In article <696@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: >From article <179@wsccs.UUCP>, by terry@wsccs.UUCP (terry): >[ lots of self-congratulation about how portable his code is, followed > by complaints that it isn't portable to the SPARC ] > >> THE REASON: Type-casting. You can't. > >FLAME ON! ( I love this! ) > >WRONG. It is demonstrably NON-portable code - it failed to port >to a working compiler on a reasonable machine. If the bloody >unix kernel runs (and it does) your silly application should, too. > >FLAME OFF. I've just ported several thousand lines of code to the Sun-4. The only problem I had was in the use of the ndbm library. I was doing a structure copy with the dptr result from dbm_fetch, which turned out to be non-word aligned. Of course the documentation does not state that the dptr will be word aligned, in fact, it doesn't state anything at all about this. Anyway, type casts do exist in this code, though in small numbers. Since I knew this code would eventually reside on the Sun-4, I was careful to write it with portability in mind. I also ported many more thousand lines of both C (written with Unix portability in mind) and Fortran (gak! written with many-vendor machine portability in mind). Not one single problem related to data alignment or casting. In all, I'd say that the Sun-4 does not present any unsolvable problems in porting code. If your code is really portable that is. :-) -- Craig Ruff NCAR INTERNET: cruff@scdpyr.UCAR.EDU (303) 497-1211 P.O. Box 3000 CSNET: cruff@ncar.CSNET Boulder, CO 80307 UUCP: cruff@scdpyr.UUCP
elg@killer.UUCP (Eric Green) (02/28/88)
in article <179@wsccs.UUCP>, terry@wsccs.UUCP (terry) says: [describes a probram which runs on systems with a minimum of porting, e.g. 2 hours for a Harris minicomputer: then details hours trying to port to a Sun 4, and failing: ] > THE REASON: Type-casting. You can't. Small programs seem to, but it doesn't > work. Bytes tend to be word aligned. Other messy stuff. It was not a > pretty sight (site?). I am sure there are other problems, but geez, this is > demonstrably portable code. > > I am all for RISC machines when reasonably implimented. My idea of > RISC is an instruction set that is sufficently small to allow the manufacturer > to call it RISC and not get sued, but sufficiently varied to allow me to go > off and have the assembler impliment enough macro's that my compiler thinks > it's running on a 680x0. ----Oh rats! If I can't have that, at LEAST my > portable C compiler should be. Sun must have some good people to be able > to have ported a semblance of UNIX to this thing. There's a couple of possible problems that may be bugging you: 1) Sun's Unix isn't. That is, it's a huge superset of Unix, with features of both BSD4.x and Sys V all mashed together. 2) Sun's "C" compiler apparently isn't very good at handling a lot of things, from your description. Or maybe there's a flag you didn't set, or something similiar. I've used a Pyramid 90x and an IBM RT. I've read papers on the AMD29000 and the MIPSco chip. I see no inherent reason for portable programs to not run on any of them, except possibly the 29000 (which lacks byte addressing in its native rendition). -- Eric Lee Green elg@usl.CSNET Snail Mail P.O. Box 92191 {cbosgd,ihnp4}!killer!elg Lafayette, LA 70509 Come on, girl, I ain't looking for no fight/You ain't no beauty, but hey you're alright/And I just need someone to love/tonight
daveb@geac.UUCP (David Collier-Brown) (02/29/88)
In article <696@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: >WRONG. It is demonstrably NON-portable code - it failed to port >to a working compiler on a reasonable machine. If the bloody >unix kernel runs (and it does) your silly application should, too. Agreed. However.... >Get a clue - portable doesn't mean "runs on X processors", it >means "conforms to standards". The purpose of the standard is to allow portability, by making machines similar enough that code `ports' (an oversimplification, of course). Therefore a claim that XXX runs on N Unix boxes of at least 2 different OS families is a stronger claim for portability than meeting a standard that describes (and i'm thinking of the SVID here) only one family. Much more info required, Terry. (terry@wsccs.UUCP) --dave -- David Collier-Brown. {mnetor yunexus utgpu}!geac!daveb Geac Computers International Inc., | Computer Science loses its 350 Steelcase Road,Markham, Ontario, | memory (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.
bs@linus.UUCP (Robert D. Silverman) (02/29/88)
In article <284@scdpyr.UUCP: cruff@scdpyr.UUCP (Craig Ruff) writes: :In article <696@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: :>From article <179@wsccs.UUCP>, by terry@wsccs.UUCP (terry): :>[ lots of self-congratulation about how portable his code is, followed :> by complaints that it isn't portable to the SPARC ] :> :>> THE REASON: Type-casting. You can't. :> :>FLAME ON! ( I love this! ) :> :>WRONG. It is demonstrably NON-portable code - it failed to port :>to a working compiler on a reasonable machine. If the bloody :>unix kernel runs (and it does) your silly application should, too. There's something about RISC architectures in general that I find confusing. Since they (read SPARC or equivalent) have no integer multiply instructions, any code which has a fair number of these is going to be slow. This would include any program which had access to 2-D arrays since one must do multiplications (unless the array sizes are a convenient power of 2) to get the array indices right. Any code that accesses a[i][j] should run like a pig on such machines. I've seen some benchmarks that suggest SUN-4's are in fact slower than SUN-3's on programs that do a large amount of integer multiplies/divides. What good is a computer that can't multiply? Anyone remember the CADET? It was, I believe, the IBM 1630 and stood for 'Can't Add, Doesn't Even Try'. It did its addition by table lookup. For people who want to use computers to COMPUTE, it appears that SPARC is a step in the wrong direction. Bob Silverman
dfk@duke.cs.duke.edu (David Kotz) (03/01/88)
> There's something about RISC architectures in general that I find > confusing. Since they (read SPARC or equivalent) have no integer multiply > instructions, any code which has a fair number of these is going to > be slow. This would include any program which had access to 2-D arrays > since one must do multiplications (unless the array sizes are a convenient > power of 2) to get the array indices right. ... > Bob Silverman Multiplication is not necessary to access 2-D arrays if the array is set up like most arrays in C, where each row is a typical vector and the 2-D array is just a vector of pointers to each row vector. Then double-indirection is necessary, rather than multiplication. I won't say that's any better, but you don't *need* multiplication. (It might not be so bad once the first two pointers are cached, and a good programmer puts them in registers anyway). David Kotz -- ARPA: dfk@cs.duke.edu CSNET: dfk@duke UUCP: {ihnp4!decvax}!duke!dfk
tim@amdcad.AMD.COM (Tim Olson) (03/01/88)
In article <3530@killer.UUCP> elg@killer.UUCP (Eric Green) writes: | I've used a Pyramid 90x and an IBM RT. I've read papers on the AMD29000 and | the MIPSco chip. I see no inherent reason for portable programs to not run on | any of them, except possibly the 29000 (which lacks byte addressing in its | native rendition). Correction: the Am29000 has byte and halfword (16-bit) extract and insert instructions to allow you to interface easily with word-only addressable memories. It also has explicit byte and halfword loads and stores (encoded in the control field) to interface with memory that performs those accesses directly. -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)
tim@amdcad.AMD.COM (Tim Olson) (03/01/88)
In article <25699@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: | There's something about RISC architectures in general that I find | confusing. Since they (read SPARC or equivalent) have no integer multiply | instructions, any code which has a fair number of these is going to | be slow. What makes you think that a single integer multiply instruction would be faster? On CISC machines, these are normally expanded to a series of multiply-step iterations (which most RISC machines have as an instruction). Only if you have architectural support for fast multiplication (i.e. a large multiplier array) is a single multiply instruction beneficial. | This would include any program which had access to 2-D arrays | since one must do multiplications (unless the array sizes are a convenient | power of 2) to get the array indices right. Any code that accesses a[i][j] | should run like a pig on such machines. A multiply by a constant (as in the case of a 2-dimensional array access in C) is almost always performed faster with an inline series of shifts and adds than with a multiply. -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)
bs@linus.UUCP (Robert D. Silverman) (03/01/88)
In article <11199@duke.cs.duke.edu: dfk@duke.cs.duke.edu (David Kotz) writes:
:> There's something about RISC architectures in general that I find
:> confusing. Since they (read SPARC or equivalent) have no integer multiply
:> instructions, any code which has a fair number of these is going to
:> be slow. This would include any program which had access to 2-D arrays
:> since one must do multiplications (unless the array sizes are a convenient
:> power of 2) to get the array indices right. ...
:> Bob Silverman
:
:Multiplication is not necessary to access 2-D arrays if the array is
:set up like most arrays in C, where each row is a typical vector and
:the 2-D array is just a vector of pointers to each row vector. Then
etc.
This isn't intended as a flame but your response is too 'language dependent'.
Not all code is written in a language that has pointers. There are many
numerical packages out there, written in FORTRAN (yech). It not only
doesn't have pointers, but it also stores things in column-major rather than
row-major order.
Please. We are discussing hardware, not software. Let's not introduce
language dependencies. Your solution also fails to address those numerical
problems that have multiplies in them.
Bob
edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (03/01/88)
> > Multiplication is not necessary to access 2-D arrays if the array is > set up like most arrays in C, where each row is a typical vector and > the 2-D array is just a vector of pointers to each row vector. EXCUSE ME, but when I declare an array to be an n by m matrix in C (float foo[n][m]) I get a contiguous block of memory. The representation IS NOT row-vector or column-vector. So when I access index number i,j someone has to perform the calculation base + sizeof(type)*(i*m+j). -- Eddie Wyatt e-mail: edw@ius1.cs.cmu.edu
bcase@Apple.COM (Brian Case) (03/01/88)
In article <3530@killer.UUCP> elg@killer.UUCP (Eric Green) writes: >I've used a Pyramid 90x and an IBM RT. I've read papers on the AMD29000 and >the MIPSco chip. I see no inherent reason for portable programs to not run on >any of them, except possibly the 29000 (which lacks byte addressing in its >native rendition). Wait a minute, I beg to differ with "lacks byte addressing in its native rendition." There are byte and halfword (16-bit) insert and extract operations. These give you everything you need (yes, yes, I know with argueable efficiency). Now, if you meant arbitrary byte *alignment,* then yeah, but the 29K isn't unique there. Note that the Pyramid machines are little endian (aren't they?). This helps non-portable code become portable.
hansen@mips.COM (Craig Hansen) (03/01/88)
In article <25699@linus.UUCP>, bs@linus.UUCP (Robert D. Silverman) writes: > In article <284@scdpyr.UUCP: cruff@scdpyr.UUCP (Craig Ruff) writes: > :In article <696@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: > :>From article <179@wsccs.UUCP>, by terry@wsccs.UUCP (terry): > :>[ lots of self-congratulation about how portable his code is, followed > :> by complaints that it isn't portable to the SPARC ] > :> > :>> THE REASON: Type-casting. You can't. > :> > :>FLAME ON! ( I love this! ) > :> > :>WRONG. It is demonstrably NON-portable code - it failed to port > :>to a working compiler on a reasonable machine. If the bloody > :>unix kernel runs (and it does) your silly application should, too. This of course pre-supposes that the SPARC architecture yields reasonable machines. While Sun claims that the Sun4 is source-code compatible with the Sun3, what that really means is that if it ports to the Sun4, it was portable, and if it doesn't port, it wasn't portable. It's ridiculous to claim the Sun4 machine is source-code compatible if _all_ software written for the Sun3 doesn't port, as "portable" code written for the Sun3 would port to many machines besides the Sun4 anyway. In fact, the SPARC architecture has a real problem with source-code compatibility with the Sun3 machines - the alignment rules are different between the 68020 and SPARC, and code that depends on misaligned data is hard to port to SPARC. The MIPS architecture and compiler system is in a better position to port such code because efficient instructions are available to handle unaligned (32-bit) words, and our compiler system can be set to use them in such code. We also provide utilities that can help to pinpoint where in the program these problems occur, and can also fix up references to such unaligned pointers within an exception handler, as an aid to porting the code quickly and then going back to tune the code later. > There's something about RISC architectures in general that I find > confusing. Since they (read SPARC or equivalent) have no integer multiply > instructions, any code which has a fair number of these is going to > be slow. This would include any program which had access to 2-D arrays > since one must do multiplications (unless the array sizes are a convenient > power of 2) to get the array indices right. Any code that accesses a[i][j] > should run like a pig on such machines. I've seen some benchmarks that > suggest SUN-4's are in fact slower than SUN-3's on programs that do a > large amount of integer multiplies/divides. What good is a computer that > can't multiply? All RISC architectures are NOT created equal, particulaly with respect to integer multiply instructions. The MIPS R-Series processors have explicit signed and unsigned integer multiply and divide instructions, that are executed in special-purpose hardware. A 32-bit multiply takes 12 cycles, with up to 10 instructions that can be executed in parallel with the multiply. We considered that to be a much superior solution than multiply-step, which would have been slower and harder to implement. (Multiply-step has too many operands and too many results.) In many cases, 2-D arrays have sizes that are known at compile-time, and so become multiplications by constants. Multiplication by constants can be handled efficiently by most RISC machines, but are generally a little faster in cycle count on the HP "Precision" architecture (I still think of it as Spectrum, but then I'm getting old and set in my ways....), which has single-cycle shift-and-add operations that are good for doing multiplies by constants that aren't powers of 2. The MIPS compiler picks either an explicit multiply operation or software shift-and-add sequences, depending on the value and form (variable vs constant) of the operands. The end result is that multiplies are most often faster than the 12 cycle worst-case figure. -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...{ames,decwrl,prls}!mips!hansen or hansen@mips.com 408-991-0234
hansen@mips.COM (Craig Hansen) (03/01/88)
In article <7507@apple.Apple.Com>, bcase@Apple.COM (Brian Case) writes: > In article <3530@killer.UUCP> elg@killer.UUCP (Eric Green) writes: > >I've used a Pyramid 90x and an IBM RT. I've read papers on the AMD29000 and > >the MIPSco chip. I see no inherent reason for portable programs to not run on > >any of them, except possibly the 29000 (which lacks byte addressing in its > >native rendition). > > Wait a minute, I beg to differ with "lacks byte addressing in its native > rendition." There are byte and halfword (16-bit) insert and extract > operations. These give you everything you need (yes, yes, I know with > argueable efficiency). Now, if you meant arbitrary byte *alignment,* > then yeah, but the 29K isn't unique there. Note that the Pyramid machines > are little endian (aren't they?). This helps non-portable code become > portable. Tim says that you can use either the insert and extract operations, or, in a machine that has external hardware to provide byte addressing in the memory system itself, you can use direct load and store byte/halfword operations. It would seem that if you wished, you could provide unaligned load and store halfword/word operations with a great deal of additional external hardware, except that there's no way to handle items that cross page boundaries. [Not an issue if the MMU isn't in use, of course.] In any case, I have two questions: 1) In AMD's performance models, which memory model is used? For full clock rate systems, I don't see how you could resonably build the direct partial-word addressed machine. [The MIPS processors perform the required shifting/extracting at full clock rate on the processor chip, and so directly handles partial-word operands without additional hardware.] 2) Which memory model does the compiler generate? What effect does this lack of architectural specificity have on software compatibility? I presume that I can't do anything to make code that uses direct partial-word addressing work on a machine that has only full-word addressing. -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...{ames,decwrl,prls}!mips!hansen or hansen@mips.com 408-991-0234
jbs@eddie.MIT.EDU (Jeff Siegal) (03/01/88)
In article <25723@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >In article <11199@duke.cs.duke.edu: dfk@duke.cs.duke.edu (David Kotz) writes: >:Multiplication is not necessary to access 2-D arrays if the array is >:set up like most arrays in C, where each row is a typical vector and >:the 2-D array is just a vector of pointers to each row vector. Then > >This isn't intended as a flame but your response is too 'language dependent'. >Not all code is written in a language that has pointers. There is nothing language dependant about row vector representation of 2-d arrays. It is an implementation tool (i.e. the compiler writer, library writer, or software archetecture designer deside such a thing). Once implemented in the system innards, the programmer need not be concerned with it (i.e. you still reference a[x][y] or a[x,y] or whatever). Of course, C gives you the ability to represent your "arrays" this way even when the built-in array representation doesn't. Even without doing any pointer dereferencing, it is easy to avoid doing any multiplication by having the compiler set up a table giving the address offset of each row of the array. So a[x][y] is found at address: a + offset[x] + y >Please. We are discussing hardware, not software. Let's not introduce >language dependencies. Your solution also fails to address those numerical >problems that have multiplies in them. True, there is no answer that is going to address this problems short of specialized hardware. I think it's pretty obvious that for applications which primarily multiply, you want fast hardware support for multiplication. Machines without such support are compromising multiply performance for better performance at (hopefully) more common operations like register loads and stores, comparisons, adds, etc. Jeff Siegal
hankd@pur-ee.UUCP (Hank Dietz) (03/01/88)
In article <11199@duke.cs.duke.edu>, dfk@duke.cs.duke.edu (David Kotz) writes: > > confusing. Since they (read SPARC or equivalent) have no integer multiply > > instructions, any code which has a fair number of these is going to > > be slow. This would include any program which had access to 2-D arrays [stuff] > Multiplication is not necessary to access 2-D arrays if the array is > set up like most arrays in C, where each row is a typical vector and > the 2-D array is just a vector of pointers to each row vector. Then > double-indirection is necessary, rather than multiplication. I won't Indirection isn't pretty in a lot of ways (because it is random addressing unless you're very clever), but that's not the point. The multiply for indexing is, first of all, often converted into bumping pointers if you have a good optimizing compiler and, secondly, unless you have a VERY FAST multiply, multiplying by VIRTUALLY ANY compile-time CONSTANT (e.g., sizeof an element) is faster using shift and add/subtract instructions. For example, a multiply by 7 (not a nice power of 2) is really shift to multiply by 8 and then subtract the original number. This sort of thing is widely known by compiler writers -- references are available -- and the big disadvantage to it is that it may require the use of more registers for temporaries.
jesup@pawl19.pawl.rpi.edu (Randell E. Jesup) (03/01/88)
In article <25699@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >There's something about RISC architectures in general that I find >confusing. Since they (read SPARC or equivalent) have no integer multiply >instructions, any code which has a fair number of these is going to >be slow. Ever noticed how long a multiply takes on a 68000? And that's only a 16x16=32 multiply! The fact that it takes multiple instructions to do a multiply doesn't mean it isn't a FAST multiply. Almost all (or maybe all) RISC machines have some multiply support in hardware to make multiplies take a reasonably small amount of time. The only way to do one fast (on any chip) is to throw a giant array-multiplier at it, which takes up an ungodly amount of chip area. The result is RISC chips have things like MSTEP instructions, that do 1 or 2 bits of a multiply, so they can do a multiply in 16 or 32 cycles (approx). The 68000 requires up to 70+ cycles to do a 16x16 multiply. >For people who want to use computers to COMPUTE, it appears that SPARC >is a step in the wrong direction. SPARC may not be perfect, and a maxed out 68030 (or maybe even 020) might be able to beat it on a benchmark or two, but in general it seems like a win for most applications. // Randell Jesup Lunge Software Development // Dedicated Amiga Programmer 13 Frear Ave, Troy, NY 12180 \\// beowulf!lunge!jesup@steinmetz.UUCP (518) 272-2942 \/ (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup (-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)
tim@amdcad.AMD.COM (Tim Olson) (03/01/88)
In article <1721@mips.mips.COM> hansen@mips.COM (Craig Hansen) writes: | 1) In AMD's performance models, which memory model is used? | For full clock rate systems, I don't see how you could | resonably build the direct partial-word addressed machine. | [The MIPS processors perform the required shifting/extracting | at full clock rate on the processor chip, and so directly | handles partial-word operands without additional hardware.] We use the "full-word load/store with extract/insert instructions" model. | 2) Which memory model does the compiler generate? All of our compilers default to the above-mentioned model. Pragmas (in the dpANS-tracking (can't really say ANSI, yet, can we ;-) compiler allow code for the second model to be generated. | What effect does this lack of architectural specificity | have on software compatibility? I presume that I can't | do anything to make code that uses direct partial-word | addressing work on a machine that has only full-word addressing. We have stated many times that one of the main design philosophies behind the 29000 was to provide usable mechanisms, not dictate policies. If you wish to run at the current 25MHz and add some external logic to directly perform byte and half-word accesses, fine. We provide those hooks for you. However, we felt that it was much more important to speed up *all* memory accesses (by providing direct load-forwarding) rather than slow all of them down through shift-mux, sign-or-zero extend logic which most of the time isn't required. To answer your question directly -- if you want to be portable, you use the full-word, extract/insert model, since it works on all systems. -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)
tpmsph@ecsvax.UUCP (Thomas P. Morris) (03/01/88)
In response to a posting about RISC having "bad" performance due to array indexing which might require heavy integer multiplication, David Kotz points out: > Multiplication is not necessary to access 2-D arrays if the array is > set up like most arrays in C, where each row is a typical vector and > the 2-D array is just a vector of pointers to each row vector. Then > double-indirection is necessary, rather than multiplication. I won't > say that's any better, but you don't *need* multiplication. (It might > not be so bad once the first two pointers are cached, and a good > programmer puts them in registers anyway). ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Two points: (1) a good programmer shouldn't have to put those pointers into registers. That's what good compilers are for! ;-) (2) If your array elements are being accessed sequentially, a good _optimizing_ compiler ought to be changing those array-index computations with addititively computed offsets from a base. Strength reduction, code hoisting, and elimination or reduction of invariant computations can do wonders for code! Of course, a good programmer _would_ indicate to his/her "C" compiler that the indices or the pointers ought to go in registers, too! ----------------------------------------------------------------------------- Tom Morris BITNET: TOM@UNCSPHVX UNC School of Public Health UUCP : ...!mcnc!ecsvax!tpmsph "It's a joke, boy! I say, it's a joke! Don't ya get it?" -Foghorn Leghorn -----------------------------------------------------------------------------
firth@sei.cmu.edu (Robert Firth) (03/01/88)
In article <998@PT.CS.CMU.EDU> edw@IUS1.CS.CMU.EDU (Eddie Wyatt) writes: >EXCUSE ME, but when I declare an array to be an n by m matrix in >C (float foo[n][m]) I get a contiguous block of memory. The >representation IS NOT row-vector or column-vector. So when I access >index number i,j someone has to perform the calculation >base + sizeof(type)*(i*m+j). However, going by Knuth's statistics on array usage, more than 95% of those multiplications occur in loops where the index is the induction variable. The optimisation called "strength reduction" can then remove them. If your target machine has a slow multiply, such an optimisation is probably necessary. How to do it is explained in detail in many books, including eg Wulf's 'The Design of an Optimising Compiler'. The first Fortran compiler included this optimisation, so it's been around for quite a while.
firth@sei.cmu.edu (Robert Firth) (03/01/88)
In article <8332@eddie.MIT.EDU> jbs@eddie.MIT.EDU (Jeff Siegal) writes: >There is nothing language dependant about row vector representation of >2-d arrays. It is an implementation tool (i.e. the compiler writer, >library writer, or software archetecture designer deside such a >thing). Once implemented in the system innards, the programmer need >not be concerned with it... Jeff, i find this an interesting assertion, and one I'd dearly like to believe. However, my reading of ANSI X3.9-1978, especially Section 5, on arrays, leads me to conclude that array representation in column-major form, and array access by chain multiplication-&-addition of subscripts, is the only feasible implementation choice. If you have a design that does everything right, and allows access by chained indirection-&-offsetting, then I'd be most interested in seeing it.
scottg@hpiacla.HP.COM (Scott Gulland) (03/02/88)
> / hpiacla:comp.arch / terry@wsccs.UUCP (terry) / 8:08 pm Feb 22, 1988 / > > THE REASON: Type-casting. You can't. Small programs seem to, but it doesn't > work. Bytes tend to be word aligned. Other messy stuff. . . . > > . . . , but sufficiently varied to allow me to go > off and have the assembler impliment enough macro's that my compiler thinks > it's running on a 680x0. ----Oh rats! If I can't have that, at LEAST my > portable C compiler should be. Sun must have some good people to be able > to have ported a semblance of UNIX to this thing. I think that what you have run into is porting from a machine with very loose data alignment requirements to one with very strict data alignment requirements, probably in combination with a marginal compiler. The 680x0 and 286/386 class of machines have very loose data alignment requirements. Integers, floats, pointers can be placed on any byte address boundry and still work. Risc machines however, may require integers, floats, pointers, etc to be placed on a 4-byte address, doubles on a 8-byte address, etc. If you then take a short and type cast it to a float or int, there is a 50% chance that the data does not sit on the correct byte address boundary, and so your program aborts. This can also happen if the C compiler does not insure correct data alignment within structures and unions. ************************************************************************** * Scott Gulland | {ucbvax,hplabs}!hpda!hpiacla!scottg [UUCP] * * Indus. Appl. Center (IAC) | scottg@hpiacla [SMTP] * * Hewlett-Packard Co. | (408) 746-5498 [AT&T] * * 1266 Kifer Road | 1-746-5498 [HP-TELNET] * * Sunnyvale, CA 94086 USA | "What If..." [HP-TELEPATHY] * **************************************************************************
baum@apple.UUCP (Allen J. Baum) (03/02/88)
-------- [] RE: needing multiplies in array accesses Accessing arbitrary array elements is non extremely common. Most of the array references are inside loops, and use the loop induction variables as part of the subscript expression. A good compiler will strength reduce these multiply operations and turn them into add/subtract immediates. Even if this is not possible, the multiply operation is generally reg*imm, which reduces to a shift in the common cases (power of two array sizes are pretty common!), or a SMALL series of shifts and adds. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
csg@pyramid.pyramid.com (Carl S. Gutekunst) (03/02/88)
In article <7507@apple.Apple.Com> bcase@apple.UUCP (Brian Case) writes: >Now, if you meant arbitrary byte *alignment,* then yeah, but the 29K isn't >unique there. Note that the Pyramid machines are little endian (aren't they?). No, the Pyramid CPU is big-endian, like the 68020. Alignment is restricted to natural boundaries, except that doubles need only be 32-bit aligned. Binary operations are always performed on 32-bit or 64-bit objects, but the load and store instructions can reference 8-bit, 16-bit, or 32-bit words; loads can be with or without sign-extension. An unaligned version of the Pyramid was done as a special for HHB Systems, many eons ago; it needed an extra clock tick per memory reference to rotate the word into place. Some odd timings happened too it part of the load was in cache, and part caused a cache miss. I find all the flamage about aligned operations in the SPARC amusing; Pyramid went through the same flamage some years ago. Having grown up on machines that forced alignment, I thought the objections were pretty silly then. Still think they're silly now. <csg>
bcase@Apple.COM (Brian Case) (03/02/88)
In article <7649@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes: >The multiply for indexing is, first of all, often converted into >bumping pointers if you have a good optimizing compiler and, >secondly, unless you have a VERY FAST multiply, multiplying by >VIRTUALLY ANY compile-time CONSTANT (e.g., sizeof an element) >is faster using shift and add/subtract instructions. For example, a >multiply by 7 (not a nice power of 2) is really shift to multiply >by 8 and then subtract the original number. This sort of thing is >widely known by compiler writers -- references are available -- >and the big disadvantage to it is that it may require the use of >more registers for temporaries. Yes, you might need more registers. Plus, now that you have computed the multiply by 8, it is possible to reuse this computation later if it is needed. This is one of the easiest ways to see that lots of registers, three-address operations, and a smart compiler are good. I've said it before and I'll say it again: RISC is both "hardware with a very short cycle time" and a philosophy: "reuse rather than recompute." Note that a RISC machine, a memory reference can be a significant computation.
rbbb@acornrc.UUCP (David Chase) (03/02/88)
It seems that people out there have invented a new meaning for RISC. You can read about the old meaning in (for example) "Reduced Instruction Set Computers", an IEEE Tutorial by William Stallings. Comments like > Please. We are discussing hardware, not software. and > I am all for RISC machines when reasonably implimented. My idea of > RISC is an instruction set that is sufficently small to allow the > manufacturer > to call it RISC and not get sued, but sufficiently varied to allow me to go > off and have the assembler impliment enough macro's that my compiler thinks > it's running on a 680x0. ----Oh rats! If I can't have that, at LEAST my > portable C compiler should be. demonstrate how misunderstood RISC is. To quote from the above reference, A vital element in achieving high performance on a RISC system is an optimizing compiler. When you have a high-quality register allocator, instruction scheduler, loop invariant code mover, strength reducer, and constant propagater implemented in assembler macros, let me know (I'm morbidly curious). It may be the case that these optimizations aren't common in C compilers, but so it goes. You are certainly free to learn about them and implement them in a compiler and sell it to the world. Understand also that (as has been noted by other posters) complaining about the slowness of a rare operation is silly; RISC designers HAVE done instruction mix profiles, and try to make their compiler+machine combinations fast for those mixes. Of course, with a stupid compiler there will be a lot more multiply operations. That's why you can't exclude software from this discussion. On an unrelated note, the Acorn RISC Machine appears to do its shift-and-OP instructions without overhead provided that the shift is constant (that is, not obtained from a register). A second pair of odd features (that I'm sure will be the subject of furious discussion for the next week or so) is (1) a SET-CC bit--if 0, then the condition code is left unmodified and (2) a condition flag for every instruction (not just branches). (The reference for this is the A.R.M CPU Software Manual, Version 1.00. Since it is over two years old things have undoubtedly changed slightly.) When you combine all these you get very compact loops for (general) multiplies, byte-swapping, and division (division is not as compact as the other two). David Chase Olivetti Research Center, Menlo Park
rgh@mtund.ATT.COM (Ronald Hiller) (03/02/88)
In article <25699@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >There's something about RISC architectures in general that I find >confusing. Since they (read SPARC or equivalent) have no integer multiply >instructions, any code which has a fair number of these is going to >be slow. more complaints about lack of multiply instruction and effect on accessing 2-D arrays. It turns out that shift/add (or shift/add/subtract) sequences are really quite fast when multiplication by a constant is required. As an example, on an 8086 (I know...I shouldn't compare it with SPARC!!), you can nearly always do a multiply by a constant faster using the shift add sequence then using the multiply instruction. Ron
dik@cwi.nl (Dik T. Winter) (03/02/88)
In article <7482@apple.UUCP> baum@apple.UUCP (Allen Baum) writes: > Accessing arbitrary array elements is non extremely common. Most of the > array references are inside loops, and use the loop induction variables as > part of the subscript expression. A good compiler will strength reduce these > multiply operations and turn them into add/subtract immediates. Even if this > is not possible, the multiply operation is generally reg*imm, which reduces > to a shift in the common cases (power of two array sizes are pretty common!), > or a SMALL series of shifts and adds. > I agree with most (CDC Cyber does not have a true integer multiply either). I disagree with the last, when you start programming on supers you will soon learn that power of two array sizes are the worse choice you can make. Even if nominally your array size ought to be a power of two, make it one more in the declaration to prevent degradation of perfomance. -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax
mcdonald@uxe.cso.uiuc.edu (03/02/88)
>Anyone remember the CADET? It was, I believe, the IBM 1630 and stood for >'Can't Add, Doesn't Even Try'. It did its addition by table lookup. IBM 1620 It also used table lookup for multiplies. The table was at a fixed location in memory (100, I seem to remember) and had to be read in at boot time be the card reader. This machine used 2N404 GERMANIUM!!! transistors. It normally ran at 125kHZ, but it had a special circuit so that when the (hardware) instruction to write to its typewriter occured the clock slowed down to 10 Hz - driven by a cam on the motor shaft!!!!!!!!!!! It was fully, absolutely, DECIMAL. It was variable "word" length, up to 20000 decimal digits!!!!!!!!! Doug McDonald
alan@pdn.UUCP (Alan Lovejoy) (03/03/88)
In article <25699@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >There's something about RISC architectures in general that I find >confusing. Since they (read SPARC or equivalent) have no integer multiply >instructions, any code which has a fair number of these is going to >be slow. This would include any program which had access to 2-D arrays >since one must do multiplications (unless the array sizes are a convenient >power of 2) to get the array indices right. Any code that accesses a[i][j] >should run like a pig on such machines. I've seen some benchmarks that >suggest SUN-4's are in fact slower than SUN-3's on programs that do a >large amount of integer multiplies/divides. What good is a computer that >can't multiply? Given VAR a: ARRAY RowIndex OF ARRAY ColumnIndex OF Element; If the compiler's code generator (or the assembly hack) stores the ADDRESS of each row at a[row], so that a[row][column] is actually a[row]^[column] (but the rows are stored on the stack, not the heap), then a[row][column] not only does not require multiplication, but will be FASTER than multiplication. Of course, a[MIN(RowIndex)..MAX(RowIndex)] would need to be initialized (possibly using multiplication) once each time the block in which "a" is declared is entered. This scheme takes more memory. Good, fast, cheap: choose any two. On the other hand, fine-tuning a computer architecture so that it gets the best possible performance executing "average" code is guaranteed to result in poor performance in some cases when executing decidedly unaverage code. The question is, to what degree should an architecture cater to pathological cases at the expense of normal ones (or vice versa)? The answer, of course, depends on who will use the architecture to do what. And people won't agree on the answer, which is why there are so many different computer architectures in the world. Which brings me to my main point: don't judge RISC on the characteristics of SPARC. There are other ways to skin a cat, and other cats to skin. Some RISCs do have hardware multiply. --alan@pdn
barmar@think.COM (Barry Margolin) (03/03/88)
In article <7514@boring.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: >In article <7482@apple.UUCP> baum@apple.UUCP (Allen Baum) writes: > > (power of two array sizes are pretty common!), For scientific programming (the major application of supercomputers), is this really true? What scientific applications naturally map onto power-of-two arrays? I suspect that making arrays a power of two in size is a habit mostly of systems programmers, who know to make things fit neatly into memory pages in order to reduce paging. >I agree with most (CDC Cyber does not have a true integer multiply either). >when you start programming on supers you will soon >learn that power of two array sizes are the worse choice you can make. Could you explain why this is so? Maybe there are architectures where it doesn't make a difference, but I can't imagine why power of two would be WORSE than other sizes. Barry Margolin Thinking Machines Corp. barmar@think.com uunet!think!barmar
daveb@geac.UUCP (David Collier-Brown) (03/03/88)
In article <4400@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (Robert Firth) writes: [discussion about array representation, and the hardware support appropriate to multiplicative addressing] >However, my reading of ANSI X3.9-1978, especially Section 5, on arrays, >leads me to conclude that array representation in column-major form, >and array access by chain multiplication-&-addition of subscripts, is >the only feasible implementation choice. My reading of the ANSI proposal leads me to the same conclusion. This is both good (it eliminates an ambiguity) and bad (it removes the freedom of an compiler-writer to do what is best applicable to her architecture). Could someone who is more of a language-lawyer comment on this interpretation? Was it in fact the intention of the committee? --dave (redirected to comp.lang.c) c-b -- David Collier-Brown. {mnetor yunexus utgpu}!geac!daveb Geac Computers International Inc., | Computer Science loses its 350 Steelcase Road,Markham, Ontario, | memory (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.
bs@linus.UUCP (Robert D. Silverman) (03/03/88)
In article <17415@think.UUCP: barmar@fafnir.think.com.UUCP (Barry Margolin) writes: :In article <7514@boring.cwi.nl: dik@cwi.nl (Dik T. Winter) writes: ::In article <7482@apple.UUCP> baum@apple.UUCP (Allen Baum) writes: :: > (power of two array sizes are pretty common!), : :For scientific programming (the major application of supercomputers), :is this really true? What scientific applications naturally map onto :power-of-two arrays? I suspect that making arrays a power of two in :size is a habit mostly of systems programmers, who know to make things :fit neatly into memory pages in order to reduce paging. : ::I agree with most (CDC Cyber does not have a true integer multiply either). ::when you start programming on supers you will soon ::learn that power of two array sizes are the worse choice you can make. : :Could you explain why this is so? Maybe there are architectures where :it doesn't make a difference, but I can't imagine why power of two :would be WORSE than other sizes. : :Barry Margolin :Thinking Machines Corp. : :barmar@think.com :uunet!think!barmar Somewhat simplified explanation: This is relatively easy to explain. When doing vectorization on arrays whose length is a power of two, memory interleaving conflicts occur. You find that instead of grabbing different elements of an array from different banks you try to grab multiple elements from the SAME banks. This defeats the purpose of interleaving and slows down vectorization. Bob Silverman
lisper@yale.UUCP (Bjorn Lisper) (03/04/88)
In article <17415@think.UUCP> barmar@fafnir.think.com.UUCP (Barry Margolin) writes: >In article <7514@boring.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: >>In article <7482@apple.UUCP> baum@apple.UUCP (Allen Baum) writes: >> > (power of two array sizes are pretty common!), > >For scientific programming (the major application of supercomputers), >is this really true? What scientific applications naturally map onto >power-of-two arrays? I suspect that making arrays a power of two in >size is a habit mostly of systems programmers, who know to make things >fit neatly into memory pages in order to reduce paging. FFT. Bjorn Lisper
fouts@orville.nas.nasa.gov (Marty Fouts) (03/04/88)
In article <17415@think.UUCP> barmar@fafnir.think.com.UUCP (Barry Margolin) writes: >>In article <7482@apple.UUCP> baum@apple.UUCP (Allen Baum) writes: >> > (power of two array sizes are pretty common!), > >For scientific programming (the major application of supercomputers), >is this really true? What scientific applications naturally map onto >power-of-two arrays? I suspect that making arrays a power of two in >size is a habit mostly of systems programmers, who know to make things >fit neatly into memory pages in order to reduce paging. > Much scientific programming takes the form of solving partial differential equations using finite difference approximations of the initial equation. By selecting the differencing delta, the programmer controls the problem size, so there is really a range of sizes which work. The lower limit is the minimum necessary to achieve some kind of numerical stability in the result. The upper limit is the amount of computer time available/desirable for solving the problem. On many vector machines, the use of multiple memory banks to increase apparent memory bandwidth dictates that arrays not be powers of two in all dimensions, in order to avoid bank conflict when marching through the array. Some computations are done with arrays in which bounds are arbitrarily set to relatively prime values.
oconnor@sungoddess.steinmetz (Dennis M. O'Connor) (03/05/88)
An article by daveb@geac.UUCP (David Collier-Brown) says:
] In article <4400@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (Robert Firth) writes:
] [discussion about array representation, and the hardware support
] appropriate to multiplicative addressing]
]
] >However, my reading of ANSI X3.9-1978, especially Section 5, on arrays,
] >leads me to conclude that array representation in column-major form,
] >and array access by chain multiplication-&-addition of subscripts, is
] >the only feasible implementation choice.
]
] My reading of the ANSI proposal leads me to the same conclusion.
] This is both good (it eliminates an ambiguity) and bad (it removes
] the freedom of an compiler-writer to do what is best applicable to
] her architecture).
] --
] David Collier-Brown. {mnetor yunexus utgpu}!geac!daveb
Why is it we have to reinvent the wheel so often ? N-dimensional
arrays, whether column or row major order, can always be accessed
without run-time multiplies. And can still be in a continuous block of
memory. The method for doing this is at least 14 years old : I heard
about it in 1974.
An example will illustrate. BTW, this is NOT the most optimized
algorithm, but they are simple permutations of this.
Given an array that is N1 by N2 by N3 et cetera. First, allocate
your block of memory for the N1*N2*N3*etc elements of the array. Then,
for each dimension EXCEPT the minor one (in this example, Nn, but it
could be N1 or any Nx ) allocate space for a vector of words with a number
of elements equal to the cardinality of that dimension. Then simply
fill in these vectors with (range 1..Cardinality(Nx)) * (whatever
multiplier is used for this dimension in the multiplication-&-addition
mwthod ). Note that this is either done at compile time, or ONCE when
the array is allocated, and uses very little multiplication.
Now, when you need to access ARRAY[a,b,c,...,n] you get the address
by using ARRAY_BASE + N1VEC[a] + N2VEC[b] + N3VEC[c] ... + n.
No multiplies.
This, I understand, is the ORIGINAL meaning of the term "VECTORIZED
ARRAY". Here's a concrete example :
type FAST_ARRAY is array[1..6,1..4,1..7] of WHATEVER ;
produces :
type MEMBLOCK is array [ 6*4*7 ] of WHATEVER ;
N1VEC : constant array [1..6] of INTEGER := ( 0, 28, 56, 84, 112, 140 ) ;
N2VEC : constant array [1..4] of INTEGER := ( 0, 7, 14, 21 ) ;
PRAGMA INLINE( GET_FAST_ARRAY, SET_FAST_ARRAY ) ;
function GET_FAST_ARRAY( M : MEMBLOCK; A,B,C : INTEGER ) return WHATEVER is
begin
return M( N1VEC( A ) + N2VEC( B ) + C ) ;
end GET_FAST_ARRAY ;
procedure SET_FAST_ARRAY( M : in out MEMBLOCK; a,b,c : INTEGER;
INPUT : WHATEVER ) is
begin
M( N1VEC( A ) + N2VEC( B ) + C ) := INPUT ;
end SET_FAST_ARRAY ;
This just an algorithmic description, I haven't compiled it.
But you get the idea. And this will work for column-major,
row-major, 3rd-dimension-major, whatever.
It's essentially just having a look-up table for frequently used multiplies.
--
Dennis O'Connor oconnor%sungod@steinmetz.UUCP
ARPA: OCONNORDM@ge-crd.arpa
(-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)
jbuck@epimass.EPI.COM (Joe Buck) (03/05/88)
In article <998@PT.CS.CMU.EDU> edw@IUS1.CS.CMU.EDU (Eddie Wyatt) writes: >> Multiplication is not necessary to access 2-D arrays if the array is >> set up like most arrays in C, where each row is a typical vector and >> the 2-D array is just a vector of pointers to each row vector. >EXCUSE ME, but when I declare an array to be an n by m matrix in >C (float foo[n][m]) I get a contiguous block of memory. The >representation IS NOT row-vector or column-vector. So when I access >index number i,j someone has to perform the calculation >base + sizeof(type)*(i*m+j). No, the compiler doesn't have to do things that way. It can create an extra row of pointers to the beginning of each row: p_foo[n], where p_foo[i] contains &foo[i][0]. Then foo[i][j] is at *p_foo[i] + j. No multiply required, but there might be a shift or two to accomodate two-byte or four-byte array elements. The old Fortran compiler for PDP-11's running RT-11 had an option to do this: you're trading off speed for space. They called it "array vectoring". The array itself is still allocated the way you think it is, but there are extra pointers. -- - Joe Buck {uunet,ucbvax,sun,<smart-site>}!epimass.epi.com!jbuck Old Internet mailers: jbuck%epimass.epi.com@uunet.uu.net
edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (03/06/88)
First comment about induction on loop variables. Even if 95% of array usage fits into this category, not all those applications may use it (or at least not without some serious dataflow analysis). As soon as a procedure or function call appears within the body of a loop, one can nolonger assume nice behaviour of induction variables or base address. I'm refering to the problems of aliasing. > > No, the compiler doesn't have to do things that way. It can create > an extra row of pointers to the beginning of each row: p_foo[n], > where p_foo[i] contains &foo[i][0]. Then foo[i][j] is at *p_foo[i] + > j. No multiply required, but there might be a shift or two to > accomodate two-byte or four-byte array elements. The old Fortran > compiler for PDP-11's running RT-11 had an option to do this: you're > trading off speed for space. They called it "array vectoring". The > array itself is still allocated the way you think it is, but there > are extra pointers. Interesting thought there - replace the representation. It may have been easy to do for fortran, but it seems to me that C may have some features that wouldn't allow it (or at least make it a pain in the ass). One is the interchangablity of pointer and array. Currently this is permissible (and easliy done) because the reps are the same - pointers to the data blocks. Therefore a array may be passed to a function that expects a pointer (and still have reasonable things happen). If an optional rep is used, the problem of parameter passing becomes determining which rep is intended to be used. Since function discriptions currently are not required to be within scope of the call, a correct choice can not be made. -- Eddie Wyatt e-mail: edw@ius1.cs.cmu.edu
crick@bnr-rsc.UUCP (Bill Crick) (03/12/88)
The comment aboutgood programmers putting the array pointers in registers is probably moot. Don't most modern(?) compilers ignore the register directive and do their own allocation?
bpendlet@esunix.UUCP (Bob Pendleton) (03/17/88)
From article <9788@steinmetz.steinmetz.UUCP>, by oconnor@sungoddess.steinmetz (Dennis M. O'Connor): > An article by daveb@geac.UUCP (David Collier-Brown) says: > Why is it we have to reinvent the wheel so often ? N-dimensional > arrays, whether column or row major order, can always be accessed > without run-time multiplies. And can still be in a continuous block of > memory. The method for doing this is at least 14 years old : I heard > about it in 1974. If memory serves, the ancient FORTRAN-5 compiler on the Univac 1108 did arrays this way. The old beast had hardware support for subscripting arrays using index vectors. This means that the idea goes back into the late '50s at least. -- Bob Pendleton @ Evans & Sutherland UUCP Address: {decvax,ucbvax,ihnp4,allegra}!decwrl!esunix!bpendlet Alternate: {ihnp4,seismo}!utah-cs!utah-gr!uplherc!esunix!bpendlet I am solely responsible for what I say.
chasm@killer.UUCP (Charles Marslett) (03/17/88)
In article <641@bnr-rsc.UUCP>, crick@bnr-rsc.UUCP (Bill Crick) writes: > The comment aboutgood programmers putting the array pointers in > registers is probably moot. Don't most modern(?) compilers ignore > the register directive and do their own allocation? If my experience is any indication, register usage in modern micro C compilers is drifting toward the the golden ideal of 1970-era Fortran compilers (IBM's Fortran H for example) -- Microsoft C 5.1 seems to completly ignore register declarations except for generating some warnings if they are used incorrectly and earlier beta copies of 5.1 seemed to have real problems with having more than 4 or 5 register declarations (couldn't juggle its own register allocation and the programer's too!). At least one of the 68000 compilers I used a couple of years ago also had the characteristic that if it needed a register (for its optimization) it would take it from your set of register variables and might (sometimes) let you use the register elsewhere in the code. This is a much harder task for the 68000 since there are more candidates for register variables and optimization sequences. I guess my point is, if the demand for good code generation from compilers wins out over fast debugging cycles for compiler company resources, we may reach the stage where the author of the program need not worry about the architecture of the underlying machine when writing code because the compiler writer is optimizing generic-CPU code well enough that he can ignore the application writer's optimizations. I hope we get there before everyone gets sidetracked to more "interesting" or "marketable" features in compilers -- whatever they might be in 1992. ======================== Charles Marslett |Said the iconoclast: | chasm@killer.UUCP | This is better? | ========================
fouts@orville.nas.nasa.gov (Marty Fouts) (03/18/88)
In article <3723@killer.UUCP> chasm@killer.UUCP (Charles Marslett) writes: > >I guess my point is, if the demand for good code generation from compilers >wins out over fast debugging cycles for compiler company resources, we >may reach the stage where the author of the program need not worry about >the architecture of the underlying machine when writing code because the >compiler writer is optimizing generic-CPU code well enough that he can >ignore the application writer's optimizations. I hope we get there before >everyone gets sidetracked to more "interesting" or "marketable" features >in compilers -- whatever they might be in 1992. > I used to hope this to, until I started writing code which runs on a wide variety of architectures. I suspect that the best the compiler will be able to do (which is actually quite good) is fairly local optimization in the sense of changes which implement an equivalent algorithm to that expressed in the original program, rather than solving an equivalent problem, but perhaps with a different algorithm. For example, there are two basic algorithms for finding all of the primes less than some integer. Once can apply the sieve algorithm explicitly by marking all of the multiples of each prime as it becomes known, or one can divide each candidate integer by all primes less than the square root of the integer, looking for an exact divisor. On most of the machines I've coded these algorithms on, the explicit sieve is a major win because counting is much faster than division. However, on the Connection Machine, the division algorithm is faster. Although I expect various compilers to optimize the loop invariant constructs out of my sieve, I would be surprised if any of them converted it into a division finder, even on the machines where that was the better approach. I don't think we'll reach the point where authors never need to worry, but I do think we'll reach the point where once the machine is chosen, the author can depend on the compiler to get the best performance out of a particular choice of algorithm. It then becomes the author's problem to figure out which algorithm has the best chance of winning on the class of machines the program is intended for.