chris@mimsy.UUCP (Chris Torek) (09/07/88)
In article <5654@june.cs.washington.edu> pardo@june.cs.washington.edu (David Keppel) writes: >I believe that the VAX "movc" command takes arbitrary pointers and >does the following: > >* If both are word-aligned, do a word copy (I mean a 4-byte word). >* If both are non-aligned and could be aligned with 1, 2, or 3 bytes > of byte-copy at either end, then do a byte copy at either end and do > a word copy down the middle. >* If niether aligned then ?? > >Unfortunately, my VAX hardware reference is out of town for a couple >of weeks, so I can't ask him about neither aligned. Anybody know? I do not *know*, but I predict that the answer is machine-dependent: that BI machines use octaword transfers, while SBI machines use quadword transfers and CMI machines use longword transfers. I believe that at least the 780 and faster VAXen have an alignment network, and that the microcode can use this directly, so that even if the two addresses cannot become simultaneously aligned, the copy can proceed as if they were, with intermediate 64-bit results accumulated in a series of latches behind the alignment network. Incidentally, the microcode has a harder job than simply aligning: The formats of the two instructions are movc3 count.rw,src.ab,dst.ab and movc5 srclen.rw,src.ab,fill.rb,dstlen.rw,dst.ab (r = read-reference, a = address-reference; b = byte, w = word; these tell how the argument is used and what increments and shifts are applied to postincrement, predecrement, and indexed addressing modes). In both cases, if the source and destination overlap, the copy is done in whichever direction is nondestructive. Alas, since the count (movc3) and length (movc5) fields are only read as words, one instruction can move at most 65535 bytes. To make these work as a general copy routine one must surround these with loops which also must determine the appropriate direction; moreover, since the results are left in specific registers (r0..r5) the loops must be carefully written so as to hold the source and destination fields in the appropriate result-registers to avoid unnecessary moves. (Fortunately, a C compiler has the sizes of structures available directly, and can generate the proper series of movc3 instructions for a structure assignment, but in fact the 4BSD PCC cheats and assumes that no structure contains more than 65535 bytes. You get a compiler error if you try to assign one larger than this. Well, at least it does not generate bad code....) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
cjh@hpausla.HP.COM (Clifford Heath) (09/26/88)
I played with Duffs device on an HP 9000/850 (RISC machine), and got some interesting results. Duffs is faster than the comparable non-unrolled loop, but only by about 20-30%. memcpy was heaps faster, so I looked at the (memcpy) assembly code using a debugger. As a result of this I changed the unrolling factor in Duff's to 4 (not much change), changed the auto-incr pointer addressing to short offset indexing (using a pointer adjustment before the loop and a single increment before the while) and got about 30% more. The 850 has auto-increment, but it still takes time that doesn't need to be wasted. It also has a good global optimizer, which seemed to do sensible things even for this strange device. Duffs's was STILL slower than memcpy by about 50%, and couldn't handle byte-size moves, non-aligned moves etc etc. Duff's is really only a way of saving the code size required to perform the additional moves left after the unrolled loop has run, which is a fairly poor excuse for using a device that's so hard to read. The only additional benefit is that the extra instructions may be in the I-cache, which isn't really such a big deal. The memcpy on the 850 is quite an astonishing effort, using word moves with double register 8/16/24 bit shifts for unequally non-aligned moves. It also has a very small setup time, so that small moves get caught early and handled quickly. Congratulations to the coder, a very good effort. Before this experiment, I was convinced that C with a good optimizer could get within 10% of assembly code for anything. I now have a convincing counter-example. In short, use the system-supplied routines for preference, and if they prove to be slow, replace them yourself AND SEND THE CODE to the company that wrote it. They'll probably be grateful. Clifford Heath, Hewlett Packard Australian Software Operation. (UUCP: hplabs!hpfcla!hpausla!cjh, ACSnet: cjh@hpausla.oz)
mcdonald@uxe.cso.uiuc.edu (09/28/88)
>In short, use the system-supplied routines for preference, and if they >prove to be slow, replace them yourself AND SEND THE CODE to the company >that wrote it. They'll probably be grateful. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ They won't be grateful. They (particularly IBM) won't even look at it. IF you send code to IBM it gets looked at by a special person whose job it is to see if the code is USER WRITTEN APPLICATIONS CODE illustrating a bug in THEIR software. If it is that, this person then sends a description off to the responsible group. If, on the other hand, you send in a proposed improvement in THEIR software, two things may happen: one is that the special filter-person shreds you suggestions and then goes off to special super-secret room where, using the fruits of super-secret research, his brain is wiped of all memory of the event. OR, he sends it to their legal department for legal action: they sue the sender for having looked inside their software to find the bad code. Big companies aren't interested in suggestions for improvements direct from customers. They are afraid that if they were to even look at it, someone else might have used it in the past and could sue them. They want to code THEIR WAY and only their way. Indirectly, of course, they must know whether their code is any good (through benchmarks and user comments to support reps).
jps@wucs1.wustl.edu (James Sterbenz) (10/03/88)
In article <46500026@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes: >>In short, use the system-supplied routines for preference, and if they >>prove to be slow, replace them yourself AND SEND THE CODE to the company >>that wrote it. They'll probably be grateful. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >They won't be grateful. They (particularly IBM) won't even look at it. >IF you send code to IBM it gets looked at by a special person whose >job it is to see if the code is USER WRITTEN APPLICATIONS CODE >illustrating a bug in THEIR software. If it is that, this person then >sends a description off to the responsible group. If, on the >other hand, you send in a proposed improvement in THEIR software, >two things may happen: one is that the special filter-person shreds >you suggestions and then goes off to special super-secret room where, >using the fruits of super-secret research, his brain is wiped of >all memory of the event. OR, he sends it to their legal department >for legal action: they sue the sender for having looked inside their >software to find the bad code. Big companies aren't interested >in suggestions for improvements direct from customers. They are afraid >that if they were to even look at it, someone else might have used >it in the past and could sue them. They want to code THEIR WAY and >only their way. ... There are various official mechanisms for suggesting improvements to products of most companies, IBM included. For IBM its called (I beleive) a PASR. Much of IBM source code is liscenced, in which case (assuming you're liscenced for the code you're using) there's nothing wrong with looking at, modifying, and making suggestions for improvement of code. If, on the other hand, you've disassembled an OCO (object code only) program, that might be another matter. There are a lot of program IBM liscences that were WRITTEN by users, These are in a category called program offerings (used to be installed user programs). These are normally offered 'as-is', but if IBM likes them enough, they will take over full support and development.-- James Sterbenz Computer and Communications Research Center Washington University in St. Louis 314-726-4203 INTERNET: jps@wucs1.wustl.edu UUCP: wucs1!jps@uunet.uu.net