rwa@auvax.UUCP (Ross Alexander) (05/09/87)
In article <4016@necntc.NEC.COM>, pec@necntc.NEC.COM (Paul Cohen) writes: > _______________________________________________ > | struct sttyp { > | unsigned first, second, third; > | struct sttyp *fourth; > | double fifth, a,b,c,d,e;} *stru > C code: | > | > | stru->fourth->fourth->third = 2; > |============================================== > | mov.w _stru+0xc,r0 # &(stru->fourth) in R0 > Assembly Code: | > | mov.w #2,0x8[0xc[r0]] # stru->fourth-> > | # fourth->third = 2; > |______________________________________________ > No doubt about it, the V70 is a complex chip; it is also fast. It packs > a great deal of functionality to provide high performance at a reasonable > cost in a real system. I don't want to rain on anybody's parade, especially on such tenuous evidence as this... but how often do you, dear reader, write 'w->x->x->z = 2;' ? perhaps not very often ? Or another way of phrasing this might be: "your machine is a _h*ll_ of a lot smarter than I am !". Now quite honestly, I am no great screaming h*ll as far as brains go (you may quote me on that :-} ) but the example has a contrived feel to it; perhaps some statistical analysis of C frequencies (i.e., real usage) would be apropos in justifiying the significance of the above ? But I would love to believe that new horizons of speed and efficiency are just waiting for me! please don't take the above mild reproof as a wild flame. I just want to hear some real, informed, constructive information on the chip. And your information is very tantalizing. It's just not very significant, and that bothers me. ...!ihnp4!alberta!auvax!rwa Ross Alexander, Athabasca University snappy quote: (very roughly) "the set of all sets is disallowed" (Russel, Lord Whitehead); "oh, really ?!" (Kurt Goedel). [ I don't claim to understand the above, my cats did it... I wish I did ]
tyler@drivax.UUCP (William Tyler) (05/11/87)
In article <162@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) criticizes the following example C code & V60/V70 implementation as being contrived. >> _______________________________________________ >> | struct sttyp { >> | unsigned first, second, third; >> | struct sttyp *fourth; >> | double fifth, a,b,c,d,e;} *stru >> C code: | >> | >> | stru->fourth->fourth->third = 2; >> |============================================== >> | mov.w _stru+0xc,r0 # &(stru->fourth) in R0 >> Assembly: | >> | mov.w #2,0x8[0xc[r0]] # stru->fourth-> >> | # fourth->third = 2; >> |______________________________________________ In programs dealing with complex data structures, this sort of construction comes up fairly often. Just as an exercise, I used egrep on one modest sized (about 1K lines) C program of mine that deals primarily with data structure manipulation. I came up with 2 pages of lines that contained '->' at least three times, most of them in constructs roughly equivalent to the example above. In the context of the application, the usages were not contrived, nor were they particularly difficult to understand. -- Bill Tyler ... {seismo,hplabs,sun,ihnp4}!amdahl!drivax!tyler
rwa@auvax.UUCP (Ross Alexander) (05/14/87)
In article <1526@drivax.UUCP>, tyler@drivax.UUCP (William Tyler)
justly takes me to task for an intemperate criticism of the use of
'a->b->c->d=2;' as a meaningful example of code generation for a very
CISCy processor (this is the essence of the example under
discussion). I apologize profusely for a rather hectoring tone and
overbearing attitude. One shouldn't post to the net after happy hour
:-(, I won't do that again.
On a more positive note, Mr. Tyler provides some concrete evidence
that triple indirection is indeed used and useful. I concur. I
have written hacks like that in assembly and C many times, and was
immoderately pleased with them at the time :-). I just wonder if
this construction is common enough and useful enough to deserve
silicon real estate for it's support. It seems to me that this sort
of thing ought rather to be done in user code rather than microcode,
and that the user should take the hit rather than the system. This
arguement is predicated on a belief that there are more constructive
things to do with the real estate, such as (for example) wallace
tree multipliers and barrel shifters, large register files, and so
on ad naseaum (I am not a 'real' hardware hacker, just a dilettante).
In support of this position, I have just spent a happy half hour
grepping my kernel sources (Ultrix 1.2 with Decnet support). A few
facts emerged: there were 365 occurences of 'a->b->c'
constructions. Most of these were in the sockets and inet code (and
in the decnet code too, interestingly enough), and the remainder
were scattered around with clustering in the vm code and the buss
autoconfiguration code. There were 5 occurences of 'a->b->c->d',
which I consider rather herculean, and 0 occurences of
'a->b->c->d->e', which came as no suprize (to me). Auvax spends
from 25 to 50 percent of it's time executing the kernel, since our
installation workload emphasizes character pushing (we drive lots of
laser printers, et c.) and also database applications.
My conclusion from this evidence is that deep levels of indirection
a la a->b->c... are a lightly used feature. I welcome constructive
discussion.
By the way :-) the H6050, a real beater of a 1960's-architecture
machine, could do the original example in at most two instructions (if
the compiler was _incredibly_ clever and we make an assumption or two
about where operands are; I would have to think carefully for about
20 minutes to write the equivalent assembly. I think it would come
out to something like 'ldx0 2,du; stxl0 fake,i' where fake is the
beginning of a long descriptor chain dragging in a, b, c, establishing
the offset of d, et c., et c. It's been years.)
...!ihnp4!alberta!auvax!rwa Ross Alexander, Athabasca University
pec@necntc.NEC.COM (Paul Cohen) (05/18/87)
I earlier posted the telephone numbers for obtaining documentation on the V60 or V70 (or any other NEC components): 1-800-NEC-ELEC (California) 1-800-NEC-ELE1 (During California working hours) 0049-211-6503-333 (Europe) Apparently, there are problems with dialing the 800 numbers from places like Canada and Mexico, so I would like to add another number to this list: 1-415-960-6000 ext. 6158 Another possibility is to write to NEC Electronics 401 Ellis Street P.O. Box 7241 Mountainview, CA 94039
daniels@cae780.UUCP (05/20/87)
In article <166@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes: [ discussion about the frequency of a->b->c->d ] >365 occurences of 'a->b->c' >5 occurences of 'a->b->c->d' >0 occurences of >'a->b->c->d->e' >My conclusion from this evidence is that deep levels of indirection >a la a->b->c... are a lightly used feature. One thing that is being overlooked here is the layers of effective "p->f" which are not in the source code. Typically each procedure has a "frame pointer" which points to its argument and local variables area. This means that code like: prog(ptr) struct some_type *ptr; { ptr->field->link_field = 0; } translates to: frame_ptr->arg_offset_ptr->field->link_field = 0; The problem is worse when you consider languages like Algol and Pascal Which provide definitions of functions inside functions and allow up-level access of the containing functions variables. For example: some_function( ptr : Pascal_pointer_type ): int inner_function(): int begin with ptr^ do inner_function := 12 + numeric_field end; begin some_function := Another_function( inner_function ) end; The code for inner function is like: return 12 + ptr^.numeric_field == return 12 + ptr->numeric_field == return 12 + inner_function_frame->parent_ptr->arg_offset_ptr->field So the multi-level structure offset is not quite so wierd as it seems at first glimmer (the compiler is generating some extra layers). FROM: Scott Daniels, Tektronix CAE 5302 Betsy Ross Drive, Santa Clara, CA 95054 UUCP: tektronix!teklds!cae780!daniels {ihnp4, decvax!decwrl}!amdcad!cae780!daniels {nsc, hplabs, resonex, qubix, leadsv}!cae780!daniels
earl@mips.UUCP (05/20/87)
I agree with Scott Daniels and Ross Alexander that a->b->c and such are definitely not silly examples. I write such constructs frequently. But that does not necessarily mean it is a good idea to add an instruction to implement them. Perhaps someone with a data sheet can post the cycle count for these instructions so we can compare. An R2000 will do a load of or a store to a->b->c in 2 - 4 cycles depending on how well the load delays are scheduled (we typically schedule 75% of these so say 2.5 cycles). a->b->c->d in 3 - 6 (3.75). I'm assuming a is in a register, which with the MIPS compiler is a fairly safe assumption. The ability to schedule the load delays is an excellant reason NOT to provide such an addressing mode. If you implement the mode, you'll just find your microcode waiting all the time. If you generate separate instructions and let the compiler schedule them, then most of the time you won't wait at all. Note that I'm assuming that hardware can't take the output of the cache, do an add to get the new address, perhaps translate it, and feed it back to the cache in a single cycle. If it took a single cycle, I'd say the cycle time were artificially slow. The R2000 takes two cycles to do this, so loads have a delay of one cycle before the result is usable.
johnw@astroatc.UUCP (05/26/87)
In article <3962@cae780.TEK.COM> daniels@cae780.UUCP (Scott Daniels) writes: >In article <166@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes: > [ discussion about the frequency of a->b->c->d ] >One thing that is being overlooked here is the layers of effective "p->f" >which are not in the source code. Typically each procedure has a "frame >pointer" which points to its argument and local variables area. This means >that code like: > prog(ptr) struct some_type *ptr; > { > ptr->field->link_field = 0; > } >translates to: > frame_ptr->arg_offset_ptr->field->link_field = 0; > >The problem is worse when you consider languages like Algol and Pascal >Which provide definitions of functions inside functions and allow up-level These question all depend heavily on the machine architecture *AND* on the compiler implementation! To generalize: All compiler implementors will make a maximum effort to keep the frame pointer in a register (ie on anything short of mem-mem or accumulator only reg machine). For languages like pascal, it is COMMON to use "displays" which means that a register is used for the `frame pointer' of each staticly nested function or procedure. (Thus all accesses are at worst: reg->variable (in C notation).) For any language where the compiler can COUNT the arguments [YES] or the local vars [probably] (but NOT necessarily both) it is possible to reference both from a signle pointer-reg. (this assumes a signed index) If this doesn't work, then you can have 2 pointer-reg's (one for agrs, one for locals) This *SEEMS* to indicate that an addressing mode for "MEM [reg+const]" is worth having (hopefully with a one cycle operand fetch). Of course one must PROVE this with simulation (or some such) results! As for the multiple indirects and other addressing modes, there appears to be little gain there, especially in light of overlapping loads. I vote for simpler and faster! Also note that even the NEC chip had to generate multiple instructions for multiple -> operators! Another argument in favor of RISCy machines and GOOD code schedulers: we got a 2:1 improvement by "tuning" our code scheduler....maybe I should say "fixing" instead of "tuning" (No, both were run *WITHOUT* the optimizer!) John W - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: John F. Wardale UUCP: ... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw arpa: astroatc!johnw@rsch.wisc.edu snail: 5800 Cottage Gr. Rd. ;;; Madison WI 53716 audio: 608-221-9001 eXt 110 To err is human, to really foul up world news requires the net!