[comp.arch] 32-bit CPUs

rwa@auvax.UUCP (Ross Alexander) (05/09/87)

In article <4016@necntc.NEC.COM>, pec@necntc.NEC.COM (Paul Cohen) writes:
> 		_______________________________________________
> 		| struct sttyp {
> 		|	unsigned first, second, third;
> 		|	struct sttyp *fourth;
> 		|	double fifth, a,b,c,d,e;} *stru
> C code:		|
> 		|
> 		| stru->fourth->fourth->third = 2;
> 		|==============================================
> 		| mov.w _stru+0xc,r0	# &(stru->fourth) in R0
> Assembly Code:	|
> 		| mov.w #2,0x8[0xc[r0]]	# stru->fourth->
> 		|			#    fourth->third = 2;
> 		|______________________________________________
> No doubt about it, the V70 is a complex chip; it is also fast.  It packs 
> a great deal of functionality to provide high performance at a reasonable 
> cost in a real system.  


I don't want to rain on anybody's parade, especially on such tenuous
evidence as this...  but how often do you, dear reader, write
'w->x->x->z = 2;' ?  perhaps not very often ?  Or another way of
phrasing this might be:  "your machine is a _h*ll_ of a lot smarter
than I am !".  Now quite honestly, I am no great screaming h*ll as far
as brains go (you may quote me on that :-} ) but the example has a
contrived feel to it; perhaps some statistical analysis of C
frequencies (i.e., real usage) would be apropos in justifiying the
significance of the above ?  

But I would love to believe that new horizons of speed and efficiency are
just waiting for me!  please don't take the above mild reproof as a wild
flame.  I just want to hear some real, informed, constructive information
on the chip.  And your information is very tantalizing.  It's just
not very significant, and that bothers me.

...!ihnp4!alberta!auvax!rwa		Ross Alexander, Athabasca University

snappy quote: (very roughly) "the set of all sets is disallowed"
(Russel, Lord Whitehead);  "oh, really ?!" (Kurt Goedel).

[ I don't claim to understand the above, my cats did it... I wish I did ]

tyler@drivax.UUCP (William Tyler) (05/11/87)

In article <162@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) criticizes the
following example C code & V60/V70 implementation as being contrived.

 >> 		_______________________________________________
 >> 		| struct sttyp {
 >> 		|	unsigned first, second, third;
 >> 		|	struct sttyp *fourth;
 >> 		|	double fifth, a,b,c,d,e;} *stru
 >> C code:	|
 >> 		|
 >> 		| stru->fourth->fourth->third = 2;
 >> 		|==============================================
 >> 		| mov.w _stru+0xc,r0	# &(stru->fourth) in R0
 >> Assembly:	|
 >> 		| mov.w #2,0x8[0xc[r0]]	# stru->fourth->
 >> 		|			#    fourth->third = 2;
 >> 		|______________________________________________


In programs dealing with complex data structures, this sort of construction 
comes up fairly often.  Just as an exercise, I used egrep on one modest sized
(about 1K lines) C program of mine that deals primarily with data structure
manipulation.  I came up with 2 pages of lines that contained '->' at least
three times, most of them in constructs roughly equivalent to the example
above.  In the context of the application, the usages were not contrived, 
nor were they particularly difficult to understand.  
-- 

Bill Tyler ... {seismo,hplabs,sun,ihnp4}!amdahl!drivax!tyler

rwa@auvax.UUCP (Ross Alexander) (05/14/87)

In article <1526@drivax.UUCP>, tyler@drivax.UUCP (William Tyler)
justly takes me to task for an intemperate criticism of the use of
'a->b->c->d=2;' as a meaningful example of code generation for a very
CISCy processor (this is the essence of the example under
discussion).  I apologize profusely for a rather hectoring tone and
overbearing attitude.  One shouldn't post to the net after happy hour
:-(, I won't do that again.  

On a more positive note, Mr. Tyler provides some concrete evidence
that triple indirection is indeed used and useful.  I concur.  I
have written hacks like that in assembly and C many times, and was
immoderately pleased with them at the time :-).  I just wonder if
this construction is common enough and useful enough to deserve
silicon real estate for it's support.  It seems to me that this sort
of thing ought rather to be done in user code rather than microcode,
and that the user should take the hit rather than the system.  This
arguement is predicated on a belief that there are more constructive
things to do with the real estate, such as (for example) wallace
tree multipliers and barrel shifters, large register files, and so
on ad naseaum (I am not a 'real' hardware hacker, just a dilettante).

In support of this position, I have just spent a happy half hour
grepping my kernel sources (Ultrix 1.2 with Decnet support).  A few
facts emerged:  there were 365 occurences of 'a->b->c'
constructions.  Most of these were in the sockets and inet code (and
in the decnet code too, interestingly enough), and the remainder
were scattered around with clustering in the vm code and the buss
autoconfiguration code.  There were 5 occurences of 'a->b->c->d',
which I consider rather herculean, and 0 occurences of
'a->b->c->d->e', which came as no suprize (to me).  Auvax spends
from 25 to 50 percent of it's time executing the kernel, since our
installation workload emphasizes character pushing (we drive lots of
laser printers, et c.) and also database applications.

My conclusion from this evidence is that deep levels of indirection
a la a->b->c... are a lightly used feature.  I welcome constructive
discussion.

By the way :-) the H6050, a real beater of a 1960's-architecture
machine, could do the original example in at most two instructions (if
the compiler was _incredibly_ clever and we make an assumption or two
about where operands are; I would have to think carefully for about
20 minutes to write the equivalent assembly.  I think it would come
out to something like 'ldx0 2,du; stxl0 fake,i' where fake is the
beginning of a long descriptor chain dragging in a, b, c, establishing
the offset of d, et c., et c.  It's been years.)

...!ihnp4!alberta!auvax!rwa	Ross Alexander, Athabasca University

pec@necntc.NEC.COM (Paul Cohen) (05/18/87)

I earlier posted the telephone numbers for obtaining documentation on
the V60 or V70 (or any other NEC components):

		1-800-NEC-ELEC (California)
		1-800-NEC-ELE1 (During California working hours)
		0049-211-6503-333 (Europe)

Apparently, there are problems with dialing the 800 numbers from places
like Canada and Mexico, so I would like to add another number to this 
list:

		1-415-960-6000 ext. 6158
		
Another possibility is to write to 

		NEC Electronics
		401 Ellis Street
		P.O. Box 7241
		Mountainview, CA
			94039

daniels@cae780.UUCP (05/20/87)

In article <166@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes:
 [ discussion about the frequency of a->b->c->d ]
>365 occurences of 'a->b->c'
>5 occurences of 'a->b->c->d'
>0 occurences of >'a->b->c->d->e'
>My conclusion from this evidence is that deep levels of indirection
>a la a->b->c... are a lightly used feature.

One thing that is being overlooked here is the layers of effective "p->f"
which are not in the source code.  Typically each procedure has a "frame 
pointer" which points to its argument and local variables area.  This means
that code like: 
	prog(ptr) struct some_type *ptr; 
	{
	 ptr->field->link_field = 0;
	}
translates to:
	 frame_ptr->arg_offset_ptr->field->link_field = 0;

The problem is worse when you consider languages like Algol and Pascal
Which provide definitions of functions inside functions and allow up-level
access of the containing functions variables.  For example:
	some_function( ptr : Pascal_pointer_type ): int
	   inner_function(): int
	   begin
	    with ptr^ do inner_function := 12 + numeric_field
	   end;
   begin
    some_function := Another_function( inner_function )
   end;

The code for inner function is like:
	   return 12 + ptr^.numeric_field 
	== return 12 + ptr->numeric_field 
	== return 12 + inner_function_frame->parent_ptr->arg_offset_ptr->field 

So the multi-level structure offset is not quite so wierd as it seems
at first glimmer (the compiler is generating some extra layers).

FROM:   Scott Daniels, Tektronix CAE
	5302 Betsy Ross Drive, Santa Clara, CA  95054
UUCP:   tektronix!teklds!cae780!daniels
	{ihnp4, decvax!decwrl}!amdcad!cae780!daniels 
        {nsc, hplabs, resonex, qubix, leadsv}!cae780!daniels 

earl@mips.UUCP (05/20/87)

I agree with Scott Daniels and Ross Alexander that a->b->c and such
are definitely not silly examples.  I write such constructs
frequently.  But that does not necessarily mean it is a good idea to
add an instruction to implement them.  Perhaps someone with a data
sheet can post the cycle count for these instructions so we can
compare.

An R2000 will do a load of or a store to a->b->c in 2 - 4 cycles
depending on how well the load delays are scheduled (we typically
schedule 75% of these so say 2.5 cycles).  a->b->c->d in 3 - 6 (3.75).
I'm assuming a is in a register, which with the MIPS compiler is a
fairly safe assumption.

The ability to schedule the load delays is an excellant reason NOT to
provide such an addressing mode.  If you implement the mode, you'll
just find your microcode waiting all the time.  If you generate
separate instructions and let the compiler schedule them, then most of
the time you won't wait at all.

Note that I'm assuming that hardware can't take the output of the
cache, do an add to get the new address, perhaps translate it, and
feed it back to the cache in a single cycle.  If it took a single
cycle, I'd say the cycle time were artificially slow.  The R2000 takes
two cycles to do this, so loads have a delay of one cycle before the
result is usable.

johnw@astroatc.UUCP (05/26/87)

In article <3962@cae780.TEK.COM> daniels@cae780.UUCP (Scott Daniels) writes:
>In article <166@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes:
> [ discussion about the frequency of a->b->c->d ]
>One thing that is being overlooked here is the layers of effective "p->f"
>which are not in the source code.  Typically each procedure has a "frame 
>pointer" which points to its argument and local variables area.  This means
>that code like: 
>	prog(ptr) struct some_type *ptr; 
>	{
>	 ptr->field->link_field = 0;
>	}
>translates to:
>	 frame_ptr->arg_offset_ptr->field->link_field = 0;
>
>The problem is worse when you consider languages like Algol and Pascal
>Which provide definitions of functions inside functions and allow up-level

These question all depend heavily on the machine architecture
*AND* on the compiler implementation!

To generalize:

All compiler implementors will make a maximum effort to keep the 
frame pointer in a register (ie on anything short of mem-mem or 
accumulator only reg machine).

For languages like pascal, it is COMMON to use "displays" which
means that a register is used for the `frame pointer' of each
staticly nested function or procedure.  (Thus all accesses are
at worst: reg->variable (in C notation).)  For any language where
the compiler can COUNT the arguments [YES] or the local vars
[probably] (but NOT necessarily both) it is possible to reference
both from a signle pointer-reg. (this assumes a signed index)
If this doesn't work, then you can have 2 pointer-reg's (one for
agrs, one for locals)

This *SEEMS* to indicate that an addressing mode for 
"MEM [reg+const]" is worth having (hopefully with a one cycle 
operand fetch).  Of course one must PROVE this with simulation
(or some such) results!

As for the multiple indirects and other addressing modes, there
appears to be little gain there, especially in light of
overlapping loads.  I vote for simpler and faster!

Also note that even the NEC chip had to generate multiple
instructions for multiple -> operators!

Another argument in favor of RISCy machines and GOOD code
schedulers:  we got a 2:1 improvement by "tuning" our code
scheduler....maybe I should say "fixing" instead of "tuning"
(No, both were run *WITHOUT* the optimizer!)


			John W

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Name:	John F. Wardale
UUCP:	... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw
arpa:   astroatc!johnw@rsch.wisc.edu
snail:	5800 Cottage Gr. Rd. ;;; Madison WI 53716
audio:	608-221-9001 eXt 110

To err is human, to really foul up world news requires the net!