[comp.arch] Bad RISC

terry@wsccs.UUCP (terry) (02/28/88)

In article <179@wsccs.UUCP> I write:
>We have been able to port to all machines we attempted, except ...  SUN's new
>... RISC machine ...
>
>THE REASON:  Type-casting.  You can't.  Small programs seem to, but it doesn't
>work.  Bytes tend to be word aligned.  Other messy stuff.  It was not a
>pretty sight (site?).  I am sure there are other problems, but geez, this is
>demonstrably portable code.

I have since been asked to post a more complete description of the problem.

> Could you post a more complete description of the problem?
> -- 
>   Lawrence Crowl		716-275-9499	University of Rochester
>		      crowl@cs.rochester.edu	Computer Science Department
> ...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

	Ok, here goes.  I can't give an exact example, as I have signed a
non-disclosure agreeement.  The RISC machine primarily attacked my boss,
not me, so I can't fake one up that I can guarantee to work.

	1) The processor thinks in words (bus-words) always;  this means if
	   you want to do something, even if it is manipulation of character
	   data, it is stored in words.  The problem comes in that the C
	   compiler on the machine did not make that distinction, and broke
	   on parameter passes to functions, where a single character element
	   is sign-extended to int to be passed.  This means that the top bits
	   were not cleared.  As this was the portable C compiler, I blame it
	   on the architecture not taking normal (read existing) operations
	   into account.  If the chip supported byte references, this would not
	   have happened.

	2) Implicit type casting, which tries to take advantage of the register
	   architecture of the machine it runs on does not always work as is
	   expected:

		float val = 314.15926535;
		float j;

		j = val % 75.0123;

	   or something similar using an integer-only operator does not
	   correctly truncate values past the decimal.

	3) Non-aligned var-args code not using the varargs definition, but
	   rather relying on the method a standard architecture machine has
	   of plunking arguments into a function (no, I do *NOT* refer herein
	   to the argument push order!) stack to be read by a function leaves
	   it indeterminate as to whether the function arguments are pointers
	   or what.  For instance

		int foo;

		foo = 'x';	/* possibly buggy implicit cast*/

		printf( "the character %c has an ascii value of ");
		printf( "%d decimal and %x hex", foo, foo);

	   apparently works, but if you write your own function output and call
	   it in place of printf, using the argc/argv style of parameter
	   passing, the processor apparently confuses the stack.  (Yes, I
	   _know_ function prototypes would probably alleviate this, but
	   until some kind soul rewrites all the C code out there, I will
	   almost always be referring to K&R C).

	4) Passing back pointers from functions whose type is resolved by the
	   linker is dangerous (I don't write code like this anyway, but if
	   you want an example try 'gets()'), and passing integer values back
	   to character assignment statements (getc()) is also unhappy.

	I realize that most of my examples are vague;  I don't keep broken
code around so I can pull it up as an example; I 'rm' it.  The only examples
of code that wouldn't port belong to another company, so I can't post them so
you can break it yourself.

	I also realize that even if you accept my examples at face value, they
appear to be flaws in the portable C compiler, some of which can be taken out
simply by adding an baroque code generator capable of producing the workarounds
for you... macro's to replace the instructions which aren't there because it is
a RISC machine.  Such macro's sort of defeat the purpose of having a RISC
machine... what's the use of having a 'more compact' instruction set if you
just have to generate more of them?

	While a new code generator is an agreed upon necessity for a new
processor, the compiler itself should not have to be changed... quad's are
supposed to be quad's.  If you have to change the compiler, you may avoid
changing your own code, but obviously you still have to change code (the
compiler) which is supposed to be portable anyway.

	I think the whole problem could have been avoided if SUN had seen
fit to provide byte/short manipulation instructions, even primative ones,
in their chip.  If they're there, the should have provided dos on them to
the guys who had to write the code generator.  Either way, users are going
to pay the price in execution time if they have to have an additional layer
of instructions between their executable code and their machine.

	I can't really be a bigot about having to write extra code to make
a machine do something it apparently wasn't designed to do... after all, I
run MS-DOS on my Amiga and dBase III .cmd files on a VAX ...but I feel that
either their developement tools need a rewrite (how are you going to fix the
assembler? :-) ) or the manner in which RISC design at Sun is approached
needs reconsideration.

My definition of portable:  Everthing I write, everthing my friends write, and
everthing I run that was written by somebody else, should run with minor
changes.  These changes should only be necessary as symptoms of differences
in O/S implementation philosophy, and not as differences in basic structure.
The programmer should know about the hardware he is running on in order to
make appropriate design decisions as far as speed of operation, but he should
not be _required_ to know (brow-beaten into it).  A user should never know,
or even suspect.  If a company says they run UNIX V.3, I should be able to
take my UNIX V.3 software off my 3B2 (or Arete 1100, or NCR Tower, etc) and
compile it with the same flags and have it run exactly the same.

Other definition of portability:  If it ran on UNIX and you ported it to VMS
so that all you had to do was #define UNIX or VMS, it should be relatively
trivial to port it to other machines, even those running Atari DOS wedge on a
6502 :-).

| Terry Lambert           UUCP: ...!decvax!utah-cs!century!terry              |
| @ Century Software       or : ...utah-cs!uplherc!sp7040!obie!wsccs!terry    |
| SLC, Utah                                                                   |
|                   These opinions are not my companies, but if you find them |
|                   useful, send a $20.00 donation to Brisbane Australia...   |

cruff@scdpyr.UUCP (Craig Ruff) (03/03/88)

In article <216@wsccs.UUCP> terry@wsccs.UUCP (terry) writes:
 >>We have been able to port to all machines we attempted, except ...  SUN's new
 >>... RISC machine ...

 >	1) The processor thinks in words (bus-words) always;  this means if
 >	   you want to do something, even if it is manipulation of character
 >	   data, it is stored in words.  The problem comes in that the C
 >	   compiler on the machine did not make that distinction, and broke
 >	   on parameter passes to functions, where a single character element
 >	   is sign-extended to int to be passed.  This means that the top bits
 >	   were not cleared.  As this was the portable C compiler, I blame it
 >	   on the architecture not taking normal (read existing) operations
 >	   into account.  If the chip supported byte references, this would not
 >	   have happened.

Wrong.  According to "The SPARC Architecture Manual", the SPARC includes
load operations for both signed and unsigned bytes and words.  Store operations
are present for both bytes and words.  As far as C goes, both chars and
shorts are extended to ints before being used in arithmetic or logical
operations and when being passed to functions as parameters.  The choice
about signed/unsigned chars during extension is usually made by the compiler
writers, with large hints from the underlying machine.  If you are relying
on operations being done at byte or word sizes from C source code, you
are relying on a compiler dependent feature.  As far as data alignment
goes, the "Porting Software to SPARC Systems" manual states "all quantities
must be aligned on boundaries corresponding to their sizes".  This means
a char followed by an int will have 3 bytes padding between.  This is true
for many machines.

 >	2) Implicit type casting, which tries to take advantage of the register
 >	   architecture of the machine it runs on does not always work as is
 >	   expected:
 >
 >		float val = 314.15926535;
 >		float j;
 >
 >		j = val % 75.0123;
 >
 >	   or something similar using an integer-only operator does not
 >	   correctly truncate values past the decimal.

All compilers I know of complain about non-integral operands for the %
operator.  K&R does not define a floating mod operator.

 >	3) Non-aligned var-args code not using the varargs definition, but
 >	   rather relying on the method a standard architecture machine has
 >	   of plunking arguments into a function (no, I do *NOT* refer herein
 >	   to the argument push order!) stack to be read by a function leaves
 >	   it indeterminate as to whether the function arguments are pointers
 >	   or what.  For instance
 >
 >		int foo;
 >
 >		foo = 'x';	/* possibly buggy implicit cast*/
 >		printf( "the character %c has an ascii value of ");
 >		printf( "%d decimal and %x hex", foo, foo);
 >
 >	   apparently works, but if you write your own function output and call
 >	   it in place of printf, using the argc/argv style of parameter
 >	   passing, the processor apparently confuses the stack.  (Yes, I
 >	   _know_ function prototypes would probably alleviate this, but
 >	   until some kind soul rewrites all the C code out there, I will
 >	   almost always be referring to K&R C).

I don't see what your example has to do with varargs.  When using varargs,
you must pay close attention to the manual.  When you pass values that
are widened (char -> short -> int, float -> double), you must use the widest
type in the va_arg call, then assign that to a variable of the appropriate
size.  You must also pay attention to signed/unsigned distinctions.  The
value passed to a function (i.e. pointer verses value) is alway defined
for C.  Non-pointer values (including whole structures and unions) are passed
by value.  Pointers (variables and explicit addresses (&var)) are passed
by value, which happens to be an address.

 >	4) Passing back pointers from functions whose type is resolved by the
 >	   linker is dangerous (I don't write code like this anyway, but if
 >	   you want an example try 'gets()'), and passing integer values back
 >	   to character assignment statements (getc()) is also unhappy.

Agree.  This is guaranteed to cause problems.  According to the manual
page for getc, the warning section talks about this.

 >	I also realize that even if you accept my examples at face value, they
 >appear to be flaws in the portable C compiler, some of which can be taken out
 >simply by adding an baroque code generator capable of producing the workarounds
 >for you... macro's to replace the instructions which aren't there because it is
 >a RISC machine.  Such macro's sort of defeat the purpose of having a RISC
 >machine... what's the use of having a 'more compact' instruction set if you
 >just have to generate more of them?

None of your examples point to flaws in the portable C compiler.  I gather
the macros you are refering to are the assembler macros that map common
operations into actual machine instructions.  Since this mapping is mostly
one onto one, these are equivalent, but just easier for humans (including
compiler writers) to understand.  RISC does not imply more compact instructions,
just regular instructions.

 >	I can't really be a bigot about having to write extra code to make
 >a machine do something it apparently wasn't designed to do... after all, I
 >run MS-DOS on my Amiga and dBase III .cmd files on a VAX ...but I feel that
 >either their developement tools need a rewrite (how are you going to fix the
 >assembler? :-) ) or the manner in which RISC design at Sun is approached
 >needs reconsideration.

You appear to be way off base here.

 >My definition of portable:  Everthing I write, everthing my friends write, and
 >everthing I run that was written by somebody else, should run with minor
 >changes.  These changes should only be necessary as symptoms of differences
 >in O/S implementation philosophy, and not as differences in basic structure.
 >The programmer should know about the hardware he is running on in order to
 >make appropriate design decisions as far as speed of operation, but he should
 >not be _required_ to know (brow-beaten into it).  A user should never know,
 >or even suspect.  If a company says they run UNIX V.3, I should be able to
 >take my UNIX V.3 software off my 3B2 (or Arete 1100, or NCR Tower, etc) and
 >compile it with the same flags and have it run exactly the same.

I agree, but this is the real world. :-)

 >Other definition of portability:  If it ran on UNIX and you ported it to VMS
 >so that all you had to do was #define UNIX or VMS, it should be relatively
 >trivial to port it to other machines, even those running Atari DOS wedge on a
 >6502 :-).

Not always.  Unix and VMS are more alike than different compared to many
micros and mainframe operating systems.
-- 
Craig Ruff      NCAR                         INTERNET: cruff@scdpyr.UCAR.EDU
(303) 497-1211  P.O. Box 3000                   CSNET: cruff@ncar.CSNET
		Boulder, CO  80307               UUCP: cruff@scdpyr.UUCP

chris@trantor.umd.edu (Chris Torek) (03/03/88)

[This is really a language argument, so I have redirected followups.]

In article <216@wsccs.UUCP> terry@wsccs.UUCP (terry) writes:
>... I can't give an exact example, as I have signed a
>non-disclosure agreeement.

Perhaps, then, you really have found problems with the Sun 4 C compiler;
but by your examples, you have found only problems with your code:

>manipulation of character data ... broke on parameter passes to
>functions, where a single character element is sign-extended to
>int to be passed.  This means that the top bits were not cleared.

This is the way (more precisely, one of two ways, and the one that
is implemented on the Vax, at that) in which char parameters are
supposed to work.  If you mean that

	f(c) char c; { if (*&c != c) printf("oops\n"); }
	main() { char x = 'x'; f(x); }

prints `oops' (or crashes), then that would indeed be a bug.  I
suspect, though, that you mean

	f(c) int c; { if (*(char *)&c ... )

fails.  Good!

>float val = 314.15926535;
>float j;
>j = val % 75.0123;

This is utterly undefined.  `%' works only on integral types.

>3) Non-aligned var-args code not using the varargs definition,

HAS NEVER BEEN SUPPORTED IN C (well, since V7).

>... if you write your own function output and call
>it in place of printf, using the argc/argv style of parameter
>passing,

you goofed.  As Ron Natalie put it, the compiler is under no
obligation to use any specific method of parameter passing.  It is
free to stuff them in an envelope and mail them to the function.
The ONLY portable way to write a `printf'-like function is to use
<varargs.h> (or, as per the dpANS, <stdarg.h>).

Let me guess:  You have never ported your code to the Pyramid
either.  Gosling Emacs used to have the same incorrect code;
when we got our Pyramid, I fixed the code, rather than (or perhaps,
I must admit, in addition to) complaining about the architecture.

>Passing back pointers from functions whose type is resolved by the
>linker is dangerous

Rather, it is nonsensical: the linker does not resolve types.
Failure to properly declare functions before calling them is
just that.  I myself have done it in the past; it is still wrong.
Even lint will tell you that.

>My definition of portable:  Everthing I write, everthing my friends write, and
>everthing I run that was written by somebody else, should run with minor
>changes.

Your definition is remarkably convenient for you.  It matches that
of no one else I know.  According to various standards documents,
the definition of `portable' is `code which uses the functions
defined by the standard in accordance with the semantics specified
by the standard'.  That is not what you have just described.  The
semantics defined by K&R are *very* loose, making it remarkably
easy to write unportable code.

>Other definition of portability:  If it ran on UNIX and you ported it to VMS
>so that all you had to do was #define UNIX or VMS, it should be relatively
>trivial to port it to other machines, even those running Atari DOS wedge on a
>6502 :-).

Portability is not defined by example.  If you insist on picking two
architectures, however, and claiming that code that runs on both with
no changes is `portable', you should pick two sufficiently different
architectures.  I suggest, rather, three: (1) Data General MV series;
(2) IBM PC using mixed model; (3) [something with 64 bit `long's].
You may substitute a PR1ME for the D/G.  (Both use different formats
for `char *' vs. `int *' or `other *', and are good at shaking out
bugs with, e.g., functions called from qsort.)
-- 
In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163
(still on trantor.umd.edu because mimsy is not yet re-news-networked)
Domain: chris@mimsy.umd.edu		Path: ...!uunet!mimsy!chris

petolino%joe@Sun.COM (Joe Petolino) (03/03/88)

>>We have been able to port to all machines we attempted, except ...  SUN's new
>>... RISC machine ...
>
>I have since been asked to post a more complete description of the problem.
>	Ok, here goes.   . . .
>
>	1) The processor thinks in words (bus-words) always;  this means if
>	   you want to do something, even if it is manipulation of character
>	   data, it is stored in words.  The problem comes in that the C
>	   compiler on the machine did not make that distinction, and broke
>	   on parameter passes to functions, where a single character element
>	   is sign-extended to int to be passed.  This means that the top bits
>	   were not cleared.  As this was the portable C compiler, I blame it
>	   on the architecture not taking normal (read existing) operations
>	   into account.  If the chip supported byte references, this would not
>	   have happened.

The Sun4 processor chip in fact does fully support byte references to memory,
both signed and unsigned, with all the same addressing modes (and the same
execution speed) as word references.  I believe that C compilers always use
the signed variants.  The above problem description is not very complete, but
it sounds like someone is trying to pass a char into an int parameter and
expects to see the high-order bits cleared.  That doesn't even work on a VAX -
K&R specifically says that sign-extension is machine-dependent.  Who knows
what the problem is?  It's certainly not caused by a lack of byte reference
instructions in the architecture.

-Joe

peter@athena.mit.edu (Peter J Desnoyers) (03/03/88)

The main complaints we seem to have heard about the C compiler for the
SUN 4 are:

   1) it sign-extends characters passed as arguments. This is, I
believe, required by K&R, so it better do so. The problem that bites
everyone is whether the type being used is 'unsigned char' or signed
'char' - you don't always have a choice. Any code that relies on the
existence of unsigned/signed chars or longs, especially when it is
careless about specifying their signedness, is a definite lose on a
lot of machines. (I assume it would be a lose on many RISC's, as it
loses on an awful lot of CISC, byte-addressing machines.)

   2) it doesn't handle return values where the type is 'resolved' by
the linker. I have never worked with a C compiler, to my knowledge,
which runs any passes after the linker to incorporate these 'resolved'
types :-). I can see two problems here - (1) the compiler is paranoid
and wants type information to ensure that the value is aligned. Then
the compiler is not source-code compatible with the Sun 3 one, even
though it's the fault of Sun and not their Sparc chip. (2) you are
returning a non-aligned value at runtime - then your code is WRONG,
and shouldn't have worked on the Sun 3, either.

I think the problem here may be some people (like you said, your boss
had more problems than you did) who have spent most of their time
working on one machine, and have never seen how to fix a portability
problem like characters getting sign-extended. It is compounded by the
fact that Sun probably did not make their compiler source-code
compatible, when except for unportable code, they could have. (At the
cost of more run-time crashes while debugging) Take heart, though - at
least you got newer and faster machines in return for all this B.S.
Some people have had to go through this just for a new operating
system release.

				Peter Desnoyers

franka@mmintl.UUCP (Frank Adams) (03/05/88)

In article <216@wsccs.UUCP> terry@wsccs.UUCP (terry) writes:
>My definition of portable:  Everthing I write ... should run with minor
>changes.  These changes should only be necessary as symptoms of differences
>in O/S implementation philosophy, and not as differences in basic structure.

There are two kinds of portability issues: those based on differences in the
operating system, and those based on hardware differences.  As the following
makes clear:

>Other definition of portability:  If it ran on UNIX and you ported it to VMS
>so that all you had to do was #define UNIX or VMS, it should be relatively
>trivial to port it to other machines.

Terry has, until now, only dealt with the operating system differences up
til now.  But hardware differences are very real, and wishing won't make
them go away.  Writing code which does not depend on the details of the
hardware (mostly) will.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

john@caeco.uucp (John Rigby) (03/05/88)

From article <216@wsccs.UUCP>, by terry@wsccs.UUCP (terry):
> In article <179@wsccs.UUCP> I write:
...
> 
> I have since been asked to post a more complete description of the problem.
> 
> 	Ok, here goes.  I can't give an exact example, as I have signed a
> non-disclosure agreeement.  The RISC machine primarily attacked my boss,
> not me, so I can't fake one up that I can guarantee to work.
> 
> 	1) The processor thinks in words (bus-words) always;  this means if
> 	   you want to do something, even if it is manipulation of character
> 	   data, it is stored in words. 

Where do you come up with this nonsense?  The machine does support
byte references, and stores character as bytes not words.
Have you read the SUN-4 assembly
manual or the tried looking at the result of a cc -S on a sample C program.

>                                         The problem comes in that the C
> 	   compiler on the machine did not make that distinction, and broke
> 	   on parameter passes to functions, where a single character element
> 	   is sign-extended to int to be passed.  This means that the top bits
> 	   were not cleared.  As this was the portable C compiler, I blame it

For your information the SUN-3 sign extends characters passed as arguments
(As do most other 680XX compilers I am familiar with).
If you don't want to do this then pass them as unsigned char's.

> 
> 	2) Implicit type casting, which tries to take advantage of the register
> 	   architecture of the machine it runs on does not always work as is
> 	   expected:
> 
> 		float val = 314.15926535;
> 		float j;
> 
> 		j = val % 75.0123;
> 
> 	   or something similar using an integer-only operator does not
> 	   correctly truncate values past the decimal.

What is modulo supposed to mean on floating point values?

> 	3) Non-aligned var-args code not using the varargs definition, but
> 	   rather relying on the method a standard architecture machine has

By "standard" I suppose you mean a machine that your unportable code runs 
correctly on.

From the varargs man page:
	This set of macros provides a means of writing portable pro-
	cedures  that accept variable argument lists.  Routines hav-
	ing variable argument lists (such as printf(3S)) but do  not
	use  varargs  are  inherently  nonportable,  since different
	machines use different argument passing conventions.

> 	   of plunking arguments into a function (no, I do *NOT* refer herein
> 	   to the argument push order!) stack to be read by a function leaves
> 	   it indeterminate as to whether the function arguments are pointers
> 	   or what.  For instance
> 
> 		int foo;
> 
> 		foo = 'x';	/* possibly buggy implicit cast*/
> 
> 		printf( "the character %c has an ascii value of ");
> 		printf( "%d decimal and %x hex", foo, foo);
> 
> 	   apparently works, but if you write your own function output and call
> 	   it in place of printf, using the argc/argv style of parameter
> 	   passing, the processor apparently confuses the stack.  (Yes, I
> 	   _know_ function prototypes would probably alleviate this, but
> 	   until some kind soul rewrites all the C code out there, I will
> 	   almost always be referring to K&R C).

Don't confuse argc, argv with argc, firstarg, secondarg, thirdarg ...
and then setting an argv = & firstarg.  Yes, this works on some
machines, but, that doesn't mean its portable.

> 
> 	4) Passing back pointers from functions whose type is resolved by the
> 	   linker is dangerous (I don't write code like this anyway, but if
> 	   you want an example try 'gets()'), and passing integer values back
> 	   to character assignment statements (getc()) is also unhappy.

When you call anything that doesn't return integer then you had better
declare it before calling it.  In the case of gets() all you need to do
is include stdio.h.

> 
> 	I realize that most of my examples are vague;  I don't keep broken
> code around so I can pull it up as an example; I 'rm' it.  The only examples
> of code that wouldn't port belong to another company, so I can't post them so
> you can break it yourself.
> 
> 	I also realize that even if you accept my examples at face value, they
> appear to be flaws in the portable C compiler, some of which can be taken out

WRONG!! They are flaws in your code.

> My definition of portable:  Everthing I write, everthing my friends write, and
> everthing I run that was written by somebody else, should run with minor
> changes.

Oh, I see, if you or your friends write it, it should run on any machine
no matter how brain damaged the code is.

>| Terry Lambert           UUCP: ...!decvax!utah-cs!century!terry              |
>| @ Century Software       or : ...utah-cs!uplherc!sp7040!obie!wsccs!terry    |
>| SLC, Utah                                                                   |

John Rigby
utah-cs!caeco!john

wcs@ho95e.ATT.COM (Bill.Stewart.<ho95c>) (03/05/88)

In article <3406@bloom-beacon.MIT.EDU> peter@athena.mit.edu (Peter J Desnoyers) writes:
:The main complaints we seem to have heard about the C compiler for the
:SUN 4 are:
:   1) it sign-extends characters passed as arguments. This is, I
:believe, required by K&R, so it better do so. The problem that bites
	Gorf, no!  K&R says (see pp 40-42) that 
function arguments are expressions, so they go through the normal
	type-conversions float->double, char->int, short->int.
On machines with signed characters, this does a sign-extend; on machines with
unsigned it doesn't.  I assume unsigned-chars on a signed-char machine
get promoted to unsigned for passing.
While I think unsigned-characters are better, K&R leaves this up to the
machine-designer, and explicitly warns you to watch out for the difference.
So don't blame K&R for Suns's problems, but don't blame Sun if they've followed
the rules (as they seem to here.)
-- 
#				Thanks;
# Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs