[comp.lang.c] machines with oddball char * formats

dave@murphy.UUCP (H. Munster) (11/13/86)

I can think of one, from my college days: the Univac 1100 mainframes.  I have
never seen an implementation of C on one of these, but I can make a few
guesses at what it would have to do for that architecture.

First of all, the machine uses a 36-bit word size.  Normally, addressing
is done in words; a pointer to a word is a 36-bit type (of which only 18
bits are normally used, but let's not worry about that).  However, to
address anything smaller than a word, a wierd "byte pointer" format is
used; this format has an 18-bit word address field, a 3-bit field that
somehow indicates what part of the word to use (the relationship between
the value in this field and the indicated byte isn't clear to me at
all), a field that indicates the byte size, and a couple of other bits
whose functions I don't remember but which would probably not be needed. 
So, if short is 18 bits and char is 9 bits, you would have three formats
of pointers: ordinary word addresses for ints, floats, and other word-
aligned things, a byte pointer format for shorts, and another byte ptr
format for chars.  (They also have a 12-bit size, but I can't think of
any good use for it.)

A C compiler implemented in this way would have to do a lot of pointer
conversions!  Not only that, but it isn't even *possible* to cast a
short or char ptr into an int ptr if the address pointed to is not on
a word boundary.

I would like to hear from anyone who has used an actual C compiler on
one of these beasts to see if they work like what I described above.
---
It's been said by many a wise philosopher that when you die and your soul
goes to its final resting place, it has to make a connection in Atlanta.

Dave Cornutt, Gould Computer Systems, Ft. Lauderdale, FL
UUCP:  ...{sun,pur-ee,brl-bmd}!gould!dcornutt
 or ...!ucf-cs!novavax!houligan!dcornutt
ARPA: wait a minute, I've almost got it...

"The opinions expressed herein are not necessarily those of my employer,
not necessarily mine, and probably not necessary."

dlc@zog.cs.cmu.edu (Daryl Clevenger) (11/17/86)

Although I don't know the details, from what I have heard the DEC-20 suffers
from a similar problem.  Wordsize is 36 bits, I think all pointers are 18-bits
and chars can not be packed to have a pointer to them.  Also, you can define
how many bits are in a byte (although this may not necessarily be a char).
There is a version of pcc that I have seen for the 20, but I have never used
it and I hear that it doesn't generate correct code anyhow.

billw@navajo.STANFORD.EDU (William E. Westfield) (11/17/86)

Yes, the DEC-20 and DEC-10 computers (the same processor, actually)
are word addressable machines, and pointers to bytes smaller than
36 bits have a special format (containing the position within a word,
the byte size, and the addres of the word).  There are several
successfull C compilers for the 20 better than the version of PCC
written for it at MIT.  There is one from New Mexico, and one from
Stanford.  A variety of bytes sizes are used for Chars are variously
7, 8 or 9 bits (7 allows efficient text packing 5 chars/word.  8 is
what most people writing "portable" code assume a char has.  9 allows
structs to be copied using say, cpystr, since it hits all the bits.)

Personally, I feel that a mjor weakness of "C" as a "portable"
language is its assumtion of byte addressability.

here's an except for the stanford tops20 C compiler (KCC) doc:

------
All pointers are a word long.  Char pointers (and someday short
pointers) are either local or one word global byte pointers to the
byte itself (LDB rather than ILDB pointers).  Pointers to word and
multi-word quantities are simply (global) machine addresses.

<Coercions>

(char *) of a word pointer (pointer to int, float, struct etc)
produces a byte pointer that points to the leftmost 9-bit byte that
would have occupied the word pointed to by the int pointer.  The
exception to this is that (char *) (int *) NULL remains zero.

(int *) of a char pointer produces an address that points to the word
that the char in which the char pointer is pointed to occupies.
Converting a char pointer into an int pointer is slightly slower than
the reverse transformation.  Again, coercing zero doesn't change it.
-----

BillW

stuart@bms-at.UUCP (Stuart D. Gathman) (11/18/86)

In article <177@houligan.UUCP>, dave@murphy.UUCP (H. Munster) writes:
> I can think of one, from my college days: the Univac 1100 mainframes.  I have
	[. . .]
> First of all, the machine uses a 36-bit word size.  Normally, addressing
> is done in words; a pointer to a word is a 36-bit type (of which only 18
> bits are normally used, but let's not worry about that).  However, to
> address anything smaller than a word, a wierd "byte pointer" format is
> used; this format has an 18-bit word address field, a 3-bit field that

The DEC-10 (and DEC-20 I assume) also have this addressing format.  The
C compiler uses, as you expected, 9-bit chars, 18-bit shorts, and 36-bit
ints and longs.  All pointers are 36-bit (the word pointers use the
first 18 bits for specifying indirection and an index register), but
with three different formats.  Since instructions using index registers
(like accessing the stack and structure members through a pointer) require
copying the 18-bit part to a work area or register anyway and are so common,
you could probably reasonably make word pointers the size of shorts.  (But
this would of course break all the brain-damaged code that assumes
sizeof(int)==sizeof(int *)).

The actual C-implementation that I have seen, however, stores pointers
as follows:

	type		0 . . . . . . . 17 18 . . . . . . . 35
	word		18-bit address     0
	short		18-bit address	   0400000 or 0
	char		18-bit address	   0 through 0600000

This is so that brain damaged code that assumes sizeof(int)==sizeof(int *)
*and* doesn't bother to convert pointers doesn't break.  (Incrementing
a character pointer by sizeof(foo) and then passing without casting
to a function expecting (foo *) still works with this setup.)  The result
of this is that code must be generated to convert pointers every time
they are referenced (instead of just when converted via cast).
-- 
Stuart D. Gathman	<..!seismo!{vrdxhq|dgis}!bms-at!stuart>

ggs@ulysses.UUCP (Griff Smith) (11/18/86)

> Although I don't know the details, from what I have heard the DEC-20 suffers
> from a similar problem.

Suffers is a strong word, and it wasn't a problem.  Remember, the machine
was around years before C was invented.  I think you are saying that any
machine that punishes violation of strict pointer typing is broken.

> Wordsize is 36 bits, I think all pointers are 18-bits
> and chars can not be packed to have a pointer to them.

If this is true, you are talking about a brain-damaged C compiler.   A
pointer can be 18 bits, but REAL pointers are 36 bits and they can
point to any arbitrary byte in a word.

Probably the most unfortunate feature of the pointer structure is that
it doesn't happen to fit accepted C coding idioms.  The hardware works
best if a byte copy loop is written as

	while (*++op = +++ip);

when we all know that the "correct" (i.e. VAX order) way to do it is

	while (*op++ = +ip++);

What does all this prove about the "goodness" or "badness" of the
machine?  Nothing!  It was a clean, RISK-like architecture that was a
joy to program.  But people don't program in assembly much any more and
the idioms for the machine don't happen to map into ones that are
favored in current high-level languages.  The machine was a victim
of culture drift.

Perhaps we should be taking courses in the history of design, with proper
attention given to the cultural forces that lead to design decisions.
-- 

Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		(201) 582-7736
UUCP:		{allegra|ihnp4}!ulysses!ggs
Internet:	ggs@ulysses.uucp

garry@batcomputer.tn.cornell.edu (Garry Wiegand) (11/19/86)

In a recent article billw@navajo.STANFORD.EDU (William E. Westfield) wrote:
>....  A variety of bytes sizes are used for Chars are variously
>7, 8 or 9 bits (7 allows efficient text packing 5 chars/word.  8 is
>what most people writing "portable" code assume a char has.  9 allows
>structs to be copied using say, cpystr, since it hits all the bits.)
>
>Personally, I feel that a mjor weakness of "C" as a "portable"
>language is its assumtion of byte addressability.
>...

Forgive my ignorance, but why don't the compiler writers on these "odd"
machines just designate a "char" and a "byte" to be the identical width
to a "short" ?   What will go wrong ?  

(Would very many real-life application programs actually be hurt by the 
added memory usage? - I'm excluding text editors!)

It seems so simple - give some memory, get a lot more speed.

garry wiegand   (garry%cadif-oak@cu-arpa.cs.cornell.edu)

billw@navajo.STANFORD.EDU (William E. Westfield) (11/20/86)

In article <1534@batcomputer.tn.cornell.edu>, garry@batcomputer) writes:
> 
> Forgive my ignorance, but why don't the compiler writers on these "odd"
> machines just designate a "char" and a "byte" to be the identical width
> to a "short" ?   What will go wrong ?  
> 
> (Would very many real-life application programs actually be hurt by the 
> added memory usage? - I'm excluding text editors!)
> 
> It seems so simple - give some memory, get a lot more speed.
> 

this is a word addressable machine.  A short is 36 bits.  Giving up
some memory for the sake of speed is one thing, but you are talking
about wasting 77% of the memory.  Given paging, it probably wouldn't
even be faster.  For typical C code dealing with character stings or
arrays, the byte operations asren't that much slower than individual
memory moves, given cache, and the fact that the byte instructions
auto-increment, and memory instructions don't.

BillW

guido@mcvax.uucp (Guido van Rossum) (11/20/86)

In article <1534@batcomputer.tn.cornell.edu> garry%cadif-oak@cu-arpa.cs.cornell.edu writes:
>Forgive my ignorance, but why don't the compiler writers on these "odd"
>machines just designate a "char" and a "byte" to be the identical width
>to a "short" ?   What will go wrong ?  
>
>(Would very many real-life application programs actually be hurt by the 
>added memory usage? - I'm excluding text editors!)

You'll have to exclude a lot more programs (think of {n,t}roff, and all
sorts of "compiler"-type programs like awk, dc and bc).

Another reason is compatibility with the rest of the world on such
systems.  Text files are usually written in a packed format (using 2
bytes per word if you have 16- or 18-bit words), so stdio or the
read/write system calls would have to do a lot of (un)packing.  String
parameters to the native operating system also have to be converted, of
course -- not an unsurmountable problem for the standard library, but a
real pain in some part of the body for system hackers.  And such system
hackers have always been a big part of the C community!  (May that's
changing now; it certainly was true a few years ago when these compilers
were designed).

On the CDC Cyber I have used a few systems like this.  The BCPL compiler
did not waste an entire word (60 bits!) for a character, but rather
packed 7 ASCII characters in it, rather than 10 Display Code characters
as the standard convention on this machine.  Nice, until you start
reading binary files and occasionally have to extract strings from
them... The Algol-68 compiler, on the other hand, *did* wast a 60-bit
word for a character.  I had to do all my system hacking in assembler or
(gasp!) Fortran.

- - -

But then again, you mention "real-life applications".  I suppose this is
a fairly restricted class of programs, only applying to programs over
10,000 lines of source code, generally dealing with image processing or
statistics...

	Guido van Rossum, CWI, Amsterdam <guido@mcvax.uucp>

monk@uwmacc.UUCP (Chau Wang) (11/20/86)

    Can someone help me locate a linear programming package that is written in
C or Modula2?  Any hint/pointer will be greatly appreciated.

    Public domain or licensed pagage doesn't matter;  machine/os that it runs
on/under also is not a major concern.

    Thanks in advance.

Please respond to:

Voice:  (608)262-0475 (collect call welcome)
Arpa:   monk@unix.macc.wisc.edu
Bitnet: CWANG@WISCMACC