[comp.arch] What *should* architectural pointers point at?

throopw@sheol.UUCP (Wayne Throop) (08/29/90)

> From: meissner@osf.org (Michael Meissner)
> IMHO, the 64 bit machine should represent all addresses in bits [...]
> Before people lynch me, let me explain, that I think that the
> addresses that are not appropriately aligned should trap.

Why should people lynch someone for that, when they don't lynch people
for advocating byte addresses that may or may not trap (or silently
fail) on unaligned operations?  (Or IS that a lynching offense? ...)

Surely bit-granular addressing is the only sensible way to go...  what
reasons are against it in a 64-bit world?
--
Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org

mash@mips.COM (John Mashey) (08/29/90)

In article <0887@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes:

>Surely bit-granular addressing is the only sensible way to go...  what
>reasons are against it in a 64-bit world?

I doubt this is the only sensible way to go, but it does lead to an
interesting exercise.  First, let us observe that with 64-bit addressing,
one could afford to sacrifice 3 bits for this, which is probably not true in
the 32-bit world.  However, for concreteness, and ease of looking at it,
let's think about the 32-bit version.  So here's the exercise:

1) Start with any of the popular RISC architectures (or a synthetic,
like P&H's DLX).

2) Change the addressing so it uses bit-addressing.

3) (Architecture): what must you change?
	what instructions would be added or deleted?
	are there common code sequences that would change?
		for better, for worse.

4) Implementation: are there any interesting critical-path or area layout
issues that show up?

5) Software: which existing languages can make use of this feature, and how?
	Which languages can PORTABLY make use of this feature, i.e.,
	least-common-denominator theory of software causes most people
	to avoid unusual features unless there's a big benefit.
	Are there language extensions one would propose that can
	portably take advantage of such features?

------
Note: it would probably good to make the assumption that if an
architecture traps on unaligned accesses that it would continue to do so.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

daveg@near.cs.caltech.edu (Dave Gillespie) (08/30/90)

>>>>> On 29 Aug 90 06:00:33 GMT, mash@mips.COM (John Mashey) said:
> In article <0887@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes:

>>Surely bit-granular addressing is the only sensible way to go...  what
>>reasons are against it in a 64-bit world?

> I doubt this is the only sensible way to go, but it does lead to an
> interesting exercise.

Okay...

You could provide two variants of the load/store instructions, one set
that trap on unaligned accesses and one set that don't (but are possibly
much slower).  This latter instruction replaces the specialized
"bit-field" instructions that some machines have now.  Compilers could
have an option to generate only the slow-but-safe instructions for the
benefit of fast-but-reckless programmers.  I've heard of at least one
architecture (SPARC?) that uses this approach.

It seems to me that a bit-addressed, aligned-accessing machine can be
thought of as just like a byte-addressed machine with three zeros on
the right that would have been on the left.  (Once we are confident that
bit-addressing doesn't cost too much from this point of view, we can go
on to see what bit-addressing gains us beyond what we can do now.)

The only problem I can think of is that you lose bits in your
instruction word.  I cringe every time I look at the 68000's
short-branch instruction that uses an 8-bit displacement measured in
bytes; instructions must fall on even addresses, so one precious bit
of the displacement is completely wasted.  Now imagine a bit-addressed
68000: four bits are wasted!  Most newer machines (like the 88000)
don't have this problem; they shift the displacement enough bits to
the left so that all bits are useful.  You would probably want to
adopt this approach for all displacements.  Accessing an aligned
double-precision float with a 16-bit offset from the frame pointer
wastes three bits now, but with bit-addressing it would waste six
bits.  On a machine like the 88000, there is a huge extra cost if the
target doesn't lie within a 16-bit displacement of the index register,
and this cost would become much more significant if you didn't
auto-shift your displacements to avoid wasting those bits.

Certain programming tricks (putting an odd address in a register and
then using an odd displacement to access a word on an even address)
would be much harder to do with auto-shifted displacements, but I
don't think this would be a serious problem.

It's much easier to be little-endian than big-endian if you use bit
addresses.  In bit-addressing, little-endian translates to "LSB is bit
zero", and big-endian translates to "LSB is bit 31 (or 15, or 7, or
63, or whatever)".  Just look at the 68020's bit-field instructions:
Because the rest of the architecture is big-endian, they had to number
bits from the left in the bit-field instructions, even though the rest
of the architecture numbers bits from the right.  If anything,
switching to bit-addressing might put an end to one brand of flame war
on comp.arch. :-)

The hardware would probably be about the same.  I hear really fast
machines are limited by the carry chain in the incrementer for the
program counter; this gets to be three bits shorter if the PC is a
bit address.  Those auto-shifted displacements might make your
addressing unit a bit more complicated, but most new machines already
have a shifter there to handle scaled indexing.

I doubt software could tell the difference unless it casts pointers
to ints and back.  If you are writing a compatibility-oriented compiler,
probably all your pointers will be multiples of eight.  So you could
cure even this by doing a three-bit shift when converting between
pointers and ints.  (Not that I'd advocate such a kludge for any
purpose other than to provide an all-the-world's-a-VAX mode for those
who absolutely must get software package X to compile effortlessly on
your box.)

Let's see...  Operations on C character arrays now require shifts where
previously they didn't.  Many of these shifts can be factored into a
scaled-indexing addressing mode, but some will not be so amenable.
(For example, "p = p + i" or "i = p - q", with "int i; char *p, *q;".)

Operations on BCD digit strings would be slightly cleaner.

Some Lisp implementations stuff tag bits into the low two bits of a
longword's byte address.  Now you could fit five tag bits down there.

Bit arrays are nicer to work with.  It may now be practical to pack
a Pascal "array of 0..3" into two-bit chunks, when formerly it might
have been inefficient to do so.  Features like subranges would
suddenly be much more popular among programmers.  Oh, no!  This would
give Ada an advantage over C.  :-)

Enough!  It's someone else's turn now.

								-- Dave
--
Dave Gillespie
  256-80 Caltech Pasadena CA USA 91125
  daveg@csvax.cs.caltech.edu, ...!cit-vax!daveg

cik@l.cc.purdue.edu (Herman Rubin) (08/30/90)

In article <DAVEG.90Aug29225127@near.cs.caltech.edu>, daveg@near.cs.caltech.edu (Dave Gillespie) writes:
| >>>>> On 29 Aug 90 06:00:33 GMT, mash@mips.COM (John Mashey) said:
| > In article <0887@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes:
> 
| >>Surely bit-granular addressing is the only sensible way to go...  what
| >>reasons are against it in a 64-bit world?
> 
| > I doubt this is the only sensible way to go, but it does lead to an
| > interesting exercise.
> 
> Okay...
> 
> You could provide two variants of the load/store instructions, one set
> that trap on unaligned accesses and one set that don't (but are possibly
> much slower).

Why much slower?  At most two items would have to be loaded, and a shift
made.

			....................

> The only problem I can think of is that you lose bits in your
> instruction word.  I cringe every time I look at the 68000's
> short-branch instruction that uses an 8-bit displacement measured in
> bytes; instructions must fall on even addresses, so one precious bit
> of the displacement is completely wasted.  Now imagine a bit-addressed
> 68000: four bits are wasted!

We are discussing machines where the memory is sufficiently large that
32-bit addressing is inadequate.  Four bits is 1/16 of an address.

 Most newer machines (like the 88000)
> don't have this problem; they shift the displacement enough bits to
> the left so that all bits are useful.  You would probably want to
> adopt this approach for all displacements.  Accessing an aligned
> double-precision float with a 16-bit offset from the frame pointer
> wastes three bits now, but with bit-addressing it would waste six
> bits.

There are quite a few machines now which have both addresses and indices.
For an index, the appropriate shifting is done.  This is not new hardware.
Again, we are discussing 64-bit address machines.  "Wasting" even 8 bits
in addressing is only 1/8 of the length of an address.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/31/90)

At the cost of an extra bit, and decoding logic, you can encode
aligned data field size in the address as follows:

    abcdefghijkl0	=> 1-bit address   abcdefghijkl
    abcdefghijk01	=> 2-bit address   abcdefghijk0 to abcdefghijk1
    abcdefghij011	=> 4-bit address   abcdefghij00 to abcdefghij11
    abcdefghi0111	=> 8-bit address   abcdefghi000 to abcdefghi111
    abcdefgh01111	=> 16-bit address  abcdefgh0000 to abcdefgh1111
    abcdefg011111	=> 32-bit address  abcdefg00000 to abcdefg11111
    abcdef0111111	=> 64-bit address  abcdef000000 to abcdef111111
    abcde01111111	=> 128-bit address abcde0000000 to abcde1111111
    ...

You could, for example, use such an encoding in the literal index or
offset field of your instruction.  You lose the ability, eg., to take
an odd address and add an odd offset to it, in an indexed addressing
mode, to get a properly aligned even address, but that's not too much
of a loss. Gould used this approach in the PN and NP computers.

The same approach can be used on a bus to get sub-field selection.
However, byte lane select flags cost only 7 bits (upto 128 bits), and
let you handle data that is misaligned wrt bus boundaries in fewer
operations. (Eg. MIPS' load-left and load-right conceivably could be
done across a bus in this way).

--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]

aglew@dwarfs.crhc.uiuc.edu (Andy Glew) (08/31/90)

>> You could provide two variants of the load/store instructions, one set
>> that trap on unaligned accesses and one set that don't (but are possibly
>> much slower).
>
>Why much slower?  At most two items would have to be loaded, and a shift
>made.

You don't think 2X or 3X is much slower? 
Then let me sell you a CISC system - it's only 2X slower than a RISC.
--
Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]

tif@doorstop.austin.ibm.com (Paul Chamberlain/32767) (08/31/90)

In article <DAVEG.90Aug29225127@near.cs.caltech.edu> daveg@near.cs.caltech.edu (Dave Gillespie) writes:
>It seems to me that a bit-addressed, aligned-accessing machine can be
>thought of as just like a byte-addressed machine with three zeros on
>the right that would have been on the left.

Now there's an interesting thought.  Why not order these 64 bits so that
the 3 on the left are the bit offset.  Mere mortals would use it just like
a byte addressed machine but wizards could use those 3 bits anyway they like.

Hmm, I guess that could complicate adressing the n-th bit in the system.
You'd have to make an instruction which rearranges the bits in a pointer.

Paul Chamberlain | I do NOT represent IBM         tif@doorstop, sc30661@ausvm6
512/838-7008     | ...!cs.utexas.edu!ibmaus!auschs!doorstop.austin.ibm.com!tif

petolino@joe.Eng.Sun.COM (Joe Petolino) (08/31/90)

>>You could provide two variants of the load/store instructions, one set
>>that trap on unaligned accesses and one set that don't (but are possibly
>>much slower).

Although you don't say so, I assume that the slower variants would actually
give the right results.  Remember that there are architectures out there
that neither trap nor work correctly when presented with unaligned addresses.

>>               This latter instruction replaces the specialized
>>"bit-field" instructions that some machines have now.  Compilers could
>>have an option to generate only the slow-but-safe instructions for the
>>benefit of fast-but-reckless programmers.  I've heard of at least one
>>architecture (SPARC?) that uses this approach.

No, SPARC doesn't do this, and I'd be surprised if any other architecture
did.  See answer to next question.  However, I think there is a SPARC
compiler option to do unaligned loads and stores using software emulation.
This is of course very slow.

>Why much slower?  At most two items would have to be loaded, and a shift 
>made.

The main reason not to allow unaligned accesses is that supporting
unaligned accesses greatly increases the *complexity* of the memory system 
in a machine with virtual addressing and caches.  Speed is probably not an
issue - in the implementations I've seen (various IBM 370 implementations),
you only pay a speed penalty if you actually use an unaliged address.
If you did implement instructions that work with unaligned addresses, there
would be little reason to also implement the trap-on-unaligned variants -
the complexity has to be there anyway.  As an example of the kind of 
complexity that arises with support for unaligned accesses, imagine a
store that spans a virtual page boundary, and only one of the pages is
write-protected.

-Joe

harpermh@mentor.cc.purdue.edu (Matthew Harper) (08/31/90)

In article <3318@awdprime.UUCP> tif@reed.UUCP (Paul Chamberlain/32767) writes:
>In article <DAVEG.90Aug29225127@near.cs.caltech.edu> daveg@near.cs.caltech.edu (Dave Gillespie) writes:
>>It seems to me that a bit-addressed, aligned-accessing machine can be
>>thought of as just like a byte-addressed machine with three zeros on
>>the right that would have been on the left.
>
>Now there's an interesting thought.  Why not order these 64 bits so that
>the 3 on the left are the bit offset.  Mere mortals would use it just like
>a byte addressed machine but wizards could use those 3 bits anyway they like.
>
   This isn't a new idea - Texas Instuments line of graphics coprocessor
   chips TMS34010 & TMS34020 can address bits with nearly every instruction...

   But, as noted previously, performance lags when access is unaligned ---
   even when running out of cache...

daveg@near.cs.caltech.edu (Dave Gillespie) (08/31/90)

>> The only problem I can think of is that you lose bits in your
>> instruction word.  I cringe every time I look at the 68000's
>> short-branch instruction that uses an 8-bit displacement measured in
>> bytes; instructions must fall on even addresses, so one precious bit
>> of the displacement is completely wasted.  Now imagine a bit-addressed
>> 68000: four bits are wasted!

Herman Rubin responds in <2491@l.cc.purdue.edu>:
> We are discussing machines where the memory is sufficiently large that
> 32-bit addressing is inadequate.  Four bits is 1/16 of an address.

That's not the point.  The 68000 puts a short branch in a 16-bit
instruction.  Eight bits of that instruction encode the displacement.
Using a naive encoding, you are wasting four of eight bits, *NOT*
four of 64.  The problem here is wasting bits in the instruction word,
which reflects directly on performance unless your instruction cache is
100% perfect.

>> Most newer machines (like the 88000)
>> don't have this problem; they shift the displacement enough bits to
>> the left so that all bits are useful.  You would probably want to
>> adopt this approach for all displacements.  Accessing an aligned
>> double-precision float with a 16-bit offset from the frame pointer
>> wastes three bits now, but with bit-addressing it would waste six
>> bits.

> There are quite a few machines now which have both addresses and indices.
> For an index, the appropriate shifting is done.  This is not new hardware.

I don't think you are using the terminology in quite the way I do, but,
yes, the hardware is already there, which is why I mentioned the idea
shortly after this paragraph.

>> Again, we are discussing 64-bit address machines.  "Wasting" even 8 bits
>> in addressing is only 1/8 of the length of an address.

Once again:  Many compilers for 68000-style machines require functions
to have no more than 32K bytes worth of local variables.  This is so
that they can use a fast 16-bit displacement from the frame pointer to
refer to those variables.  Without auto-scaling, those same 16-bit
displacements now allow only 32K *bits* of local variables.  So to do
the same job a 68000 does now would require a compiler to generate
longer displacements, i.e., larger instructions, which take more time
to fetch from memory, which in turn slows everything down.

Many new machines support auto-scaling of index registers, but none
that I have seen support auto-scaling of fixed displacements in the
instruction word.  I was trying to show why bit addressing would create
a much more compelling need for this feature.

Of course, few functions have more than 32K bits worth of locals, so
perhaps it's not that great a sacrifice.  But compilers that previously
could get away with imposing a hard limit to make their code generators
simpler would now pretty much have to be willing to generate longer
displacements when the need arises.  Many architectures, like the
68000 and the 88000, make full 32-bit displacements significantly slower
or more awkward to use.

								-- Dave
--
Dave Gillespie
  256-80 Caltech Pasadena CA USA 91125
  daveg@csvax.cs.caltech.edu, ...!cit-vax!daveg

daveg@near.cs.caltech.edu (Dave Gillespie) (08/31/90)

> = Joe Petolino, >> = Herman Rubin, >>> = Me

>>>You could provide two variants of the load/store instructions, one set
>>>that trap on unaligned accesses and one set that don't (but are possibly
>>>much slower).

> Although you don't say so, I assume that the slower variants would actually
> give the right results.  Remember that there are architectures out there
> that neither trap nor work correctly when presented with unaligned addresses.

Yes, I meant that one variety would trap, the other would do the extra
work to do an unaligned access correctly.

>>>               This latter instruction replaces the specialized
>>>"bit-field" instructions that some machines have now.  Compilers could
>>>have an option to generate only the slow-but-safe instructions for the
>>>benefit of fast-but-reckless programmers.

>>Why much slower?  At most two items would have to be loaded, and a shift 
>>made.

> The main reason not to allow unaligned accesses is that supporting
> unaligned accesses greatly increases the *complexity* of the memory system 
> in a machine with virtual addressing and caches.  Speed is probably not an
> issue - in the implementations I've seen (various IBM 370 implementations),
> you only pay a speed penalty if you actually use an unaliged address.
> If you did implement instructions that work with unaligned addresses, there
> would be little reason to also implement the trap-on-unaligned variants -
> the complexity has to be there anyway.  As an example of the kind of 
> complexity that arises with support for unaligned accesses, imagine a
> store that spans a virtual page boundary, and only one of the pages is
> write-protected.

I thought perhaps the extra hardware to handle unaligned bit addresses
might be unpleasant; you would have to shift, e.g., 64 possible ways,
so you would probably want to use the ALU's shifter for this rather than
a special shifter in the bus interface.  So I envisioned one set of
instructions that are slow because they need the ALU to be available,
and one that can use a maximally-simple bus interface directly and can
run simultaneously with ALU instructions.

Of course, a clever processor could have a single load instruction
that computed the address, tested if it was aligned, and, if not, only
then went on the requisition the ALU.  Then you would need only just
type of instruction, but it would make the hardware more complicated.
It would now be data-dependent whether a load instruction and an ALU
instruction could be allowed to run simultaneously.

Maybe barrel shifters are fast and easy enough these days that this
isn't a big deal.  But my guess is that it still is.

								-- Dave
--
Dave Gillespie
  256-80 Caltech Pasadena CA USA 91125
  daveg@csvax.cs.caltech.edu, ...!cit-vax!daveg

chip@tct.uucp (Chip Salzenberg) (08/31/90)

According to cik@l.cc.purdue.edu (Herman Rubin):
>We are discussing machines where the memory is sufficiently large that
>32-bit addressing is inadequate.

Not quite.  We are discussing machines where the *address space* is
too large for 32 bits.  For widely-used machines, physical memory will
usually be less than four gigabytes, and thus 32-bit addressable.

>Four bits is 1/16 of an address.

I would not so blithely throw away 15/16 of my address space.

>Again, we are discussing 64-bit address machines.  "Wasting" even 8 bits
>in addressing is only 1/8 of the length of an address.

I most certainly would not throw away 255/256 of my address space.
-- 
Chip Salzenberg at Teltronics/TCT     <chip@tct.uucp>, <uunet!pdn!tct!chip>

colin@array.UUCP (Colin Plumb) (08/31/90)

In article <3318@awdprime.UUCP> tif@reed.UUCP (Paul Chamberlain/32767) writes:
> Now there's an interesting thought.  Why not order these 64 bits so that
> the 3 on the left are the bit offset.  Mere mortals would use it just like
> a byte addressed machine but wizards could use those 3 bits anyway they like.

Because it's unnecessary.  Where is some code that depends on the fact that
incrementing a character pointer adds 1 to the value of its bit pattern, if
read as an integer?  I'm sure there is some, buried somewhere, but for the
most part, the only code that does such pointer diddling in in C and C hides
the representation of pointers sufficiently, unless you really try.

If you're extra paranoid, define the cast from a pointer to a long and back
to do a shift by 3 bits; then you only have to worry about unions.
-- 
	-Colin

dave@fps.com (Dave Smith) (09/01/90)

In article <3318@awdprime.UUCP> tif@reed.UUCP (Paul Chamberlain/32767) writes:
>Now there's an interesting thought.  Why not order these 64 bits so that
>the 3 on the left are the bit offset.  Mere mortals would use it just like
>a byte addressed machine but wizards could use those 3 bits anyway they like.

That's a sick idea, right up there with the way that Intel arranged the bits
in the segment descriptors for the '286/'386.  You can't increment your
bit address and walk right through memory, you've got to do all kinds of
funky addressing.

This bit addressing stuff is all fine and well, but it seems like a lot of
extra baggage to carry around for a little bit-twiddling every now and then.
Why not just have operations that will return the nth bit?


--
David L. Smith
FPS Computing, San Diego        
ucsd!celerity!dave or dave@fps.com

throopw@sheol.UUCP (Wayne Throop) (09/02/90)

> From: mash@mips.COM (John Mashey)
> However, it is still clear that bit-addressing would break more programs
> than byte-addressing, so all I wanted was for someone to work through
> a complete example to show why it would be A Good Thing on balance.

Looking at it from another perspective, I think it certain that many,
many MORE programs (and mostly the same sloppy programs, BTW, IMHO) will
be broken by making pointers 64 bits long, whether or not integers are
also made 64 bits long.

Further, programs that count on pointer formats being byte granular are
already not maximally portable, since many machines are still word
addressed (though, granted, this is becomeing quite rare). 

This in mind, the backwards compatibility issue is a small one, IMHO.

Of course, the complete example showing the net effects is a good idea.


> From: daveg@near.cs.caltech.edu (Dave Gillespie)
> It's much easier to be little-endian than big-endian if you use bit
> addresses.  In bit-addressing, little-endian translates to "LSB is bit
> zero", and big-endian translates to "LSB is bit 31 (or 15, or 7, or
> 63, or whatever)".

Well, OK, but why go after the least significant bit anyways?  Isn't 
it more likely to be needing the MSB (at least, for many operations)? 
Or, to put it another way, I think the situation is still balanced
endian-wise on a bit-addressed machine.  Some operations are easier on
little-endian machines, others on big-endian. 

Certainly as a practical matter, a bit-addressed machine can quite
easily be big-endian: the bit-addressed machine I'm most familiar
with was big-endian for example, and this didn't interfere with
the addressing granularity one whit.


> From: chip@tct.uucp (Chip Salzenberg)
> I would not so blithely throw away 15/16 of my address space.

Why not?  You are (or at least the industry at large is) willing
to throw away 3/4 of it so that addresses are byte-granular despite
the fact that so many loads/stores are 32-bit aligned.

What's that I hear from the cheap seats? It's important to be able to
reference characters, since so many things are text-oriented? Really? So
why isn't it important to indicate pixels, bitmaps, positions in huffman
(and other bit granular) code strings, page table property bits, boolean
arrays, and any of the other zillions and zillions of
naturally-bit-granular things that abound in the software 
written every day?

I still claim that trading address space for finer granularity is
a Good Thing (even a Very Good Thing), IF we are talking about
at-least-64 bit addresses.  (I'd even argue it for 32 bit addresses,
but even I have to admit it's dodgier there.)

> From: daveg@near.cs.caltech.edu (Dave Gillespie)
> The 68000 puts a short branch in a 16-bit
> instruction.  Eight bits of that instruction encode the displacement.
> Using a naive encoding, you are wasting four of eight bits, *NOT*
> four of 64.

From the rest of Dave's posting, it's not clear to me that he's really
objecting to bit-granular addresses on the above grounds.  (In fact, it
seems not to be the case.)  But let me respond as to a hypothetical
person who *does* think this is an objection.  I hope Dave takes no
offense. 

The point is, why use a naive encoding? Presumably a RISC would insist
on word-of-some-size aligned istreams, so word-granular offsets in
instructions would remain, just as today.  Similarly for offsets and
partial addresses in instructions that manipulate other aligned things. 

Sure, let the people who want to operate on entities of bit granularity
pay for the privilege.  But there's no reason to require them to tie
their hands behind their backs and stand on their heads and spin
hula-hoops on their feet while they type with their noses either.
--
Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org

daveg@near.cs.caltech.edu (Dave Gillespie) (09/02/90)

>>>>> On 2 Sep 90 03:27:13 GMT, throopw@sheol.UUCP (Wayne Throop) said:

> Looking at it from another perspective, I think it certain that many,
> many MORE programs (and mostly the same sloppy programs, BTW, IMHO) will
> be broken by making pointers 64 bits long, whether or not integers are
> also made 64 bits long.

Good point!

> From: daveg@near.cs.caltech.edu (Dave Gillespie)
>> The 68000 puts a short branch in a 16-bit
>> instruction.  Eight bits of that instruction encode the displacement.
>> Using a naive encoding, you are wasting four of eight bits, *NOT*
>> four of 64.

> From the rest of Dave's posting, it's not clear to me that he's really
> objecting to bit-granular addresses on the above grounds.  (In fact, it
> seems not to be the case.)  But let me respond as to a hypothetical
> person who *does* think this is an objection.  I hope Dave takes no
> offense. 

No offense taken---you got my point exactly.  Many existing
architectures put byte offsets into the instruction word even though
alignment constraints mean some of the bits of the offset will always
be zero.  Why not omit the zeros and put those bits on the left where
they can do some good!  If it's only one or two bits it's not such a
big deal, but a bit-addressed machine would have to take the wastage
of, say, 4-6 bits much more seriously.  Some machines do this now for
branch offsets, but I don't know of any that do it for all types of
offsets.

A couple other people have implied that 64-bit machines would have
64-bit instruction words, with plenty of bits to spare.  But I really
doubt this will happen:  The temptation to use that wide bus to fetch
two 32-bit instructions in one gulp is probably too great.  Even
32-bit instruction sizes are controversial.

> Sure, let the people who want to operate on entities of bit granularity
> pay for the privilege.  But there's no reason to require them to tie
> their hands behind their backs and stand on their heads and spin
> hula-hoops on their feet while they type with their noses either.

Um, er, my sentiments exactly...

								-- Dave
--
Dave Gillespie
  256-80 Caltech Pasadena CA USA 91125
  daveg@csvax.cs.caltech.edu, ...!cit-vax!daveg

cik@l.cc.purdue.edu (Herman Rubin) (09/02/90)

In article <11192@celit.fps.com>, dave@fps.com (Dave Smith) writes:
> In article <3318@awdprime.UUCP> tif@reed.UUCP (Paul Chamberlain/32767) writes:

			.....................

> This bit addressing stuff is all fine and well, but it seems like a lot of
> extra baggage to carry around for a little bit-twiddling every now and then.
> Why not just have operations that will return the nth bit?

But what if k bits, starting from the n-th bit, are wanted?  This is not
all that uncommon, and may be needed frequently, including supercomputers.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

dave@fps.com (Dave Smith) (09/05/90)

In article <2504@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>But what if k bits, starting from the n-th bit, are wanted?  This is not
>all that uncommon, and may be needed frequently, including supercomputers.

Shift to the left, shift to the right do some or's.  What you're asking
for makes the hardware more complex and therefore SLOWER!  I'm not a hardware
designer but I listen to them whine and moan all the time about these kinds
of things, so let me put forth what I think would be needed to do that
(y'all can flame me if you like)

Ok, here's some ways to give k bits starting at the n-th bit:
a)	A bit-addressable memory, n (call it 32) bits wide.  So, it has
	a lot of 1 bit wide memory cells.  Any one of these bits can be
	bit 0 thru n on the output bus.  How do you hook the memories to
	the bus?  Each cell to each line of the bus?  Ick!  Lots of circuitry
	and lots of (overlapping!) traces on your board.  A cross-bar circuit
	would work but those are expensive.  We could load the memories
	into a barrel shifter and then shift and then do another load.  
b)	Do the work on the processor end.  This pretty much means you've gotta
	do shifts since your memory will be giving you word aligned data.
	No crossbars here.  You'll have to do two memory fetches on non-word
	aligned fetches.
c)	1 bit data path.  This is silly, but it does supply the wanted
	functionality.

Suppose you go ahead and do the circuitry for this.  This means this 
circuitry is in the critical path for data fetches, ALWAYS.  Even when
it's not used, it's another layer of circuitry.  When you're working with
faster machines, nano-seconds count.  Most of the time you'll be working
with 32-bit word, or even double-word aligned data.  So why pay the extra
penalty for this circuitry on _every_ data fetch?  All you're doing is
hiding the bus from the programmer.  Let the compiler do that for you
(sorry, Herman, I know how you feel about HLL's).

Byte addressing is bad enough and a lot of people probably wouldn't support
if it weren't for the fact that a lot of I/O devices require you to be
able to write only one byte to a certain address and nothing more.  
Communications protocols that are byte-aligned make processing them go
slower and don't even start me off on bit-stuffed formats (thank YOU, ISO
for ASN.1!).
--
David L. Smith
FPS Computing, San Diego        
ucsd!celerity!dave or dave@fps.com

pauls@apple.com (Paul Sweazey) (09/06/90)

Whole issue of bit/byte/etc. address pointers reminds me alot of the 
big-endian vs. little-endian wars.  Most (but not all) people argue for 
one or the other based on what they think is the "natural" structure.  The 
idea of bit-address pointers was brought up as an architectural issue 
because the bit is the only non-arbitrary basic quantum of data in binary 
logic.

This is a great academic excercise.  I would also add data types of any 
bit length.  For such a machine I want to know what kinds of hardware and 
software systems we might end up with.

It is obvious to most that todays computing systems would not get faster 
by having the hardware support bit-level addressing, much less an infinite 
number of data types.  But there are entire classes of systems that don't 
get explored (or even conceived of) because they don't map well into the 
architectural constraints of computing systems today .

I am mildly surprised that brainstorms about revolutionary architectures 
are rare in comp.arch.  We could be just a brainstorm away from the next 
killer-micro-wave.

Paul Sweazey
pauls@apple.com
408-974-0253

crowl@cs.rochester.edu (Lawrence Crowl) (09/07/90)

In article <26DE7EE3.58FC@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes:
>We are discussing machines where the *address space* is too large for 32
>bits.  For widely-used machines, physical memory will usually be less than
>four gigabytes, and thus 32-bit addressable.

A bit granularity in virtual addresses does not imply a bit granularity in
physical addresses.  For instance, we could have a bit-grain 64-bit virtual
address and a word-grain 32-bit physical address.  Before anyone jumps on
me for such a bizzare notion, note that some microprocessors do not provide
the low bit (or two) of the physical address on the pinout.  (Although some
have byte select pins, from which you can infer the address.)
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
	  ...!{ames,rutgers}!rochester!crowl	Rochester, New York,  14627

mshute@cs.man.ac.uk (Malcolm Shute) (09/07/90)

I've come in on the middle of this discussion, so forgive me for any
misconceptions.  Can I first check that I understand some of the background
to this correctly.  Is the following correct:

	1) People want to move on from 32-bit addressing
	   (either for genuine reasons of running out of address space
	   or 'fashion' ones of seeing 32-bit as passe').
	2) A single leap to 64-bit seems rather drastic,
	   but there are good engineering and fashion reasons
	   for not going to an intermediate non-power-of-two.
	3) Bit addressing would be one method of disguising the
	   introduction of a 58-bit machine as a power-of-two one
	   whilst at the same time moving some useful bit-twiddling
	   hardware into the data-fetch logic instead of in the
	   occasional ALU instruction logic.

> In article <3318@awdprime.UUCP> tif@reed.UUCP (Paul Chamberlain/32767) writes:
> > Now there's an interesting thought.  Why not order these 64 bits so that
> > the 3 on the left are the bit offset. [...]

In article <643@array.UUCP> colin@array.UUCP (Colin Plumb) writes:
> [It need not be axiomatic that
> incrementing a character pointer adds 1 to the value of its bit pattern, if
> read as an integer]

Moreover, can I comment on the obvious:
integer data are just special cases of fixed-point.
It is simple enough to think of this 64-bit machine as being
a fixed-point, word-addressed one (with 6 binary places of fraction
to indicate fractions of a word).  Let's use octal to represent
an example of such a quantity:

	12345671234567123456.71		(all digits represent 3-bits,
					except the most sig one).

To 'increment' this number correctly requires some knowledge of what we
think 'increment' means.  Thus we would add 1.00 to it in order to increment
to the same position within the next word, or add 0.10 to it in order to
increment to the same position within the next byte, or add 0.01 to it
to point to the next bit.

It is also necessary to allow for different data types being used throughout
the memory.  A table of string pointers might be 'compacted' to a fixed-point
format with only 3 binary places of fraction.  A jump table, or a block of
integer variables might be compacted to 0-binary-place fixed-point
(i.e. integer); the offset on a branch instruction would make sensible use
of this compaction too.

Lastly, it is necessary to determine how to represent the result of the
increment.  An increment to a 6-bp (binary-place) fixed-point pointer
would presumably need to deliver a 6-bp result to the address hardware.
An increment to a 0-bp number (an integer) would need to deliver a 0-bp
result.  And an increment to a 0 or 3-bp compacted pointer in a table
would need to return a 0, 3 or 6-bp result, depending on context.

So, our 'increment' operation needs to ask three questions:
	1) By how much should I increment?
	2) What is the data-type of the object I am incrementing?
	3) What is the data-type of the result I must deliver?

I don't think that I have said anything in this article which has not
already been stated or implied.  However, I wanted to spell it out on
paper, just to check in my mind that I understand thus far.
--

Malcolm SHUTE.         (The AM Mollusc:   v_@_ )        Disclaimer: all

throopw@sheol.UUCP (Wayne Throop) (09/10/90)

> From: pauls@apple.com (Paul Sweazey)
> It is obvious to most that todays computing systems would not get faster 
> by having the hardware support bit-level addressing, much less an infinite 
> number of data types.

I slightly disagree.  I think there are a largeish class of things that
would get faster just by having the architectural address format be bit
granular, thus avoiding hauling around and paying for loading and storing
pointer-plus-bit-descriptor instead of just pointers.  Such as packed
subrange integer types or boolean types, as might be used in compression
schemes, in symbol tables, in graphics applications, in hardware descriptors,
and so on and on.

Note well, I'm not necessarily supposing that the memory system per se
needs to support bit granular access.  Just that the standard pointer
format ought to be able to indicate any bit in the virtual address
space.  Because otherwise, bit entities must live in a restricted part
of the address space, or it becomes much more costly than necessary to
represent bit granular or bit aligned entities.  I claim that such
entities are artificially less common than they "ought" to be, simply
because putting them into byte granular and byte aligned containers,
while wasting memory (sometimes a significant amount of memory),
is currently more convenient.

I therefore think there is a significant class of algorithms which would
indeed get faster... perhaps a couple of tens of percent faster, perhaps
even a couple of times faster.  And along with that is a significant class
of algorighms which would consume significantly less memory.
--
Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/10/90)

> From: pauls@apple.com (Paul Sweazey)
> It is obvious to most that today's computing systems would not get faster 
> by having the hardware support bit-level addressing, ...

I'll agree with that, but not for Sweazey's reason.
Computing systems wouldn't get faster if bit-level addressing were
supported because some of them (such as the NS32?32) *already* support
bit-level addressing.

The NS32?32 series has
	CBIT[I]	offset, base	(base).[offset:1] := 0
	IBIT    offset, base	(base).[offset:1] ^= 1
	SBIT[I]	offset, base	(base).[offset:1] := 1
	TBIT   	offset, base	(base).[offset:1] ^= 0
all of which first do		F := (base).[offset:1]
Here the base can be a register (in which case offset is 0..31) or
a memory address (in which case offset can be anything).  There is
also	CVTP	offset, base, dest	dest := (&base)*8 + offset
which converts a byte address and a bit offset to a bit address.
Thus:
	struct { int a:2, b:1; } x;
	int (*p):1 = &(x.b);
	*p = 1;
would turn into
	; x at -4(FP), p at -8(FP)
	CVTP	2, -4(FP), -8(FP)	; p := &(x.b)
	SBITD	-8(FP), @0		; *p := 1
	
I haven't an M68020 manual handy, but I believe the bit-field instructions
there could be used similarly.  (The 32?32 also has bit-field instructions.)

This might suggest a compromise position:  a machine with 64-bit data
paths and ALU, 32-bit *word* addresses, but load/store 1/8/16/32 instructions
which combined a word address with a 32-bit bit offset.  A 32-bit word
address would let you address 32Gb.

-- 
Psychiatry is not about cure.  Psychiatry is about power.