[net.arch] Boundary alignment: Ken Reek responds

kar@ritcv.UUCP (05/07/84)

<>

	Having collected all of the replies to my posting, it is now time for
a reply.  First, a disclaimer -- I am not advocating that reduced-instruction-
set machines themselves go away.  Whether the concept is good or not is a
completely different debate.  It is the pyramid's boundary alignment require-
ment that I object to.

	I'll answer arguments in more or less the order in which I received
the articles here.

Henry Spencer (Univ of Toronto Zoology) writes, in part:

> Simple hardware wins on more counts than just making life easier for
> lazy engineers.  It is simpler and cheaper to build, ..., more reliable,
> easier to fix when it breaks, etc etc.

Yes, simple hardware is easier to design, easier to fix, and all of the rest.
But it is a lot less useful.  How simple should it be?  An abacus is pretty
simple, for example.

> (omitted from above) simpler for the software to run (comparing an 11/44 to
> the nightmare complexity of the VAX i/o structure, for example, tells you
> why 4.2BSD is so bloated)

You are confusing implementation with interface.  The functions you choose to
implement do not determine the interface between the hardware and the soft-
ware; consider the many different ways that different machines support I/O.
On the Vax, it is the complexity of the interface that causes the problems,
not underlying implementation.

> Don't forget that magic word "cheaper".  It has become fashionable
> to say "software costs totally dominate hardware costs", but most
> people forget to add "...unless you can't afford the hardware in the
> first place".  Hardware and software money don't usually come out of
> the same pot, and the people who make decisions about such things are
> not necessarily as enlightened as we are.

Then enlighten them!  If you buy a machine on which software development is
difficult only because that machine is cheaper, you're not making a very
good decision.  It's up to those who know about such things to educate those
that "make the decisions" about the false economy of choosing a machine based
only on price.

> And once again, don't forget the example of the VAX:  sure, it looks
> like a nice machine, but it's grossly overpriced for its market now.
> This is despite massive use of [semi-]custom ICs on the more recent
> VAXen -- and you would not believe what a disaster that is for
> production and maintenance!  (There is an awful lot to be said for
> using standard parts, which means restricting yourself to things that
> can be built economically with them.)

Are you suggesting that it would have been better to build Vaxen from
7400 series logic?  I think not.

> I have heard, from reliable sources, that if/when the successor to the VAX
> emerges, the biggest difference will be that it will be much simpler.

That's the interface again, not necessarily the implementation.

> If you can show me a way to eliminate alignment constraints without a
> speed penalty, WITHOUT adding large amounts of hardware (which I could
> use better to make the aligned version faster), I'd love to hear about
> it.  It's hard.

Page 200 of the Vax Hardware Handbook describes how it is done with a cache
on the 780.  The same can be (is!) done with data, using a larger cache to
compensate for the less sequential nature of data accesses.

> But actually, most of this is beside the [original] point.  We are not
> talking about some decision which makes life a lot harder for the poor
> software jockey.  We are talking about a decision which requires more
> memory to get equivalent performance.  There is a perfectly straight-
> forward hardware-vs-hardware tradeoff here:  is it cheaper to build
> a machine that doesn't care about alignment, or to just stack more
> memory on a machine that does care?  I would give long odds that the
> latter approach wins, even just on initial cost.  When you think about
> things like reliability and maintenance, it wins big.

Good point.  However, for programs that dynamically allocate space, the size
of the largest problem that can be handled is determined by how efficiently
you use that space.  For ANY given memory size, the program that more
efficiently uses the space can handle larger problems.

Of greater concern, though, is where all that data resides when the program
is not actually running.  Can you add additional disk space to hold all of
the wasted space in your data structures as cheaply as you can add main
memory?  I would be upset if I had to buy another drive because all of my
existing ones were full of data that was 25% wasted space.  Sure, you can
pack them on disk and unpack them when you read them, but you are then
trading away execution efficiency.

> I agree that this doesn't help the poor people who have made a big
> investment in data structures that assume no alignment constraints.
> These people have made a mistake, period:  they have imbedded a major
> machine-dependent assumption in software that obviously should have
> been portable.  

This is my whole point -- alignment should NOT be machine dependent.


This from Spencer W. Thomas:

> A current trend in computer design is to assume that the user will only
> be writing in a high-level language, and that the compiler will do the
> hard work of generating machine code.  This is the theory behind RISC
> machines, in particular.  Making the hardware simpler makes it run faster.

Note: RISC machines simplify the interface to the machine, the machine
language.  The point of this is to simplify the generation of optimal code.
The speed of the machine is determined by the implementation, not the inter-
face.

> Once we start getting really convoluted machines (such as ELI, or some
> pipelined machines which execute several instructions after a
> conditional branch, before actually branching), all your clever hacks
> based on assumptions about the hardware will just go straight down the
> tubes.  If the compiler were smart enough, it would say "Oh, he's trying to
> access a longword, but it's not on the right boundary", and generate a byte
> move instruction to align it before accessing.

Huh?  If the implementation is allowed to screw up the interface, then the
instructions won't be doing what you think they should (e.g. executing several
instructions after a conditional branch before actually branching).  To over-
come this, the compiler would have to be pretty smart.

As for automatically checking for whether a byte move is necessary, it's fine
for statically allocated structures.  For any structure accessed through a
pointer, however, a run-time check would be required.  Again, we're trading
hardware complexity with performance.  If you want blinding speed, do it in
hardware.

> The basic problem is that generality is slower.  For really blinding
> speed, you always have to give up something.  With big memories,
> arbitrary alignment is not too hard to give up.  (I bet that the
> original application never put longwords on odd byte boundaries, now did
> it?)

The original application DID have "longwords on odd byte boundaries" -- that's
what caused the whole discussion.  Given that and the discussion above, then
the fastest solution is to have the hardware (not the software) take care of
non-aligned data.


>From mprvaxa!tbray:

> His argument is that byte addressability is a win because of the
> greater ease in writing software and the high cost today of software
> vs hardware.

> Not so!  Because...

> 1. All significant machine code today is generated by compilers, not
>    by people, and the compilers do the messy work of aligning everything.

Only if you're willing to pay the price -- see the discussion of disk space
and maximum problem size above.

> 2. Removing the byte-addressability constraint allows the hardware boys
>    to build much more cost-effective architectures, and to build them
>    quicker.

It should be no surprise that less useful hardware is cheaper and faster to
build.

> 3. Point number 2 is vital since the line about the rising cost of 
>    software with respect to hardware is so much horse puckey.  All those
>    graphs of the future that showed a graph that looked like an X, the
>    rising line being software and the falling line hardware, never happened.

Let's look at the vax again.  Compare the cost of developing the hardware
to the cost of all of the software that runs on it, and see the complaint
above about how the complexity of the vax resulted in a "bloated" 4.2 Unix.

>    The reason being that the demand grows at a phenomenal rate and every
>    year software becomes more layered, more functional, and less hardware-
>    efficient (*see note).  Which is as it should be.  So quick, cheap 
>    architectures are IMPORTANT.

If you're talking about operating systems, then yes.  How many operating
systems do you know of, though, that provide application-specific function-
ality?  Until that happens, complex applications systems will remain expen-
sive to implement, especially PORTABLE ones.  Reducing needless differences
between machines will make this simpler.

> If somebody can build, say, a Ridge 32 and it runs a really nice UNIX (it
> doesn't yet) and goes like hell for < $50K (it does), I'll cheerfully
> grapple with nUxi and alignment problems in my existing software.

I wouldn't.

> As to the reduced machine efficiency of modern software, this was really
> brought home to me one time I was touring a DEC manufacturing plant, and
> there were these 6 huge backplane-wiring machines running flat out, all
> being driven by a little wee PDP-8 (!).  When I expressed surprise, the
> manager explained that the 8 could do it with room to spare because there
> was no operating system to get in the way...

So what?  This particular application didn't need any of the capabilities
provided by a modern operating system, such as multiple users, paging, device
independent I/O, networking, etc.


And finally, from hhb!bob:

> Now let me flame at the folkz who felt compelled to tell me that we had
> written the code completely wrong.  These responses we just typical
> (and as I had expected) of UN*X snob types with little understanding of
> what it takes to develop major software systems.  With attitudes like
> that we ought to just throw most of UN*X out the window.  Do you have
> any idea how much effort we spent making the UN*X utilities work on a
> machine that did not have character pointers the same size as all other
> pointers ?  (This was for the word addressed machine I had previously
> mentioned).  It was months, and an extremely tedious job.  So obviously
> they wrote UN*X wrong PERIOD.

Right on.  Remember that, prior to 4.2, block addresses in inodes were stored
on the disk as 3 bytes.  Why?  To save space on the disk, that's why!  Of
course Unix is not "wrong PERIOD" -- it is a REAL software product, the
result of compromise and continual change.

In conclusion -- the fewer differences there are between machines, the
easier it will be to port software.  I do not mean to imply that all 
machines should be identical; it still makes sense to have "small" machines
for small applications and large machines for large applications, or
machines with special quirks for quirky applications.  Within any given
group, however, non-essential differences should be eliminated from the
architectures!  Manufactures that do this will be rewarded with increased
sales IF the software engineers educate those who hold the purse strings
about the economies of producing software.

	Ken Reek, Rochester Institute of Technology
	{allegra,seismo}!rochester!ritcv!kar

dmmartindale@watcgl.UUCP (Dave Martindale) (05/10/84)

To address your last point first, I do not see how you can agree that
UNIX was written "wrong" because parts of it assume that all pointers
will fit in an int, an then say that your sofware is right and the
hardware is wrong for not allowing non-aligned references.
If you believe that all hardware should allow non-aligned references
for the benefit of the software, why don't you also argue that all
hardware should use byte addressing, with all pointers the same length
and the pointer the same size as the int?  These latter restrictions
are actually far more important to porting much code than having the
processor do unaligned fetches.

Hardware design is a series of tradeoffs.  It would be nice to have
hardware that would accept data on arbitrary byte boundaries.  It would
be even nicer to extend that to arbitrary bit boundaries.  It would be
nice if no machine had an address shorter than 24 bits, 32 would be
much better.  It would be nice if all machines had floating-point
instruction times which are comparable to their integer instruction times.
It would be nice to have virtual memory capability on all machines (this
makes a much greater difference in "size of problem that can be handled"
than the ability to pack data to eliminate wasted space).

But all "it would be nice to have" features cost something - in speed,
cost, power, size of board, or somewhere.  Manufacturers will continue
to decide to include or exclude a feature based on these tradeoffs.
Computer users will continue to evaluate the machines produced in light
of their capabilities, restrictions, and the application they are intended
for.  Now, if you choose to ignore all machines which will not do unaligned
fetches, that is your right.  But please do not badmouth other people who
see other issues as more important to what they do.

And if you really want to call your software "portable", it should run
on machines which have different sizes of pointers, different byte
orders within the word, different native character sets, and even restrictions
on the alignment of data.  You make software which is portable by making
it independent of the details of the implementations of a wide variety
of machines, not by arguing that a certain class of those machines should
not be allowed to exist.

	Dave Martindale

henry@utzoo.UUCP (Henry Spencer) (05/17/84)

Dave Martindale has addressed some of Ken Reek's comments; here's
some more rebuttal...

   > Simple hardware wins on more counts than just making life easier for
   > lazy engineers.  It is simpler and cheaper to build, ..., more reliable,
   > easier to fix when it breaks, etc etc.
   
   Yes, simple hardware is easier to design, easier to fix, and all of the rest.
   But it is a lot less useful.  How simple should it be?  An abacus is pretty
   simple, for example.

The whole point of the RISC notion is that the hardware can be made
dramatically simpler *without* losing anything important.  You haven't
demonstrated that "aligned" machines lose anything important -- the
inability to run unportable software is hardly significant, or the RISC
would be doomed by its inability to run VMS.

   > (omitted from above) simpler for the software to run (comparing an 11/44 to
   > the nightmare complexity of the VAX i/o structure, for example, tells you
   > why 4.2BSD is so bloated)
   
   You are confusing implementation with interface.  The functions you choose to
   implement do not determine the interface between the hardware and the soft-
   ware; consider the many different ways that different machines support I/O.
   On the Vax, it is the complexity of the interface that causes the problems,
   not underlying implementation.

Complexity of implementation usually rears its ugly head in the
interface as well.  Even things like caches and unaligned-operand
features do show up, if you think about it carefully.  Ask someone who's
written operating-system memory-management code what he thinks of
unaligned operands that can straddle page boundaries.  No, unaligned
operands are *not* free of software complexities, although the extra
complexity they introduce is at least localized.

   > Don't forget that magic word "cheaper".  It has become fashionable
   > to say "software costs totally dominate hardware costs", but most
   > people forget to add "...unless you can't afford the hardware in the
   > first place".  Hardware and software money don't usually come out of
   > the same pot, and the people who make decisions about such things are
   > not necessarily as enlightened as we are.
   
   Then enlighten them!  If you buy a machine on which software development is
   difficult only because that machine is cheaper, you're not making a very
   good decision.  It's up to those who know about such things to educate those
   that "make the decisions" about the false economy of choosing a machine based
   only on price.

I don't know about you, but my software development would not be one cent
cheaper on an unaligned machine.  (My current machine, a PDP11, is aligned.)
Clean, portable software has no problem with such a machine.

   > And once again, don't forget the example of the VAX:  sure, it looks
   > like a nice machine, but it's grossly overpriced for its market now.
   > This is despite massive use of [semi-]custom ICs on the more recent
   > VAXen -- and you would not believe what a disaster that is for
   > production and maintenance!  (There is an awful lot to be said for
   > using standard parts, which means restricting yourself to things that
   > can be built economically with them.)
   
   Are you suggesting that it would have been better to build Vaxen from
   7400 series logic?  I think not.

Read that last parenthesized sentence again -- the point is not that
you should use standard parts even for jobs that they can't do, but that
you should restrict your jobs to things that standard parts *can* do.
It actually is possible to implement a VAX in 7400's -- what do you think
the 780 is made of? -- it's just hard and expensive.

   > I have heard, from reliable sources, that if/when the successor to the VAX
   > emerges, the biggest difference will be that it will be much simpler.
   
   That's the interface again, not necessarily the implementation.

See earlier for comments on the visibility of implementation complexities.

   > If you can show me a way to eliminate alignment constraints without a
   > speed penalty, WITHOUT adding large amounts of hardware (which I could
   > use better to make the aligned version faster), I'd love to hear about
   > it.  It's hard.
   
   Page 200 of the Vax Hardware Handbook describes how it is done with a cache
   on the 780.  The same can be (is!) done with data, using a larger cache to
   compensate for the less sequential nature of data accesses.

Please read your Vax Hardware Handbook more carefully.  The same is *not*
done with data, unless the data item fortuitously happens to fall within
an 8-byte-aligned doubleword.  A data item that straddles an 8-byte boundary
in memory will take two fetches, i.e. a speed penalty.  Nor is the presence
of the cache an "out":  caches cost hardware.  Lots.  The VAX has already
paid this particular price for other reasons -- stupid memory-system
interface design -- but this doesn't make the cache free.  Well-designed
machines in this speed range don't need caches at all.

   > But actually, most of this is beside the [original] point.  We are not
   > talking about some decision which makes life a lot harder for the poor
   > software jockey.  We are talking about a decision which requires more
   > memory to get equivalent performance.  There is a perfectly straight-
   > forward hardware-vs-hardware tradeoff here:  is it cheaper to build
   > a machine that doesn't care about alignment, or to just stack more
   > memory on a machine that does care?  I would give long odds that the
   > latter approach wins, even just on initial cost.  When you think about
   > things like reliability and maintenance, it wins big.
   
   Good point.  However, for programs that dynamically allocate space, the size
   of the largest problem that can be handled is determined by how efficiently
   you use that space.  For ANY given memory size, the program that more
   efficiently uses the space can handle larger problems.
   
   Of greater concern, though, is where all that data resides when the program
   is not actually running.  Can you add additional disk space to hold all of
   the wasted space in your data structures as cheaply as you can add main
   memory?  I would be upset if I had to buy another drive because all of my
   existing ones were full of data that was 25% wasted space.  Sure, you can
   pack them on disk and unpack them when you read them, but you are then
   trading away execution efficiency.

You have not refuted my point at all.  Granted that adding memory (be it
main memory or disk) costs money, it's still cheaper and simpler than
adding unaligned-operand hardware.

   > I agree that this doesn't help the poor people who have made a big
   > investment in data structures that assume no alignment constraints.
   > These people have made a mistake, period:  they have imbedded a major
   > machine-dependent assumption in software that obviously should have
   > been portable.  
   
   This is my whole point -- alignment should NOT be machine dependent.

There is an anecdote attributed to Abraham Lincoln.  He asked a riddle:
"If you call a dog's tail a leg, how many legs does a dog have?".  The
correct answer is:  "Four.  Calling the tail a leg doesn't make it one."
The fact is, alignment *IS* machine dependent, and all the wishing in
the world won't change it.  To quote Dave Martindale:

   .......................... You make software which is portable by making
   it independent of the details of the implementations of a wide variety
   of machines, not by arguing that a certain class of those machines should
   not be allowed to exist.

-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry