[net.arch] Pyramid architectural restraints

guy@rlgvax.UUCP (Guy Harris) (04/21/84)

> We currently have a pyramid 90x which I am evaluating in my copious spare
> time.  It's configured with 4m main, and ~900m disk.  We do not have
> data cache installed yet.  So far, a great deal of fun has been had, but
> the 90x faults on non-long-aligned 32-bit memory operations.  Any comments,
> questions, opinions, etc, would be of interest.  Especially regarding the
> above item.  Also, I am interested in exchanging benchmark stats (vs
> 780, 750).  Lastly, I am curious about up-and-coming architectures (exp.
> DEC) with the above constraint.

The Motorola M68000 CPU chips (MC68000 and MC68010, at least) fault on non-word-
aligned 16-bit or 32-bit memory operations.  The 360 would also fault on non-
long-aligned 32-bit operations or non-(half)word-aligned 16-bit operations;
the 370 (and 360/85) wouldn't, *but* would run slower if the operand wasn't
properly aligned.  I believe the same is true on the VAX; i.e., you can put
things off the right boundary, but you pay for it.  Since C puts things on the
right boundary, and since you're better off putting them there in a lot of
cases even on machines which permit you to do otherwise, I presume Pyramid
figured it wasn't worth the trouble to permit unaligned operands.  I suspect
it'll be a cold day in June before any VAXen refuse to support unaligned
operands (compatibility and all that), but I suspect any other machines
DEC is working on (like some RISC project supposedly being cooked up) may not
feel any obligation to support them.

I can't speak for most other VAX-class superminis (such as the Ridge 32), but
there *is* at least one other 32-bit supermini which requires proper alignment;
the CCI Power 6/32.  (It's not announced yet, but it's *very* fast.  Keep
your eyes peeled.  - unpaid commercial announcement)  When we ported UNET
to our 6/32, we did get bit by that one but it was simple to fix.  (The code
was using an "short *" to point into a buffer and either casting or assigning
that pointer to "long *" and pulling a 32-bit quantity out of that buffer.
It probably never failed on other machines because most of them are 16-bit
or 16/32 bit machines and only require 16-bit alignment, or are 32-bit
machines but don't require any alignment.)  I suspect we took the same tack;
namely, C puts things on the right boundary, and there's no guarantee that
unaligned operands work on all machines, so portable code won't assume that
they do - so there's no point in putting effort into supporting unaligned
operands.

Consider that lots of machines read the contents of a memory buffer register
onto some internal 32-bit data bus; if the operand is aligned, you need not
monkey with it, but if it's unaligned you have to fetch the next longword,
combine the two longwords, shift it by the appropriate number of bytes, and
pull out the appropriate 32 bits and put them onto the appropriate data
bus.  If you don't really need all this fanciness, why bother?

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

bob@hhb.UUCP (04/23/84)

My company created and now sells the CADAT digital simulation system which
incorporates hierarchical concurrent fault simulation capabilities.
The software package represents at least 20 man years of effort and
is not trivial in any respect.

The problem is this:

	It was originally written in C and was first implemented for a
16 bit architecture.  Since we are dealing with such vast amounts
of information to describe the topology of a circuit, we went to
great pains to make our data-structures as lean as possible.
Of course this meant that many of our data structures were
`hand' built via pointer arithmetic and such. A typical example is
a data structure we use to describe one device in a circuit.
(i.e. and,nand,nor,jk-flop....)  It has a fixed header but the
data that follows it is variable in length, with its structure
being derivable from the information in the header.  If we went to
using C structures to define this information, the padding done
by the compiler would increase each device entries size by 25%.
There can be million of devices in a single circuit.

When we are simulating VLSI components, we often have 'device tables'
which have sizes of 2-3 meg.  This 25% increase in space could have a
serious impact on our performance if we then started being paged by
the system.  In digital simulation, performance is the name of the
game cause Test engineers just aren't cheap.

I know, memory is `cheap', but no matter how much memory i have, 
this simulation system will want to use it, and paying a 25%
penalty for `structured programming' is hard to stomach.

We were doing fine.....

The system now runs under 12 different version of UNIX on both
68000's and Vaxen.  It also runs under VMS.  We hit our first
snag on the Pyramid system we are evaluating.  It compiled
fine.. but you know the rest.....  We `core dumped' when we accessed
a long word on a non quad byte boundary.  We also found that Ridge will
exhibit the same behavior as the Pyramid. Along with the CCI Power 6/32.


We now have a dilemma.  Spend 10 man-months and rewrite the
data structures we have for these 3 machines and incur the 
25% memory penalty ?

Or just avoid architectures of this type ? 

How many architectures of the future will have this restraint ?
A machine we are looking at now for a CADAT implementation is
the ELEXSI.  It is a super-mini with IBM mainframe performance
but it offers Vaxen like addressing and won't pose any problem
to us.

What are the opinions in net-land.. Should we re-write or just
ignore the (few ?) machines with this restraint ?


======================================= Be

kar@ritcv.UUCP (Kenneth A. Reek) (04/24/84)

Regarding the question of whether to invest 10 man-years to rewrite a large
system to make it run on machines that require boundary alignment of data or
to just ignore those machines...

	Ignore those machines and maybe they'll go away.  Architectures that
require aligning multi-byte quantities on particular boundaries sound to me
like they were designed by an engineer who was interested in simplifying his
own task at the expense of the software designers who will use the machine.
The computer is designed once, but there are an infinite number of programs
that might be run on it.  This short-sighted engineering was OK for the early
360's, but is not appropriate for modern computers, and if we software types
tolerate it, it is likely to keep happening.

	Ergo: don't buy any of these machines, and maybe people will stop
making them.

	Ken Reek, Rochester Institute of Technology
	{allegra,seismo}!rochester!ritcv!kar

PS:  Allowing non-aligned data to be accessed with a performance penalty is
only a little less short-sighted.  When designing a computer, anything that
makes the job of writing software easier will be justified in the marketplace,
especially the oem marketplace.  Given the current comparison of hardware
costs to software (i.e. people) costs, a more expensive cpu that is easier
to program will be vastly cheaper in the long run.

chris@umcp-cs.UUCP (04/25/84)

Simple solution: write a subroutine to access longwords that might
not be aligned.  Since subroutine calls are so fast on the Pyramid,
you'll probably still run faster than a Vax....
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

stan@clyde.UUCP (04/25/84)

Sounds like it should have been a compiler option to have the data
strunctures aligned or not.

(I realize that I'm RISC-king my neck by saying that, though.)

		Stan King			phone: 201-386-7433
		Bell Labs, Whippany, NJ		Cornet:  8+232-7433
		room 2A-111			uucp:	 clyde!stan

dmmartindale@watcgl.UUCP (Dave Martindale) (04/26/84)

	PS:  Allowing non-aligned data to be accessed with a
	performance penalty is only a little less short-sighted.  When
	designing a computer, anything that makes the job of writing
	software easier will be justified in the marketplace,
	especially the oem marketplace.  Given the current comparison
	of hardware costs to software (i.e. people) costs, a more
	expensive cpu that is easier to program will be vastly cheaper
	in the long run.

There are always going to be tradeoffs.  Allowing the largest data type
accessed by the machine to be accessed on any boundary with no performance
penalty requires that the data path from memory be twice as wide as the
largest operand fetch that will be done from it, and the presence of
a data aligner which is fast (probably combinational logic) and also
that wide.  On the VAX, for example, where the longest data type is 8
bytes (ignoring H-floating for the moment), you'd need a 128-bit wide data
path from memory.  That's four times its current width; frankly, I'd rather
see the extra money spent on features that would speed the machine up
all the time, not just when doing unaligned data references.

wm@tekchips.UUCP (Wm Leler) (04/26/84)

1- I have heard Dr. Fred Brooks state that it was a mistake
to allow non-aligned word accesses in the IBM/370.

2- I don't understand why you must pay a 25% space penalty
to use the Pyramid machine.  Maybe rewrite your accessesing
functions, but change your data structures?  In the worst
case you could grab everything a byte at a time and assemble.
Not that you should do this, I was just trying to show that a
machine that requires word alignment can do any data structure.

3- <enter sarcasm mode> Well, if you are complaining about
machines that require word alignment, how about all those
machines out there that require *byte* alignment!  I want
to be able to store my double precision floating point numbers
starting with any bit in memory I desire!  What about the
waste when C programmers use ints (32 bits long!) for boolean
flags?  Or all those structures that contain padding?  Wouldn't
this solve the problem of structure comparisons?  And I know
how many bits wide my integers should be.  I should be able
to have 19 bit integers, or 129 bit floats.  Foo on alignment.
I mean, you hardware guys are making my job as a software
hacker much harder.  Like someone said, making a machine
cheaply at the expense of making software harder to write
is a big lose.
		<exit sarcasm mode> :-O :-) ;-)

Please don't send me mail about bit aligned machines.  I already
know about them.

			Wm Leler    503/627-5151
			wm.Tektronix@csnet-relay
		{ucbvax|allegra|decvax|ihnp4}!tektronix!wm

henry@utzoo.UUCP (Henry Spencer) (04/27/84)

Kenneth Reek contends, in part:

	Architectures that require aligning multi-byte quantities on
	particular boundaries sound to me like they were designed by an
	engineer who was interested in simplifying his own task at the
	expense of the software designers who will use the machine.

Simple hardware wins on more counts than just making life easier for
lazy engineers.  It is simpler and cheaper to build, simpler for the
software to run (comparing an 11/44 to the nightmare complexity of the
VAX i/o structure, for example, tells you why 4.2BSD is so bloated),
more reliable, easier to fix when it breaks, etc etc.

Don't forget that magic word "cheaper".  It has become fashionable
to say "software costs totally dominate hardware costs", but most
people forget to add "...unless you can't afford the hardware in the
first place".  Hardware and software money don't usually come out of
the same pot, and the people who make decisions about such things are
not necessarily as enlightened as we are.  And once again, don't forget
the example of the VAX:  sure, it looks like a nice machine, but it's
grossly overpriced for its market now.  This is despite massive use of
[semi-]custom ICs on the more recent VAXen -- and you would not believe
what a disaster that is for production and maintenance!  (There is an
awful lot to be said for using standard parts, which means restricting
yourself to things that can be built economically with them.)  I have
heard, from reliable sources, that if/when the successor to the VAX
emerges, the biggest difference will be that it will be much simpler.

	Allowing non-aligned data to be accessed with a performance
	penalty is only a little less short-sighted.  When designing a
	computer, anything that makes the job of writing software easier
	will be justified in the marketplace, especially the oem marketplace.

If you can show me a way to eliminate alignment constraints without a
speed penalty, WITHOUT adding large amounts of hardware (which I could
use better to make the aligned version faster), I'd love to hear about
it.  It's hard.

	Given the current comparison of hardware costs to software
	(i.e. people) costs, a more expensive cpu that is easier to
	program will be vastly cheaper in the long run.

See my first comments for some reasons why the software will be easier
if the hardware is simpler.

But actually, most of this is beside the [original] point.  We are not
talking about some decision which makes life a lot harder for the poor
software jockey.  We are talking about a decision which requires more
memory to get equivalent performance.  There is a perfectly straight-
forward hardware-vs-hardware tradeoff here:  is it cheaper to build
a machine that doesn't care about alignment, or to just stack more
memory on a machine that does care?  I would give long odds that the
latter approach wins, even just on initial cost.  When you think about
things like reliability and maintenance, it wins big.

I agree that this doesn't help the poor people who have made a big
investment in data structures that assume no alignment constraints.
These people have made a mistake, period:  they have imbedded a major
machine-dependent assumption in software that obviously should have
been portable.  The merits of the assumption are debatable; what is
not debatable is that it shouldn't have been wired in so deeply!
They have asked whether they should spend 10 man-months recoding to
imbed the opposite assumption, i.e. alignment constraints.  I think
they should spend however much time it takes to eliminate such
deeply-wired-in assumptions completely.  Or they will surely be bitten
by something like this again some day.  (Case in point:  does the code
assume that a long int is 32 bits?  Might be a mistake if they ever
want to move it to an Amdahl Unix -- and the big Amdahls are supposed
to be pretty good Unix machines if you need lots of crunch.)

[I can just hear N netnews readers firing up their afterburners to
accuse me of being a degenerate anti-software hardware hacker...
I'm a software specialist, degree in Computer Science (specifically,
compilers), experience mostly in software.  But I know a good deal
about how hardware works and about the practical aspects of building
it, and have personal experience with some of the problems.]
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

thomas@utah-gr.UUCP (Spencer W. Thomas) (04/28/84)

>	Ignore those machines and maybe they'll go away.  Architectures that
> require aligning multi-byte quantities on particular boundaries sound to me
> like they were designed by an engineer who was interested in simplifying his
> own task at the expense of the software designers who will use the machine.
> The computer is designed once, but there are an infinite number of programs
> that might be run on it.  This short-sighted engineering was OK for the early
> 360's, but is not appropriate for modern computers, and if we software types
> tolerate it, it is likely to keep happening.

A current trend in computer design is to assume that the user will only
be writing in a high-level language, and that the compiler will do the
hard work of generating machine code.  This is the theory behind RISC
machines, in particular.  Making the hardware simpler makes it run
faster.  Once we start getting really convoluted machines (such as ELI,
or some pipelined machines which execute several instructions after a
conditional branch, before actually branching), all your clever hacks
based on assumptions about the hardware will just go straight down the
tubes.  If the compiler were smart enough, it would say "Oh, he's trying
to access a longword, but it's not on the right boundary", and generate
a byte move instruction to align it before accessing.

The basic problem is that generality is slower.  For really blinding
speed, you always have to give up something.  With big memories,
arbitrary alignment is not too hard to give up.  (I bet that the
original application never put longwords on odd byte boundaries, now did
it?)

=Spencer

eric@gang.UUCP (Eric Kiebler) (05/01/84)

[Nuke the smurfs]

Since when does having nice hardware and well though-out software
make any difference?  People have been buying machines with horrible
software and lousy architectures for years, and continue to do so.
In fact, I heard a rumor that "they" are producing a particular
horrible machine with a lousy architecture at the rate of 1 every
16 seconds.  As long as lousy hardware and horrible software get
the job done, however, people are happy.  Inertia is a wonderful/bad
thing.  Until machines are sufficiently sophisticated that increasing
performance is *not* an issue, people will tweak and prod and make
trade-offs that are not in their best long-term interest.

Meanwhile, we all push our hair out in little gray bunches...

eric
-- 
from the gang down at...  38.37.45 N   90.12.22 W
	..!ihnp4!afinitc!{gang|wucs!gang}!eric

Any sufficiently advanced technology is indistinguishable from a rigged demo.
Copyright (C) 1984 All Rights Reserved.

bob@hhb.UUCP (05/02/84)

Its time for a response.

When I posed my question regarding the problem we were having with the
pyramid system, I really wasn't looking for technical solutions to the
problem, and I wasn't looking for replies telling me that we had just
plain written the code wrong (PERIOD). (Implied in that statement was
that we must be bumbling fools who shouldn't be allowed behind the
wheel of a C Compiler), all I was looking for was whether this type of
architecture would be dominant in the future, so that if it was, we can
schedule a fix for it in the future.

Now for a credibility speech.

Me and my company have been involved with UN*X for quite a while.  We
have done two native portations to a word addressed computer.  One of
them UN*X v7m, and the other being System III.  We have also done 3 C
compiler code generators, 2 for word addressed machines and the other
for the hp 1000 series machines (2 registers, no byte addressing!).  We
also did the XENIX adaptation for the IBM Instruments CS9000 computer.
I feel we are quite competent at what we do, and have an excellent
knowledge of compilers, and machine architectures.

Now for a plea.

Please - No more solutions to our problem, like -- write subroutines to
fetch the bytes individually, use macros, trap the fault in the kernel
and fix it there........ and on and on and on...  We've already thought
of the different ways to fix it, and feel, as stated, that to do it
CORRECTLY would take 10 man months.

Now for some exhaust from my after-burners ----->>>>>>>>>>********

Now let me flame at the folkz who felt compelled to tell me that we had
written the code completely wrong.  These responses we just typical
(and as I had expected) of UN*X snob types with little understanding of
what it takes to develop major software systems.  With attitudes like
that we ought to just throw most of UN*X out the window.  Do you have
any idea how much effort we spent making the UN*X utilities work on a
machine that did not have character pointers the same size as all other
pointers ?  (This was for the word addressed machine I had previously
mentioned).  It was months, and an extremely tedious job.  So obviously
they wrote UN*X wrong PERIOD.

No, you're right, it was written wrong, from a purely technical,
non-realistic viewpoint.  Maybe those who write code perfectly and
think of ALL considerations before they code, do not have deadlines,
schedules, and real world problems to address.  I'm also sure that they
have plenty of time to conduct seminars for all the new people they
hire, espousing to them the perfect way to write C code.  But then
again, maybe the most ambitious system they have developed is
comparable to a `grep' program.  Or else maybe they have large
contracts from the government.  (I worked for a defense contractor
once....)

Did we make assumptions about the architecture of the machine we would
run on ?  Yup, for instance, we'll never run on an 8086 microprocessor
because we assume we have a linear address space greater than 64k.
(Unless someone comes out with a true large model compiler for it). And
won't lose much from that decision.  It boils down to speed/space vs.
generality and we chose a compromise position. It bit us with the RISC
machines, but unless they are the wave of the future (which was the
original question i was posing) we won't disrupt our development cycle
for them.

Anyways, what the flamers ignored was that we now run on 12 different
systems, and will run on many more to come, without mods to our
system.  So I fail to see that we wrote it completely wrong.  And I
admit, it won't port to a 67 bit flammigabar machine, but it sure seems
to be a useful product in the market it enjoys.  So how about coming
out of your ivory towers and just try to put things in perspective.



========================================================== Be

Company:        HHB-Softron UUCP address:
{decvax,allegra}!philabs!hhb!bob

pete@lvbull.UUCP (pete) (06/21/84)

Here Here for UNIX hacks for those old word address machines!  Would
rather run a hacked UNIX than GCOS/mod400 for example.

I also have been battleing with word address machines for a few years
now and can appreciate the problems.

Maybe we should exchange software that has been hacked for 
word addressing machines; Ex: my 4bsd networking on a V7 filesystem.

Anyway it would be nice to see word addressing bugs removed from
the UNIX sources; it get a little booring macking the same hack
in the kernel in iomove(), uiomove(), again and again.

Will Bell and Berkely accept word-addressing fixes as worth-while.
I believe all the newworking hackes were in mbuf.h and sys_inode.c