[comp.parallel] Byte order in tight coupled systems.

andy@research.canon.oz.au (Andy Newman) (06/18/91)

We have a rather interesting problem with a system we are putting
together and were wondering if others had any experience with any
similar situations.

Basically we have a tightly coupled (i.e., shared memory) system
with two processors, one little-endian, the other big-endian.
We were going to fix it in hardware but that didn't work as
expected so its up to us software people to attack it.

The system transfer *lots* of complex data structures between the
two processors (mainly in one direction, big-endian to
little-endian). There's certainly no problem in providing some
macros that read and write different primitive types but this
is a little low level. We thought about having macros that check
the size of an object and apply the appropriate byte swapping (yes
this can be fooled with certain structures) but weren't impressed
with the possibility of side effects and the great risk of people
making mistakes at 2am in the morning.

We thought about a pre-processor that would do the necessary magic
but in the general case you have to parse C properly and understand
type declarations. We could limit what is legal to simplify things
but we'd rather not.

Before we do anything ourselves has anyone developed any tools for
dealing with this sort of situation? Any ideas would be appreciated.

--
Andy Newman (andy@research.canon.oz.au)

kenton@decvax.dec.com (Jeff Kenton OSG/UEG) (06/19/91)

In article <1991Jun18.173855.29052@hubcap.clemson.edu>,
andy@research.canon.oz.au (Andy Newman) writes:

|> Basically we have a tightly coupled (i.e., shared memory) system
|> with two processors, one little-endian, the other big-endian.
|> We were going to fix it in hardware but that didn't work as
|> expected so its up to us software people to attack it.
|> 
|> Before we do anything ourselves has anyone developed any tools for
|> dealing with this sort of situation? Any ideas would be appreciated.
|> 

Some musings on the subject:

  .  You already discovered that the problem is dependent on the specific
	data items and their layout in memory.  Unless you severely restrict
	structure layouts, you must deal with the problem item by item.

  .  If you are willing to run only Fortran and restrict yourself to Fortran
	formatted I/O the runtime I/O package will have the necessary size
	information to do the proper byte swapping.

  .  You don't say which two processors you are using, but several of the
	current chips will allow you to run individual processes reverse-
	endian.  It might save you grief in the long run to find a way to
	run all relevant processes the same endian.  88000's and the newer
	MIPS chips allow this (others probably do).

  .  Are you sure it's too late to go back and re-design the hardware?

-----------------------------------------------------------------------------
==	jeff kenton		Consulting at kenton@decvax.dec.com        ==
==	(617) 894-4508			(603) 881-0011			   ==
-----------------------------------------------------------------------------

eugene@nas.nasa.gov (Eugene N. Miya) (06/20/91)

In article <1991Jun18.193731.4911@hubcap.clemson.edu>
kenton@decvax.dec.com (Jeff Kenton OSG/UEG) writes:
>  .  Are you sure it's too late to go back and re-design the hardware?

A quote from Wulf's book which I used in an unpublished NASA survey:

[Wulf81, pp. 276]:
.(q
In general, we believe that it's possible to make two major mistakes at the
outset of a project like C.mmp.  One is to design one's own processor;
doing so is guaranteed to add two years to the length of the project and,
quite possibly, sap the energy of the project staff to the point that nothing
beyond the processor ever gets done.  The second mistake is to use someone
else's processor.  Doing so forecloses a number of critical decisions, and thus
sufficiently muddies the water that crisp evaluations of the results are
difficult.  We can offer no advice.  We have now made the second mistake\**
\*- for variety, next time we'd like to make the first!  Given the chance, our
processor would:
.(f
\**[Wulf81]: Twice, in fact.
The second multiprocessor project at C-MU, $Cms$, also uses the PDP-11.
.)f

Do not ask for my survey.  It is hopelessly out of date (I wrote that
RISCs would be significant, and my boss at the time didn't believe me).
No one ever reads surveys anyway (I discovered this during my survey).  
That's why I did a machine readable bibliography (now that is useful,
and you can find real gems).

What is sad is that we are constantly reinventing ILLIAC IVs, reinventing
C.mmps and Cm*s.  Wulf's book is good.

	"Those who forget the past are doomed to repeat it."  -- G.S.
	"Those who remember the past are doomed to repeat it."
						-- Suzanne Fuller

--eugene miya, NASA Ames Research Center, eugene@orville.nas.nasa.gov
  Resident Cynic, Rock of Ages Home for Retired Hackers
  Program Committee, Hacker's 7.0
  {uunet,mailrus,other gateways}!ames!eugene

bdb@uunet.UU.NET (Bruce D. Becker) (06/24/91)

In article <1991Jun18.173855.29052@hubcap.clemson.edu> andy@research.canon.oz.au (Andy Newman) writes:
|
|Basically we have a tightly coupled (i.e., shared memory) system
|with two processors, one little-endian, the other big-endian.
|We were going to fix it in hardware but that didn't work as
|expected so its up to us software people to attack it.
[lots of stuff deleted]

	You might want to think about trying to
	adapt Sun's XDR system to your problem.
	XDR is a data-representation language
	which has C-like syntax, and which was
	designed to handle exactly such problems.

	The source for compiler & library should
	be easily available at many FTP sites.


-- 
  ,u,	 Bruce Becker	Toronto, Ontario
a /i/	 Internet: bdb@becker.UUCP, bruce@gpu.utcs.toronto.edu
 `\o\-e	 UUCP: ...!utai!mnetor!becker!bdb
 _< /_	 "Ferget yer humanity, do the poot" - devo

alan@uc.msc.edu (Alan Klietz) (06/25/91)

In article <1991Jun24.122015.19112@hubcap.clemson.edu> becker!bdb@uunet.UU.NET (Bruce D. Becker) writes:
<
<	You might want to think about trying to
<	adapt Sun's XDR system to your problem.
<	XDR is a data-representation language
<	which has C-like syntax, and which was
<	designed to handle exactly such problems.

Yes, but XDR is sloow.  The overhead of a procedure call per item is too
much, esp for regular data like arrays. 

XDR wants to build data structures by composing XDR functions for each object.
To get decent performance on a vector or parallel machine, you have to rewrite
it to make it "flat" so that the compiler can recognize the regularity of the
data layout and generate decent code.  Perhaps the compilers will be smart 
enough to this automatically someday, but not yet..

--
Alan E. Klietz
Minnesota Supercomputer Center, Inc.
1200 Washington Avenue South
Minneapolis, MN  55415
Ph: +1 612 626 1737	       Internet: alan@msc.edu

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell