[comp.lang.c] Portability across architectures..

dandc@simsdevl.UUCP (Dan DeClerck) (08/27/88)

I've run across a need to have data files in various forms of UN*X
be portable to each other. Mostly, this deals with Intel to Motorola and 
vice-versa. I could write data out to files in ASCII, but this is cumbersome, 
slow and may hamper the products' marketability. 

The problem lies in writing integers as well as structures to files, and allow those files to be transferred between a multitude of machines without a data transformation taking place.

A fellow programmer suggested an "XDR" standard from SUN, but this seems to only work with inter-process communication. Has anyone encountered this problem??

--------------------------------------------------------------------------------Dan DeClerck               dandc@rutgers!mcdchg!simsdevl

Motorola SIMS development group...     "SIMS", it anticipates every need.....
Schaumburg, Il

gwyn@smoke.ARPA (Doug Gwyn ) (08/28/88)

In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes:
>The problem lies in writing integers as well as structures to files,
>and allow those files to be transferred between a multitude of machines
>without a data transformation taking place.

I know your main need is "Intel to Motorola", whatever that means (8086 to
68000?), but the issue is a general one and that's what I'll be discussing.

In general, a data transformation HAS to take place; the main issue is
where it is done.  If you map into some "universal" data representation,
then most architectures/implementations have to translate the data.  (If
the "universal" format matches the native binary format of some
implementation, then that implementation can take advantage of the fact.)

If one is exchanging data dynamically, e.g. via a network connection,
then it is possible to do some initial probing to discover how much
mapping is necessary and often reduce the amount of work from what
would be required if a "universal" format had been used.  I have a
package that does exactly this.

When coding up data translation algorithms in C, portability is fairly
tedious to achieve, since one can only count on AT LEAST 8 bits in a
char (there may be more), ordinary chars may sign-extend in expressions,
integers have to be built by shifting their pieces into the right place,
unions are not usually guaranteed to have the right properties, etc.
Floating-point format is even harder now that IEEE has introduced non-
numbers into the game.  And the "standard" apparently does not control
byte order within a floating-point number, so incompatibilities exist
even between IEEE implementations.

Sun's XDR spells out methods of describing structures; such descriptors
are not necessary if your application knows in advance what it will be
reading.  However, you still need some way to construct the in-memory
structure.  Even if every struct member has the same representation as
in the transferred file, there are still alignment/padding requirements
that may differ from one implementation to the next.  Generally one
ends up building a struct by translating each member separately.

Sorry there is no simple way to achieve what you're after, at least not
in general.  Don't rule out ASCII representation too quickly..

bill@proxftl.UUCP (T. William Wells) (08/29/88)

In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes:
: I've run across a need to have data files in various forms of UN*X
: be portable to each other. Mostly, this deals with Intel to Motorola and
: vice-versa. I could write data out to files in ASCII, but this is cumbersome,
: slow and may hamper the products' marketability.
:
: The problem lies in writing integers as well as structures to
: files, and allow those files to be transferred between a
: multitude of machines without a data transformation taking
: place.
:
: A fellow programmer suggested an "XDR" standard from SUN, but
: this seems to only work with inter-process communication.  Has
: anyone encountered this problem??

As I understand it, XDR includes a description of the data *as it
is transmitted* as well as a set of conversion routines.  For
example, on the man page for our Sun 3 is:

xdr_array()  translate arrays to/from external representation
xdr_bool()   translate Booleans to/from external representation
xdr_bytes()  translate counted byte strings to/from external representation
xdr_double() translate double precision to/from external representation
xdr_enum()   translate enumerations to/from external representation
xdr_float()  translate floating point to/from external representation
xdr_int()    translate integers to/from external representation
xdr_long()   translate long integers to/from external representation
xdr_opaque() translate fixed-size opaque data to/from external representation
xdr_short()  translate short integers to/from external representation
xdr_string() translate null-terminated strings to/from external representation
xdr_u_int()  translate unsigned integers to/from external representation
xdr_u_long() translate unsigned long integers to/from external representation
xdr_u_short()translate unsigned short integers to/from external representation
xdr_union()  translate discriminated unions to/from external representation

Presumably, what you want to do is to use these routines to
convert your data and then write the data to a file; then use
them to convert the data back when read it.

---
Bill
novavax!proxftl!bill

ok@quintus.uucp (Richard A. O'Keefe) (08/30/88)

In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes:
>A fellow programmer suggested an "XDR" standard from SUN,
>but this seems to only work with inter-process communication.

Have you *tried* it?
Look at the name of the manual:
	"External Data Representation Protocol Specification"
in "Networking on the Sun Workstation".  You can send XDR-encoded
data through sockets, but there is no necessary connexion between
XDR and IPC.  For your application, you want to use
	xdrstdio_create()
to convert stdio streams to your files to XDR streams, and then
just read and write the data with appropriate XDR calls.  In fact,
you will probably use the same routine for reading and writing, as
the direction is encoded in the XDR stream, not in the calls.

It's really very easy to use.  (Easier than scanf(), anyway.)

SHANE@UTDALVM1.BITNET (Shane Davis) (09/08/88)

>I've run across a need to have data files in various forms of UN*X
>be portable to each other. Mostly, this deals with Intel to Motorola and
>vice-versa. I could write data out to files in ASCII, but this is cumbersome,
>slow and may hamper the products' marketability.
>
>The problem lies in writing integers as well as structures to files, and allow
> those files to be transferred between a multitude of machines without a data
> transformation taking place.
>
>A fellow programmer suggested an "XDR" standard from SUN, but this seems to onl
> work with inter-process communication. Has anyone encountered this problem??

XDR should do exactly what you need. Here is an example:

#include <stdio.h>
#include <rpc/rpc.h>
#define  MAXARRAYLEN 20

main()
  {
    XDR *xdrs;
    static unsigned int foo[MAXARRAYLEN],*fooptr,arraylen=MAXARRAYLEN,i=0;
    FILE *foo_out;

    foo_out = fopen ("fooarray","w");
    fooptr = &foo;
    while (i < 20)
      foo[i++] = i;
    xdrstdio_create (xdrs, foo_out, XDR_ENCODE);
    xdr_array (xdrs,&fooptr,&arraylen,MAXARRAYLEN,sizeof (int),xdr_int);
    fclose (foo_out);
  }

This program writes, in standard XDR binary representation, the entire contents
of the array 'foo', which can in turn be read by a program on another
architecture using XDR_DECODE rather than XDR_ENCODE. The last parameter to
the 'xdr_array' call is the name of the XDR "primitive" to be used on each
element of the array; as 'foo' is an int array, the function is 'xdr_int'.
Other primitives include 'xdr_float','xdr_short', etc. XDR functions are
also provided for structs.

Actually, I have not tested that program, but don't flame me too bad if it
doesn't work...

You can't move data from one architecture to another without *some* sort of
data transformation; XDR is much more compact and reasonable than ASCII files,
though.

--Shane Davis
  Systems Programmer, Univ. of Texas at Dallas Academic Computer Center
  SHANE@{UTDALVM1.BITNET|utdalvm1.dal.utexas.edu},rsd@engc1.dal.utexas.edu

scs@athena.mit.edu (Steve Summit) (09/13/88)

In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes:
> I've run across a need to have data files in various forms of UN*X
> be portable to each other.
> I could write data out to files in ASCII, but this is cumbersome,
> slow and may hamper the products' marketability.

Please strongly consider using ASCII after all.  The advantages
are many; the disadvantages are comparatively minor.

     1.	ASCII is well-nigh universal; portability is virtually
	assured.  Even if you ever want to go to an EBCDIC
	machine, conversion utilities are bound to be readily
	available (and conversion may indeed happen implicitly
	when transferring a text file to such a machine).

     2.	It's usually not nearly as inefficient as you'd think.
	Ironically, even sophisticated computer programmers
	commonly ignore the fact that computers are just
	blisteringly fast and can usually complete a seemingly
	inefficient ASCII parse in far lees time than it takes
	to think about it.  (I am aware that there are high-
	bandwidth, high-performance systems which cannot afford
	the luxury of an ASCII parse, and are well-advised to use
	binary transfer methods.  I maintain that surprisingly
	many real applications do not fall into this category,
	and can use ASCII without paying a performance penalty.)

     3.	Reading and writing ASCII formats isn't really that
	cumbersome; in fact I'd argue that binary formats, when
	properly designed to account for word ordering and other
	difficulties which ASCII formats easily overcome, are
	more cumbersome in the long run.

     4.	Don't overlook debugging.  ASCII formats can be
	inspected with cat, piped through grep and sed and other
	familiar utilities, patched with ordinary text editors,
	etc., etc.  The first program you write for your binary
	format is usually not the application you were trying to
	write, but the disassembler you find you need for
	debugging; getting the disassembler working is often a
	prerequisite for getting the end application working.

     5.	ASCII formats can make good, backwards-compatible
	version number schemes easy to implement.  Data formats
	inevitably require revision to accommodate new features.
	Fixed binary formats, especially those that simply write
	structures out as bytes, are usually not amenable to such
	changes, unless you did a lot of work to make them
	extensible (which is another aspect that makes binary
	formats more, not less, cumbersome than ASCII).
	Introducing a "version 2" format then requires a host of
	extra translation utilities, and nasty incompatibility
	problems when programs try to read files of the wrong
	format.  (These compatibility problems can be successfully
	worked around, but only if all files contain a version
	number, which is usually not recognized or implemented
	until version 1 is in place and version 2 is being
	contemplated, by which time it's too late.)

	Suppose, on the other hand, that your ASCII format
	consists of arbitrary lines of text, with a keyword at
	the beginning of each line indicating what kind of data,
	(e.g. what field of a structure) that line contains.  If
	programs ignore unrecognizable lines (a good practice),
	"version 1" programs can read "version 2" files without
	modification, if the version 2 keywords are a superset of
	version 1's.  Version 1 filters and editors can even
	modify version 2 files, without losing version-2-specific
	information, by saving, and echoing to the output,
	unrecognized lines without interpretation.

	(It's true that a binary format employing variable-length
	records with a type field in a consistent place would
	also enjoy these advantages.  Such records are in fact
	common in network protocols.)

The only real problem I've ever had with ASCII data interchange
formats is that you tend to lose a bit of precision when reading
and writing doubles, but you can minimize this by printfing
things with %.ne, for n sufficiently large.  If the precision
inherent in the data is less than that of a double, you're only
"losing" something you didn't have in the first place.

I'm not sure how using ASCII data formats could "hamper a
products' marketability."  If not an efficiency concern, it's
probably some attempt to keep information hidden in a cryptic
binary format rather than having it in plain text that anyone
could read.

                                            Steve Summit
                                            scs@adam.pika.mit.edu

dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/14/88)

In article <7038@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit)
writes:
[numerous arguments in favor of using ASCII text for portability]

Counterarguments:

1.   ASCII text is likely to be very bulky.

2.   All modern hardware architectures can use 8-bit bytes, so ASCII
     is unnecesssary except for older machines.  (Think "networks".)

I suggest encoding the data in bytes using a known byte order.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

daveb@geac.UUCP (David Collier-Brown) (09/14/88)

In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes:
| I've run across a need to have data files in various forms of UN*X
| be portable to each other.
| I could write data out to files in ASCII, but this is cumbersome,
| slow and may hamper the products' marketability.
 
From article <7038@bloom-beacon.MIT.EDU>, by scs@athena.mit.edu (Steve Summit):
| Please strongly consider using ASCII after all.  The advantages
| are many; the disadvantages are comparatively minor.

  In fact, you can get better efficency in many cases by writing in
ascii: zeros and blanks are represented by single characters instead
of whole records...
  Once upon a time, I used to beam with pleasure when watching a
spreadsheet stored in ascii load faster than the same thing written
in binary by a compeditor's product.  It didn't happen all the time,
but it wasn't all that rare (many spreadsheets in those days were
sparse arrays, written to disk in an inefficent manner).

--dave
-- 
 David Collier-Brown.  | yunexus!lethe!dave
 78 Hillcrest Ave,.    | He's so smart he's dumb.
 Willowdale, Ontario.  |        --Joyce C-B

bill@proxftl.UUCP (T. William Wells) (09/15/88)

In article <7038@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
: I'm not sure how using ASCII data formats could "hamper a
: products' marketability."  If not an efficiency concern, it's
: probably some attempt to keep information hidden in a cryptic
: binary format rather than having it in plain text that anyone
: could read.

One major reason for using binary over ASCII is that the binary
is usually going to be more compact.  If your data files *have*
to fit in some specified amount of disk space (like one floppy
disk), the difference might be critical.

---
Bill
novavax!proxftl!bill

henry@utzoo.uucp (Henry Spencer) (09/15/88)

In article <3942@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>1.   ASCII text is likely to be very bulky.

The question is whether reducing the bulk somewhat (often not as much as
you'd think) is worth the complications that result.  This is one of those
cases where the right thing to do is to implement it the obvious way, then
measure it to find out whether you NEED anything better.  Often you don't.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

news@amdcad.AMD.COM (Network News) (09/15/88)

In article <1988Sep14.230820.28652@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
| In article <3942@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
| >1.   ASCII text is likely to be very bulky.
| 
| The question is whether reducing the bulk somewhat (often not as much as
| you'd think) is worth the complications that result.  This is one of those
| cases where the right thing to do is to implement it the obvious way, then
| measure it to find out whether you NEED anything better.  Often you don't.

This was the conclusion of David Hansen of the University of Arizona,
when he proposed a linker that worked with ASCII object files rather
than some arcane binary standard.  It allowed the object modules to be
viewed and edited with standard tools, and was quite compact, because
many data constants could be written in one or two ASCII bytes, which
would have taken up a full word of memory when stored as binary.

In addition, macros allowed small text strings to represent the ascii
encodings for instructions, so they also took up little space.  This
also allowed the entire assembly phase to be bypassed, since the linker
could understand a language that looked reasonably like standard
assembly.

	-- Tim Olson
	Advanced Micro Devices
	(tim@crackle.amd.com)

scottg@hpiacla.HP.COM (Scott Gulland) (09/16/88)

/ hpiacla:comp.lang.c / dhesi@bsu-cs.UUCP (Rahul Dhesi) /  8:43 pm  Sep 13, 1988 /
>> In article <7038@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit)
>>writes:
>>[numerous arguments in favor of using ASCII text for portability]

> Counterarguments:
> 
> 1.   ASCII text is likely to be very bulky.

True, ASCII text may be very bulky, but isn't this just an efficiency issue.
When portability to many machine architectures is truly needed, most people
will gladly sacrifice efficiency for ease of portability in their data files.


> 2.   All modern hardware architectures can use 8-bit bytes, so ASCII
>      is unnecesssary except for older machines.  (Think "networks".)

I'm sorry, but I don't understand how this statement relates to portability
of data between heterogeneous architectures.  In any event, this is a simple
statement, NOT a counterargument.  How about explaining your rationale in 
a little more detail.

>
> I suggest encoding the data in bytes using a known byte order.

Bad idea !  Many architectures use different byte orderings for integers
, reals, etc.  This also does not seem to address differences between
floating point formats, sizes of integers, etc.  You also have to deal
with differences in representation of data types in the same language, but
on different architectures.  You will find that different implementations
of a language will store data in subtlely different ways.

**************************************************************************
* Scott Gulland	            | {ucbvax,hplabs}!hpda!hpiacla!scottg [UUCP] *
* Indus. Appl. Center (IAC) | scottg@hpiacla                      [SMTP] *
* Hewlett-Packard Co.       | (408) 746-5498                      [AT&T] *
* 1266 Kifer Road           | 1-746-5498                     [HP-TELNET] *
* Sunnyvale, CA  94086  USA | "What If..."                [HP-TELEPATHY] *
**************************************************************************

libes@cme-durer.ARPA (Don Libes) (09/18/88)

There's another possibility besides ASCII and native form: ASN.1

ASN.1 is a set of ISO standards which address the OSI Presentation
Layer - very similar to the problem you face.

The time/space overhead for ASN.1 data encoding is low.  Each datum
has a few extra bytes on it for length and type info but is otherwise
compactly stored.  For example, an ascii string is stored as an ascii
string.  Ints are stored in binary and allow for arbitrary size.

The nice thing is, you can select the way you want the datatype to be
encoded.  E.g., you can have an int encoded as an integer or string or
any of the other appropriate built-in types - and the result is still
portable.  In other words, you have pretty good control over
everything.

You can also define your own datatypes.  For example, I can (and do)
send linked lists between two processes this way.  And they can be
written in different languages, running on different machines and
different operating systems.

ASN.1 is really several pieces.  I believe there is a complete
implementation in ISODE (available via ftp from udel.edu).

We actively use ASN.1 here.  (We tried ASCII, native forms and mixes
of them, but eventually gave up with them.)  We have our own
implementation that one of our hackers wrote.  It's minimal but small
(object code is only 3K) and fast.  It doesn't do any parsing (8824),
just value encoding/decoding (8825), although we write our standards
following the ASN.1 syntax anyway.  (And yes, you can have it, but it
is undocumented.)

Don Libes          cme-durer.arpa      ...!uunet!cme-durer!libes

mh@wlbr.EATON.COM (Mike Hoegeman) (09/19/88)

In article <641@muffin.cme-durer.ARPA> libes@cme-durer.arpa (Don Libes) writes:
>There's another possibility besides ASCII and native form: ASN.1
>
>ASN.1 is a set of ISO standards which address the OSI Presentation
>Layer - very similar to the problem you face.
>

You may want to check out the eXternal Data Representation specifcation
as defined by Sun Microsystems. This is  used in NFS and the other
network services using the RPC (remote procedure call) protocol. the
sources are available as well as the specs are in the sun-spots
archives on titan.rice.edu The number of machines that NFS runs on is a
testament to it's portability.


-mike

libes@cme-durer.ARPA (Don Libes) (09/20/88)

In article <23344@wlbr.EATON.COM> mh@wlbr.eaton.com.UUCP (Mike Hoegeman) writes:
>In article <641@muffin.cme-durer.ARPA> libes@cme-durer.arpa (Don Libes) writes:
>>There's another possibility besides ASCII and native form: ASN.1
>
>You may want to check out the eXternal Data Representation specifcation
>as defined by Sun Microsystems. 

Is anyone familiar with both ASN.1 and XDR to give a good comparison?
I've never seen or heard of one, although I assume the XDR authors
must have known about ASN.1 (or X.409 as it used to be called).

I'm aware that ASN.1 was not complete when Sun did RPC.  I always
wondered if 1) they ever considered switching over at some time in the
future, 2) if the two are too functionally dissimilar, or 3) RPC is
better, faster, whatever, than ASN.1.

Since Sun (for example) is moving towards ISO application services,
they will have both ASN.1 and RPC (in source, in memory, etc).

Don Libes          cme-durer.arpa      ...!uunet!cme-durer!libes

jsp@marvin.UUCP (Johnnie Peters) (09/23/88)

In article <4940003@hpiacla.HP.COM!, scottg@hpiacla.HP.COM (Scott Gulland) writes:
! / hpiacla:comp.lang.c / dhesi@bsu-cs.UUCP (Rahul Dhesi) /  8:43 pm  Sep 13, 1988 /
!!! In article <7038@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit)
!!!writes:
!!![numerous arguments in favor of using ASCII text for portability]
! 
!! Counterarguments:
!! 
!! 1.   ASCII text is likely to be very bulky.
! 
! True, ASCII text may be very bulky, but isn't this just an efficiency issue.
! When portability to many machine architectures is truly needed, most people
! will gladly sacrifice efficiency for ease of portability in their data files.
!! 2.   All modern hardware architectures can use 8-bit bytes, so ASCII
! I'm sorry, but I don't understand how this statement relates to portability
!! I suggest encoding the data in bytes using a known byte order.
! 
! Bad idea !  Many architectures use different byte orderings for integers
! , reals, etc.  This also does not seem to address differences between
! 

	Why not take the road that many databases do?  Write 2 utilites, one 
to export the data in ascii form and one to import it back in.  This would
allow data to taken across machines and still be used in their native format.
Also this allows backups that will be more portable.

				--  Johnnie  --