dandc@simsdevl.UUCP (Dan DeClerck) (08/27/88)
I've run across a need to have data files in various forms of UN*X be portable to each other. Mostly, this deals with Intel to Motorola and vice-versa. I could write data out to files in ASCII, but this is cumbersome, slow and may hamper the products' marketability. The problem lies in writing integers as well as structures to files, and allow those files to be transferred between a multitude of machines without a data transformation taking place. A fellow programmer suggested an "XDR" standard from SUN, but this seems to only work with inter-process communication. Has anyone encountered this problem?? --------------------------------------------------------------------------------Dan DeClerck dandc@rutgers!mcdchg!simsdevl Motorola SIMS development group... "SIMS", it anticipates every need..... Schaumburg, Il
gwyn@smoke.ARPA (Doug Gwyn ) (08/28/88)
In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes: >The problem lies in writing integers as well as structures to files, >and allow those files to be transferred between a multitude of machines >without a data transformation taking place. I know your main need is "Intel to Motorola", whatever that means (8086 to 68000?), but the issue is a general one and that's what I'll be discussing. In general, a data transformation HAS to take place; the main issue is where it is done. If you map into some "universal" data representation, then most architectures/implementations have to translate the data. (If the "universal" format matches the native binary format of some implementation, then that implementation can take advantage of the fact.) If one is exchanging data dynamically, e.g. via a network connection, then it is possible to do some initial probing to discover how much mapping is necessary and often reduce the amount of work from what would be required if a "universal" format had been used. I have a package that does exactly this. When coding up data translation algorithms in C, portability is fairly tedious to achieve, since one can only count on AT LEAST 8 bits in a char (there may be more), ordinary chars may sign-extend in expressions, integers have to be built by shifting their pieces into the right place, unions are not usually guaranteed to have the right properties, etc. Floating-point format is even harder now that IEEE has introduced non- numbers into the game. And the "standard" apparently does not control byte order within a floating-point number, so incompatibilities exist even between IEEE implementations. Sun's XDR spells out methods of describing structures; such descriptors are not necessary if your application knows in advance what it will be reading. However, you still need some way to construct the in-memory structure. Even if every struct member has the same representation as in the transferred file, there are still alignment/padding requirements that may differ from one implementation to the next. Generally one ends up building a struct by translating each member separately. Sorry there is no simple way to achieve what you're after, at least not in general. Don't rule out ASCII representation too quickly..
bill@proxftl.UUCP (T. William Wells) (08/29/88)
In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes:
: I've run across a need to have data files in various forms of UN*X
: be portable to each other. Mostly, this deals with Intel to Motorola and
: vice-versa. I could write data out to files in ASCII, but this is cumbersome,
: slow and may hamper the products' marketability.
:
: The problem lies in writing integers as well as structures to
: files, and allow those files to be transferred between a
: multitude of machines without a data transformation taking
: place.
:
: A fellow programmer suggested an "XDR" standard from SUN, but
: this seems to only work with inter-process communication. Has
: anyone encountered this problem??
As I understand it, XDR includes a description of the data *as it
is transmitted* as well as a set of conversion routines. For
example, on the man page for our Sun 3 is:
xdr_array() translate arrays to/from external representation
xdr_bool() translate Booleans to/from external representation
xdr_bytes() translate counted byte strings to/from external representation
xdr_double() translate double precision to/from external representation
xdr_enum() translate enumerations to/from external representation
xdr_float() translate floating point to/from external representation
xdr_int() translate integers to/from external representation
xdr_long() translate long integers to/from external representation
xdr_opaque() translate fixed-size opaque data to/from external representation
xdr_short() translate short integers to/from external representation
xdr_string() translate null-terminated strings to/from external representation
xdr_u_int() translate unsigned integers to/from external representation
xdr_u_long() translate unsigned long integers to/from external representation
xdr_u_short()translate unsigned short integers to/from external representation
xdr_union() translate discriminated unions to/from external representation
Presumably, what you want to do is to use these routines to
convert your data and then write the data to a file; then use
them to convert the data back when read it.
---
Bill
novavax!proxftl!bill
ok@quintus.uucp (Richard A. O'Keefe) (08/30/88)
In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes: >A fellow programmer suggested an "XDR" standard from SUN, >but this seems to only work with inter-process communication. Have you *tried* it? Look at the name of the manual: "External Data Representation Protocol Specification" in "Networking on the Sun Workstation". You can send XDR-encoded data through sockets, but there is no necessary connexion between XDR and IPC. For your application, you want to use xdrstdio_create() to convert stdio streams to your files to XDR streams, and then just read and write the data with appropriate XDR calls. In fact, you will probably use the same routine for reading and writing, as the direction is encoded in the XDR stream, not in the calls. It's really very easy to use. (Easier than scanf(), anyway.)
SHANE@UTDALVM1.BITNET (Shane Davis) (09/08/88)
>I've run across a need to have data files in various forms of UN*X >be portable to each other. Mostly, this deals with Intel to Motorola and >vice-versa. I could write data out to files in ASCII, but this is cumbersome, >slow and may hamper the products' marketability. > >The problem lies in writing integers as well as structures to files, and allow > those files to be transferred between a multitude of machines without a data > transformation taking place. > >A fellow programmer suggested an "XDR" standard from SUN, but this seems to onl > work with inter-process communication. Has anyone encountered this problem?? XDR should do exactly what you need. Here is an example: #include <stdio.h> #include <rpc/rpc.h> #define MAXARRAYLEN 20 main() { XDR *xdrs; static unsigned int foo[MAXARRAYLEN],*fooptr,arraylen=MAXARRAYLEN,i=0; FILE *foo_out; foo_out = fopen ("fooarray","w"); fooptr = &foo; while (i < 20) foo[i++] = i; xdrstdio_create (xdrs, foo_out, XDR_ENCODE); xdr_array (xdrs,&fooptr,&arraylen,MAXARRAYLEN,sizeof (int),xdr_int); fclose (foo_out); } This program writes, in standard XDR binary representation, the entire contents of the array 'foo', which can in turn be read by a program on another architecture using XDR_DECODE rather than XDR_ENCODE. The last parameter to the 'xdr_array' call is the name of the XDR "primitive" to be used on each element of the array; as 'foo' is an int array, the function is 'xdr_int'. Other primitives include 'xdr_float','xdr_short', etc. XDR functions are also provided for structs. Actually, I have not tested that program, but don't flame me too bad if it doesn't work... You can't move data from one architecture to another without *some* sort of data transformation; XDR is much more compact and reasonable than ASCII files, though. --Shane Davis Systems Programmer, Univ. of Texas at Dallas Academic Computer Center SHANE@{UTDALVM1.BITNET|utdalvm1.dal.utexas.edu},rsd@engc1.dal.utexas.edu
scs@athena.mit.edu (Steve Summit) (09/13/88)
In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes: > I've run across a need to have data files in various forms of UN*X > be portable to each other. > I could write data out to files in ASCII, but this is cumbersome, > slow and may hamper the products' marketability. Please strongly consider using ASCII after all. The advantages are many; the disadvantages are comparatively minor. 1. ASCII is well-nigh universal; portability is virtually assured. Even if you ever want to go to an EBCDIC machine, conversion utilities are bound to be readily available (and conversion may indeed happen implicitly when transferring a text file to such a machine). 2. It's usually not nearly as inefficient as you'd think. Ironically, even sophisticated computer programmers commonly ignore the fact that computers are just blisteringly fast and can usually complete a seemingly inefficient ASCII parse in far lees time than it takes to think about it. (I am aware that there are high- bandwidth, high-performance systems which cannot afford the luxury of an ASCII parse, and are well-advised to use binary transfer methods. I maintain that surprisingly many real applications do not fall into this category, and can use ASCII without paying a performance penalty.) 3. Reading and writing ASCII formats isn't really that cumbersome; in fact I'd argue that binary formats, when properly designed to account for word ordering and other difficulties which ASCII formats easily overcome, are more cumbersome in the long run. 4. Don't overlook debugging. ASCII formats can be inspected with cat, piped through grep and sed and other familiar utilities, patched with ordinary text editors, etc., etc. The first program you write for your binary format is usually not the application you were trying to write, but the disassembler you find you need for debugging; getting the disassembler working is often a prerequisite for getting the end application working. 5. ASCII formats can make good, backwards-compatible version number schemes easy to implement. Data formats inevitably require revision to accommodate new features. Fixed binary formats, especially those that simply write structures out as bytes, are usually not amenable to such changes, unless you did a lot of work to make them extensible (which is another aspect that makes binary formats more, not less, cumbersome than ASCII). Introducing a "version 2" format then requires a host of extra translation utilities, and nasty incompatibility problems when programs try to read files of the wrong format. (These compatibility problems can be successfully worked around, but only if all files contain a version number, which is usually not recognized or implemented until version 1 is in place and version 2 is being contemplated, by which time it's too late.) Suppose, on the other hand, that your ASCII format consists of arbitrary lines of text, with a keyword at the beginning of each line indicating what kind of data, (e.g. what field of a structure) that line contains. If programs ignore unrecognizable lines (a good practice), "version 1" programs can read "version 2" files without modification, if the version 2 keywords are a superset of version 1's. Version 1 filters and editors can even modify version 2 files, without losing version-2-specific information, by saving, and echoing to the output, unrecognized lines without interpretation. (It's true that a binary format employing variable-length records with a type field in a consistent place would also enjoy these advantages. Such records are in fact common in network protocols.) The only real problem I've ever had with ASCII data interchange formats is that you tend to lose a bit of precision when reading and writing doubles, but you can minimize this by printfing things with %.ne, for n sufficiently large. If the precision inherent in the data is less than that of a double, you're only "losing" something you didn't have in the first place. I'm not sure how using ASCII data formats could "hamper a products' marketability." If not an efficiency concern, it's probably some attempt to keep information hidden in a cryptic binary format rather than having it in plain text that anyone could read. Steve Summit scs@adam.pika.mit.edu
dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/14/88)
In article <7038@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes: [numerous arguments in favor of using ASCII text for portability] Counterarguments: 1. ASCII text is likely to be very bulky. 2. All modern hardware architectures can use 8-bit bytes, so ASCII is unnecesssary except for older machines. (Think "networks".) I suggest encoding the data in bytes using a known byte order. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
daveb@geac.UUCP (David Collier-Brown) (09/14/88)
In article <103@simsdevl.UUCP> dandc@simsdevl.UUCP (Dan DeClerck) writes: | I've run across a need to have data files in various forms of UN*X | be portable to each other. | I could write data out to files in ASCII, but this is cumbersome, | slow and may hamper the products' marketability. From article <7038@bloom-beacon.MIT.EDU>, by scs@athena.mit.edu (Steve Summit): | Please strongly consider using ASCII after all. The advantages | are many; the disadvantages are comparatively minor. In fact, you can get better efficency in many cases by writing in ascii: zeros and blanks are represented by single characters instead of whole records... Once upon a time, I used to beam with pleasure when watching a spreadsheet stored in ascii load faster than the same thing written in binary by a compeditor's product. It didn't happen all the time, but it wasn't all that rare (many spreadsheets in those days were sparse arrays, written to disk in an inefficent manner). --dave -- David Collier-Brown. | yunexus!lethe!dave 78 Hillcrest Ave,. | He's so smart he's dumb. Willowdale, Ontario. | --Joyce C-B
bill@proxftl.UUCP (T. William Wells) (09/15/88)
In article <7038@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
: I'm not sure how using ASCII data formats could "hamper a
: products' marketability." If not an efficiency concern, it's
: probably some attempt to keep information hidden in a cryptic
: binary format rather than having it in plain text that anyone
: could read.
One major reason for using binary over ASCII is that the binary
is usually going to be more compact. If your data files *have*
to fit in some specified amount of disk space (like one floppy
disk), the difference might be critical.
---
Bill
novavax!proxftl!bill
henry@utzoo.uucp (Henry Spencer) (09/15/88)
In article <3942@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: >1. ASCII text is likely to be very bulky. The question is whether reducing the bulk somewhat (often not as much as you'd think) is worth the complications that result. This is one of those cases where the right thing to do is to implement it the obvious way, then measure it to find out whether you NEED anything better. Often you don't. -- NASA is into artificial | Henry Spencer at U of Toronto Zoology stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
news@amdcad.AMD.COM (Network News) (09/15/88)
In article <1988Sep14.230820.28652@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: | In article <3942@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: | >1. ASCII text is likely to be very bulky. | | The question is whether reducing the bulk somewhat (often not as much as | you'd think) is worth the complications that result. This is one of those | cases where the right thing to do is to implement it the obvious way, then | measure it to find out whether you NEED anything better. Often you don't. This was the conclusion of David Hansen of the University of Arizona, when he proposed a linker that worked with ASCII object files rather than some arcane binary standard. It allowed the object modules to be viewed and edited with standard tools, and was quite compact, because many data constants could be written in one or two ASCII bytes, which would have taken up a full word of memory when stored as binary. In addition, macros allowed small text strings to represent the ascii encodings for instructions, so they also took up little space. This also allowed the entire assembly phase to be bypassed, since the linker could understand a language that looked reasonably like standard assembly. -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)
scottg@hpiacla.HP.COM (Scott Gulland) (09/16/88)
/ hpiacla:comp.lang.c / dhesi@bsu-cs.UUCP (Rahul Dhesi) / 8:43 pm Sep 13, 1988 / >> In article <7038@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) >>writes: >>[numerous arguments in favor of using ASCII text for portability] > Counterarguments: > > 1. ASCII text is likely to be very bulky. True, ASCII text may be very bulky, but isn't this just an efficiency issue. When portability to many machine architectures is truly needed, most people will gladly sacrifice efficiency for ease of portability in their data files. > 2. All modern hardware architectures can use 8-bit bytes, so ASCII > is unnecesssary except for older machines. (Think "networks".) I'm sorry, but I don't understand how this statement relates to portability of data between heterogeneous architectures. In any event, this is a simple statement, NOT a counterargument. How about explaining your rationale in a little more detail. > > I suggest encoding the data in bytes using a known byte order. Bad idea ! Many architectures use different byte orderings for integers , reals, etc. This also does not seem to address differences between floating point formats, sizes of integers, etc. You also have to deal with differences in representation of data types in the same language, but on different architectures. You will find that different implementations of a language will store data in subtlely different ways. ************************************************************************** * Scott Gulland | {ucbvax,hplabs}!hpda!hpiacla!scottg [UUCP] * * Indus. Appl. Center (IAC) | scottg@hpiacla [SMTP] * * Hewlett-Packard Co. | (408) 746-5498 [AT&T] * * 1266 Kifer Road | 1-746-5498 [HP-TELNET] * * Sunnyvale, CA 94086 USA | "What If..." [HP-TELEPATHY] * **************************************************************************
libes@cme-durer.ARPA (Don Libes) (09/18/88)
There's another possibility besides ASCII and native form: ASN.1 ASN.1 is a set of ISO standards which address the OSI Presentation Layer - very similar to the problem you face. The time/space overhead for ASN.1 data encoding is low. Each datum has a few extra bytes on it for length and type info but is otherwise compactly stored. For example, an ascii string is stored as an ascii string. Ints are stored in binary and allow for arbitrary size. The nice thing is, you can select the way you want the datatype to be encoded. E.g., you can have an int encoded as an integer or string or any of the other appropriate built-in types - and the result is still portable. In other words, you have pretty good control over everything. You can also define your own datatypes. For example, I can (and do) send linked lists between two processes this way. And they can be written in different languages, running on different machines and different operating systems. ASN.1 is really several pieces. I believe there is a complete implementation in ISODE (available via ftp from udel.edu). We actively use ASN.1 here. (We tried ASCII, native forms and mixes of them, but eventually gave up with them.) We have our own implementation that one of our hackers wrote. It's minimal but small (object code is only 3K) and fast. It doesn't do any parsing (8824), just value encoding/decoding (8825), although we write our standards following the ASN.1 syntax anyway. (And yes, you can have it, but it is undocumented.) Don Libes cme-durer.arpa ...!uunet!cme-durer!libes
mh@wlbr.EATON.COM (Mike Hoegeman) (09/19/88)
In article <641@muffin.cme-durer.ARPA> libes@cme-durer.arpa (Don Libes) writes: >There's another possibility besides ASCII and native form: ASN.1 > >ASN.1 is a set of ISO standards which address the OSI Presentation >Layer - very similar to the problem you face. > You may want to check out the eXternal Data Representation specifcation as defined by Sun Microsystems. This is used in NFS and the other network services using the RPC (remote procedure call) protocol. the sources are available as well as the specs are in the sun-spots archives on titan.rice.edu The number of machines that NFS runs on is a testament to it's portability. -mike
libes@cme-durer.ARPA (Don Libes) (09/20/88)
In article <23344@wlbr.EATON.COM> mh@wlbr.eaton.com.UUCP (Mike Hoegeman) writes: >In article <641@muffin.cme-durer.ARPA> libes@cme-durer.arpa (Don Libes) writes: >>There's another possibility besides ASCII and native form: ASN.1 > >You may want to check out the eXternal Data Representation specifcation >as defined by Sun Microsystems. Is anyone familiar with both ASN.1 and XDR to give a good comparison? I've never seen or heard of one, although I assume the XDR authors must have known about ASN.1 (or X.409 as it used to be called). I'm aware that ASN.1 was not complete when Sun did RPC. I always wondered if 1) they ever considered switching over at some time in the future, 2) if the two are too functionally dissimilar, or 3) RPC is better, faster, whatever, than ASN.1. Since Sun (for example) is moving towards ISO application services, they will have both ASN.1 and RPC (in source, in memory, etc). Don Libes cme-durer.arpa ...!uunet!cme-durer!libes
jsp@marvin.UUCP (Johnnie Peters) (09/23/88)
In article <4940003@hpiacla.HP.COM!, scottg@hpiacla.HP.COM (Scott Gulland) writes: ! / hpiacla:comp.lang.c / dhesi@bsu-cs.UUCP (Rahul Dhesi) / 8:43 pm Sep 13, 1988 / !!! In article <7038@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) !!!writes: !!![numerous arguments in favor of using ASCII text for portability] ! !! Counterarguments: !! !! 1. ASCII text is likely to be very bulky. ! ! True, ASCII text may be very bulky, but isn't this just an efficiency issue. ! When portability to many machine architectures is truly needed, most people ! will gladly sacrifice efficiency for ease of portability in their data files. !! 2. All modern hardware architectures can use 8-bit bytes, so ASCII ! I'm sorry, but I don't understand how this statement relates to portability !! I suggest encoding the data in bytes using a known byte order. ! ! Bad idea ! Many architectures use different byte orderings for integers ! , reals, etc. This also does not seem to address differences between ! Why not take the road that many databases do? Write 2 utilites, one to export the data in ascii form and one to import it back in. This would allow data to taken across machines and still be used in their native format. Also this allows backups that will be more portable. -- Johnnie --